Data Workflows

class cartoframes.data.Dataset(table_name=None, schema=None, query=None, df=None, gdf=None, state=None, is_saved_in_carto=False, context=None)

Generic data class for cartoframes data operations. A Dataset instance can be created from a dataframe, geodataframe, a table hosted on a CARTO account, an arbitrary query against a CARTO account, or a local or hosted GeoJSON source. If hosted, the data can be retrieved as a pandas DataFrame. If local or as a query, a new table can be created in a CARTO account off of the Dataset instance.

The recommended way to work with this class is by using the class methods from_table(), from_query(), from_dataframe(), from_geodataframe(), or from_geojson(). Direct use of the Dataset constructor should be avoided.

classmethod from_table(table_name, context=None, schema=None)

Create a Dataset from a table hosted on CARTO.

Parameters:
  • table_name (str) – Name of table on CARTO account associated with context.
  • context (Context, optional) – Context that table_name is associated with. If set_default_context is previously used, this value will be implicitly filled in.
  • schema (str, optional) – Name of user in organization (multi-user account) who shared table_name. This option only works with multi-user accounts.
from cartoframes.auth import set_default_context
from cartoframes.data import Dataset

set_default_context('https://cartoframes.carto.com')

d = Dataset.from_table('us_counties_population')

# download into a dataframe
df = d.download()
classmethod from_query(query, context=None)

Create a Dataset from an arbitrary query of data hosted on CARTO.

Parameters:
  • query (str) – Name of table on CARTO account associated with context.
  • context (Context, optional) – Context that query is associated with. If set_default_context is previously used, this value will be implicitly filled in.
from cartoframes.auth import set_default_context
from cartoframes.data import Dataset
from cartoframes.viz import Map
from cartoframes.viz.helpers import color_continuous_layer

set_default_context('https://cartoframes.carto.com')

d = Dataset.from_query('''
    SELECT
      CDB_LatLng(pickup_latitude, pickup_longitude) as the_geom,
      ST_Transform(CDB_LatLng(pickup_latitude, pickup_longitude), 3857) as the_geom_webmercator,
      cartodb_id,
      fare_amount
    FROM
      taxi_50k
    ''')

# show dataset on a map
Map(color_continuous_layer(d, 'fare_amount'))
classmethod from_dataframe(df)

Create a Dataset from a local pandas DataFrame.

Parameters:df (pandas.DataFrame) – pandas DataFrame

Example

Create a Dataset from a pandas Dataframe and then map the data.

from cartoframes.data import Dataset
from cartoframes.viz import Map, Layer
import pandas as pd

df = pd.DataFrame({'lat': [0, 10, 20], 'lng': [20, 10, 0]})

d = Dataset.from_dataframe(df)

Map(Layer(d))
classmethod from_geodataframe(gdf)

Create a Dataset from a local GeoPandas GeoDataFrame.

Parameters:gdf (geopandas.GeoDataFrame) – GeoPandas GeoDataFrame

Example

GeoDataFrame example code taken from GeoPandas documentation.

from cartoframes.data import Dataset
from cartoframes.viz import Map, Layer
import pandas as pd
import geopandas as gpd

df = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
     'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
     'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
     'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df.Longitude, df.Latitude)
)

d = Dataset.from_geodataframe(gdf)

Map(Layer(d))
classmethod from_geojson(geojson)

Create a Dataset from a GeoJSON file (hosted or local).

Parameters:gdf (geopandas.GeoDataFrame) – GeoPandas GeoDataFrame

Example

GeoDataFrame example code taken from GeoPandas documentation.

from cartoframes.data import Dataset
from cartoframes.viz import Map, Layer

geojson_source = 'https://cartoframes.carto.com/api/v2/sql?q=select+*+from+nyc_census_tracts&format=geojson'

d = Dataset.from_geojson(geojson_source)

Map(Layer(d))
dataframe

Dataset DataFrame

geodataframe

Dataset GeoDataFrame

table_name

Dataset table name

schema

Dataset schema

query

Dataset query

context

Dataset Context

is_saved_in_carto

Property on whether Dataset is saved in CARTO account

dataset_info

DatasetInfo associated with Dataset instance

Note

This method only works for Datasets created from tables.

Example

from cartoframes.auth import set_default_context
from cartoframes.data import Dataset

set_default_context(
    base_url='https://your_user_name.carto.com/',
    api_key='your api key'
)

d = Dataset.from_table('tablename')
d.dataset_info
update_dataset_info(privacy=None, name=None)

Update/change Dataset privacy and name

Parameters:
  • privacy (str, optional) – One of DatasetInfo.PRIVATE, DatasetInfo.PUBLIC, or DatasetInfo.LINK
  • name (str, optional) – Name of the dataset on CARTO.

Example

from cartoframes.data import Dataset
from cartoframes.auth import set_default_context

set_default_context(
    base_url='https://your_user_name.carto.com/',
    api_key='your api key'
)

d = Dataset.from_table('tablename')
d.update_dataset_info(privacy='link')
upload(with_lnglat=None, if_exists='fail', table_name=None, schema=None, context=None)

Upload Dataset to CARTO account associated with context.

Parameters:
  • with_lnglat (tuple, optional) – Two columns that have the longitude and latitude information. If used, a point geometry will be created upon upload to CARTO. Example input: (‘long’, ‘lat’). Defaults to None.
  • if_exists (str, optional) – Behavior for adding data from Dataset. Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’, which means that the Dataset instance will not overwrite a table of the same name if it exists. If the table does not exist, it will be created.
  • table_name (str) – Desired table name for the dataset on CARTO. If name does not conform to SQL naming conventions, it will be ‘normalized’ (e.g., all lower case, adding _ in place of spaces and other special characters.
  • context (Context, optional) – Context of user account to send Dataset to. If not provided, a default context (if set with set_default_context) will attempted to be used.

Example

Send a pandas DataFrame to CARTO.

from cartoframes.auth import set_default_context
from cartoframes.data import Dataset
import pandas as pd

set_default_context(
    base_url='https://your_user_name.carto.com',
    api_key='your api key'
)

df = pd.DataFrame({
    'lat': [40, 45, 50],
    'lng': [-80, -85, -90]
})
d = Dataset.from_dataframe(df)
d.upload(with_lnglat=('lng', 'lat'), table_name='sample_table')
download(limit=None, decode_geom=False, retry_times=3)

Download / read a Dataset (table or query) from CARTO account associated with the Dataset’s instance of Context.

Parameters:
  • limit (int, optional) – The number of rows of the Dataset to download. Default is to download all rows. This value must be >= 0.
  • decode_geom (bool, optional) – Decode Dataset geometries into Shapely geometries from EWKB encoding.
  • retry_times (int, optional) – Number of time to retry the download in case it fails. Default is Dataset.DEFAULT_RETRY_TIMES.

Example

from cartoframes.data import Dataset
from cartoframes.auth import set_default_context

# use cartoframes example account
set_default_context('https://cartoframes.carto.com')

d = Dataset('brooklyn_poverty')

df = d.download(decode_geom=True)
delete()

Delete table on CARTO account associated with a Dataset instance

Example

from cartoframes.data import Dataset
from cartoframes.auth import set_default_context

set_default_context(
    base_url='https://your_user_name.carto.com',
    api_key='your api key'
)

d = Dataset.from_table('table_name')
d.delete()
Returns:True if deletion is successful, False otherwise.
Return type:bool
exists()

Checks to see if table exists

is_public()

Checks to see if table or table used by query has public privacy

get_table_columns()

Get column names and types from a table or query result

get_table_column_names(exclude=None)

Get column names and types from a table

compute_geom_type()

Compute the geometry type from the data