Ingesting from common file formats
Learn how to ingest data from common file formats into Tilebox
Through the usage of xarray
and pandas
you can also easily ingest existing datasets available in file
formats, such as CSV, Parquet, Feather and more.
CSV
Comma-separated values (CSV) is a common file format for tabular data. It’s widely used in data science. Tilebox
supports CSV ingestion using the pandas.read_csv
function.
Assuming you have a CSV file named data.csv
with the following content. If you want to follow along, you can
download the file here.
This data already conforms to the schema of the MyCustomDataset
dataset, except for some_unwanted_column
which
you want to drop before you ingest it. Here is how this could look like:
Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
Tilebox supports Parquet ingestion using the pandas.read_parquet
function.
The parquet file used in this example is available here.
Feather
Feather is a file format originating from the Apache Arrow project,
designed for storing tabular data in a fast and memory-efficient way. It’s supported by many programming languages,
including Python. Tilebox supports Feather ingestion using the pandas.read_feather
function.
The feather file used in this example is available here.
GeoParquet
Please check out the Ingesting data guide for an example of ingesting a GeoParquet file.