> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tilebox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Async support

> The Tilebox Python SDK offers full async support with asyncio, allowing you to run data operations in parallel for significantly better performance.

## Why use async?

When working with external datasets, such as [Tilebox datasets](/datasets/concepts/datasets), loading data may take some time. To speed up this process, you can run requests in parallel. While you can use multi-threading or multi-processing, which can be complex, often times a simpler option is to perform data loading tasks asynchronously using coroutines and `asyncio`.

## Switching to an async datasets client

To switch to the async client, change the import statement for the `Client`. The example below illustrates this change.

<CodeGroup>
  ```python Python (Sync) theme={"system"}
  from tilebox.datasets import Client

  # This client is synchronous
  client = Client()
  ```

  ```python Python (Async) theme={"system"}
  from tilebox.datasets.aio import Client

  # This client is asynchronous
  client = Client()
  ```
</CodeGroup>

After switching to the async client, use `await` for operations that interact with the Tilebox API.

<CodeGroup>
  ```python Python (Sync) theme={"system"}
  # Listing datasets
  datasets = client.datasets()

  # Listing collections
  dataset = datasets.open_data.copernicus.sentinel1_sar
  collections = dataset.collections()

  # Collection information
  collection = collections["S1A_IW_RAW__0S"]
  info = collection.info()
  print(f"Data for My-collection is available for {info.availability}")

  # Loading data
  data = collection.query(temporal_extent=("2022-05-01", "2022-06-01"), show_progress=True)

  # Finding a specific datapoint
  datapoint_uuid = "01910b3c-8552-7671-3345-b902cc0813f3"
  datapoint = collection.find(datapoint_uuid)
  ```

  ```python Python (Async) theme={"system"}
  # Listing datasets
  datasets = await client.datasets()

  # Listing collections
  dataset = datasets.open_data.copernicus.sentinel1_sar
  collections = await dataset.collections()

  # Collection information
  collection = collections["S1A_IW_RAW__0S"]
  info = await collection.info()
  print(f"Data for My-collection is available for {info.availability}")

  # Loading data
  data = await collection.query(temporal_extent=("2022-05-01", "2022-06-01"), show_progress=True)

  # Finding a specific datapoint
  datapoint_uuid = "01910b3c-8552-7671-3345-b902cc0813f3"
  datapoint = await collection.find(datapoint_uuid)
  ```
</CodeGroup>

<Note>
  Jupyter notebooks and similar interactive environments support asynchronous code execution. You can use
  `await some_async_call()` as the output of a code cell.
</Note>

## Fetching data concurrently

The primary benefit of the async client is that it allows concurrent requests, enhancing performance.
In below example, data is fetched from multiple collections. The synchronous approach retrieves data sequentially, while the async approach does so concurrently, resulting in faster execution.

<CodeGroup>
  ```python Python (Sync) theme={"system"}
  # Example: fetching data sequentially

  # switch to the async example to compare the differences
  import time
  from tilebox.datasets import Client
  from tilebox.datasets.sync.timeseries import TimeseriesCollection

  client = Client()
  datasets = client.datasets()
  collections = datasets.open_data.copernicus.landsat8_oli_tirs.collections()

  def stats_for_2020(collection: TimeseriesCollection) -> None:
      """Fetch data for 2020 and print the number of data points that were loaded."""
      data = collection.query(temporal_extent=("2020-01-01", "2021-01-01"), show_progress=True)
      n = data.sizes['time'] if 'time' in data else 0
      return (collection.name, n)

  start = time.monotonic()
  results = [stats_for_2020(collections[name]) for name in collections]
  duration = time.monotonic() - start

  for collection_name, n in results:
      print(f"There are {n} datapoints in {collection_name} for 2020.")
  print(f"Fetching data took {duration:.2f} seconds")
  ```

  ```python Python (Async) theme={"system"}
  # Example: fetching data concurrently

  import asyncio
  import time
  from tilebox.datasets.aio import Client
  from tilebox.datasets.aio.timeseries import TimeseriesCollection

  client = Client()
  datasets = await client.datasets()
  collections = await datasets.open_data.copernicus.landsat8_oli_tirs.collections()

  async def stats_for_2020(collection: TimeseriesCollection) -> None:
      """Fetch data for 2020 and print the number of data points that were loaded."""
      data = await collection.query(temporal_extent=("2020-01-01", "2021-01-01"), show_progress=True)
      n = data.sizes['time'] if 'time' in data else 0
      return (collection.name, n)

  start = time.monotonic()

  # Initiate all requests concurrently
  requests = [stats_for_2020(collections[name]) for name in collections]
  # Wait for all requests to finish in parallel
  results = await asyncio.gather(*requests)

  duration = time.monotonic() - start

  for collection_name, n in results:
      print(f"There are {n} datapoints in {collection_name} for 2020.")
  print(f"Fetching data took {duration:.2f} seconds")
  ```
</CodeGroup>

The output demonstrates that the async approach runs approximately 30% faster for this example. With `show_progress` enabled, the progress bars update concurrently.

<CodeGroup>
  ```plaintext Python (Sync) theme={"system"}
  There are 19624 datapoints in L1GT for 2020.
  There are 1281 datapoints in L1T for 2020.
  There are 65313 datapoints in L1TP for 2020.
  There are 25375 datapoints in L2SP for 2020.
  Fetching data took 10.92 seconds
  ```

  ```plaintext Python (Async) theme={"system"}
  There are 19624 datapoints in L1GT for 2020.
  There are 1281 datapoints in L1T for 2020.
  There are 65313 datapoints in L1TP for 2020.
  There are 25375 datapoints in L2SP for 2020.
  Fetching data took 7.45 seconds
  ```
</CodeGroup>

## Async workflows

The Tilebox workflows Python client does not have an async client. This is because workflows are designed for distributed and concurrent execution outside a single async event loop. But within a single task, you may use still use`async` code to take advantage of asynchronous execution, such as parallel data loading. You can achieve this by wrapping your async code in `asyncio.run`.

Below is an example of using async code within a workflow task.

<CodeGroup>
  ```python Python (Async) theme={"system"}
  import asyncio
  import xarray as xr

  from tilebox.datasets.aio import Client as DatasetsClient
  from tilebox.datasets.query import TimeIntervalLike
  from tilebox.workflows import Task, ExecutionContext

  class FetchData(Task):
      def execute(self, context: ExecutionContext) -> None:
          # The task execution itself is synchronous
          # But we can leverage async code within the task using asyncio.run

          # This will fetch three months of data in parallel
          data_jan, data_feb, data_mar = asyncio.run(load_first_three_months())
          
  async def load_data(interval: TimeIntervalLike):
      datasets = await DatasetsClient().datasets()
      collections = await datasets.open_data.copernicus.landsat8_oli_tirs.collections()
      return await collections["L1T"].query(temporal_extent=interval)

  async def load_first_three_months() -> tuple[xr.Dataset, xr.Dataset, xr.Dataset]:
      jan = load_data(("2020-01-01", "2020-02-01"))
      feb = load_data(("2020-02-01", "2020-03-01"))
      mar = load_data(("2020-03-01", "2020-04-01"))
      # load the three months in parallel
      jan, feb, mar = await asyncio.gather(jan, feb, mar)
      return jan, feb, mar
  ```
</CodeGroup>

<Tip>
  If you encounter an error like `RuntimeError: asyncio.run() cannot be called from a running event loop`, it means you're trying to start another asyncio event loop (with `asyncio.run`) from within an existing one. This often happens in Jupyter notebooks since they automatically start an event loop. A way to resolve this is by using [nest-asyncio](https://pypi.org/project/nest-asyncio/).
</Tip>
