Why async?

Often case when interacting with external datasets, such as Tilebox datasets loading data can take a little while. One way to speed up this process is to run those requests in parallel. This can be achieved by multi-threading or multi-processing, but this is not always easiest method of achieving this. An alternative is to perform data loading tasks in an async manner, leveraging co-routines and asyncio to achieve this.

Switching to an async datasets client

Typically all you need to do is swap out your import statement of the Client and you’re good to go. Check out the example below to see how that is done works.

Once you have switched to the async client, you can use the async and await keywords to make your code async. Check out the examples below to see how that works for a few examples.

Jupyter notebooks or similar interactive environments also support asynchronous code execution. You can even use await some_async_call() as the output of a code cell.

Benefits

The main benefit of using an async client is that you can run requests concurrently, which improve performance. This is especially useful when you are loading data from different collections. Check out the example below to see how that works.

Example: Fetching data concurrently

The following example fetches data from different collections. In the synchronous example, it fetches the data sequentially, whereas in the async example it fetches the data concurrently. This means that the async approach is faster for such use cases.

The output is shown below. As you can see, the async approach is 5 seconds faster. If you have show_progress enabled, the progress bars are updated concurrently. In this example the second collection contains less data than the first one, so it finishes first.

Async workflows

The Tilebox workflows Python client doesn’t offer an async client. That’s because workflows are already designed to be executed in a distributed and concurrent fashion - outside of the context of a single async event loop. But within a single task execution, you may still want to use async code, to leverage the benefits of async execution, such as loading data in parallel. Achieving this is straightforward, by wrapping your async code in asyncio.run.

Below is an example of how you can leverage async code within a workflow task.

If you encounter an error like RuntimeError: asyncio.run() cannot be called from a running event loop, it means you are trying to start another asyncio event loop (with asyncio.run) from within an already running event loop. One situation where this can easily occur is if you are using asyncio.run in a Jupyter notebook, since Jupyter automatically starts an event loop for you. One way to work around this is to use nest-asyncio.