Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The CDS Toolbox offers a broad variety of tools for retrieving, processing and visualising datasets from the C3S Climate Data Store, with the cdstoolbox Python package containing everything you need to explore datasets and develop applications in the Toolbox editor. The  cdstoolbox package (usually imported as ct) draws from several widely used Python libraries including xarray, numpy, scipy and pandas; if you're familiar with these libraries, you'll find a lot of the same functionality within cdstoolbox.

Why cdstoolbox?

One huge advantage of using the Toolbox over processing data "offline" is that your code, or workflow, is run entirely within the CDS infrastructure. This means you do not need to download huge volumes of data or have a powerful computer to work with CDS data - you can use our computers instead!

...

This is all you need to know, but read on to learn how the cdstoolbox package achieves remote data processing and caching of results.

What are remote objects?

When you retrieve data from the CDS catalogue in a Toolbox workflow, the result is returned as a remote object, which is simply a pointer to a data file stored (cached) on the CDS. Remote objects can be printed within a workflow to get an xarray summary of the underlying data:

...

This is a great way to get a quick view of the data you're working with, although in reality a remote is not an xarray DataArray but rather an object containing all the information the Toolbox needs to find, retrieve and operate on the data. This means that remote objects cannot be treated as Python arrays because the data isn't present within the object itself; instead, we need to use the tools and services within the cdstoolbox namespace because they understand how to access and process the data.

How do CDS tools ands services work?

The CDS tools and services are functions that are designed to work with remote objects. They take as input dictionaries which provide parameters and/or file locations and return a dictionary which contain a "resultlocation", this is the url of the file produced by the service, whether that is a netCDF, json, png or any other file type.

...

This means that all of the 'heavy lifting' takes place on powerful CDS compute nodes, and every result produced is cached on the CDS.

Caching

Once a cdstoolbox function has been executed, the result is stored in the CDS cache. This means that the next time that service is run with the exact same inputs, the result is retrieved instantly from the cache and the data doesn't need to be processed again.

...