zcollection.collection.Collection.update#

Collection.update(func, /, *args, delayed=True, depth=0, filters=None, npartitions=None, selected_variables=None, trim=True, variables=None, **kwargs)[source]#

Update the selected partitions.

Parameters:
  • func (UpdateCallable) – The function to apply on each partition.

  • *args – The positional arguments to pass to the function.

  • delayed (bool) – Whether to load data in a dask array or not.

  • depth (int) – The depth of the overlap between the partitions. Default is 0 (no overlap). If depth is greater than 0, the function is applied on the partition and its neighbors selected by the depth. If func accepts a partition_info as a keyword argument, it will be passed a tuple with the name of the partitioned dimension and the slice allowing getting in the dataset the selected partition.

  • filters (str | Callable[[Dict[str, int]], bool] | None) – The expression used to filter the partitions to update.

  • npartitions (int | None) – The number of partitions to update in parallel. By default, it is equal to the number of Dask workers available when calling this method.

  • selected_variables (list[str] | None) – A list of variables to load from the collection. If None, all variables are loaded.

  • trim (bool) – Whether to trim depth items from each partition after calling func. Set it to False if your function does this for you.

  • variables (Sequence[str] | None) – The list of variables updated by the function. If None, the variables are inferred by calling the function on the first partition. In this case, it is important to ensure that the function can be called twice on the same partition without side-effects. Default is None.

  • **kwargs – The keyword arguments to pass to the function.

Raises:

ValueError – If the variables to update are not in the collection.

Return type:

None

Example

>>> import dask.array
>>> import zcollection
>>> def ones(ds):
...     return dict(var2=ds.variables["var1"].values * 0 + 1)
>>> collection = zcollection.Collection("my_collection", mode="w")
>>> collection.update(ones)