zcollection.collection.Collection.map_overlap#

Collection.map_overlap(func, /, *args, delayed=True, depth=1, filters=None, partition_size=None, npartition=None, selected_variables=None, **kwargs)#

Map a function over the partitions of the collection with some overlap.

Parameters:
  • func (PartitionCallable) – The function to apply to every partition of the collection. If func accepts a partition_info as a keyword argument, it will be passed a tuple with the name of the partitioned dimension and the slice allowing getting in the dataset the selected partition without the overlap.

  • *args – The positional arguments to pass to the function.

  • delayed (bool) – Whether to load the data lazily or not.

  • depth (int) – The depth of the overlap between the partitions. Defaults to 1.

  • filters (str | Callable[[Dict[str, int]], bool] | None) – The predicate used to filter the partitions to process. To get more information on the predicate, see the documentation of the partitions() method.

  • partition_size (int | None) – The length of each bag partition.

  • npartition (int | None) – The number of desired bag partitions.

  • selected_variables (Sequence[str] | None) – A list of variables to retain from the collection. If None, all variables are kept.

  • **kwargs – The keyword arguments to pass to the function.

Returns:

A bag containing the tuple of the partition scheme and the result of the function.

Return type:

Bag

Example

>>> futures = collection.map_overlap(
...     lambda x: (x["var1"] + x["var2"]).values,
...     depth=1)
>>> for item in futures:
...     print(item)
[1.0, 2.0, 3.0, 4.0]
[5.0, 6.0, 7.0, 8.0]