zcollection.collection.Collection#

class zcollection.collection.Collection(axis, ds, partition_handler, partition_base_dir, *, mode=None, filesystem=None, synchronizer=None)[source]#

Bases: ReadOnlyCollection

This class manages a collection of files in Zarr format stored in a set of subdirectories. These subdirectories split the data, by cycles or dates for example, in order to optimize access and updates, deletion or addition of new data.

Parameters:
  • axis (str) – The axis of the collection. This is the dimension along which the data is partitioned.

  • ds (meta.Dataset) – The dataset containing the collection. This dataset is used to create the metadata of the collection, which is used to validate datasets that are inserted in the collection.

  • partition_handler (partitioning.Partitioning) – The partitioning strategy for the collection. This is an instance of a subclass of zarr_partitioning.PartitionHandler.

  • partition_base_dir (str) – The base directory for the collection. This is the directory where the subdirectories containing the partitioned data are stored.

  • mode (Literal['r', 'w'] | None) – The mode of the collection. This can be either ‘r’ (read-only) or ‘w’ (write). In read-only mode, the collection can only be read and no data can be inserted or modified. In write mode, the collection can be read and modified.

  • filesystem (fsspec.AbstractFileSystem | str | None) – The filesystem to use for the collection. This is an instance of a subclass of fsspec.AbstractFileSystem.

  • synchronizer (sync.Sync | None) – The synchronizer to use for the collection. This is an instance of a subclass of zarr_synchronizer.Synchronizer.

Raises:

ValueError – If the axis does not exist in the dataset, if the partition key is not defined in the dataset or if the access mode is not supported.

Notes

Normally, this class is not instantiated directly but through the create_collection and open_collection methods of this library.

Attributes

CONFIG

Configuration filename of the collection.

axis

Return the axis of the collection.

fs

Return the filesystem of the collection.

immutable

Return True if the collection contains immutable data relative to the partitioning.

metadata

Return the metadata of the collection.

mode

Return the mode of the collection.

partition_properties

Return the partitioning properties of the collection.

partitioning

Return the partitioning strategy of the collection.

synchronizer

Return the synchronizer of the collection.

Public Methods

add_variable(variable)

Add a variable to the collection.

copy(target, *[, filters, filesystem, mode, ...])

Copy the collection to a new location.

drop_partitions(*[, filters, timedelta])

Drop the selected partitions.

drop_variable(variable)

Delete the variable from the collection.

from_config(path, *[, mode, filesystem, ...])

Open a Collection described by a configuration file.

insert(ds, *[, merge_callable, npartitions, ...])

Insert a dataset into the collection.

is_readonly()

Return True if the collection is read-only.

update(func, /, *args[, delayed, depth, ...])

Update the selected partitions.

validate_partitions([filters, fix])

Validates partitions in the collection by checking if they exist and are readable.

Protected Methods

_read_only_mode()

Set the unsupported methods to raise an exception when the collection is opened in read-only mode.

_unsupported_operation(*args, **kwargs)

Raise an exception if the operation is not supported.

_write_config([skip_if_exists])

Write the configuration file.

Special Methods

__str__()

Return str(self).

Inherited Methods

is_locked()

Return True if the collection is locked.

iterate_on_records(*[, relative])

Iterate over the partitions and the zarr groups.

load(*[, delayed, filters, indexer, ...])

Load the selected partitions.

map(func, /, *args[, delayed, filters, ...])

Map a function over the partitions of the collection.

map_overlap(func, /, *args[, delayed, ...])

Map a function over the partitions of the collection with some overlap.

partitions(*[, cache, lock, filters, relative])

List the partitions of the collection.

variables([selected_variables])

Return the variables of the collection.