zcollection.collection.Collection#
- class zcollection.collection.Collection(axis, ds, partition_handler, partition_base_dir, *, mode=None, filesystem=None, synchronizer=None)[source]#
Bases:
ReadOnlyCollection
This class manages a collection of files in Zarr format stored in a set of subdirectories. These subdirectories split the data, by cycles or dates for example, in order to optimize access and updates, deletion or addition of new data.
- Parameters:
axis (str) – The axis of the collection. This is the dimension along which the data is partitioned.
ds (meta.Dataset) – The dataset containing the collection. This dataset is used to create the metadata of the collection, which is used to validate datasets that are inserted in the collection.
partition_handler (partitioning.Partitioning) – The partitioning strategy for the collection. This is an instance of a subclass of
zarr_partitioning.PartitionHandler
.partition_base_dir (str) – The base directory for the collection. This is the directory where the subdirectories containing the partitioned data are stored.
mode (Literal['r', 'w'] | None) – The mode of the collection. This can be either ‘r’ (read-only) or ‘w’ (write). In read-only mode, the collection can only be read and no data can be inserted or modified. In write mode, the collection can be read and modified.
filesystem (fsspec.AbstractFileSystem | str | None) – The filesystem to use for the collection. This is an instance of a subclass of
fsspec.AbstractFileSystem
.synchronizer (sync.Sync | None) – The synchronizer to use for the collection. This is an instance of a subclass of
zarr_synchronizer.Synchronizer
.
- Raises:
ValueError – If the axis does not exist in the dataset, if the partition key is not defined in the dataset or if the access mode is not supported.
Notes
Normally, this class is not instantiated directly but through the
create_collection
andopen_collection
methods of this library.Attributes
Configuration filename of the collection.
Return the axis of the collection.
Return the filesystem of the collection.
Return True if the collection contains immutable data relative to the partitioning.
Return the metadata of the collection.
Return the mode of the collection.
Return the partitioning properties of the collection.
Return the partitioning strategy of the collection.
Return the synchronizer of the collection.
Public Methods
add_variable
(variable)Add a variable to the collection.
copy
(target, *[, filters, filesystem, mode, ...])Copy the collection to a new location.
drop_partitions
(*[, filters, timedelta])Drop the selected partitions.
drop_variable
(variable)Delete the variable from the collection.
from_config
(path, *[, mode, filesystem, ...])Open a Collection described by a configuration file.
insert
(ds, *[, merge_callable, npartitions, ...])Insert a dataset into the collection.
Return True if the collection is read-only.
update
(func, /, *args[, delayed, depth, ...])Update the selected partitions.
validate_partitions
([filters, fix])Validates partitions in the collection by checking if they exist and are readable.
Protected Methods
Set the unsupported methods to raise an exception when the collection is opened in read-only mode.
_unsupported_operation
(*args, **kwargs)Raise an exception if the operation is not supported.
_write_config
([skip_if_exists])Write the configuration file.
Special Methods
__str__
()Return str(self).
Inherited Methods
Return True if the collection is locked.
iterate_on_records
(*[, relative])Iterate over the partitions and the zarr groups.
load
(*[, delayed, filters, indexer, ...])Load the selected partitions.
map
(func, /, *args[, delayed, filters, ...])Map a function over the partitions of the collection.
map_overlap
(func, /, *args[, delayed, ...])Map a function over the partitions of the collection with some overlap.
partitions
(*[, cache, lock, filters, relative])List the partitions of the collection.
variables
([selected_variables])Return the variables of the collection.