zcollection.dataset.Dataset#

class zcollection.dataset.Dataset(variables, *, attrs=None, block_size_limit=None, chunks=None, delayed=None)[source]#

Bases: object

Hold variables, dimensions, and attributes that together form a dataset.

Parameters:
  • variables (DelayedArray | Array) – A dictionary of variables in the dataset, with variable names as keys and Array or DelayedArray objects as values.

  • attrs (Sequence[Attribute] | None) – A tuple of global attributes on this dataset.

  • block_size_limit (int | None) – The maximum size (in bytes) of a block/chunk of variable’s data. Defaults to 128 MiB.

  • chunks (Sequence[Dimension] | None) – A dictionary of chunk sizes for each dimension.

  • delayed (bool | None) – A boolean indicating whether the dataset contains delayed variables (numpy arrays wrapped in dask arrays).

Raises:
  • ValueError – If the dataset contains variables with the same dimensions but with different values.

  • ValueError – If the dataset contains both delayed and non-delayed variables.

Notes

The dataset is a dictionary-like container of variables. It also holds the dimensions and attributes of the dataset. If the dataset contains delayed variables, the values are DelayedArray objects. Otherwise, the values are Array objects. It is impossible to mix delayed and non-delayed variables in the same dataset.

Attributes

dimensions

A dictionary of dimension names and their index in the dataset

variables

class:Variable <zcollection.variable.abc.Variable> objects.

attrs

The list of global attributes on this dataset

chunks

Chunk size for each dimension

block_size_limit

Maximum data chunk size

delayed

The type of variables in the dataset

dims_chunk

Dimensions chunk size as a tuple.

nbytes

Return the total number of bytes in the dataset.

Public Methods

add_variable(var, /[, data])

Add a variable to the dataset.

compute(**kwargs)

Compute the dataset variables.

concat(other, dim)

Concatenate datasets along a dimension.

delete(indexer, axis)

Return a new dataset without the data selected by the provided indices.

drops_vars(names)

Drop variables from the dataset.

fill_attrs(mds)

Fill the dataset and its variables attributes using the provided metadata.

from_xarray(zds[, delayed])

Create a new dataset from a xarray dataset.

isel(slices)

Return a new dataset with each array indexed along the specified slices.

merge(other)

Merge the provided dataset into this dataset.

metadata()

Get the dataset metadata.

persist(*[, compress])

Persist the dataset variables.

rechunk(**kwargs)

Rechunk the dataset.

rename(names)

Rename variables in the dataset.

select_variables_by_dims(dims[, predicate])

Return a new dataset with only the variables that have the specified dimensions if predicate is true, otherwise return a new dataset with only the variables that do not have the specified dimensions.

select_vars(names)

Return a new dataset containing only the selected variables.

set_for_insertion(mds)

Create a new dataset ready to be inserted into a collection.

to_dict([variables])

Convert the dataset to a dictionary, between the variable names and their data.

to_xarray(**kwargs)

Convert the dataset to a xarray dataset.

to_zarr(path[, fs, parallel])

Write the dataset to a Zarr store.

Special Methods

__bool__()

__getattr__(name)

__getitem__(name)

Return a variable from the dataset.

__getstate__()

Helper for pickle.

__len__()

__repr__()

Return repr(self).

__setstate__(state)

__str__()

Return str(self).