ascat package

Subpackages

Submodules

ascat.accessors module

class ascat.accessors.CFDiscreteGeometryAccessor(xarray_obj: Dataset)[source]

Bases: object

property array_type: str

resample_to_orthomulti(**kwargs)[source]

sel_instances(instance_vals: Sequence[int, str] | np.ndarray | None = None, instance_lookup_vector: np.ndarray[Any, np.dtype[np.bool]] | None = None, **kwargs)[source]

set_coord_vars(coord_vars: Sequence[str])[source]

set_instance_vars(instance_vars: Sequence[str])[source]

set_sample_dimension(sample_dim: str)[source]

property timeseries_id: str

to_contiguous_ragged(**kwargs)[source]

to_indexed_ragged(**kwargs)[source]

to_orthomulti(**kwargs)[source]

to_point_array()[source]

to_raster(*args, **kwargs)[source]

class ascat.accessors.PyGeoGriddedArrayAccessor(xarray_obj: Dataset)[source]

Bases: object

property grid

lonlat_vars_from_gpi_var(gpi_var, lon_var='lon', lat_var='lat') → tuple[DataArray, DataArray][source]

sel_bbox(bbox: Sequence[float]) → Dataset[source]

Select data for a bounding box.

bboxtuple, optional: Tuple of (latmin, latmax, lonmin, lonmax) coordinates.

sel_cells(cells: Sequence[float]) → Dataset[source]

sel_coords(coords: Sequence[Sequence[float]], max_coord_dist: float = inf) → Dataset[source]

sel_geom(geom: BaseGeometry) → Dataset[source]

sel_gpis(gpis: Sequence[int] | None = None, lookup_vector: ndarray | None = None) → Dataset[source]

set_grid_name(grid_name: str, grid_class: type | None = None)[source]

ascat.cell module

class ascat.cell.CellGridFiles(root_path, file_class, grid, fn_format='{cell:04d}.nc', sf_format=None, preprocessor=None)[source]

Bases: object

convert_to_contiguous(out_dir, print_progress=True, **kwargs)[source]

Convert all files in the collection to contiguous format and write to disk.

Parameters:

out_dir (str) – Output directory.
print_progress (bool, optional) – Whether to print progress messages to console. Default is True.
kwargs (dict) – Keyword arguments passed to the reprocess method.

fn_search(cell)[source]

classmethod from_product_class(root_path, product_class, **kwargs)[source]

classmethod from_product_id(root_path, product_id, **kwargs)[source]

read(cell=None, location_id=None, coords=None, bbox=None, geom=None, max_coord_dist=inf, date_range=None, **kwargs)[source]

Read data matching a spatial and temporal criterion.

Parameters:

cell (int or list of int) – Grid cell number to read.
location_id (int or list of int) – Location id.
coords (tuple of numeric or tuple of iterable of numeric) –
Tuple of (lon, lat) coordinates. lon and lat could each be numpy arrays in order to read multiple coordinates. For each coordinate the nearest grid point within max_coord_dist (in spherical cartesian coordinates) will be selected.

Note that if any passed coordinates share the same nearest grid point, that grid point will only be represented once in the output dataset.
bbox (tuple) – Tuple of (latmin, latmax, lonmin, lonmax) coordinates.
geom (shapely.geometry) – Geometry object.
max_coord_dist (float) – The maximum distance a coordinate’s nearest grid point can be from it to be selected (in spherical cartesian coordinates). Default is np.inf.
date_range (tuple of np.datetime64) – Tuple of (start, end) dates.

Returns:

Filtered and merged data for the specified spatiotemporal region.

Return type:

xarray.Dataset

reprocess(out_dir, func, parallel=True, **kwargs)[source]

Use Filenames.reprocess to apply a function to all files in the collection and save the results to out_dir.

Parameters:

out_dir (str) – Output directory.
func (callable) – Function to apply to each file.
parallel (bool, optional) – Whether to process files in parallel. Default is True.
kwargs (dict) – Keyword arguments passed to func.

spatial_search(cell=None, location_id=None, coords=None, bbox=None, geom=None)[source]

Search files for cells matching a spatial criterion. All args are declared as optional; but one and only one should be passed.

Parameters:

cell (int or list of int) – Grid cell number to read.
location_id (int or list of int) – Location id.
coords (tuple of numeric or tuple of iterable of numeric) – Tuple of (lon, lat) coordinates.
bbox (tuple) – Tuple of (latmin, latmax, lonmin, lonmax) coordinates.
geom (shapely.geometry) – Geometry object.

Returns:

filenames – Filenames.

Return type:

list of str

class ascat.cell.OrthoMultiTimeseriesCell(filenames)[source]

Bases: Filenames

Class to read and merge orthomulti cell files.

read(date_range=None, location_id=None, lookup_vector=None, preprocessor=None, parallel=False, **kwargs)[source]

Read data from OrthoMulti Cell files.

Parameters:

date_range (tuple of np.datetime64) – Tuple of (start, end) dates.
location_id (list of int) – List of timeseries IDs to read.
lookup_vector (np.ndarray) – Lookup vector.
preprocessor (callable, optional) – Function to preprocess the dataset.
parallel (bool, optional) – Whether or not to read/preprocess in parallel. Default is False.

class ascat.cell.RaggedArrayTs(filenames)[source]

Bases: Filenames

Class to read and merge ragged array cell files.

read(date_range=None, location_id=None, lookup_vector=None, preprocessor=None, return_format=None, parallel=False, **kwargs)[source]

Read data from Ragged Array Cell files.

Parameters:

date_range (tuple of np.datetime64) – Tuple of (start, end) dates.
location_id (list of int) – List of timeseries IDs to read.
lookup_vector (np.ndarray) – Lookup vector.
preprocessor (callable, optional) – Function to preprocess the dataset.
return_format (str, optional) – CF discrete geometry format to return data as. Can be “point”, “indexed”, or “contiguous”.
parallel (bool, optional) – Whether or not to read/preprocess in parallel. Default is False.
**kwargs (dict) –

ascat.cf_array module

class ascat.cf_array.CFDiscreteGeom(xarray_obj: Dataset, coord_vars: Sequence[str] | None = None, instance_vars: Sequence[str] | None = None, contiguous_sort_vars: Sequence[str] | None = None)[source]

Bases: object

property array_type

class ascat.cf_array.OrthoMultiTimeseriesArray(xarray_obj: Dataset, coord_vars: Sequence[str] | None = None, instance_vars: Sequence[str] | None = None, contiguous_sort_vars: Sequence[str] | None = None)[source]

Bases: CFDiscreteGeom

property array_type

sel_instances(instance_vals: Sequence[int | str] | ndarray | None = None, instance_lookup_vector: ndarray | None = None)[source]

Select requested timeseries instances from an orthomulti timeseries array dataset.

Parameters:

instance_vals (Union[Sequence[Union[int, str]], np.ndarray], optional) – List of instance values to select, by default None
instance_lookup_vector (Union[np.ndarray], optional) – Lookup vector for instance values, by default None

set_sample_dimension(sample_dim: str)[source]

to_raster(x_var, y_var)[source]

class ascat.cf_array.PointArray(xarray_obj: Dataset, coord_vars: Sequence[str] | None = None, instance_vars: Sequence[str] | None = None, contiguous_sort_vars: Sequence[str] | None = None)[source]: Bases: CFDiscreteGeom

class ascat.cf_array.RaggedArray(xarray_obj: Dataset, coord_vars: Sequence[str] | None = None, instance_vars: Sequence[str] | None = None, contiguous_sort_vars: Sequence[str] | None = None)[source]

Bases: CFDiscreteGeom

property array_type

sel_instances(instance_vals: Sequence[int | str] | ndarray | None = None, instance_lookup_vector: ndarray | None = None) → Dataset[source]

set_sample_dimension(sample_dim: str)[source]

property timeseries_id

to_contiguous_ragged(count_var: str = 'row_size', sort_vars: Sequence[str] | None = None) → Dataset[source]

to_indexed_ragged(index_var: str = 'locationIndex') → Dataset[source]

to_point_array()[source]

class ascat.cf_array.TimeseriesPointArray(xarray_obj: Dataset, coord_vars: Sequence[str] | None = None, instance_vars: Sequence[str] | None = None, contiguous_sort_vars: Sequence[str] | None = None)[source]

Bases: PointArray

Assumptions made beyond basic CF conventions:

cf_role=”timeseries_id” is used to identify the timeseries ID variable for purposes
of selecting instances and converting to ragged arrays. If you only have a single timeseries there’s not much point in using this class.

property array_type: str

resample_to_orthomulti(instance_dim: str = 'locations', timeseries_id: str = 'location_id', count_var: str = 'row_size', instance_vars: ~typing.Sequence[str] | None = None, coord_vars: ~typing.Sequence[str] | None = None, sort_vars: ~typing.Sequence[str] | None = None, vars_to_resample: ~typing.Sequence[str] | None = None, resample_method: callable = <function mean>, resample_period: str = '1M')[source]

sel_instances(instance_vals: Sequence[int | str] | ndarray | None = None, instance_lookup_vector: ndarray | None = None, timeseries_id: str = 'location_id')[source]

set_sample_dimension(sample_dim: str)[source]

property timeseries_id

to_contiguous_ragged(instance_dim: str = 'locations', timeseries_id: str = 'location_id', count_var: str = 'row_size', instance_vars: Sequence[str] | None = None, coord_vars: Sequence[str] | None = None, sort_vars: Sequence[str] | None = None) → Dataset[source]

to_indexed_ragged(instance_dim: str = 'locations', timeseries_id: str = 'location_id', index_var: str = 'locationIndex', instance_vars: Sequence[str] | None = None, coord_vars: Sequence[str] | None = None) → Dataset[source]

to_orthomulti(instance_dim: str = 'locations', timeseries_id: str = 'location_id', count_var: str = 'row_size', instance_vars: Sequence[str] | None = None, coord_vars: Sequence[str] | None = None, sort_vars: Sequence[str] | None = None)[source]

to_point_array()[source]

ascat.cf_array.cf_array_class(ds, array_type, **kwargs)[source]

ascat.cf_array.cf_array_type(ds)[source]

ascat.cf_array.check_orthomulti_ts(ds)[source]

ascat.cf_array.contiguous_to_indexed(ds: Dataset, sample_dim: str, instance_dim: str, count_var: str, index_var: str) → Dataset[source]: Convert a contiguous ragged array dataset to an indexed ragged array dataset.

ascat.cf_array.contiguous_to_point(ds: Dataset, sample_dim: str, instance_dim: str, count_var: str)[source]

Convert a contiguous ragged array dataset to a Point Array.

Parameters:

ds (xarray.Dataset) – Dataset.
sample_dim (str) – Name of the sample dimension.
instance_dim (str) – Name of the instance dimension.
count_var (str) – Name of the count variable.

Returns:

Dataset with only the time series variables.

Return type:

xarray.Dataset

ascat.cf_array.indexed_to_contiguous(ds: Dataset, sample_dim: str, instance_dim: str, count_var: str, index_var: str, sort_vars: Sequence[str] | None = None) → Dataset[source]: Convert an indexed ragged array dataset to a contiguous ragged array dataset

ascat.cf_array.indexed_to_point(ds: Dataset, sample_dim: str, instance_dim: str, index_var: str)[source]

ascat.cf_array.point_to_contiguous(ds: Dataset, sample_dim: str, instance_dim: str, timeseries_id: str, count_var: str = 'row_size', instance_vars: Sequence[str] | None = None, coord_vars: Sequence[str] | None = None, sort_vars: Sequence[str] | None = None) → Dataset[source]

ascat.cf_array.point_to_indexed(ds: Dataset, sample_dim: str, instance_dim: str, timeseries_id: str, index_var: str = 'locationIndex', instance_vars: Sequence[str] | None = None, coord_vars: Sequence[str] | None = None) → Dataset[source]

ascat.cgls module

ascat.file_handling module

File search methods.

class ascat.file_handling.ChronFiles(root_path, cls, fn_templ, sf_templ, cls_kwargs=None, err=True, fn_read_fmt=None, sf_read_fmt=None, fn_write_fmt=None, sf_write_fmt=None, cache_size=0)[source]

Bases: MultiFileHandler

Managing chronological files with a date field in the filename.

read_period(dt_start, dt_end, dt_delta=datetime.timedelta(days=1), dt_buffer=datetime.timedelta(days=1), search_date_fmt='%Y%m%d*', date_field='date', date_field_fmt='%Y%m%d', end_inclusive=True, fmt_kwargs={}, **kwargs)[source]

Read data for given interval.

Parameters:

dt_start (datetime) – Start datetime.
dt_end (datetime) – End datetime.
dt_delta (timedelta, optional) – Time delta used to jump through search date.
dt_buffer (timedelta, optional) – Search buffer used to find files which could possibly contain data but would be left out because of dt_start.
search_date_fmt (str, optional) – Search date string format used during file search (default: %Y%m%d*).
date_field (str, optional) – Date field name (default: “date”).
date_field_fmt (str, optional) – Date field string format (default: %Y%m%d).

Returns:

data – Data stored in file.

Return type:

dict, numpy.ndarray

search_date(timestamp, search_date_fmt='%Y%m%d*', date_field='date', date_field_fmt='%Y%m%d', return_date=False, **fmt_kwargs)[source]

Search files for given date.

Parameters:

timestamp (datetime) – Search date.
search_date_fmt (str, optional) – Search date string format used during file search (default: %Y%m%d*).
date_field (str, optional) – Date field name (default: “date”)
date_field_format (str, optional) – Date field string format (default: %Y%m%d).
return_date (bool, optional) – Return date parsed from filename (default: False).

Returns:

filenames (list of str) – Filenames.
dates (list of datetime) – Parsed date of filename (only returned if return_date=True).

search_period(dt_start, dt_end, dt_delta=datetime.timedelta(days=1), search_date_fmt='%Y%m%d*', date_field='date', date_field_fmt='%Y%m%d', end_inclusive=True, **fmt_kwargs)[source]

Search files for time period.

Parameters:

dt_start (datetime) – Start datetime.
dt_end (datetime) – End datetime.
dt_delta (timedelta, optional) – Time delta used to jump through search date.
search_fmt (str, optional) – Search date string format used during file search (default: %Y%m%d*).
date_field (str, optional) – Date field name (default: “date”).
date_field_fmt (str, optional) – Date field string format (default: %Y%m%d).
end_inclusive (bool, optional) – Include files from a dt_delta length period beyond dt_end if True (default: False).

Returns:

filenames – Filenames.

Return type:

list of str

class ascat.file_handling.CsvFile(filename, mode='r')[source]

Bases: Filenames

Read and write single CSV file.

header2dtype(header)[source]

Convert header string to dtype info.

Parameters:: header (str) – Header string with dtype info.
Returns:: dtype – Data type.
Return type:: numpy.dtype

read_period(dt_start, dt_end)[source]

Read subset data from CSV file for given interval.

Parameters:: interval ((datetime, datetime)) – Time interval to extract data.
Returns:: data – Data.
Return type:: numpy.ndarray

class ascat.file_handling.CsvFiles(root_path)[source]

Bases: ChronFiles

Write CSV files.

class ascat.file_handling.FileSearch(root_path, fn_pattern, sf_pattern=None)[source]

Bases: object

FileSearch class.

create_isearch_func(func, recursive=False)[source]

Create custom search function returning it.

Parameters:

func (function) – Search function with its own args/kwargs returning a filename format dictionary and subfolder format dictionary depending on the passed arguments.
recursive (bool, optional) – If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories (default: False).

Returns:

custom_search – Custom search function returning an iterator of path/file names that match.

Return type:

function

create_search_func(func, recursive=False)[source]

Create custom search function returning it.

Parameters:

func (function) – Search function with its own args/kwargs returning a filename format dictionary and subfolder format dictionary depending on the passed arguments.
recursive (bool, optional) – If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories (default: False).

Returns:

custom_search – Custom search function returning a possibly-empty list of path/file names that match.

Return type:

function

isearch(fn_fmt, sf_fmt=None, recursive=False)[source]

Search filesystem for given pattern returning iterator.

Parameters:

fn_fmt (dict) – Filename format dictionary.
sf_fmt (dict of dicts, optional) – Format dictionary for subfolders (default: None).
recursive (bool, optional) – If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories (default: False).

Returns:

filenames – Iterator which yields the same values as search() without actually storing them all simultaneously.

Return type:

iterator

search(fn_fmt, sf_fmt=None, recursive=False)[source]

Search filesystem for given pattern returning list.

Parameters:

fn_fmt (dict) – Filename format dictionary.
sf_fmt (dict of dicts, optional) – Format dictionary for subfolders (default: None).
recursive (bool, optional) – If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories (default: False).

Returns:

filenames – Return a possibly-empty list of path/file names that match.

Return type:

list of str

class ascat.file_handling.FilenameTemplate(root_path, fn_templ, sf_templ=None)[source]

Bases: object

FilenameTemplate class.

build_basename(fmt)[source]

Create file basename from format dictionary.

Parameters:: fmt (dict) – Filename format applied on filename pattern (fn_pattern). e.g. fn_pattern = “{date}*.{suffix}” with fmt = {“date”: “20000101”, “suffix”: “nc”} returns “20000101*.nc”
Returns:: filename – Filename with format_dict applied.
Return type:: str

build_filename(fn_fmt, sf_fmt=None)[source]

Create filename from format dictionary.

Parameters:

fn_fmt (dict) – Filename format applied on filename pattern (fn_pattern). e.g. fn_pattern = “{date}*.{suffix}” with fn_format_dict = {“date”: “20000101”, “suffix”: “nc”} returns “20000101*.nc”
sf_fmt (dict of dicts) –
Format dictionary for subfolders. Each subfolder contains a dictionary defining the format of the folder name. e.g. sf_templ = {“years”: {year}, “months”: {month}} with sf_format = {“years”: {“year”: “2000”},

”months”: {“month”: “02”}}

returns [“2000”, “02”]

Returns:

filename – Filename with format_dict applied.

Return type:

str

build_subfolder(fmt)[source]

Create subfolder path from format dictionary.

Parameters:

fmt (dict of dicts) –

Format dictionary for subfolders. Each subfolder contains a dictionary defining the format of the folder name. e.g. sf_pattern = {“years”: {year}, “months”: {month}} with format_dict = {“years”: {“year”: “2000”},

”months”: {“month”: “02”}}

returns [“2000”, “02”]

Returns:

subfolder – Subfolder with format_dict applied.

Return type:

list of str

property template: Name property.

class ascat.file_handling.Filenames(filenames)[source]

Bases: object

A class to handle operations on multiple filenames.

This class provides methods for reading from, writing to, and merging data from multiple files.

close()[source]

Close file(s).

This method can be overridden in subclasses if necessary.

iter_read(print_progress=False, **kwargs)[source]

Iterate over all files and yield data.

Yields:: object – Data read from each file.

iter_read_nbytes(max_nbytes, print_progress=False, **kwargs)[source]: Iterate over all files and yield data until the specified number of bytes is reached. If _read returns dask objects, they are computed (in parallel) before merging the data.

merge(data)[source]

Merge data from multiple data objects.

Parameters:: data (list) – List of data objects.
Returns:: Merged data, or None if the input list is empty.
Return type:: object

read(parallel=False, closer_attr=None, **kwargs)[source]

Read all data from files.

Returns:: Merged data from all files.
Return type:: object

reprocess(out_dir, func, parallel=False, print_progress=False, read_kwargs=None, **write_kwargs)[source]

Reprocess data from all files through func, writing the results to out_dir. Assumes that if any files have the same name, they should be merged.

Parameters:

out_dir (Path) – Directory to write the output files. This will be prepended to the filenames.
func (function) – The function to apply to the data before writing out.
parallel (bool, optional) – Whether to process the data in parallel (default: False).
**kwargs (dict) – Additional keyword arguments for writing.

write(data, parallel=False, print_progress=False, **kwargs)[source]

Write data to file.

If there’s only one filename in self.filenames, write provided data to that file. If there is more than one filename, write each element of the provided data list to the corresponding filename.

Parameters:: data (list of objects) – The data to write. Should be a list with the same length as self.filenames, where each element is the data to be written to the corresponding filename.

class ascat.file_handling.MultiFileHandler(root_path, cls, fn_templ, sf_templ=None, cls_kwargs=None, err=False, cache_size=0)[source]

Bases: object

MultiFileHandler class.

read(*fmt_args, fmt_kwargs=None, cls_kwargs=None)[source]

Read data.

Parameters:

fmt_args (tuple) – Format arguments.
fmt_kwargs (dict, optional) – Format keywords (Default: None).
cls_kwargs (dict, optional) – Class keywords (Default: None).

Returns:

data – Data stored in file.

Return type:

dict, numpy.ndarray

read_file(filename, cls_kwargs=None)[source]

Read data for given filename.

Parameters:: filename (str) – Filename.

search(fn_search_pattern, sf_search_pattern=None, custom_fn_templ=None, custom_sf_templ=None)[source]

Search files for given root path and filename/folder pattern.

Returns:: filenames – Filenames.
Return type:: list of str

write(data, *fmt_args, fmt_kwargs=None, cls_kwargs=None)[source]

Write data.

Parameters:

data (dict, numpy.ndarray) – Data to write.
fmt_args (tuple) – Format arguments.
fmt_kwargs (dict, optional) – Format keywords (Default: None).
cls_kwargs (dict, optional) – Class keywords (Default: None).

write_file(data, filename, cls_kwargs=None)[source]

Write data for given filename.

Parameters:: filename (str) – Filename.

ascat.h_saf module

ascat.ragged_array module

class ascat.ragged_array.ContiguousRaggedArray(ds: Dataset, count_var: str, instance_dim: str, instance_id_var: str = None)[source]

Bases: object

Contiguous ragged array representation (CF convention).

In an contiguous ragged array representation, the dataset for all time series are stored in a single 1D array. Additional variables or dimensions provide the metadata needed to map these values back to their respective time series.

The contiguous ragged array representation can be used only if the size of each instance is known at the time that it is created. In this representation the data for each instance will be contiguous on disk.

If the instance dimension exists as a variable, it is assumed that the values represent the identifiers for each instance otherwise they are count upwards from 0.

instance_dim

Name of the instance dimension.

Type:: str

sample_dim

Name of the sample dimension. The variable bearing the sample_dimension attribute (i.e. count_var) must have the instance dimension as its single dimension, and must have an integer type.

Type:: str

count_var

Name of the count variable. The count variable must be an integer type and must have the instance dimension as its sole dimension. The count variable are identifiable by the presence of an attribute, sample_dimension, found on the count variable, which names the sample dimension being counted.

Type:: str

ds

Contiguous ragged array dataset.

Type:: xarray.Dataset

instance_variables

List of instance variables.

Type:: list

instance_ids

List of instance ids.

Type:: list

sel_instance(i)[source]: Read time series for given instance.

iter()[source]: Yield time series for each instance.

apply(func)[source]: Apply function on each instance.

property ds

Dataset.

Returns:: ds – Contiguous ragged array dataset.
Return type:: xr.Dataset

classmethod from_file(filename: str, count_var: str, instance_dim: str, instance_id_var: str = None, **kwargs)[source]

Load time series from file.

Parameters:

filename (str) – Filename.
count_var (str) – Count variable name.
instance_dim (str) – Instance dimension name.
instance_id_var (str, optional) – Variable used as instance identifier (default: None).

Returns:

data – ContiguousRaggedArray object loaded from a file.

Return type:

ContiguousRaggedArray

get_instance_variables(include_dtype: bool = False) → list[source]

Instance variables.

Returns:: instance_variables – Instance variables.
Return type:: list of str

property instance_ids: list

Instance ids

Returns:: instance_ids – Instance ids.
Return type:: list of int

property instance_variables: list

Instance variables.

Returns:: instance_variables – Instance variables.
Return type:: list of str

iter()[source]

Explicit iterator method.

Returns:: ds – Time series for instance.
Return type:: xr.Dataset

sel_instance(i: int)[source]: Read time series

sel_instances(i: ndarray) → Dataset[source]

Read time series for given instance IDs using a LUT and preserve order.

Parameters:: i (np.ndarray) – Array of instance IDs.
Returns:: ds – Dataset containing the selected instances in the correct order.
Return type:: xr.Dataset

property size: list

Number of instances.

Returns:: instance_ids – Number of instance.
Return type:: int

to_indexed()[source]

Convert to indexed ragged array.

Returns:: data – Indexed ragged array time series.
Return type:: IndexedRaggedArray

to_orthomulti()[source]

Convert to orthogonal multidimensional array.

Returns:: data – Orthogonal multidimensional array time series.
Return type:: OrthoMultiArray

to_point_data()[source]

validate()[source]: Validate format.

class ascat.ragged_array.IndexedRaggedArray(ds: Dataset, index_var: str, sample_dim: str)[source]

Bases: object

Indexed ragged array representation (CF convention).

In an indexed ragged array representation, the dataset is structured to store variable-length data (e.g., time series with varying lengths) compactly. To achieve this, auxiliary indexing variables that map the flat array storage to meaningful groups (e.g. locations).

If the instance dimension exists as a variable, it is assumed that the values represent the identfiers for each instance otherwise they counting upwards from 0.

index_var

The indexed ragged array representation must contain an index variable, which must be an integer type, and must have the sample dimension as its single dimension. The index variable can be identified by having an attribute ‘instance_dimension’ whose value is the instance dimension.

Type:: str

sample_dim

Name of the sample dimension. The sample dimension indicates the number of instances (e.g. stations, locations).

Type:: str

instance_dim

The name of the instance dimension. The value is defined by the ‘instance_dimension’ attribute, which must be present on the index variable. All variables having the instance dimension are instance variables, i.e. variables holding time series data.

Type:: str

ds

Indexed ragged array dataset.

Type:: xarray.Dataset

instance_variables

List of instance variables.

Type:: list

instance_ids

List of instance ids.

Type:: list

sel_instance(i)[source]: Read time series for given instance.

iter()[source]: Yield time series for each instance.

append(ds: Dataset)[source]

Append indexed ragged array time series.

Parameters:: ds (xarray.Dataset) – Indexed ragged array time series.

apply(func)[source]: Apply function on each instance.

property ds: Dataset

Dataset.

Returns:: ds – Indexed ragged array dataset.
Return type:: xr.Dataset

classmethod from_file(filename: str, index_var: str, sample_dim: str)[source]

Read data from file.

Parameters:

filename (str) – Filename.
index_var (str) – Index variable name.
sample_dim (str) – Sample dimension name.

Returns:

data – IndexRaggedArray object loaded from a file.

Return type:

IndexRaggedArray

property instance_ids: list

Instance ids.

Returns:: instance_ids – Instance ids.
Return type:: list of int

property instance_variables: list

Instance variables.

Returns:: instance_variables – Instance variables.
Return type:: list of str

iter() → Dataset[source]

Explicit iterator method.

Returns:: ds – Time series for instance.
Return type:: xr.Dataset

save(filename: str)[source]

Write data to file.

Parameters:: filename (str) – Filename.

sel_instance(i: int) → Dataset[source]

Read time series.

Parameters:: i (int) – Instance identifier.
Returns:: ds – Time series for instance.
Return type:: xr.Dataset

sel_instances(i: array, ignore_missing: bool = True) → Dataset[source]

Select multiple instances (time series).

Parameters:: i (numpy.array) – Instance identifier.
Returns:: ds – Time series for instance.
Return type:: xr.Dataset

property size: list

Number of instances.

Returns:: instance_ids – Number of instance.
Return type:: int

to_contiguous(count_var: str = 'row_size') → ContiguousRaggedArray[source]

Convert to contiguous ragged array.

Parameters:: count_var (str, optional) – Count variable (default: “row_size”).
Returns:: data – Contiguous ragged array time series.
Return type:: ContiguousRaggedArray

to_orthomulti() → OrthoMultiArray[source]

Convert to orthogonal multidimensional array.

Returns:: data – Orthogonal multidimensional array time series.
Return type:: OrthoMultiArray

to_point_data()[source]

validate()[source]: Validate format.

class ascat.ragged_array.OrthoMultiArray(ds: Dataset, instance_dim: str = 'loc', element_dim: str = 'time')[source]

Bases: object

Orthogonal multidimensional array.

instance_dim

Name of the instance dimension.

Type:: str

element_dim

Element dimension name.

Type:: str

ds

Orthomulti array dataset.

Type:: xarray.Dataset

instance_variables

List of instance variables.

Type:: list

sel_instance(i)[source]: Read time series for given instance.

iter()[source]: Yield time series for each instance.

apply(func)[source]

property ds

iter()[source]: Explicit iterator method

sel_instance(instance_id: int)[source]: Read time series

to_contiguous()[source]

to_indexed()[source]

to_point_data()[source]

validate()[source]: Validate format.

class ascat.ragged_array.PointData(ds: Dataset, sample_dim: str)[source]

Bases: object

Point data represent scattered locations and times with no implied relationship among of coordinate positions, both data and coordinates must share the same (sample) instance dimension.

property ds

to_contiguous(count_var: str = 'row_size', instance_dim: str = 'loc')[source]

Convert point data to contiguous ragged array.

Parameters:

count_var (str) – Name of the new count variable to be added (default: ‘row_size’).
instance_dim (str) – Name of the instance dimension (default: ‘loc’).

Returns:

contiguous – Contiguous ragged array object.

Return type:

ContiguousRaggedArray

to_indexed(index_var: str = 'obs', instance_dim: str = 'loc')[source]

Convert point data to indexed ragged array.

Parameters:

index_var (str) – Name of the new index variable to be added.
instance_dim (str) – Name of the instance dimension.

Returns:

indexed – Indexed ragged array object.

Return type:

IndexedRaggedArray

validate()[source]: Validate format.

ascat.ragged_array.create_contiguous_ragged()[source]: Create CF-compliant contiguous ragged array.

ascat.ragged_array.create_indexed_ragged()[source]: Create CF-compliant indexed ragged array.

ascat.ragged_array.create_ortho_multi()[source]: Create CF-compliant orthomulti array.

ascat.ragged_array.create_point_data()[source]: Create CF-compliant point data array.

ascat.ragged_array.pad_to_2d(var: DataArray, x: array, y: array, shape: tuple) → array[source]

Pad each time series

Parameters:

var (xarray.DataArray) – 1d array to be converted into 2d array.
x (np.array) – Row indices.
y (np.array) – Column indices.
shape (tuple) – Array shape.

Returns:

padded – Padded 2d array.

Return type:

numpy.array

ascat.ragged_array.verify_contiguous_ragged(ds: Dataset, count_var: str, instance_dim: str) → None[source]

Verify dataset follows contiguous ragged array CF definition.

Parameters:

ds (xarray.Dataset) – Dataset to be verified.
count_var (str) – Name of the count variable. Count variable contains the length of each time series feature. It is identified by having an attribute with name ‘sample_dimension’ whose value is name of the sample dimension. The count variable implicitly partitions into individual instances all variables that have the sample dimension.

Raises:

RuntimeError if verification fails. –

ascat.ragged_array.verify_indexed_ragged(ds: Dataset, index_var: str, sample_dim: str) → None[source]

Verify dataset follows indexed ragged array CF definition.

Parameters:

ds (xarray.Dataset) – Dataset.
index_var (str) – The index variable can be identified by having an attribute with name of instance_dimension whose value is the instance dimension.
sample_dim (str) – Name of the sample dimension.

Raises:

RuntimeError if verification fails. –

ascat.ragged_array.verify_ortho_multi(ds: Dataset, instance_dim: str, element_dim: str) → None[source]

Verify dataset follows orthogonal multidimensional array CF definition.

Parameters:

ds (xarray.Dataset) – Dataset to be verified.
instance_dim (str) – Name of the instance dimension.
element_dim (str) – Name of the element dimension.

Returns:

sample_dimension – Name of the sample dimension.

Return type:

str

Raises:

RuntimeError if verification fails. –

ascat.ragged_array.verify_point_array(ds: Dataset, sample_dim: str) → None[source]

Verify dataset follows the CF point data array convention.

Parameters:

ds (xarray.Dataset) – Dataset to be verified.
sample_dim (str) – Name of the sample dimension.

Raises:

RuntimeError if verification fails. –

ascat.ragged_array.vrange(starts, stops)[source]

Create concatenated ranges of integers for multiple start/stop values.

Parameters:

starts (numpy.ndarray) – Starts for each range.
stops (numpy.ndarray) – Stops for each range (same shape as starts).

Returns:

ranges – Concatenated ranges.

Return type:

numpy.ndarray

Example

>>> starts = [1, 3, 4, 6]
>>> stops  = [1, 5, 7, 6]
>>> vrange(starts, stops)
array([3, 4, 4, 5, 6])

ascat.swath module

class ascat.swath.Swath(filenames)[source]

Bases: Filenames

Class to read and merge swath files given one or more file paths.

static combine_attributes(attrs_list, context)[source]

Decides which attributes to keep when merging swath files.

Parameters:

attrs_list (list of dict) – List of attributes dictionaries.
context (None) – This currently is None, but will eventually be passed information about the context in which this was called. (see https://github.com/pydata/xarray/issues/6679#issuecomment-1150946521)

read(parallel=False, mask_and_scale=True, **kwargs)[source]

Read the file or a subset of it.

Parameters:

parallel (bool, optional) – If True, read files in parallel.
mask_and_scale (bool, optional) – If True, mask and scale the data.
kwargs (dict) – Additional keyword arguments passed to Filenames.read.

Returns:

ds – Dataset.

Return type:

xarray.Dataset

class ascat.swath.SwathGridFiles(root_path, fn_templ, sf_templ, grid_name, date_field_fmt, cell_fn_format=None, cls_kwargs=None, err=True, fn_read_fmt=None, sf_read_fmt=None, fn_write_fmt=None, sf_write_fmt=None, preprocessor=None, postprocessor=None, cache_size=0)[source]

Bases: ChronFiles

Class to manage chronological swath files with a date field in the filename.

classmethod from_product_class(path, product_class)[source]

Create a SwathGridFiles from a given io_class.

Returns a SwathGridFiles object initialized with the given io_class.

Parameters:

path (str or Path) – Path to the swath file collection.
io_class (class) – Class to use for reading and writing the swath files.

Examples

>>> my_swath_collection = SwathFileCollection.from_io_class(
...     "/path/to/swath/files",
...     AscatH129Swath,
... )

classmethod from_product_id(path, product_id)[source]

Create a SwathGridFiles object based on a product_id.

Returns a SwathGridFiles object initialized with an io_class specified by product_id (case-insensitive).

Parameters:

path (str or Path) – Path to the swath file collection.
product_id (str) – Identifier for the specific ASCAT product the swath files are part of.

Raises:

ValueError – If product_id is not recognized.

Examples

>>> my_swath_collection = SwathFileCollection.from_product_id(
...     "/path/to/swath/files",
...     "H129",
... )

read(date_range, dt_delta=None, search_date_fmt='%Y%m%d*', date_field='date', end_inclusive=True, cell=None, location_id=None, coords=None, max_coord_dist=None, bbox=None, geom=None, read_kwargs=None, **fmt_kwargs)[source]

Extract data from swath files within a time range and spatial criterion.

Parameters:

date_range (tuple of datetime.datetime) – Start and end date.
dt_delta (timedelta) – Time delta.
search_date_fmt (str) – Search date format.
date_field (str) – Date field.
end_inclusive (bool) – If True (default), include data from the end date in the result. Otherwise, exclude it.
cell (int or list of int) – Grid cell number to read.
location_id (int or list of int) – Location id to read.
coords (tuple of numeric or tuple of iterable of numeric) – Tuple of (lon, lat) coordinates to read.
max_coord_dist (float) – Maximum distance in meters to search for grid points near the given coordinates. If None, the default is np.inf.
bbox (tuple) – Tuple of (latmin, latmax, lonmin, lonmax) coordinates to bound the data.
geom (shapely.geometry) – Geometry to bound the data.

Returns:

Dataset.

Return type:

xarray.Dataset

stack_to_cell_files(out_dir, max_nbytes, date_range=None, fmt_kwargs=None, cells=None, print_progress=True, parallel=True)[source]

Stack all swath files to cell files, writing them in parallel.

Parameters:

out_dir (str) – Output directory.
max_nbytes (int) – Maximum number of bytes to open as xarray datasets before dumping to disk.
date_range (tuple of datetime.datetime, optional) – Start and end date for the search.
fmt_kwargs (dict, optional) – Additional keyword arguments passed to ascat.file_handling.ChronFiles.search_period.
cells (list of int, optional) – List of grid cell numbers to read. If None (default), all cells are read.
print_progress (bool, optional) – If True (default), print progress bars.
parallel (bool, optional) – If True, write data to files in parallel (use all available resources).

swath_search(dt_start, dt_end, dt_delta=None, search_date_fmt='%Y%m%d*', date_field='date', end_inclusive=True, cell=None, location_id=None, coords=None, bbox=None, geom=None, **fmt_kwargs)[source]

Search for swath files within a time range and spatial criterion.

Parameters:

dt_start (datetime) – Start date.
dt_end (datetime) – End date.
dt_delta (timedelta) – Time delta.
search_date_fmt (str) – Search date format.
date_field (str) – Date field.
end_inclusive (bool) – End date inclusive.
cell (int or list of int) – Grid cell number to read.
location_id (int or list of int) – Location id.
coords (tuple of numeric or tuple of iterable of numeric) – Tuple of (lon, lat) coordinates.
bbox (tuple) – Tuple of (latmin, latmax, lonmin, lonmax) coordinates.
geom (shapely.geometry) – Geometry.
fmt_kwargs (dict) – Additional keyword arguments passed to ascat.file_handling.ChronFiles.search_period.

Returns:

Filenames.

Return type:

list of str

ascat.utils module

class ascat.utils.Spacecraft(name)[source]

Bases: object

Spacecraft class.

valid_spacecraft_names = ['METOPA', 'METOPB', 'METOPC', 'METOP-A', 'METOP-B', 'METOP-C', 'METOP-SG B1', 'METOP-SG B2', 'METOP-SG B3']

ascat.utils.append_to_netcdf(filename, ds_to_append, unlimited_dim)[source]

Appends an xarray dataset to an existing netCDF file along a given unlimited dim.

Parameters:

filename (str or Path) – Filename of netCDF file to append to.
ds_to_append (xarray.Dataset) – Dataset to append.
unlimited_dim (str or list of str) – Name of the unlimited dimension to append along.

Raises:

ValueError – If more than one unlimited dim is given.

ascat.utils.boxcar(radius, distance)[source]

Boxcar filter

Parameters:

n (int) – Length.

Returns:

weights (numpy.ndarray) – Distance weights.
tw (float32) – Sum of weigths.

ascat.utils.create_variable_encodings(ds, custom_variable_encodings=None, custom_dtypes=None)[source]

Create an encoding dictionary for a dataset, optionally overriding the default encoding or adding additional encoding parameters. New parameters cannot be added to default encoding for a variable, only overridden.

E.g. if you want to add a “units” encoding to “lon”, you should also pass “dtype”, “zlib”, “complevel”, and “_FillValue” if you don’t want to lose those.

Parameters:

ds (xarray.Dataset) – Dataset.
custom_variable_encodings (dict, optional) – Custom encodings.

Returns:

ds – Dataset with encodings.

Return type:

xarray.Dataset

ascat.utils.daterange(start_date, end_date)[source]

Generator for daily datetimes.

Parameters:

start_date (datetime) – Start date.
end_date (datetime) – End date.

ascat.utils.db2lin(val)[source]

Converting from linear to dB domain.

Parameters:: val (numpy.ndarray) – Values in dB domain.
Returns:: val – Values in linear domain.
Return type:: numpy.ndarray

ascat.utils.get_bit(a, bit_pos)[source]

Returns 1 or 0 if bit is set or not.

Parameters:

a (int or numpy.ndarray) – Input array.
bit_pos (int) – Bit position. First bit position is right.

Returns:

b – 1 if bit is set and 0 if not.

Return type:

numpy.ndarray

ascat.utils.get_file_format(filename)[source]

Try to guess the file format from the extension.

Parameters:: filename (str) – File name.
Returns:: file_format – File format indicator.
Return type:: str

ascat.utils.get_grid_gpis(grid, cell=None, location_id=None, coords=None, bbox=None, geom=None, max_coord_dist=inf, return_lookup: bool = False)[source]

Get grid point indices.

Parameters:

grid (pygeogrids.CellGrid) – Grid object.
cell (int or iterable of int, optional) – Cell number(s).
location_id (int or iterable of int, optional) – Location ID.
coords (tuple, optional) – Tuple of (lon, lat) coordinates.
bbox (tuple, optional) – Tuple of (latmin, latmax, lonmin, lonmax) coordinates.
geom (shapely.geometry.BaseGeometry, optional) – Geometry object.
max_coord_dist (float, optional) – Maximum distance from coordinates to return a gpi.

Returns:

gpi (int) – Grid point index.
lookup_vector (numpy.ndarray) – Lookup vector. (only if return_lookup is True)

ascat.utils.get_roi_subset(ds, roi)[source]

Filter dataset for given region of interest.

Parameters:

ds (xarray.Dataset) – Dataset to be filtered for region of interest.
roi (tuple of 4 float) – Region of interest: latmin, lonmin, latmax, lonmax

Returns:

ds – Filtered dataset.

Return type:

xarray.Dataset

ascat.utils.get_toi_subset(ds, toi)[source]

Filter dataset for given time of interest.

Parameters:

ds (xarray.Dataset) – Dataset to be filtered for time of interest.
toi (tuple of datetime) – Time of interest.

Returns:

ds – Filtered dataset.

Return type:

xarray.Dataset

ascat.utils.get_window_radius(window, hp_radius)[source]

Calculates the required radius of a window function in order to achieve the provided half power radius.

Parameters:

window (string) –
Window function name. Current supported windows:
- Hamming
- Boxcar
hp_radius (float32) – Half power radius. Radius of window function for weight equal to 0.5 (-3 dB). In the spatial domain this corresponds to half of the spatial resolution one would like to achieve with the given window.

Returns:

r – Window radius needed to achieve the given half power radius

Return type:

float32

ascat.utils.get_window_weights(window, radius, distance, norm=False)[source]

Function returning weights for the provided window function

Parameters:

window (str) – Window function name
radius (float) – Radius of the window.
distance (numpy.ndarray) – Distance array
norm (boolean) – If true, normalised weights will be returned.

Returns:

weights – Weights according to distances and given window function

Return type:

numpy.ndarray

ascat.utils.gpis_to_lookup(grid, gpis)[source]

Create lookup vector from grid point indices.

Parameters:

grid (pygeogrids.BasicGrid) – Grid object.
gpis (numpy.ndarray) – Grid point indices.

Returns:

lookup_vector – Lookup vector.

Return type:

numpy.ndarray

ascat.utils.hamming_window(radius, distances)[source]

Hamming window filter.

Parameters:

radius (float32) – Radius of the window.
distances (numpy.ndarray) – Array with distances.

Returns:

weights (numpy.ndarray) – Distance weights.
tw (float32) – Sum of weigths.

ascat.utils.lin2db(val)[source]

Converting from linear to dB domain.

Parameters:: val (numpy.ndarray) – Values in linear domain.
Returns:: val – Values in dB domain.
Return type:: numpy.ndarray

ascat.utils.mask_dtype_nans(ds)[source]: Mask NaNs in a dataset based on the dtypes of its variables.

ascat.utils.set_bit(a, bit_pos, value=1)[source]

Set bit at given position.

Parameters:

a (int or numpy.ndarray) – Input array.
bit_pos (int) – Bit position. First bit starts right.
value (1 or 0, optional) – Set bit either to 1 or 0 (default: 1).

Returns:

a – Modified input array with bit=value.

Return type:

numpy.ndarray

ascat.utils.tmp_unzip(filename)[source]

Unzip file to temporary directory.

Parameters:: filename (str) – Filename.
Returns:: unzipped_filename – Unzipped filename
Return type:: str

ascat package

Subpackages

Submodules

ascat.accessors module

ascat.cell module

ascat.cf_array module

ascat.cgls module

ascat.file_handling module

ascat.h_saf module

ascat.ragged_array module

ascat.swath module

ascat.utils module

Module contents