{"cells":[{"cell_type":"markdown","metadata":{},"source":["Working with ASCAT data and Xarray\n","==================================\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["## Working with swath files\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["### Creating a SwathFileCollection\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["If we have a collection of time-series swath files, we can create a SwathFileCollection object that will handle them as a group.\n","\n"]},{"cell_type":"code","execution_count":1,"metadata":{},"outputs":[],"source":["from datetime import datetime\n","from importlib import reload\n","\n","import ascat.read_native.ragged_array_ts as rat\n","from ascat.read_native import xarray_io\n","\n","from time import time"]},{"cell_type":"code","execution_count":2,"metadata":{},"outputs":[],"source":["swath_source = \"/home/charriso/p14/data-write/RADAR/hsaf/h121_v1.0/swaths\""]},{"cell_type":"markdown","metadata":{},"source":["It’s important to understand the structure of the data, so that the SwathFileCollection can find and parse all of the data that is requested from it. Handily, this package comes with existing parsers for several ASCAT products. These can be used with `SwathFileCollection.from_product_id()`:\n","\n"]},{"cell_type":"code","execution_count":3,"metadata":{},"outputs":[],"source":["swath_collection = rat.SwathFileCollection.from_product_id(swath_source, \"H121_V1.0\")"]},{"cell_type":"markdown","metadata":{},"source":["The currently included project ids are the keys of `ascat.read_native.xarray_io.swath_io_catalog`\n","\n"]},{"cell_type":"code","execution_count":4,"metadata":{},"outputs":[{"data":{"text/plain":["dict_keys(['H129', 'H129_V1.0', 'H121_V1.0', 'H122', 'SIG0_6.25', 'SIG0_12.5'])"]},"execution_count":4,"metadata":{},"output_type":"execute_result"}],"source":["from ascat.read_native.xarray_io import swath_io_catalog\n","swath_io_catalog.keys()"]},{"cell_type":"markdown","metadata":{},"source":["If your data structure does not match anything included in the package, you can write a class inheriting from `ascat.read_native.xarray_io.SwathIOBase`:\n","\n"]},{"cell_type":"code","execution_count":1,"metadata":{"executable":"False"},"outputs":[],"source":["class AscatH121v1Swath(SwathIOBase):\n"," # outlines the basic structure of the swath filenames, with {bracketed} variable names\n"," # in the place of anything that changes according to the data inside\n"," fn_pattern = \"W_IT-HSAF-ROME,SAT,SSM-ASCAT-METOP{sat}-12.5km-H121_C_LIIB_{placeholder}_{placeholder1}_{date}____.nc\"\n","\n"," # defines the names of the subfolder/directory names that contain the data. In this case,\n"," # files are sorted into folders by satellite and year, so that a typical filepath looks like\n"," # \".../metop_b/2021/W_IT-HSAF-ROME...\"\n"," sf_pattern = {\n"," \"satellite_folder\": \"metop_[abc]\",\n"," \"year_folder\": \"{year}\"\n"," }\n","\n"," # specifies the string format of the date that occupies the \"date\" field in fn_pattern\n"," date_format = \"%Y%m%d%H%M%S\"\n","\n"," # provides the grid that the data is based on. In this case, using the grid_cache allows\n"," # us to reuse the same grid for multiple classes in the source code without\n"," # actually generating several different FibGrids.\n"," # however, it would also work to just write, for example:\n"," # grid = FibGrid(12.5)\n"," grid = grid_cache.fetch_or_store(\"Fib12.5\", FibGrid, 12.5)[\"grid\"]\n","\n"," # specifies the size (in degrees) of the grid cell to be written out by SwathFileCollection.stack()\n"," grid_cell_size = 5\n","\n"," # specifies the filename format for the cells written out by SwathFileCollection.stack()\n"," cell_fn_format = \"{:04d}.nc\"\n","\n"," # names any dataset variables with a \"beams\" dimension - not relevant with this particular product\n"," beams_vars = []\n","\n"," # defines all the dataset's data variable names and the dtype that they should be written as when packed.\n"," ts_dtype = np.dtype([\n"," (\"sat_id\", np.int8),\n"," (\"as_des_pass\", np.int8),\n"," (\"swath_indicator\", np.int8),\n"," (\"surface_soil_moisture\", np.float32),\n"," (\"surface_soil_moisture_noise\", np.float32),\n"," (\"backscatter40\", np.float32),\n"," (\"slope40\", np.float32),\n"," (\"curvature40\", np.float32),\n"," (\"surface_soil_moisture_sensitivity\", np.float32),\n"," (\"backscatter_flag\", np.uint8),\n"," (\"correction_flag\", np.uint8),\n"," (\"processing_flag\", np.uint8),\n"," (\"surface_flag\", np.uint8),\n"," (\"snow_cover_probability\", np.int8),\n"," (\"frozen_soil_probability\", np.int8),\n"," (\"wetland_fraction\", np.int8),\n"," (\"topographic_complexity\", np.int8),\n"," (\"subsurface_scattering_probability\", np.int8),\n"," ])\n","\n"," # a function for generating a filename regex for a particular timestamp\n"," # (used for filtering swath files by date)\n"," @staticmethod\n"," def fn_read_fmt(timestamp):\n"," return {\"date\": timestamp.strftime(\"%Y%m%d*\"),\n"," \"sat\": \"[ABC]\",\n"," \"placeholder\": \"*\",\n"," \"placeholder1\": \"*\"}\n","\n"," # a function that returns subfolder regexes based on a timestamp\n"," # Here, in the case of satellite_folder, we don't actually need to be able to search\n"," # by satellite, so we just have a regex that is inclusive of all three possible satellites.\n"," # However, in the case of year_folder, we want the search year to actually match with the\n"," # timestamp we pass to the function.\n"," @staticmethod\n"," def sf_read_fmt(timestamp):\n"," return {\n"," \"satellite_folder\": {\"satellite\": \"metop_[abc]\"},\n"," \"year_folder\": {\"year\": f\"{timestamp.year}\"},\n"," }\n","\n"," def __init__(self, filename, **kwargs):\n"," super().__init__(filename, \"netcdf4\", **kwargs)"]},{"cell_type":"markdown","metadata":{},"source":["After creating your IO class, you can use it to make a collection by passing it to the SwathFileCollection class:\n","\n"]},{"cell_type":"code","execution_count":1,"metadata":{},"outputs":[],"source":["swath_collection = rat.SwathFileCollection(swath_source, ioclass=ASCATH121v1Swath)"]},{"cell_type":"markdown","metadata":{},"source":["Regardless of how you define you define your collection, once created it can be used to read data from your swath collection for any given date range and geographic extent. It can also be used to stack data in the collection into cellwise timeseries in indexed ragged array format, according to the CellGrid defined in the ioclass.\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["Let’s start with a rectangular, roughly country-sized area in Central Europe, and a time period of four days.\n","\n"]},{"cell_type":"code","execution_count":5,"metadata":{},"outputs":[],"source":["bounds = (45, 50, 10, 20) #latmin, latmax, lonmin, lonmax\n","dates = (datetime(2020, 12, 1), datetime(2020, 12, 5))"]},{"cell_type":"markdown","metadata":{},"source":["By calling the `read` method of our SwathFileCollection, we open every dataset within the `date_range` we’ve passed, concatenate them together into an xarray dataset, and filter out any observations that don’t intersect with the bounding box we’ve passed to `bbox`. Other spatial selections we could pass to `read` are `cell` (cell number or list of cell numbers in the swath data’s grid system), `location_id` (grid point ID or list of IDs in the swath data’s grid system), `coords` (lat/lon coordinate or list of coorinates that will be converted to the nearest grid point ID or list of grid point IDs), or `geom` (a Shapely geometry).\n","\n"]},{"cell_type":"code","execution_count":7,"metadata":{},"outputs":[{"data":{"text/html":["
<xarray.Dataset>\n","Dimensions: (obs: 45007)\n","Coordinates:\n"," latitude (obs) float64 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," longitude (obs) float64 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," time (obs) datetime64[ns] 2020-12-01T08:09:...\n","Dimensions without coordinates: obs\n","Data variables: (12/19)\n"," location_id (obs) float64 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," as_des_pass (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," swath_indicator (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," surface_soil_moisture (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," surface_soil_moisture_noise (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," backscatter40 (obs) float64 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," ... ...\n"," snow_cover_probability (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," frozen_soil_probability (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," wetland_fraction (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," topographic_complexity (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," subsurface_scattering_probability (obs) float32 dask.array<chunksize=(1056,), meta=np.ndarray>\n"," sat_id (obs) int64 ...
<xarray.DataArray 'location_id' (obs: 45007)>\n","array([1258693., 1259680., 1260057., ..., 1260222., 1260311., 1260455.])\n","Coordinates:\n"," latitude (obs) float64 49.91 49.96 49.98 49.82 ... 49.98 49.99 49.99 50.0\n"," longitude (obs) float64 19.94 19.78 19.35 19.84 ... 11.69 10.57 12.38 11.26\n"," time (obs) datetime64[ns] 2020-12-01T08:09:59.415000064 ... 2020-12...\n","Dimensions without coordinates: obs\n","Attributes:\n"," long_name: Location identifier (Grid Point ID)\n"," valid_min: 0\n"," valid_max: 3300000
<xarray.DataArray 'surface_soil_moisture' (location_id: 2652)>\n","array([55.3 , 73.21357 , 54.573635 , ..., 3.9599998, 41.566666 ,\n"," 25.1 ], dtype=float32)\n","Coordinates:\n"," * location_id (location_id) float64 1.163e+06 1.163e+06 ... 1.26e+06 1.26e+06\n","Attributes:\n"," long_name: surface soil moisture\n"," units: percent saturation\n"," valid_min: 0\n"," valid_max: 10000
<xarray.DataArray 'surface_soil_moisture' (location_id: 2652)>\n","array([55.3 , 73.21357 , 54.57363 , ..., 3.9599998, 41.566666 ,\n"," 25.100002 ], dtype=float32)\n","Coordinates:\n"," * location_id (location_id) float64 1.163e+06 1.163e+06 ... 1.26e+06 1.26e+06\n","Attributes:\n"," long_name: surface soil moisture\n"," units: percent saturation\n"," valid_min: 0\n"," valid_max: 10000
<xarray.DataArray 'surface_soil_moisture' (location_id: 2652, hour: 9)>\n","array([[91.71 , 37.46 , 71.503334 , ..., 55.53 , 40.95 ,\n"," nan],\n"," [75.7 , 56.953327 , 63.656666 , ..., 80.432495 , 94.45 ,\n"," nan],\n"," [52. , 42.173332 , 57.54333 , ..., 58.72333 , nan,\n"," nan],\n"," ...,\n"," [ 3.9599998, nan, nan, ..., nan, nan,\n"," nan],\n"," [ nan, 26.9 , 70.9 , ..., nan, nan,\n"," nan],\n"," [14.49 , 21.055 , 12.83 , ..., nan, 31.404999 ,\n"," nan]], dtype=float32)\n","Coordinates:\n"," * location_id (location_id) float64 1.163e+06 1.163e+06 ... 1.26e+06 1.26e+06\n"," * hour (hour) int32 7 8 9 10 17 18 19 20 21\n","Attributes:\n"," long_name: surface soil moisture\n"," units: percent saturation\n"," valid_min: 0\n"," valid_max: 10000
<xarray.Dataset>\n","Dimensions: (obs: 59143813, locations: 4378)\n","Coordinates:\n"," time (obs) datetime64[ns] 2007-01-01T08:00:...\n"," lon (locations) float32 ...\n"," lat (locations) float32 ...\n"," alt (locations) float32 ...\n","Dimensions without coordinates: obs, locations\n","Data variables: (12/22)\n"," locationIndex (obs) int64 ...\n"," as_des_pass (obs) float32 ...\n"," swath_indicator (obs) float32 ...\n"," surface_soil_moisture (obs) float32 ...\n"," surface_soil_moisture_noise (obs) float32 ...\n"," backscatter40 (obs) float32 ...\n"," ... ...\n"," topographic_complexity (obs) float32 ...\n"," subsurface_scattering_probability (obs) float32 ...\n"," global_attributes_flag (locations) int64 ...\n"," sat_id (obs) float32 ...\n"," location_id (locations) int64 ...\n"," location_description (locations) object ...\n","Attributes: (12/15)\n"," title: ASCAT surface soil moisture near real-time product\n"," summary: ASCAT surface soil moisture expressed in degree of sat...\n"," doi: unset\n"," keywords: Metop-A ASCAT surface soil moisture\n"," history: original generated product\n"," institution: H SAF\n"," ... ...\n"," disposition_mode: Operational\n"," environment: Operational\n"," references: h-saf.eumetsat.int\n"," software_version: warp_h_nrt 0.0.0\n"," conventions: CF-1.10\n"," featureType: timeSeries
<xarray.Dataset>\n","Dimensions: (obs: 226965, locations: 4378)\n","Coordinates:\n"," time (obs) datetime64[ns] 2020-12-01T08:11:...\n"," lon (locations) float32 ...\n"," lat (locations) float32 ...\n"," alt (locations) float32 ...\n","Dimensions without coordinates: obs, locations\n","Data variables: (12/22)\n"," locationIndex (obs) int64 ...\n"," as_des_pass (obs) float32 ...\n"," swath_indicator (obs) float32 ...\n"," surface_soil_moisture (obs) float32 ...\n"," surface_soil_moisture_noise (obs) float32 ...\n"," backscatter40 (obs) float32 ...\n"," ... ...\n"," topographic_complexity (obs) float32 ...\n"," subsurface_scattering_probability (obs) float32 ...\n"," global_attributes_flag (locations) int64 ...\n"," sat_id (obs) float32 ...\n"," location_id (locations) int64 ...\n"," location_description (locations) object ...\n","Attributes: (12/15)\n"," title: ASCAT surface soil moisture near real-time product\n"," summary: ASCAT surface soil moisture expressed in degree of sat...\n"," doi: unset\n"," keywords: Metop-A ASCAT surface soil moisture\n"," history: original generated product\n"," institution: H SAF\n"," ... ...\n"," disposition_mode: Operational\n"," environment: Operational\n"," references: h-saf.eumetsat.int\n"," software_version: warp_h_nrt 0.0.0\n"," conventions: CF-1.10\n"," featureType: timeSeries
<xarray.DataArray 'surface_soil_moisture' (locationIndex: 4000)>\n","array([73.23913 , 85.52 , 52.248085, ..., 64.45 , 35.750626,\n"," 62.632 ], dtype=float32)\n","Coordinates:\n"," * locationIndex (locationIndex) int64 1 2 4 5 6 ... 4373 4374 4375 4376 4377\n","Attributes:\n"," long_name: surface soil moisture\n"," units: percent saturation\n"," valid_min: 0\n"," valid_max: 10000\n"," coordinates: time latitude longitude