Zonal

Apply

xrspatial.zonal.apply(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, func: Callable, nodata: Optional[int] = 0)[source]

Apply a function to the values agg within zones in zones agg. Change the agg content.

Parameters
  • zones (xr.DataArray) – zones.values is a 2d array of integers. A zone is all the cells in a raster that have the same value, whether or not they are contiguous. The input zone layer defines the shape, values, and locations of the zones. An integer field in the zone input is specified to define the zones.

  • agg (xr.DataArray) – agg.values is either a 2D or 3D array of integers or floats. The input value raster.

  • func (callable function to apply.) –

  • nodata (int, default=None) – Nodata value in zones raster. Cells with nodata does not belong to any zone, and thus excluded from calculation.

Examples

>>> zones_val = np.array([[1, 1, 0, 2],
>>>                       [0, 2, 1, 2]])
>>> zones = xarray.DataArray(zones_val)
>>> values_val = np.array([[2, -1, 5, 3],
>>>                        [3, np.nan, 20, 10]])
>>> agg = xarray.DataArray(values_val)
>>> func = lambda x: 0
>>> apply(zones, agg, func)
>>> agg
array([[0, 0, 5, 0],
       [3, 0, 0, 0]])

Crop

xrspatial.zonal.crop(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, zones_ids: Union[list, tuple], name: str = 'crop')[source]

Crop scans from edges and eliminates rows / cols until one of the input values is found.

Parameters
  • zones (xr.DataArray) – Input zone raster.

  • values (xr.DataArray) – Input values raster.

  • zones_ids (list or tuple) – List of zone ids to crop raster.

  • name (str, default='crop') – Output xr.DataArray.name property.

Returns

crop_agg

Return type

xarray.DataArray

Notes

  • This operation will change the output size of the raster.

Examples

>>> print(terrain_agg.shape)
(300, 500)

>>> print(terrain_agg.attrs)
{
    'res': (80000.0, 133333.3333333333),
    'Description': 'Example Terrain',
    'units': 'km',
    'Max Elevation': '4000',
}

>>> print(cropped_agg.shape)
(300, 250)

>>> print(cropped_agg.attrs)
{
    'res': (80000.0, 133333.3333333333),
    'Description': 'Example Crop',
    'units': 'km',
    'Max Elevation': '4000',
}

Regions

xrspatial.zonal.regions(raster: xarray.core.dataarray.DataArray, neighborhood: int = 4, name: str = 'regions') xarray.core.dataarray.DataArray[source]

Create unique regions of raster based on pixel value connectivity. Connectivity can be based on either 4 or 8-pixel neighborhoods. Output raster contain a unique int for each connected region.

Parameters
  • raster (xr.DataArray) –

  • connections (int, default=4) – 4 or 8 pixel-based connectivity.

  • name (str, default='regions') – output xr.DataArray.name property.

Returns

regions_agg

Return type

xarray.DataArray

References

Examples

>>> print(terrain_agg[200:203, 200:202])
<xarray.DataArray 'Elevation' (lat: 3, lon: 2)>
array([[1264.02296597, 1261.947921  ],
       [1285.37105519, 1282.48079719],
       [1306.02339636, 1303.4069579 ]])
Coordinates:
* lon      (lon) float64 -3.96e+06 -3.88e+06
* lat      (lat) float64 6.733e+06 6.867e+06 7e+06
Attributes:
    res:            (80000.0, 133333.3333333333)
    Description:    Example Terrain
    units:          km
    Max Elevation:  4000

>>> print(regions_agg[200:203, 200:202])
<xarray.DataArray 'Region Value' (lat: 3, lon: 2)>
array([[39557., 39558.],
       [39943., 39944.],
       [40327., 40328.]])
Coordinates:
* lon      (lon) float64 -3.96e+06 -3.88e+06
* lat      (lat) float64 6.733e+06 6.867e+06 7e+06
Attributes:
    res:            (80000.0, 133333.3333333333)
    Description:    Example Regions
    units:
    Max Elevation:  4000

Trim

xrspatial.zonal.trim(raster: xarray.core.dataarray.DataArray, values: Union[list, tuple] = (nan,), name: str = 'trim') xarray.core.dataarray.DataArray[source]

Trim scans from the edges and eliminates rows / cols which only contain the values supplied.

Parameters
  • raster (xr.DataArray) –

  • values (list or tuple, default=(np.nan)) – List of zone ids to trim from raster edge.

  • name (str, default='trim') – Output xr.DataArray.name property.

Returns

trim_agg

Return type

xarray.DataArray

Notes

  • This operation will change the output size of the raster.

Examples

>>> print(terrain_agg.shape)
(300, 500)

>>> print(terrain_agg.attrs)
{
    'res': (80000.0, 133333.3333333333),
    'Description': 'Example Terrain',
    'units': 'km',
    'Max Elevation': '4000',
}

>>> print(trimmed_agg.shape)
(268, 500)

>>> print(trimmed_agg.attrs)
{
    'res': (80000.0, 133333.3333333333),
    'Description': 'Example Trim',
    'units': 'km',
    'Max Elevation': '4000',
}

Zonal Statistics

xrspatial.zonal.get_full_extent(crs)[source]

Returns the full extent of a map projection, available projections are ‘Mercator’ and ‘Geographic’.

Parameters

crs (str) – Input projection.

Returns

extent – Projection extent of form ((min_x, max_x), (min_y, max_y)).

Return type

tuple

Examples

>>> from xrspatial.zonal import get_full_extent

>>> extent = get_full_extent('Mercator')
>>> print(extent)
((-20000000.0, 20000000.0), (-20000000.0, 20000000.0))
xrspatial.zonal.stats(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, zone_ids: Optional[List[Union[int, float]]] = None, stats_funcs: Union[Dict, List] = ['mean', 'max', 'min', 'sum', 'std', 'var', 'count'], nodata_values: Optional[Union[int, float]] = None) Union[pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame][source]

Calculate summary statistics for each zone defined by a zones dataset, based on values aggregate.

A single output value is computed for every zone in the input zones dataset.

This function currently supports numpy backed, and dask with numpy backed xarray DataArrays.

Parameters
  • zones (xr.DataArray) – zones is a 2D xarray DataArray of numeric values. A zone is all the cells in a raster that have the same value, whether or not they are contiguous. The input zones raster defines the shape, values, and locations of the zones. An integer field in the input zones DataArray defines a zone.

  • values (xr.DataArray) – values is a 2D xarray DataArray of numeric values (integers or floats). The input values raster contains the input values used in calculating the output statistic for each zone. In dask case, the chunksizes of zones and values should be matching. If not, values will be rechunked to be the same as of zones.

  • zone_ids (list of ints, or floats) – List of zones to be included in calculation. If no zone_ids provided, all zones will be used.

  • stats_funcs (dict, or list of strings, default=['mean', 'max', 'min',) – ‘sum’, ‘std’, ‘var’, ‘count’] The statistics to calculate for each zone. If a list, possible choices are subsets of the default options. In the dictionary case, all of its values must be callable. Function takes only one argument that is the values raster. The key become the column name in the output DataFrame. Note that if zones and values are dask backed DataArrays, stats_funcs must be provided as a list that is a subset of default supported stats.

  • nodata_values (int, float, default=None) – Nodata value in values raster. Cells with nodata_values do not belong to any zone, and thus excluded from calculation.

Returns

stats_df – A pandas DataFrame, or a dask DataFrame where each column is a statistic and each row is a zone with zone id.

Return type

Union[pandas.DataFrame, dask.dataframe.DataFrame]

Examples

>>> # Calculate Stats
>>> stats_df = stats(zones=zones, values=values)
>>> print(stats_df)
    zone  mean  max  min   sum       std    var  count
0   0    22.0   44    0   550  14.21267  202.0     25
1  10    27.0   49    5   675  14.21267  202.0     25
2  20    72.0   94   50  1800  14.21267  202.0     25
3  30    77.0   99   55  1925  14.21267  202.0     25

>>> # Custom Stats
>>> custom_stats ={'double_sum': lambda val: val.sum()*2}
>>> custom_stats_df = stats(zones=zones,
                            values=values,
                            stats_funcs=custom_stats)
>>> print(custom_stats_df)
    zone  double_sum
0   0     1100
1  10     1350
2  20     3600
3  30     3850

>>> # Calculate Stats with dask backed xarray DataArrays
>>> dask_stats_df = stats(zones=dask_zones, values=dask_values)
>>> print(type(dask_stats_df))
<class 'dask.dataframe.core.DataFrame'>
>>> print(dask_stats_df.compute())
    zone  mean  max  min   sum       std    var  count
0     0  22.0   44    0   550  14.21267  202.0     25
1    10  27.0   49    5   675  14.21267  202.0     25
2    20  72.0   94   50  1800  14.21267  202.0     25
3    30  77.0   99   55  1925  14.21267  202.0     25

>>> # Custom Stats with dask backed xarray DataArrays
>>> dask_custom_stats ={'double_sum': lambda val: val.sum()*2}
>>> dask_custom_stats_df = stats(
>>>      zones=dask_zones, values=dask_values, stats_funcs=custom_stats
>>> )
>>> print(type(dask_custom_stats_df))
<class 'dask.dataframe.core.DataFrame'>
>>> print(dask_custom_stats_df.compute())
    zone  double_sum
0     0        1100
1    10        1350
2    20        3600
3    30        3850
xrspatial.zonal.suggest_zonal_canvas(smallest_area: Union[int, float], x_range: Union[tuple, list], y_range: Union[tuple, list], crs: str = 'Mercator', min_pixels: int = 25) tuple[source]

Given a coordinate reference system (crs), a set of polygons with corresponding x range and y range, calculate the height and width of canvas so that the smallest polygon (polygon with smallest area) is rasterized with at least min pixels.

Currently, we assume that the smallest polygon does not intersect others. One should note that a polygon can have different shapes when it is rasterized in canvases of different size. Thus, we cannot 100% guarantee the actual number of pixels after rasterization. It is recommended to add an additional of 5% to @min_pixels parameter.

Parameters
  • x_range (tuple or list of float or int) – The full x extent of the polygon GeoDataFrame.

  • y_range (tuple or list of float or int) – The full y extent of the polygon GeoDataFrame.

  • smallest_area (float or int) – Area of the smallest polygon.

  • crs (str, default='Mercator') – Name of the coordinate reference system.

  • min_pixels (int, default=25) – Expected number of pixels of the polygon with smallest area when the whole dataframe is rasterized.

Returns

height, width – Height and width of the canvas in pixel space.

Return type

int

Examples

>>> # Imports
>>> from spatialpandas import GeoDataFrame
>>> import geopandas as gpd
>>> import datashader as ds

>>> df = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
>>> df = df.to_crs("EPSG:3857")
>>> df = df[df.continent != 'Antarctica']
>>> df['id'] = [i for i in range(len(df.index))]
>>> xmin, ymin, xmax, ymax = (
        df.bounds.minx.min(),
        df.bounds.miny.min(),
        df.bounds.maxx.max(),
        df.bounds.maxy.max(),
    )
>>> x_range = (xmin, xmax)
>>> y_range = (ymin, ymax)
>>> smallest_area = df.area.min()
>>> min_pixels = 20
>>> height, width = suggest_zonal_canvas(
        x_range=x_range,
        y_range=y_range,
        smallest_area=smallest_area,
        crs='Mercator',
        min_pixels=min_pixels,
    )
>>> cvs = ds.Canvas(x_range=x_range, y_range=y_range,
>>>             plot_height=height, plot_width=width)
>>> spatial_df = GeoDataFrame(df, geometry='geometry')
>>> agg = cvs.polygons(spatial_df, 'geometry', agg=ds.max('id'))
>>> min_poly_id = df.area.argmin()
>>> actual_min_pixels = len(np.where(agg.data==min_poly_id)[0])

Zonal Cross Tabulate

xrspatial.zonal.crosstab(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, zone_ids: Optional[List[Union[int, float]]] = None, cat_ids: Optional[List[Union[int, float]]] = None, layer: Optional[int] = None, agg: Optional[str] = 'count', nodata_zones: Optional[Union[int, float]] = None, nodata_values: Optional[Union[int, float]] = None) Union[pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame][source]

Calculate cross-tabulated (categorical stats) areas between two datasets: a zone dataset zones, a value dataset values (a value raster). Infinite and NaN values in zones and values will be ignored.

Outputs a pandas DataFrame if zones and values are numpy backed. Outputs a dask DataFrame if zones and values are dask with numpy-backed xarray DataArrays.

Requires a DataArray with a single data dimension, here called the “values”, indexed using either 2D or 3D coordinates.

DataArrays with 3D coordinates are expected to contain values distributed over different categories that are indexed by the additional coordinate. Such an array would reduce to the 2D-coordinate case if collapsed across the categories (e.g. if one did aggc.sum(dim='cat') for a categorical dimension cat).

Parameters
  • zones (xr.DataArray) – 2D data array of integers or floats. A zone is all the cells in a raster that have the same value, whether or not they are contiguous. The input zones raster defines the shape, values, and locations of the zones. An unique field in the zone input is specified to define the zones.

  • values (xr.DataArray) – 2D or 3D data array of integers or floats. The input value raster contains the input values used in calculating the categorical statistic for each zone.

  • zone_ids (List of ints, or floats) – List of zones to be included in calculation. If no zone_ids provided, all zones will be used.

  • cat_ids (List of ints, or floats) – List of categories to be included in calculation. If no cat_ids provided, all categories will be used.

  • layer (int, default=0) – index of the categorical dimension layer inside the values DataArray.

  • agg (str, default = 'count') – Aggregation method. If the data is 2D, available options are: percentage, count. If the data is 3D, available option is: count.

  • nodata_zones (int, float, default=None) – Nodata value in zones raster. Cells with nodata do not belong to any zone, and thus excluded from calculation.

  • nodata_values (int, float, default=None) – Nodata value in values raster. Cells with nodata do not belong to any zone, and thus excluded from calculation.

Returns

crosstab_df – A pandas DataFrame, or an uncomputed dask DataFrame, where each column is a categorical value and each row is a zone with zone id. Each entry presents the statistics, which computed using the specified aggregation method, of the category over the zone.

Return type

Union[pandas.DataFrame, dask.dataframe.DataFrame]

Examples

>>> # Calculate Crosstab, numpy case
>>> df = crosstab(zones=zones, values=values)
>>> print(df)
        zone  0.0  10.0  20.0  30.0  40.0  50.0
    0      0    1     0     0     0     0     0
    1      1    3     0     0     0     0     0
    2      3    1     2     0     0     0     0
    3      5    0     0     0     1     0     0
    4      6    1     2     2     0     0     1
    5      7    0     1     0     0     1     1

>>> # Calculate Crosstab, dask case
>>> df = crosstab(zones=zones_dask, values=values_dask)
>>> print(df)
    Dask DataFrame Structure:
    zone        0.0     10.0    20.0    30.0    40.0    50.0
    npartitions=5
    0   float64 int64   int64   int64   int64   int64   int64
    1   ...     ...     ...     ...     ...     ...     ...
    ... ...     ...     ...     ...     ...     ...     ...
    4   ...     ...     ...     ...     ...     ...     ...
    5   ...     ...     ...     ...     ...     ...     ...
    Dask Name: astype, 1186 tasks
>>> print(dask_df.compute)
        zone  0.0  10.0  20.0  30.0  40.0  50.0
    0      0    1     0     0     0     0     0
    1      1    3     0     0     0     0     0
    2      3    1     2     0     0     0     0
    3      5    0     0     0     1     0     0
    4      6    1     2     2     0     0     1
    5      7    0     1     0     0     1     1