xrspatial.zonal.stats#

xrspatial.zonal.stats(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, zone_ids: Optional[List[Union[int, float]]] = None, stats_funcs: Union[Dict, List] = ['mean', 'max', 'min', 'sum', 'std', 'var', 'count'], nodata_values: Optional[Union[int, float]] = None, return_type: str = 'pandas.DataFrame') Union[pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame, xarray.core.dataarray.DataArray][source]#

Calculate summary statistics for each zone defined by a zones dataset, based on values aggregate.

A single output value is computed for every zone in the input zones dataset.

This function currently supports numpy backed, and dask with numpy backed xarray DataArrays.

Parameters
  • zones (xr.DataArray) – zones is a 2D xarray DataArray of numeric values. A zone is all the cells in a raster that have the same value, whether or not they are contiguous. The input zones raster defines the shape, values, and locations of the zones. An integer field in the input zones DataArray defines a zone.

  • values (xr.DataArray) – values is a 2D xarray DataArray of numeric values (integers or floats). The input values raster contains the input values used in calculating the output statistic for each zone. In dask case, the chunksizes of zones and values should be matching. If not, values will be rechunked to be the same as of zones.

  • zone_ids (list of ints, or floats) – List of zones to be included in calculation. If no zone_ids provided, all zones will be used.

  • stats_funcs (dict, or list of strings, default=['mean', 'max', 'min',) – ‘sum’, ‘std’, ‘var’, ‘count’] The statistics to calculate for each zone. If a list, possible choices are subsets of the default options. In the dictionary case, all of its values must be callable. Function takes only one argument that is the values raster. The key become the column name in the output DataFrame. Note that if zones and values are dask backed DataArrays, stats_funcs must be provided as a list that is a subset of default supported stats.

  • nodata_values (int, float, default=None) – Nodata value in values raster. Cells with nodata_values do not belong to any zone, and thus excluded from calculation.

  • return_type (str, default='pandas.DataFrame') – Format of returned data. If zones and values numpy backed xarray DataArray, allowed values are ‘pandas.DataFrame’, and ‘xarray.DataArray’. Otherwise, only ‘pandas.DataFrame’ is supported.

Returns

stats_df – A pandas DataFrame, or a dask DataFrame where each column is a statistic and each row is a zone with zone id.

Return type

Union[pandas.DataFrame, dask.dataframe.DataFrame]

Examples

stats() works with NumPy backed DataArray

>>> import numpy as np
>>> import xarray as xr
>>> from xrspatial.zonal import stats
>>> height, width = 10, 10
>>> values_data = np.array([
    [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
    [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
    [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
    [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
    [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
    [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
    [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
    [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
>>> values = xr.DataArray(values_data)
>>> zones_data = np.array([
    [ 0.,  0.,  0.,  0.,  0., 10., 10., 10., 10., 10.],
    [ 0.,  0.,  0.,  0.,  0., 10., 10., 10., 10., 10.],
    [ 0.,  0.,  0.,  0.,  0., 10., 10., 10., 10., 10.],
    [ 0.,  0.,  0.,  0.,  0., 10., 10., 10., 10., 10.],
    [ 0.,  0.,  0.,  0.,  0., 10., 10., 10., 10., 10.],
    [20., 20., 20., 20., 20., 30., 30., 30., 30., 30.],
    [20., 20., 20., 20., 20., 30., 30., 30., 30., 30.],
    [20., 20., 20., 20., 20., 30., 30., 30., 30., 30.],
    [20., 20., 20., 20., 20., 30., 30., 30., 30., 30.],
    [20., 20., 20., 20., 20., 30., 30., 30., 30., 30.]])
>>> zones = xr.DataArray(zones_data)

>>> # Calculate Stats
>>> stats_df = stats(zones=zones, values=values)
>>> print(stats_df)
    zone  mean  max  min   sum       std    var  count
0   0    22.0   44    0   550  14.21267  202.0     25
1  10    27.0   49    5   675  14.21267  202.0     25
2  20    72.0   94   50  1800  14.21267  202.0     25
3  30    77.0   99   55  1925  14.21267  202.0     25

>>> # Custom Stats
>>> custom_stats ={'double_sum': lambda val: val.sum()*2}
>>> custom_stats_df = stats(zones=zones,
                            values=values,
                            stats_funcs=custom_stats)
>>> print(custom_stats_df)
    zone  double_sum
0   0     1100
1  10     1350
2  20     3600
3  30     3850
stats() works with Dask with NumPy backed DataArray
>>> import dask.array as da
>>> import dask.array as da
>>> values_dask = xr.DataArray(da.from_array(values_data, chunks=(3, 3)))
>>> zones_dask = xr.DataArray(da.from_array(zones_data, chunks=(3, 3)))
>>> # Calculate Stats with dask backed xarray DataArrays
>>> dask_stats_df = stats(zones=zones_dask, values=values_dask)
>>> print(type(dask_stats_df))
<class 'dask.dataframe.core.DataFrame'>
>>> print(dask_stats_df.compute())
    zone  mean  max  min   sum       std    var  count
0     0  22.0   44    0   550  14.21267  202.0     25
1    10  27.0   49    5   675  14.21267  202.0     25
2    20  72.0   94   50  1800  14.21267  202.0     25
3    30  77.0   99   55  1925  14.21267  202.0     25