xrspatial.zonal.crosstab

xrspatial.zonal.crosstab(zones: xarray.core.dataarray.DataArray, values: xarray.core.dataarray.DataArray, zone_ids: Optional[List[Union[int, float]]] = None, cat_ids: Optional[List[Union[int, float]]] = None, layer: Optional[int] = None, agg: Optional[str] = 'count', nodata_zones: Optional[Union[int, float]] = None, nodata_values: Optional[Union[int, float]] = None) Union[pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame][source]

Calculate cross-tabulated (categorical stats) areas between two datasets: a zone dataset zones, a value dataset values (a value raster). Infinite and NaN values in zones and values will be ignored.

Outputs a pandas DataFrame if zones and values are numpy backed. Outputs a dask DataFrame if zones and values are dask with numpy-backed xarray DataArrays.

Requires a DataArray with a single data dimension, here called the “values”, indexed using either 2D or 3D coordinates.

DataArrays with 3D coordinates are expected to contain values distributed over different categories that are indexed by the additional coordinate. Such an array would reduce to the 2D-coordinate case if collapsed across the categories (e.g. if one did aggc.sum(dim='cat') for a categorical dimension cat).

Parameters
  • zones (xr.DataArray) – 2D data array of integers or floats. A zone is all the cells in a raster that have the same value, whether or not they are contiguous. The input zones raster defines the shape, values, and locations of the zones. An unique field in the zone input is specified to define the zones.

  • values (xr.DataArray) – 2D or 3D data array of integers or floats. The input value raster contains the input values used in calculating the categorical statistic for each zone.

  • zone_ids (List of ints, or floats) – List of zones to be included in calculation. If no zone_ids provided, all zones will be used.

  • cat_ids (List of ints, or floats) – List of categories to be included in calculation. If no cat_ids provided, all categories will be used.

  • layer (int, default=0) – index of the categorical dimension layer inside the values DataArray.

  • agg (str, default = 'count') – Aggregation method. If the data is 2D, available options are: percentage, count. If the data is 3D, available option is: count.

  • nodata_zones (int, float, default=None) – Nodata value in zones raster. Cells with nodata do not belong to any zone, and thus excluded from calculation.

  • nodata_values (int, float, default=None) – Nodata value in values raster. Cells with nodata do not belong to any zone, and thus excluded from calculation.

Returns

crosstab_df – A pandas DataFrame, or an uncomputed dask DataFrame, where each column is a categorical value and each row is a zone with zone id. Each entry presents the statistics, which computed using the specified aggregation method, of the category over the zone.

Return type

Union[pandas.DataFrame, dask.dataframe.DataFrame]

Examples

import dask.array as da
import numpy as np
import xarray as xr
from xrspatial.zonal import crosstab

values_data = np.asarray([[0, 0, 10, 20],
                          [0, 0, 0, 10],
                          [0, np.nan, 20, 50],
                          [10, 30, 40, np.inf],
                          [10, 10, 50, 0]])
values = xr.DataArray(values_data)

zones_data = np.asarray([[1, 1, 6, 6],
                         [1, np.nan, 6, 6],
                         [3, 5, 6, 6],
                         [3, 5, 7, np.nan],
                         [3, 7, 7, 0]])
zones = xr.DataArray(zones_data)

values_dask = xr.DataArray(da.from_array(values, chunks=(3, 3)))
zones_dask = xr.DataArray(da.from_array(zones, chunks=(3, 3)))
>>> # Calculate Crosstab, numpy case
>>> df = crosstab(zones=zones, values=values)
>>> print(df)
        zone  0.0  10.0  20.0  30.0  40.0  50.0
    0      0    1     0     0     0     0     0
    1      1    3     0     0     0     0     0
    2      3    1     2     0     0     0     0
    3      5    0     0     0     1     0     0
    4      6    1     2     2     0     0     1
    5      7    0     1     0     0     1     1

>>> # Calculate Crosstab, dask case
>>> df = crosstab(zones=zones_dask, values=values_dask)
>>> print(df)
    Dask DataFrame Structure:
    zone        0.0     10.0    20.0    30.0    40.0    50.0
    npartitions=5
    0   float64 int64   int64   int64   int64   int64   int64
    1   ...     ...     ...     ...     ...     ...     ...
    ... ...     ...     ...     ...     ...     ...     ...
    4   ...     ...     ...     ...     ...     ...     ...
    5   ...     ...     ...     ...     ...     ...     ...
    Dask Name: astype, 1186 tasks
>>> print(dask_df.compute)
        zone  0.0  10.0  20.0  30.0  40.0  50.0
    0      0    1     0     0     0     0     0
    1      1    3     0     0     0     0     0
    2      3    1     2     0     0     0     0
    3      5    0     0     0     1     0     0
    4      6    1     2     2     0     0     1
    5      7    0     1     0     0     1     1