# Classification

## Equal Interval

xrspatial.classify.equal_interval(agg: xarray.core.dataarray.DataArray, k: int = 5, name: Optional[str] = 'equal_interval') xarray.core.dataarray.DataArray[source]

Reclassifies data for array agg into new values based on intervals of equal width.

Parameters
• agg (xarray.DataArray) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.

• k (int, default=5) – Number of classes to be produced.

• name (str, default='equal_interval') – Name of output aggregate.

Returns

equal_interval_agg – 2D aggregate array of equal interval allocations. All other input attributes are preserved.

Return type

xarray.DataArray of the same type as agg

References

Examples

```>>> print(data)
<xarray.DataArray (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  2.,  3.,  4.],
[ 5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., inf]])
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)

>>> print(data_equal_interval)
<xarray.DataArray 'equal_interval' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  1.],
[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  2.,  2.,  2.,  2.],
[ 2.,  2.,  2.,  2., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)
```

## Natural Breaks

xrspatial.classify.natural_breaks(agg: xarray.core.dataarray.DataArray, num_sample: Optional[int] = 20000, name: Optional[str] = 'natural_breaks', k: int = 5) xarray.core.dataarray.DataArray[source]

Reclassifies data for array agg into new values based on Natural Breaks or K-Means clustering method. Values are grouped so that similar values are placed in the same group and space between groups is maximized.

Parameters
• agg (xarray.DataArray) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.

• num_sample (int, default=20000) – Number of sample data points used to fit the model. Natural Breaks (Jenks) classification is indeed O(n²) complexity, where n is the total number of data points, i.e: agg.size When n is large, we should fit the model on a small sub-sample of the data instead of using the whole dataset.

• k (int, default=5) – Number of classes to be produced.

• name (str, default='natural_breaks') – Name of output aggregate.

Returns

natural_breaks_agg – 2D aggregate array of natural break allocations. All other input attributes are preserved.

Return type

xarray.DataArray of the same type as agg

References

Examples

```>>> print(data)
<xarray.DataArray (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  2.,  3.,  4.],
[ 5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., inf]])
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)

>>> print(data_natural_breaks)
<xarray.DataArray 'natural_breaks' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
[ 1.,  1.,  1.,  1.,  2.],
[ 2.,  2.,  2.,  2.,  3.],
[ 3.,  3.,  3.,  3.,  4.],
[ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)
```

## Reclassify

xrspatial.classify.reclassify(agg: xarray.core.dataarray.DataArray, bins: List[int], new_values: List[int], name: Optional[str] = 'reclassify') xarray.core.dataarray.DataArray[source]

Reclassifies data for array agg into new values based on user defined bins.

Parameters
• agg (xarray.DataArray) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.

• bins (array-like object) – Values or ranges of values to be changed.

• new_values (array-like object) – New values for each bin.

• name (str, default='reclassify') – Name of output aggregate array.

Returns

reclass_agg – 2D aggregate array of reclassified allocations. All other input attributes are preserved.

Return type

xarray.DataArray, of the same type as agg

References

Examples

```>>> print(data)
<xarray.DataArray (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  2.,  3.,  4.],
[ 5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., inf]])
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)

>>> print(data_reclassify)
<xarray.DataArray 'reclassify' (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  2.,  2.,  2.,  2.],
[ 2.,  2.,  2.,  2.,  2.],
[ 2.,  3.,  3.,  3., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)
```

## Quantile

xrspatial.classify.quantile(agg: xarray.core.dataarray.DataArray, k: int = 4, name: Optional[str] = 'quantile') xarray.core.dataarray.DataArray[source]

Reclassifies data for array agg into new values based on quantile groups of equal size.

Parameters
• agg (xarray.DataArray) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.

• k (int, default=4) – Number of quantiles to be produced.

• name (str, default='quantile') – Name of the output aggregate array.

Returns

quantile_agg – 2D aggregate array, of quantile allocations. All other input attributes are preserved.

Return type

xarray.DataArray, of the same type as agg

Notes

• Dask’s percentile algorithm is approximate, while numpy’s is exact.

• This may cause some differences between results of vanilla numpy

and dask version of the input agg. (https://github.com/dask/dask/issues/3099) # noqa

References

Examples

```>>> print(data)
<xarray.DataArray (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  2.,  3.,  4.],
[ 5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., inf]])
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)

>>> print(data_quantile)
<xarray.DataArray 'quantile' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
[ 0.,  1.,  1.,  1.,  1.],
[ 2.,  2.,  2.,  2.,  2.],
[ 3.,  3.,  3.,  3.,  4.],
[ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
res:      (10.0, 10.0)
```