xrspatial.classify.natural_breaks

xrspatial.classify.natural_breaks(agg: xarray.core.dataarray.DataArray, num_sample: Optional[int] = 20000, name: Optional[str] = 'natural_breaks', k: int = 5) xarray.core.dataarray.DataArray[source]

Reclassifies data for array agg into new values based on Natural Breaks or K-Means clustering method. Values are grouped so that similar values are placed in the same group and space between groups is maximized.

Parameters
  • agg (xarray.DataArray) – 2D NumPy, CuPy, NumPy-backed Dask, or Cupy-backed Dask array of values to be reclassified.

  • num_sample (int, default=20000) – Number of sample data points used to fit the model. Natural Breaks (Jenks) classification is indeed O(n²) complexity, where n is the total number of data points, i.e: agg.size When n is large, we should fit the model on a small sub-sample of the data instead of using the whole dataset.

  • k (int, default=5) – Number of classes to be produced.

  • name (str, default='natural_breaks') – Name of output aggregate.

Returns

natural_breaks_agg – 2D aggregate array of natural break allocations. All other input attributes are preserved.

Return type

xarray.DataArray of the same type as agg

References

Examples

import numpy as np
import xarray as xr
import dask.array as da
from xrspatial.classify import natural_breaks

elevation = np.array([
    [np.nan,  1.,  2.,  3.,  4.],
    [ 5.,  6.,  7.,  8.,  9.],
    [10., 11., 12., 13., 14.],
    [15., 16., 17., 18., 19.],
    [20., 21., 22., 23., np.inf]
])
data = xr.DataArray(elevation, attrs={'res': (10.0, 10.0)})
data_natural_breaks = natural_breaks(data, k=5)
>>> print(data)
<xarray.DataArray (dim_0: 5, dim_1: 5)>
array([[nan,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.],
       [20., 21., 22., 23., inf]])
Dimensions without coordinates: dim_0, dim_1
Attributes:
    res:      (10.0, 10.0)

>>> print(data_natural_breaks)
<xarray.DataArray 'natural_breaks' (dim_0: 5, dim_1: 5)>
array([[nan,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  2.],
       [ 2.,  2.,  2.,  2.,  3.],
       [ 3.,  3.,  3.,  3.,  4.],
       [ 4.,  4.,  4.,  4., nan]], dtype=float32)
Dimensions without coordinates: dim_0, dim_1
Attributes:
    res:      (10.0, 10.0)