Zonal

Importing Packages

[1]:
import numpy as np
import datashader as ds
from datashader.transfer_functions import shade
from datashader.transfer_functions import stack
from datashader.transfer_functions import dynspread
from datashader.transfer_functions import set_background
from datashader.colors import Elevation

import xrspatial

Generate Terrain Data

The rest of the geo-related functions focus on raster data (or rasterized data, after a previous Datashader step that returns an Xarray object). To demonstrate using these raster-based functions, let’s generate some fake terrain as an elevation raster:

[2]:
from xrspatial import generate_terrain

W = 800
H = 600

cvs = ds.Canvas(plot_width=W, plot_height=H, x_range=(-20e6, 20e6), y_range=(-20e6, 20e6))
terrain = generate_terrain(canvas=cvs)

shade(terrain, cmap=['black', 'white'], how='linear')
[2]:

The grayscale value above shows the elevation linearly in intensity (with the large black areas indicating low elevation), but it will look more like a landscape if we map the lowest values to colors representing water, and the highest to colors representing mountaintops:

[3]:
shade(terrain, cmap=Elevation, how='linear')
[3]:

Zonal Statistics

Zonal statistics allows for calculating summary statistics for specific areas or zones within a datashader aggregate. Zones are defined by creating an integer aggregate where the cell values are zone_ids. The output of zonal statistics is a Pandas dataframe containing summary statistics for each zone based on a value raster.

Imagine the following scenario: - You are a hiker on a six-day-trip. - The path for each day is defined by a line segement. - You wish to calculate the max and min elevations for each hiking segment as a Pandas dataframe based on an elevation dataset.

[4]:
from xrspatial import hillshade
from datashader.colors import Set1
import pandas as pd

cvs = ds.Canvas(plot_width=W, plot_height=H, x_range=(-20, 20), y_range=(-20, 20))

terrain = generate_terrain(canvas=cvs)
terrain_shaded = shade(terrain, cmap=Elevation, alpha=128, how='linear')

illuminated = hillshade(terrain)
illuminated_shaded = shade(illuminated, cmap=['gray', 'white'], alpha=255, how='linear')

zone_df = pd.DataFrame({
   'x': [-11, -5, 4, 12, 14, 18, 19],
   'y': [-5, 4, 10, 13, 13, 13, 10],
   'trail_segement_id': [11, 12, 13, 14, 15, 16, 17]
})

zones_agg = cvs.line(zone_df, 'x', 'y', ds.sum('trail_segement_id'))
zones_shaded = dynspread(shade(zones_agg, cmap=Set1), max_px=5)

stack(illuminated_shaded, terrain_shaded, zones_shaded)
[4]:
[5]:
from xrspatial import zonal_stats
zones_agg.values = np.nan_to_num(zones_agg.values, copy=False).astype(np.int)
zonal_stats(zones_agg, terrain)
/tmp/ipykernel_14688/1703978268.py:2: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  zones_agg.values = np.nan_to_num(zones_agg.values, copy=False).astype(np.int)
[5]:
zone mean max min sum std var count
0 0 887.687779 4000.000000 0.000000 4.255211e+08 872.639188 761499.152736 479359
1 11 1321.919879 1428.384240 1228.065037 1.797811e+05 53.409131 2852.535316 136
2 12 1489.600166 2399.556289 1220.694206 2.681280e+05 310.177594 96210.140061 180
3 13 1711.632114 2430.430303 1368.922124 2.738611e+05 278.369664 77489.669636 160
4 14 1355.661087 1368.952389 1346.856316 5.422644e+04 6.249734 39.059170 40
5 15 1381.950485 1443.856734 1319.891060 1.105560e+05 38.918956 1514.685125 80
6 16 1486.314866 1709.222348 1367.673562 6.688417e+04 120.950684 14629.068060 45

Calculate custom stats for each zone

[6]:
custom_stats = dict(elevation_change=lambda zone: zone.max() - zone.min(),
                    elevation_min=np.min,
                    elevation_max=np.max)

zonal_stats(zones_agg, terrain, custom_stats)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_14688/2868890181.py in <module>
      3                     elevation_max=np.max)
      4
----> 5 zonal_stats(zones_agg, terrain, custom_stats)

~/.local/lib/python3.8/site-packages/xrspatial/zonal.py in stats(zones, values, zone_ids, stats_funcs, nodata_zones, nodata_values)
    354     if isinstance(values.data, np.ndarray):
    355         # numpy case
--> 356         stats_df = _stats_numpy(
    357             zones,
    358             values,

~/.local/lib/python3.8/site-packages/xrspatial/zonal.py in _stats_numpy(zones, values, zone_ids, stats_funcs, nodata_zones, nodata_values)
    130         unique_zones = np.array(zone_ids)
    131
--> 132     stats_dict = _stats(
    133         zones,
    134         values,

~/.local/lib/python3.8/site-packages/xrspatial/zonal.py in _stats(zones, values, unique_zones, stats_funcs, nodata_values)
     98         stats_dict[stats] = []
     99
--> 100     for zone_id in unique_zones:
    101         # get zone values
    102         zone_values = _zone_cat_data(zones, values, zone_id, nodata_values)

TypeError: iteration over a 0-d array

Here the zones are defined by line segments, but they can be any spatial pattern, and in particular can be any region computable as a Datashader aggregate.