Masked-array NumPy interface

Masked-array NumPy interface#

NumPy possesses an array interface that allows to properly map their functions on objects that depend on ndarrays.

GeoUtils utilizes this interface to work with all Rasters and their subclasses.

Universal functions#

A first category of NumPy functions supported by Rasters through the array interface is that of universal functions, which operate on ndarrays in an element-by-element fashion. Examples of such functions are add(), absolute(), isnan() or sin(), and they number at more than 90.

Universal functions can take one or two inputs, and return one or two outputs. Through GeoUtils, as long as one of the two inputs is a Rasters, the output will be a Raster. If there is a second input, it can be a Raster or ndarray with matching georeferencing or shape, respectively.

These functions inherently support the casting of different dtype and values masked by nodata in the MaskedArray.

Below, we re-use the same example created in Support of pythonic operators.

Hide code cell source
import geoutils as gu
import rasterio as rio
import pyproj
import numpy as np

# Create a random 3 x 3 masked array
np.random.seed(42)
arr = np.random.randint(0, 255, size=(3, 3), dtype="uint8")
mask = np.random.randint(0, 2, size=(3, 3), dtype="bool")
ma = np.ma.masked_array(data=arr, mask=mask)

# Create an example raster
rast = gu.Raster.from_array(
       data = ma,
       transform = rio.transform.from_bounds(0, 0, 1, 1, 3, 3),
       crs = pyproj.CRS.from_epsg(4326),
       nodata = 255
    )
rast
Hide code cell output
Raster(
  data=[[102 -- --]
        [-- 179 61]
        [234 203 --]]
  transform=| 0.33, 0.00, 0.00|
            | 0.00,-0.33, 1.00|
            | 0.00, 0.00, 1.00|
  crs=EPSG:4326
  nodata=255)
# Universal function with a single input and output
np.sin(rast)
Raster(
  data=[[0.99462890625 -- --]
        [-- 0.07073974609375 -0.96630859375]
        [0.9990234375 0.93310546875 --]]
  transform=| 0.33, 0.00, 0.00|
            | 0.00,-0.33, 1.00|
            | 0.00, 0.00, 1.00|
  crs=EPSG:4326
  nodata=255)
# Universal function with a two inputs and single output
np.add(arr, rast)
Raster(
  data=[[204 -- --]
        [-- 102 122]
        [212 150 --]]
  transform=| 0.33, 0.00, 0.00|
            | 0.00,-0.33, 1.00|
            | 0.00, 0.00, 1.00|
  crs=EPSG:4326
  nodata=255)
# Universal function with a single input and two outputs
np.modf(rast)
(Raster(
   data=[[0.0 -- --]
         [-- 0.0 0.0]
         [0.0 0.0 --]]
   transform=| 0.33, 0.00, 0.00|
             | 0.00,-0.33, 1.00|
             | 0.00, 0.00, 1.00|
   crs=EPSG:4326
   nodata=255),
 Raster(
   data=[[102.0 -- --]
         [-- 179.0 61.0]
         [234.0 203.0 --]]
   transform=| 0.33, 0.00, 0.00|
             | 0.00,-0.33, 1.00|
             | 0.00, 0.00, 1.00|
   crs=EPSG:4326
   nodata=255))

Similar to with Python operators, NumPy’s logical comparison functions cast Rasters to a Mask.

# Universal function with a single input and two outputs
np.greater(rast, rast + np.random.normal(size=np.shape(arr)))
Mask(
  data=[[False -- --]
        [-- False False]
        [True False --]]
  transform=| 0.33, 0.00, 0.00|
            | 0.00,-0.33, 1.00|
            | 0.00, 0.00, 1.00|
  crs=EPSG:4326)

Array functions#

The second and last category of NumPy array functions supported by Rasters through the array interface is that of array functions, which are all other non-universal functions that can be applied to an array. Those function always modify the dimensionality of the output, such as mean(), count_nonzero() or nanmax(). Consequently, the output is the same as it would be with ndarrays.

# Traditional mathematical function
np.max(rast)
234
# Expliciting an axis for reduction
np.count_nonzero(rast, axis=1)
masked_array(data=[1, 2, 2],
             mask=[False, False, False],
       fill_value=999999)

Not all array functions are supported, however. GeoUtils supports nearly all mathematical functions, masked-array functions and logical functions. A full list of supported array function is available in geoutils.raster.handled_array_funcs.

Respecting masked values#

There are two ways to compute statistics on Rasters while respecting masked values:

  1. Use any NumPy core function (np.func) directly on the Raster (this includes NaN functions np.nanfunc),

  2. Use any NumPy masked-array function (np.ma.func) on Raster.data.

# Numpy core function applied to the raster
np.median(rast)
179.0
# Numpy NaN function applied to the raster
np.nanmedian(rast)
179.0
# Masked-array function on the data
np.ma.median(rast.data)
179.0

If a NumPy core function raises an error (e.g., numpy.percentile()), nodata values might not be respected. In this case, use the NaN function on the Raster.

Note

Unfortunately, masked-array functions np.ma.func cannot be recognized yet if applied directly to a Raster, but this should come soon as related interfacing is in the works in NumPy!