Array manipulation and booleans in numpy, hunting for what went wrong

Recently I was doing some raster processing with gdal_calc.py but I was getting some surprising results.
The goal was to convert some continuous value into a binary value based on a treshold and then detect places where only A was True (value: -1), where only B was True (value: 1) and where both A and B where True (value: 0). With a treshold set to 2 I came with the following formula:

(A > 2) - (B > 2)

But after calling the gdal_calc.py with my formula and inspecting the results I only got values of 0 and 1.
After inspecting gdal_calc.py I noticed that it uses numpy and more specifically numpy arrays for the raster manipulation.

This how my python shell session went (numpy_array_math.py):

>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> b = np.array([1,5,3,2])
>>> print(a > 2)
[False False  True  True]
>>> print(b > 2)
[False  True  True False]
>>> print(True-False)
1
>>> print(False-True)
-1
>>> print((a>2)-(b>2))
[False  True False  True]
>>> print((a>2)*1-(b>2))  # we got a winner
[ 0 -1  0  1]

The problem was that boolean substraction in Python does generate the expected numeric results but the results where converted back into a boolean array by numpy after the substraction. And indeed converting -1, 1 or any other non zero number to a boolean generates a True which when converted back to number for writing the raster to disk gives you the value 1.
The solution was to force at least one of the arrays to be numeric so that we substract numeric values.

>>> bool(1)
True
>>> bool(2)
True
>>> bool(-1)
True
>>> bool(0)
False
>>> print((a>2)*1)
[0 0 1 1]

If you want to try this yourself on Windows then the easiest way to install gdal is with the OSGeo4W installer. A windows installer for numpy can be found on the website by Christopher Golke but consider also installing the full SciPy stack with one of the Scientific Python distributions.

What surprising results have you encountered with numpy.array or gdal_calc ?

No comments: