Hey there,
I am just starting with Numba, and I am trying to port an anisotropic diffusion filter to Numba. The function looks something like this:
from numba import jit
@jit(nopython=True)
def anisodiff(img, niter=1, kappa=50, gamma=0.1, step=(1.,1.), option=1):
"""
Anisotropic diffusion.
Usage:
imgout = anisodiff(im, niter, kappa, gamma, option)
Arguments:
img - input image
niter - number of iterations
kappa - conduction coefficient 20-100 ?
gamma - max value of .25 for stability
step - tuple, the distance between adjacent pixels in (y,x)
option - 1 Perona Malik diffusion equation No 1
2 Perona Malik diffusion equation No 2
ploton - if True, the image will be plotted on every iteration
Returns:
imgout - diffused image.
kappa controls conduction as a function of gradient. If kappa is low
small intensity gradients are able to block conduction and hence diffusion
across step edges. A large value reduces the influence of intensity
gradients on conduction.
gamma controls speed of diffusion (you usually want it at a maximum of
0.25)
step is used to scale the gradients in case the spacing between adjacent
pixels differs in the x and y axes
Diffusion equation 1 favours high contrast edges over low contrast ones.
Diffusion equation 2 favours wide regions over smaller ones.
Reference:
P. Perona and J. Malik.
Scale-space and edge detection using ansotropic diffusion.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
12(7):629-639, July 1990.
Original MATLAB code by Peter Kovesi
School of Computer Science & Software Engineering
The University of Western Australia
pk @ csse uwa edu au
<http://www.csse.uwa.edu.au>
Translated to Python and optimised by Alistair Muldal
Department of Pharmacology
University of Oxford
<alistair.muldal@pharm.ox.ac.uk>
June 2000 original version.
March 2002 corrected diffusion eqn No 2.
July 2012 translated to Python
April 2019 - Corrected for Python 3.7 - AvW
"""
# initialize output array
imgout = img.copy()
# initialize some internal variables
deltaS = np.zeros_like(imgout)
deltaE = deltaS.copy()
E = deltaS.copy()
S = deltaS.copy()
NS = deltaS.copy()
EW = deltaS.copy()
gS = np.ones_like(imgout)
gE = gS.copy()
for ii in range(niter):
# calculate the diffs
deltaS[:-1,: ] = np.diff(imgout.T).T
deltaE[: ,:-1] = np.diff(imgout)
# conduction gradients (only need to compute one per dim!)
if option == 1:
gS[:, :] = np.exp(- np.square(np.divide(deltaS, kappa))) / step[0]
gE[:, :] = np.exp(- np.square(np.divide(deltaE, kappa))) / step[1]
elif option == 2:
gS[:, :] = 1./(1. + np.square(np.divide(deltaS, kappa))) / step[0]
gE[:, :] = 1./(1. + np.square(np.divide(deltaE, kappa))) / step[1]
# update matrices
E[:, :] = gE*deltaE
S[:, :] = gS*deltaS
# subtract a copy that has been shifted 'North/West' by one
# pixel. don't ask questions. just do it. trust me.
NS[:] = S
EW[:] = E
NS[1:,:] -= S[:-1,:]
EW[:,1:] -= E[:,:-1]
# update the image
imgout += gamma*(NS+EW)
return imgout
I’m using it in a Jupyter notebook and calling it and timing it with the following line:
%%timeit
_ = anisodiff(np.random.rand(4096, 128).astype(np.float32), niter=10, kappa=10, gamma=1, step=np.asarray((.5, .5), dtype=np.float32), option=1)
I did take care to run the above command once before timing to ensure the cached compiled function was being timed, not the compilation itself. I tried to ensure that the code was type-stable (I need all my computations done as float32). I was shocked to see that I did not get any improvement over the Numpy version. The performance decreases significantly with Numba.
The above leads to 2 questions:
- What is the best way to profile Numba code and identify bottlenecks?
- Could someone look at the code above and tell me if something obvious would make the Numba version much slower than the Numpy version?
Any help appreciated!
-Guillaume