I am doing some benchmarking on accelerators for python. In below example, tensorflow runs at about 56% of the numba runtime:
import numpy as np import tensorflow as tf from numba import njit, prange @tf.function def compute_tf(m, n): x1 = tf.range(0, m-1, 1) ** 2 x2 = tf.range(0, n-1, 1) ** 2 return x1[:, None] + x2[None, :] compute_tf(tf.constant(1), tf.constant(1)) m = 50000 n = 10000 %timeit compute_tf(m, n)
557 ms ± 30.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
@njit(parallel=True) def compute_numba(m, n): x = np.empty((m, n)) for i in prange(m): for j in prange(n): x[i, j] = i**2 + j**2 return x compute_numba(1, 1) %timeit compute_numba(m, n)
995 ms ± 38.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is a very simple computation, so I don’t really see why the TF-version would run any faster. Do you have any idea on how I can make the numba-version run on par with TF?