Just-in-time Compiling with Numba
This week I worked on implementing a numba version of PerfForesightConsumerType
in ConsIndShockNumba.py. The class PerfForesightConsumerTypeNumba
is a child class of PerfForesightConsumerType
, inheriting most of the existing structure but changing the solver to a numba
implementation called ConsPerfForesightSolverNumba
.
The class ConsPerfForesightSolverNumba
itself inherits from the standard ConsPerfForesightSolver
class, but modifies certain methods to allow for numba
just-in-time compiling. In particular, the methods makePFcFunc
and addSSmNrm
are computationally intensive and would benefit most from jit-compiling.
First, I created an global jitted function called makePFcFuncNumba
that takes only primitives and (numpy
) arrays. While most of the code can stay as in makePFcFunc
, numba
implementations of numpy
ufuncs can only take numpy
arrays, and not lists. This is easily adaptable by replacing any [x,y]
with np.array([x,y])
, as in this instance. Any assignments to objects or code that can not be implemented in numba
are left to the class function to take care of, such as creating a LinearInterp
instance for cfunc
here.
Implementing addSSmNrmNumba
took a bit more work. This method uses calls to cfunc
, which is an instance of LinearInterp
, to find roots using scipy.optimize.newton
. Again, numba
does not (yet) support user defined classes (although there is @jitclass
, which could be interesting to look into) or non-jitted functions such as those from scipy
, so we need alternatives to both. As a replacement to LinearInterp
, I used interpolation.py from EconForge. This replacement is straightforward; instead of cfunc(m), I take the parameters that created cfunc
and use interp(mNrmNow, cNrmNow, m)
which returns the interpolated value of the consumption function at m
.
In the standard (non-numba
) implementation of ConsPerfForesightSolver
, scipy
’s newton takes 2 parameters, func
and x0
. This means, according to the documentation, that without fprime
, scipy
will default to a newton secant algorithm. Fortunately, I know just the library that has jitted optimization functions, QuantEcon. I thus use quantecon.optimize.newton_secant
in place of scipy
’s newton for root-finding. Another shortcoming of numba
is that lambda functions can not be passed as arguments to jitted functions, such as quantecon’s newton_secant
. Thus I create an additional searchSSfuncNumba
to be used by newton_secant
.
These small changes create significant gains in speed, even while I haven’t parallelized or used gpu computing, only just-in-time compiling. For some basic benchmarks on my system (Intel Core i7-8565U CPU @ 1.80GHz, 1992 Mhz, 4 Cores, 8 Logical Processors, 16.0 GB RAM
), see PerfForesightConsumerType.ipynb
and PerfForesightConsumerTypeNumba.ipynb
for a comparison. Using %timeit
, the standard solution reports 234 ms ± 82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
, while the numba-fied solution comes in at 63.1 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
, meaning the numba version is almost 4 times faster. It is important to note, however, that %timeit
runs several loops of the same command and reports averages, but with numba implementations the first call always takes significantly more time, as it performs just-in-time compiling. So if we were to compare first run to first run, the numba implementation would be significantly slower. Nevertheless, PerfForesightConsumerType
is a simple model, and as a proof of concept this exercise was instructive.
I will next work on implementing a numba version of IndShockConsumerType
. Perhaps with a more complex model the gains from using numba will be more apparent, even at first run.