There are many reasons why Python is gaining traction to solve problems in finance. Firstly, it’s relatively easy to learn. It’s also generally quicker to write programs in Python. There are also many Python libraries which are suited to analysing financial markets, such as pandas, for dealing with time series. However, one major drawback of Python is that it’ll typically run much slower than code written in languages like C. Python has the GIL (global interpreter lock), which means at any one time, just one thing can be executed at a time. The question I always get asked is how can we make Python quicker and what skills do I need to learn to do this?
One solution is Cython. In practice, Cython code is actually very similar to Python albeit with a few bells and whistles. Lots of well-known Python libraries actually use Cython. The big plus is that Cython code can be converted into fast C code and you can “release the GIL”, or in other words you can run the code in parallel. However, you might need to spend time tuning your code and also annotating it with C type declarations. You’ll also need to compile your Cython code. However, if your project ends up using Cython extensively you’ll end up needing to spend a lot of time doing these changes.
If you don’t fancy messing around with Cython, Numba is another solution, which is a LLVM (low level virtual machine). It involves annotating your code with special instructions and then running them through the usual Python interpreter. However, as with Cython, you’ll need to spend a bit of time rewriting your code to get the maximum speed benefit, particular if you’re trying to run the code on a powerful GPU.
If most of what you do is time series analysis, and you extensively use the pandas library, it’s worth considering Modin. With Modin you simply change one line of code (yes, just one line) to speed up pandas, and that’s it. However, at this stage the library is relatively new, so it doesn’t support all of pandas functionalityDask is another similar library and also allows you to run your pandas-like computations in parallel.
If your code does a lot of downloading of data from external sources such as the web or databases, some nice choices to speed this up are Python’s threading and asyncio libraries. For more heavy-duty parallel computation you can also try the multiprocessing library. One of my personal favourites is the Celery library, which lets you run pretty much whatever Python computation you want in a distributed way. It does have a bit of steep learning curve to begin with, but understanding Celery properly is definitely time well spent in my opinion. I’ve used it extensively over the past few years to speed up transaction cost analysis calculations which use a lot of tick data.
The last way to speed up your Python code is probably the first thing you should try! It involves trying to vectorise your code to use libraries like NumPy and pandas, rather than overusing for loops. Very often this might actually make your code quick enough anyway.
So, yes Python isn’t the fastest language out there. However, there are solutions to help speed up the code. The bad news is there’s no free lunch, and most of the skills I’ve mentioned will require some time to learn.
Saeed Amen is a systematic FX trader, running a proprietary trading book trading liquid G10 FX, since 2013. He developed systematic trading strategies at major investment banks including Lehman Brothers and Nomura, and runs Cuemacro, a consulting and research firm focused on systematic trading.
Have a confidential story, tip, or comment you’d like to share? Contact: firstname.lastname@example.org in the first instance. Whatsapp/Signal/Telegram also available.
Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)