If you aspire to find a job in data science and artificial intelligence, you're probably wondering how to allocate your time. - Should you focus on learning math(s), or Python? Or both?

The answer is, both. You should not neglect mathematics.

Mathematics is an unusual subject. It is not a -logy, unlike, for example, theology or biology. The word itself comes from the Greek word “mathematikos” meaning, quite simply, “fond of learning”. In a sense, mathematics is our prowess to learn.

Unfortunately, at school we are led to believe that mathematics is about numbers. Indeed, there are three kinds of mathematicians: those who can count and those who cannot. Only later, if we choose to pursue the subject to undergraduate level and beyond, do we learn that numbers are incidental, whereas mathematics is about ideas, logic and intuition – a sense of truth.

Jacques Hadamard believed it was primarily about the latter, since “logic only sanctions the conquests of intuition”. Intuition begins with observation – just as philosophy begins with wonder – a deep and thoughtful observation, and a desire to discover the truth – the ultimate goal of a data scientist.

The temptation of complexity thwarts the efforts of a mathematician. Once we have learned the Nobel prize-winning Black-Scholes-Merton option pricing theory, the demon of complexity starts to whisper in our ear: “Why stop at vanilla options? Consider the most exotic payoff that you can possibly price!” Here discernment and introspection are required: are we increasing complexity because it is genuinely needed or because we want to show how clever we are? As Isaac Newton remarked in *Rules for methodizing the Apocalypse*, “Truth is ever to be found in simplicity, and not in the multiplicity and confusion of things”. How can we distinguish true complexity from entropy, signal from noise?

Indeed, some of the simpler branches of mathematics are the most useful for the data scientist. If you want to work in data science and machine learning, you will not necessarily need to understand stochastic calculus, but you will need to understand the mathematical concepts below.

You need to be familiar with linear algebra if you want to work in data science and machine learning because it helps deal with matrices – mathematical objects consisting of multiple numbers organised in a grid. The data collected by a data scientist naturally comes in the form of a matrix – the data matrix – of *n* observations by *p* features, thus an *n*-by-*p* grid.

Probability theory – even the basic, not yet measure-theoretic probability theory – helps the data scientist deal with uncertainty and express it in models. Frequentists, Bayesians, and indeed quantum physicists argue to this day what probability really is (in many languages, such as Russian and Ukrainian, the word for probability comes from “having faith”), whereas pragmatists, such as Andrey Kolmogorov, shirk the question, postulate some axioms that describe how probability behaves (rather than what it is) and say: stop asking questions, just use the axioms.

After probability theory, there comes statistics. As Ian Hacking remarked, “The quiet statisticians have changed our world – not by discovering new facts or technical developments, but by changing the ways that we reason, experiment, and form opinions”. Read Darrell Huff’s *How to Lie with Statistics* – if only to learn how to be truthful and how to recognise truth – just as Moses learned “all the wisdom of the Egyptians” – in order to reject it.

A particular branch of statistics – estimation theory – had been largely neglected in mathematical finance, at a high cost. It tells us how well we know a particular number: what is the error present in our estimates? How much of it is due to bias and how much due to variance?

Going beyond classical statistics, in machine learning we want to minimise the error on new data – out-of-sample – rather than on the data that we have already seen – in-sample. As someone remarked, probably Niels Bohr or Piet Hein, “prediction is very difficult, especially about the future.”

You can spend a lifetime studying this. Much of machine learning is about optimization - we want to find the weights that give the best (in optimisation speak, optimal) performance of a neural network on new data, so naturally we have to optimise – perhaps with some form of regularisation. (And before you have calibrated that long short-term memory (LSTM) network – have you tried the basic linear regression on your data?)

There is always more. An average data scientist may not use its language, but some of the recent advances in neural networks have been powered by Claude Shannon’s information theory – and thermodynamics. After all, entropy is our enemy, and we should keep our friends close and our enemies closer.

*Paul Bilokon is a founder of The Thalesians. The Thalesians are an Artificial Intelligence (AI) company specialising in neocybernetics, digitaleconomy, quantitative finance, education, and consulting. The are experts in (and run courses in) the application of Machine Learning (ML) techniques to time series data, particularly Big Data and high-frequency data. Our areas of expertise also include the mathematics of ML, Deep Learning (DL), Python, and kdb+/q. A former quant and algorithmic trader at Deutsche Bank, Citi and Nomura, Paul also lectures part time at Imperial College London.*

* Have a confidential story, tip, or comment you’d like to share? Contact: sbutcher@efinancialcareers.com* in the first instance. Whatsapp/Signal/Telegram also available.

*Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.) *