**Data has become extremely valuable to many companies in the world. Huge data centres are being built to cope with the vast amount of it that is being transmitted every single second. Especially with the rise of AI and specifically machine learning, this information can be used to build models that can predict future outcomes with phenomenal accuracy. These techniques could also be very powerful when we want to predict stock markets. However, stock prices can be very volatile and investor sentiments can influence these prices as well, so how accurate are these machine learning models when we want to apply them to our financial markets?**

**Basic principles**

Machine learning tries to extract patterns from historical data to make predictions when a so-called ‘test’ dataset is given to the model. The model is first ‘trained’ by a big proportion of the data so it can learn how certain patterns emerge and use them when it has to predict new datasets. Then it goes through a ‘test’ dataset to figure out how accurate such a model is. So a machine learning programme is not explicitly programmed to make predictions. The model teaches itself by considering training sets.

There are many mathematical models that are used in machine learning. They all have different approaches and some are better in specific applications. However, the basic principle is still the same, as we feed the model data to train itself and then we test the model by predicting values in a test dataset. You may have heard of some of these mathematical models like the neural network, random forest or support vector machine. If you want to know more about neural networks then I recommend reading David Anthonio’s and Stan Koobs’ article about this model. It is based on how the human brain works and implements it with data. The figure below shows a simplified neural network.

This seems like a compelling theory which you are probably quite sceptical about, but this technique has proven to be very effective. As of today, machine learning is being used to predict user preferences on social media, medical diagnostics, self-driving cars, minimizing risk at pensions and many more. In finance, the method is used to detect fraud but is also used in high-frequency trading such as the company FlowTraders. All of these different applications, however, do not always use the same underlying model. For example, the neural networks model is very useful for speech recognition while a random forest will rather be used to predict decisions or help in the decision making process.

**SVM and news sentiments**

One of the most robust prediction methods is the support-vector machine (SVM) model. Given a training dataset, the SVM model classifies the data into two categories. Suppose a given set of data points each belong to a specific class, then the SVM tries to find a separating [expand title= hyperplane] Definition of a separating hyperplane: Let be nonzero convex sets, then a nonzero vector and real number such that: and . [/expand] that separates these classes with maximum margin. This margin can be seen as the nearest distance from one point of one class to the hyperplane. Then we choose the hyperplane that maximizes this distance from it to the nearest data point on each side. An example is given in the figure below, we see that is the hyperplane that separates these two classes with maximum margin, while and do not. Notice however that does separate the 2 classes but as has a larger distance between them we choose .

This SVM model is particularly good in recognizing handwritten characters and can be very helpful in text characterization. It is also widely used in computer vision, which is a special field in AI that tries to teach machines to understand videos and images like the human eye can. Returning to the subject of this article, which is how these kinds of models can be used to predict stock markets, the SVM model can particularly be used to predict investors sentiment. This can be done by analysing Twitter feeds of world leaders, financial newspapers or any text that is talking about a specific company or stock. Investors sentiment can play a huge role in the price of a stock for a given week. This has especially been the case since the corona pandemic. Stocks go down when Trump announces that there will be no more support for the economy from the government, while news about how well the economy is recovering drives the stocks upwards. A paper conducted by Wang (2020) researched how this SVM model can be used to predict investor sentiment based on news feeds and therefore predict how stocks will change. They found out that the SVM model has an accuracy of 60% and they concluded that this model is highly accurate for predicting these sentiments that have an effect on stock prices.

**Machine Learning vs Black-Scholes**

All these new methods to predict the prices of stocks are relatively new compared to the more conventional approach to stock pricing. However, these new methods do seem to outperform some of these conventional approaches when it comes to overall accuracy. One of the most popular and frequently used formulas for pricing options or any security in the financial market is the Black-Scholes [expand title= equation] The formula to price a call option from this equation is: and , where the parameters are defined as follows: is the current price of an asset; is the strike price; is the CDF of a standard normal distribution; is the interest rate and is the time to maturity. [/expand] (BSE). I have talked about this equation in one of my previous articles about differential equations, so if you want to know more about this equation you can read that article here. While the BSE is used to value the prices of options, it can also be used to find the prices of stocks. A paper by Chowdhury (2020) discovered that many machine learning methods are very close to the Black-Scholes prediction of stock prices. This may indicate that these methods are just as accurate as conventional formulas. However, the paper notes that some models can actually outperform the BSE due to the fact that Black-Scholes uses a risk and dividend parameter which change continuously.

Interestingly, the paper did not use one model to outperform the BSE, but they used multiple models. In machine learning, this is called ensemble learning. It is a technique, where multiple algorithms or learners are strategically generated and combined in order to solve one particular variable of interest, which in our case would be the stock prices. It is the combination of these models that makes this method a very powerful and accurate algorithm. This is because ensemble learning uses the individual models together to overcome errors and biases, which will have major implications on the accuracy of a prediction.

The figure above illustrates the sheer power of an ensemble model. In the figure, the dark blue line indicates the actual close price of a stock and the red line the ensemble prediction. It is clearly seen that the accuracy of the ensemble is very near to 100%, which is incredibly accurate. Although the neural network and decision tree come very close, the results from the paper show that combining multiple algorithms leads to better predictions. This also clearly shows that machine learning methods can outperform Black-Scholes predictions.

**What’s next?**

Machine learning has proven itself to be a very reliable and accurate method to predict and give a lot of insights in a stock’s price. Therefore, the value of it is becoming more apparent in the world of finance. Many companies have a big interest in this technology as it brings along many benefits such as increased revenues, better risk management, better customer experience and reduced operational cost as many operations can be automated with machine learning.

However, there are also some concerns about this method. As machine learning uses datasets which can include biases, these machines will also learn these biases and exhibit them back when used. For example, a firm that is somewhat racist in their hiring policy may result in a machine learning system that will reproduce this bias. This can be mended by responsible data collection and algorithmic rules that ensure these biases are not learned but punished.

Another concern for the financial world was written in a report by the World Economic Forum, where they warned that widespread use of AI technology can introduce new systematic and security risk in the system. This is because a tiny flaw in the algorithm could then result in terrible consequences. Something that they also warned about is the fact that the big tech companies that invest heavily in this technology will have a tremendous position in the market making them even more powerful.

In the meantime, machine learning algorithms are providing investment advice, combating fraud in finance, authenticating documents, trading on stock exchanges and gathering crucial information that might affect markets and investments. While machine learning algorithms are busy with all these tasks, they are learning and getting smarter, bringing the world closer to a completely automated financial system, which would amount to the ultimate achievement of machine learning in the financial market.

### References

Chowdhury, R., Mahdy, M. R. C., Alam, T. N., Al Quaderi, G. D., & Arifur Rahman, M. (2020). Predicting the stock price of frontier markets using machine learning and modified Black–Scholes Option pricing model.* Physica A: Statistical Mechanics and Its Applications, 555*, 124444. https://doi.org/10.1016/j.physa.2020.124444

Henrique, B. M., Sobreiro, V. A., & Kimura, H. (2019). Literature review: Machine learning techniques applied to financial market prediction.* Expert Systems with Applications, 124,* 226–251. https://doi.org/10.1016/j.eswa.2019.01.012

Wang, D., & Zhao, Y. (2020). Using News to Predict Investor Sentiment: Based on SVM Model. *Procedia Computer Science, 174,* 191–199. https://doi.org/10.1016/j.procs.2020.06.074

World Economic Forum (2020). How artificial intelligence is transforming the financial ecosystem. *The new physics of financial services*

*This article is written by Sam Ansari*