This post is an exercise to learn how to predict using different data on a machine learning model.
Market breadth data and indicators are very popular in the investment world. I find them useful, and as I know them, I will use them as basis to experiment machine learning models.
I have done the clean up of data in excel and then I have moved to a .CSV. There I have done a basic visual analysis on a Jupyter notebook. The result of the analysis can be found on this GitHub repository.
The correlation matrix of the raw data is:
I have created a relative matrix with the data, but the basic correlation is worse than the original one.
Applying models on the data
I have used basic regressor from XGBoost, and I have created 10 basic models changing the parameters a little bit, looking for a better mean absolute error.
from xgboost import XGBRegressor model_1 = XGBRegressor(n_estimators=100, learning_rate=0.05) model_2 = XGBRegressor(n_estimators=500, learning_rate=0.05) model_3 = XGBRegressor(n_estimators=1000, learning_rate=0.05) model_4 = XGBRegressor(n_estimators=2000, learning_rate=0.05) model_5 = XGBRegressor(n_estimators=3000, learning_rate=0.05) model_6 = XGBRegressor(n_estimators=100, learning_rate=0.1) model_7 = XGBRegressor(n_estimators=500, learning_rate=0.1) model_8 = XGBRegressor(n_estimators=1000, learning_rate=0.1) model_9 = XGBRegressor(n_estimators=2000, learning_rate=0.1) model_10 = XGBRegressor(n_estimators=3000, learning_rate=0.1) models = [model_1, model_2, model_3, model_4, model_5, model_6, model_7, model_8, model_9, model_10]
Model 3 was the one with lower MAE, so I picked this one. Then I did the prediction and I plotted it. This was the result of the prediction on test data (from 1/1/2020 to 25/4/2021):
The result is very funny, the model is able to predict the part of 2020 that was horrible to the investors (February – May) but the rest of the time is incapable to hit anything.
You can see the complete code and charts here:
I will continue working on this, this is being funny.