Data analysis on market breadth data -

This post is an exercise to learn how to predict using different data on a machine learning model.

Background

Market breadth data and indicators are very popular in the investment world. I find them useful, and as I know them, I will use them as basis to experiment machine learning models.

Data Analysis

I have done the clean up of data in excel and then I have moved to a .CSV. There I have done a basic visual analysis on a Jupyter notebook. The result of the analysis can be found on this GitHub repository.

The correlation matrix of the raw data is:

I have created a relative matrix with the data, but the basic correlation is worse than the original one.

Applying models on the data

I have used basic regressor from XGBoost, and I have created 10 basic models changing the parameters a little bit, looking for a better mean absolute error.

from xgboost import XGBRegressor

model_1 = XGBRegressor(n_estimators=100, learning_rate=0.05)
model_2 = XGBRegressor(n_estimators=500, learning_rate=0.05)
model_3 = XGBRegressor(n_estimators=1000, learning_rate=0.05)
model_4 = XGBRegressor(n_estimators=2000, learning_rate=0.05)
model_5 = XGBRegressor(n_estimators=3000, learning_rate=0.05)
model_6 = XGBRegressor(n_estimators=100, learning_rate=0.1)
model_7 = XGBRegressor(n_estimators=500, learning_rate=0.1)
model_8 = XGBRegressor(n_estimators=1000, learning_rate=0.1)
model_9 = XGBRegressor(n_estimators=2000, learning_rate=0.1)
model_10 = XGBRegressor(n_estimators=3000, learning_rate=0.1)

models = [model_1, model_2, model_3, model_4, model_5, model_6, model_7, model_8, model_9, model_10]

Model 3 was the one with lower MAE, so I picked this one. Then I did the prediction and I plotted it. This was the result of the prediction on test data (from 1/1/2020 to 25/4/2021):

Result of model 3 prediction using a XGBoost regressor

The result is very funny, the model is able to predict the part of 2020 that was horrible to the investors (February – May) but the rest of the time is incapable to hit anything.

You can see the complete code and charts here:

https://github.com/joapen/ML-Learning-bucket/blob/main/marketbreadthdata_v2.ipynb

I will continue working on this, this is being funny.

Background

Data Analysis

The correlation matrix of the raw data is:

Applying models on the data

Leave a Comment Cancel reply