# Data analysis on market breadth data

This post is an exercise to learn how to predict using different data on a machine learning model.

## Background

Market breadth data and indicators are very popular in the investment world. I find them useful, and as I know them, I will use them as basis to experiment machine learning models.

## Data Analysis

I have done the clean up of data in excel and then I have moved to a .CSV. There I have done a basic visual analysis on a Jupyter notebook. The result of the analysis can be found on this GitHub repository.

### The correlation matrix of the raw data is:

I have created a relative matrix with the data, but the basic correlation is worse than the original one.

## Applying models on the data

I have used basic regressor from XGBoost, and I have created 10 basic models changing the parameters a little bit, looking for a better mean absolute error.

``````from xgboost import XGBRegressor

model_1 = XGBRegressor(n_estimators=100, learning_rate=0.05)
model_2 = XGBRegressor(n_estimators=500, learning_rate=0.05)
model_3 = XGBRegressor(n_estimators=1000, learning_rate=0.05)
model_4 = XGBRegressor(n_estimators=2000, learning_rate=0.05)
model_5 = XGBRegressor(n_estimators=3000, learning_rate=0.05)
model_6 = XGBRegressor(n_estimators=100, learning_rate=0.1)
model_7 = XGBRegressor(n_estimators=500, learning_rate=0.1)
model_8 = XGBRegressor(n_estimators=1000, learning_rate=0.1)
model_9 = XGBRegressor(n_estimators=2000, learning_rate=0.1)
model_10 = XGBRegressor(n_estimators=3000, learning_rate=0.1)

models = [model_1, model_2, model_3, model_4, model_5, model_6, model_7, model_8, model_9, model_10]``````

Model 3 was the one with lower MAE, so I picked this one. Then I did the prediction and I plotted it. This was the result of the prediction on test data (from 1/1/2020 to 25/4/2021):