Machine Learning project: Agile or Waterfall approach?

This question is so easy: agile approach. Why? Because it recognizes that the construction of the solution requires different loops. Reason 1: ML models change overtime Machine Learning projects are supported on ML models, and models change overtime. Why do a model change overtime? A model change overtime because the data used to train the … Read more

AWS Sagemaker

These are some notes about the basis of Sagemaker Sagemaker services SageMaker Neo optimizes the trained model and compiles it into an executable. Taking the target hardware where the model will be run as input; the compiler uses a ML model to apply performance optimizations on your model. Ground truth makes easy to label data. … Read more

CRISP-DM methodology

The cross-industry standard process for data mining or CRISP-DM is an open standard process framework model for data mining project planning, created in 1996. The process of CRISP-DM is into 6 phases or components: Business understanding – What does the business need? Data understanding – What data do we have / need? Is it clean? Data preparation – How do we organize … Read more

Box-cox transformation

These are reminder notes about Box-cox transformation. One of the problems that box-cox transformation tries to solve is “heteroscedasticity” (non-constant variance). This article explains the problem where you can apply box-cox transformation to solve it: https://blog.minitab.com/en/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformation SciPy has added an inverse Box-Cox transformation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.inv_boxcox.html Does Box-cox always work? The answer is NO. Box-cox does not … Read more

Time Series notes

I have done this course proposed by Kaggle, and I would like to take some notes. The trend component of a time series represents a persistent, long-term change in the mean of the series. We mainly have: Time dependent properties: trends and seasonality. Serial dependent properties: Cycles and lagged series 1. Trends The trend component of a time series … Read more

Intermediate Machine Learning, by Kaggle

Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which is especially important if you are making a lot of modeling decisions.  Use pipelines for doing cross-validation, you will save a lot of time. XGBoost = Gradient boosting We refer to … Read more