What is overfitting?
when you are preparing a machine learning solution, you work basically with data sets that contains:
- Data: relevant and/or important data.
- Noise: inrelevant and/or non important data.
With this data you want to identify a trigger, a signal that responds to your target pattern you want that your code identifies.
So you start identifying a pattern and you work to improve it.
Suddenly, you improve your pattern identification so much, till a point where you will be not just using the data but your pattern is also using the noise side of the data to trigger the signal.
This phenomenon is not desired, and it is what is called overfitting.
In the picture from the left:
- The black line represents a healthy pattern.
- The green line represents an overfitted pattern.