{"id":3664,"date":"2018-04-11T11:50:10","date_gmt":"2018-04-11T11:50:10","guid":{"rendered":"http:\/\/joapen.com\/blog\/?p=3664"},"modified":"2018-04-12T15:57:47","modified_gmt":"2018-04-12T15:57:47","slug":"machine-learning-source-errors","status":"publish","type":"post","link":"https:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/","title":{"rendered":"Machine learning, source of errors"},"content":{"rendered":"<h1>Before to start<\/h1>\n<p>What is an error?<\/p>\n<p>Observation prediction error = Target &#8211; Prediction = Bias + Variance + Noise<\/p>\n<h1>The main sources of errors are<\/h1>\n<ul>\n<li>Bias and\u00a0Variability (variance).<\/li>\n<li>Underfitting or overfitting.<\/li>\n<li>Underclustering or overclustering.<\/li>\n<li>Improper validation (after the training). It could be that comes from the wrong validation set. It is important to divide completely the training and validation processes to minimize this error, and document assumptions in detail.<\/li>\n<\/ul>\n<h1><a href=\"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/source-of-errors-diagram\/\" rel=\"attachment wp-att-3665\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3665 size-medium\" src=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg\" alt=\"\" width=\"300\" height=\"226\" srcset=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg 300w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-398x300.jpg 398w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram.jpg 566w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>Underfitting<\/h1>\n<p>This phenomenon happens when we have low variance and high bias.<\/p>\n<p>This happens typically when we have too few features and the final model we have is too simple.<\/p>\n<p>How can I prevent underfitting?<\/p>\n<ul>\n<li>Increase the number of features and hence the model complexity.<\/li>\n<li>If you are using a PCA, it applies a dimension reduction, so the step should be to unapply this dimension reduction.<\/li>\n<li>Perform cross-validation.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/underfitting\/\" rel=\"attachment wp-att-3666\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3666 size-full\" src=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/underfitting.jpg\" alt=\"\" width=\"637\" height=\"262\" srcset=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/underfitting.jpg 637w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/underfitting-300x123.jpg 300w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/underfitting-500x206.jpg 500w\" sizes=\"auto, (max-width: 637px) 100vw, 637px\" \/><\/a><\/p>\n<h1>Overfitting<\/h1>\n<p>This phenomenon happens when we have high variance and low bias.<\/p>\n<p>This happens typically when we have too many features and the final model we have is too complex.<\/p>\n<p>How can I prevent overfitting?<\/p>\n<ul>\n<li>Decrease the number of features and hence the complexity of the model.<\/li>\n<li>Perform a dimension reduction (PCA)<\/li>\n<li>Perform cross-validation.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/overfitting-2\/\" rel=\"attachment wp-att-3667\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3667\" src=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/overfitting.jpg\" alt=\"\" width=\"567\" height=\"243\" srcset=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/overfitting.jpg 567w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/overfitting-300x129.jpg 300w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/overfitting-500x214.jpg 500w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/a><\/p>\n<h1>Cross validation<\/h1>\n<p>This is one of the typical methods to reduce the appareance of errors on a machine learning solution. It consist on testing the model in many different contexts.<\/p>\n<p>You have to be careful when re-testing model on the same training\/test sets, the reason? this often leads you to underfitting or overfitting errors.<\/p>\n<p>The cross-validation tries to mitigate these behaviors.<\/p>\n<p>The typical way to enable cross validation is to divide the data set in different sections, so you use 1 for testing, and the others for validations. For instance, you can take a stock data set from 2010 to 2017, use the data from 2012 as testing dataset and use the other divisions by year for validation of your trading model.<\/p>\n<h1>Neural networks<\/h1>\n<p>They can be used to avoid that errors are backpropagated. The neural network helps you to minimize the error by adjusting the impact of the accumulation of data.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Before to start What is an error? Observation prediction error = Target &#8211; Prediction = Bias + Variance + Noise The main sources of errors are Bias and\u00a0Variability (variance). Underfitting or overfitting. Underclustering or overclustering. Improper validation (after the training). It could be that comes from the wrong validation set. It is important to divide &#8230; <a title=\"Machine learning, source of errors\" class=\"read-more\" href=\"https:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/\" aria-label=\"Read more about Machine learning, source of errors\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[151],"tags":[],"class_list":["post-3664","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Machine learning, source of errors -<\/title>\n<meta name=\"description\" content=\"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machine learning, source of errors -\" \/>\n<meta property=\"og:description\" content=\"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects\" \/>\n<meta property=\"og:url\" content=\"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/\" \/>\n<meta property=\"og:site_name\" content=\"joapen projects\" \/>\n<meta property=\"article:published_time\" content=\"2018-04-11T11:50:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-04-12T15:57:47+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg\" \/>\n<meta name=\"author\" content=\"joapen\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"joapen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#article\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/\"},\"author\":{\"name\":\"joapen\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"headline\":\"Machine learning, source of errors\",\"datePublished\":\"2018-04-11T11:50:10+00:00\",\"dateModified\":\"2018-04-12T15:57:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/\"},\"wordCount\":344,\"commentCount\":0,\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"image\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2018\\\/04\\\/source-of-errors-diagram-300x226.jpg\",\"articleSection\":[\"Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/\",\"name\":\"Machine learning, source of errors -\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#primaryimage\"},\"image\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2018\\\/04\\\/source-of-errors-diagram-300x226.jpg\",\"datePublished\":\"2018-04-11T11:50:10+00:00\",\"dateModified\":\"2018-04-12T15:57:47+00:00\",\"description\":\"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects\",\"breadcrumb\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#primaryimage\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2018\\\/04\\\/source-of-errors-diagram.jpg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2018\\\/04\\\/source-of-errors-diagram.jpg\",\"width\":566,\"height\":427},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2018\\\/04\\\/11\\\/machine-learning-source-errors\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machine learning, source of errors\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\",\"name\":\"joapen projects\",\"description\":\"Just a place to write\",\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\\\/\\\/joapen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\",\"name\":\"joapen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"width\":400,\"height\":400,\"caption\":\"joapen\"},\"logo\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\"},\"sameAs\":[\"http:\\\/\\\/www.joapen.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine learning, source of errors -","description":"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/","og_locale":"en_US","og_type":"article","og_title":"Machine learning, source of errors -","og_description":"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects","og_url":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/","og_site_name":"joapen projects","article_published_time":"2018-04-11T11:50:10+00:00","article_modified_time":"2018-04-12T15:57:47+00:00","og_image":[{"url":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg","type":"","width":"","height":""}],"author":"joapen","twitter_misc":{"Written by":"joapen","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#article","isPartOf":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/"},"author":{"name":"joapen","@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"headline":"Machine learning, source of errors","datePublished":"2018-04-11T11:50:10+00:00","dateModified":"2018-04-12T15:57:47+00:00","mainEntityOfPage":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/"},"wordCount":344,"commentCount":0,"publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"image":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#primaryimage"},"thumbnailUrl":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg","articleSection":["Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#respond"]}]},{"@type":"WebPage","@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/","url":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/","name":"Machine learning, source of errors -","isPartOf":{"@id":"http:\/\/joapen.com\/blog\/#website"},"primaryImageOfPage":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#primaryimage"},"image":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#primaryimage"},"thumbnailUrl":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram-300x226.jpg","datePublished":"2018-04-11T11:50:10+00:00","dateModified":"2018-04-12T15:57:47+00:00","description":"Before to start What is an error? Observation prediction error = Target - Prediction = Bias + Variance + Noise The main sources of errors are Bias - joapen projects","breadcrumb":{"@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#primaryimage","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram.jpg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2018\/04\/source-of-errors-diagram.jpg","width":566,"height":427},{"@type":"BreadcrumbList","@id":"http:\/\/joapen.com\/blog\/2018\/04\/11\/machine-learning-source-errors\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/joapen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Machine learning, source of errors"}]},{"@type":"WebSite","@id":"http:\/\/joapen.com\/blog\/#website","url":"http:\/\/joapen.com\/blog\/","name":"joapen projects","description":"Just a place to write","publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/joapen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217","name":"joapen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","width":400,"height":400,"caption":"joapen"},"logo":{"@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg"},"sameAs":["http:\/\/www.joapen.com"]}]}},"_links":{"self":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/3664","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/comments?post=3664"}],"version-history":[{"count":4,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/3664\/revisions"}],"predecessor-version":[{"id":3685,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/3664\/revisions\/3685"}],"wp:attachment":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/media?parent=3664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/categories?post=3664"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/tags?post=3664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}