{"id":5927,"date":"2021-11-22T11:42:49","date_gmt":"2021-11-22T10:42:49","guid":{"rendered":"https:\/\/joapen.com\/blog\/?p=5927"},"modified":"2021-11-22T11:42:53","modified_gmt":"2021-11-22T10:42:53","slug":"intermediate-machine-learning-by-kaggle","status":"publish","type":"post","link":"http:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/","title":{"rendered":"Intermediate Machine Learning, by Kaggle"},"content":{"rendered":"\n<p>Some notes of this course offered by Kaggle, for my poor memory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Cross validation<\/h2>\n\n\n\n<p>Cross-validation gives a more accurate measure of model quality, which is especially important if you are making a lot of modeling decisions.\u00a0<\/p>\n\n\n\n<p>Use pipelines for doing cross-validation, you will save a lot of time.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/i.imgur.com\/9k60cVA.png\" alt=\"tut5_crossval\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">XGBoost = Gradient boosting<\/h2>\n\n\n\n<p>We refer to the random forest method as an &#8220;ensemble method&#8221;. By definition,\u00a0<strong>ensemble methods<\/strong>\u00a0combine the predictions of several models (e.g., several trees, in the case of random forests).<\/p>\n\n\n\n<p>Gradient boosting is an ensemble method too that goes through cycles to iteratively add models into an ensemble.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Naive model is build as initial prediction that will be used as basis for following predictions.<\/li><li>Make predictions, we use the first prediction or the iterative predictions to generate predictions for each observation in the dataset. These predictions are added in the ensemble.<\/li><li>Calculate loss, we track the data returned by a loss function (like mean square error (MAE)).<\/li><li>Train new model, we use the loss function to fit a new model that will be added to the ensemble. Specifically, we determine model parameters so that adding this new model to the ensemble will reduce the loss. (<em>Side note: The &#8220;gradient&#8221; in &#8220;gradient boosting&#8221; refers to the fact that we&#8217;ll use\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">gradient descent<\/a>\u00a0on the loss function to determine the parameters in this new model.<\/em>).<\/li><li>Add new model to ensemble, and repeat again and again.<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/i.imgur.com\/MvCGENh.png\" alt=\"tut6_boosting\"\/><figcaption>Gradient Boosting process<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Data leakage<\/h2>\n\n\n\n<p>Data leakage happens when your training data contains information about the target, but similar data will not be available when the model is used for prediction. This leads you to obtain high performance on the training set (and possibly even the validation data), but low performance in production (and a lot of frustration).<\/p>\n\n\n\n<p>Types of data leakage:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Target leakage: it occurs when your predictors include data that will not be available at the time you make predictions.<\/li><li>Train-test leakage: it occurs when you aren&#8217;t careful to distinguish training data from validation data.<\/li><\/ul>\n\n\n\n<p>How to prevent them:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Target leakage: Any variable updated (or created) after the target value is realized should be excluded.<\/li><li>Train-test leakage: take care separating the train and validation data properly.<\/li><\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which is especially important if you are making a lot of modeling decisions.\u00a0 Use pipelines for doing cross-validation, you will save a lot of time. XGBoost = Gradient boosting We refer to &#8230; <a title=\"Intermediate Machine Learning, by Kaggle\" class=\"read-more\" href=\"http:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/\" aria-label=\"Read more about Intermediate Machine Learning, by Kaggle\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[151],"tags":[],"class_list":["post-5927","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Intermediate Machine Learning, by Kaggle -<\/title>\n<meta name=\"description\" content=\"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Intermediate Machine Learning, by Kaggle -\" \/>\n<meta property=\"og:description\" content=\"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects\" \/>\n<meta property=\"og:url\" content=\"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/\" \/>\n<meta property=\"og:site_name\" content=\"joapen projects\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-22T10:42:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-11-22T10:42:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i.imgur.com\/9k60cVA.png\" \/>\n<meta name=\"author\" content=\"joapen\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"joapen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/\"},\"author\":{\"name\":\"joapen\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"headline\":\"Intermediate Machine Learning, by Kaggle\",\"datePublished\":\"2021-11-22T10:42:49+00:00\",\"dateModified\":\"2021-11-22T10:42:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/\"},\"wordCount\":370,\"commentCount\":0,\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"image\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i.imgur.com\\\/9k60cVA.png\",\"articleSection\":[\"Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/\",\"name\":\"Intermediate Machine Learning, by Kaggle -\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i.imgur.com\\\/9k60cVA.png\",\"datePublished\":\"2021-11-22T10:42:49+00:00\",\"dateModified\":\"2021-11-22T10:42:53+00:00\",\"description\":\"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i.imgur.com\\\/9k60cVA.png\",\"contentUrl\":\"https:\\\/\\\/i.imgur.com\\\/9k60cVA.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2021\\\/11\\\/22\\\/intermediate-machine-learning-by-kaggle\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Intermediate Machine Learning, by Kaggle\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\",\"name\":\"joapen projects\",\"description\":\"Just a place to write\",\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\\\/\\\/joapen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\",\"name\":\"joapen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"width\":400,\"height\":400,\"caption\":\"joapen\"},\"logo\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\"},\"sameAs\":[\"http:\\\/\\\/www.joapen.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Intermediate Machine Learning, by Kaggle -","description":"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/","og_locale":"en_US","og_type":"article","og_title":"Intermediate Machine Learning, by Kaggle -","og_description":"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects","og_url":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/","og_site_name":"joapen projects","article_published_time":"2021-11-22T10:42:49+00:00","article_modified_time":"2021-11-22T10:42:53+00:00","og_image":[{"url":"https:\/\/i.imgur.com\/9k60cVA.png","type":"","width":"","height":""}],"author":"joapen","twitter_misc":{"Written by":"joapen","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#article","isPartOf":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/"},"author":{"name":"joapen","@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"headline":"Intermediate Machine Learning, by Kaggle","datePublished":"2021-11-22T10:42:49+00:00","dateModified":"2021-11-22T10:42:53+00:00","mainEntityOfPage":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/"},"wordCount":370,"commentCount":0,"publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"image":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#primaryimage"},"thumbnailUrl":"https:\/\/i.imgur.com\/9k60cVA.png","articleSection":["Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/","url":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/","name":"Intermediate Machine Learning, by Kaggle -","isPartOf":{"@id":"http:\/\/joapen.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#primaryimage"},"image":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#primaryimage"},"thumbnailUrl":"https:\/\/i.imgur.com\/9k60cVA.png","datePublished":"2021-11-22T10:42:49+00:00","dateModified":"2021-11-22T10:42:53+00:00","description":"Some notes of this course offered by Kaggle, for my poor memory. Cross validation Cross-validation gives a more accurate measure of model quality, which - joapen projects","breadcrumb":{"@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#primaryimage","url":"https:\/\/i.imgur.com\/9k60cVA.png","contentUrl":"https:\/\/i.imgur.com\/9k60cVA.png"},{"@type":"BreadcrumbList","@id":"https:\/\/joapen.com\/blog\/2021\/11\/22\/intermediate-machine-learning-by-kaggle\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/joapen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Intermediate Machine Learning, by Kaggle"}]},{"@type":"WebSite","@id":"http:\/\/joapen.com\/blog\/#website","url":"http:\/\/joapen.com\/blog\/","name":"joapen projects","description":"Just a place to write","publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/joapen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217","name":"joapen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","width":400,"height":400,"caption":"joapen"},"logo":{"@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg"},"sameAs":["http:\/\/www.joapen.com"]}]}},"_links":{"self":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/5927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/comments?post=5927"}],"version-history":[{"count":1,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/5927\/revisions"}],"predecessor-version":[{"id":5928,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/5927\/revisions\/5928"}],"wp:attachment":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/media?parent=5927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/categories?post=5927"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/tags?post=5927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}