{"id":6751,"date":"2022-10-17T14:43:55","date_gmt":"2022-10-17T12:43:55","guid":{"rendered":"https:\/\/joapen.com\/blog\/?p=6751"},"modified":"2022-10-17T14:44:00","modified_gmt":"2022-10-17T12:44:00","slug":"amazon-sagemaker-spark","status":"publish","type":"post","link":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/","title":{"rendered":"Amazon SageMaker + Spark"},"content":{"rendered":"\n<p>Some screenshots and notes for my poor memory<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Apache Spark<\/strong>\u00a0is an\u00a0open-source\u00a0unified analytics engine for large-scale data processing.\u00a0<\/li><li>PCA = principal components analysis.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"519\" src=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png\" alt=\"\" class=\"wp-image-6753\" srcset=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png 1024w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-300x152.png 300w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-768x389.png 768w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13.png 1058w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Collaborative Filtering<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"882\" height=\"464\" src=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-16.png\" alt=\"\" class=\"wp-image-6756\" srcset=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-16.png 882w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-16-300x158.png 300w, https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-16-768x404.png 768w\" sizes=\"auto, (max-width: 882px) 100vw, 882px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Deep Structure Semantic Module (DSSM)<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>A matrix factorization solution in its core is multiplication of 2 matrices.<\/li><li>Neural Networks are good at picking up semantic intent at phrase \/ sentence level.<\/li><li>Neural Networks are great at image captioning.<\/li><li>The output of a network is a tensor.<\/li><li>So we can use the output of several networks as our embedding layer for an enriched recommendation system.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified analytics engine for large-scale data processing.\u00a0 PCA = principal components analysis. Collaborative Filtering Deep Structure Semantic Module (DSSM) A matrix factorization solution in its core is multiplication of 2 matrices. Neural Networks are &#8230; <a title=\"Amazon SageMaker + Spark\" class=\"read-more\" href=\"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/\" aria-label=\"Read more about Amazon SageMaker + Spark\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66,151],"tags":[],"class_list":["post-6751","post","type-post","status-publish","format-standard","hentry","category-aws","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Amazon SageMaker + Spark -<\/title>\n<meta name=\"description\" content=\"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Amazon SageMaker + Spark -\" \/>\n<meta property=\"og:description\" content=\"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects\" \/>\n<meta property=\"og:url\" content=\"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"joapen projects\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-17T12:43:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-10-17T12:44:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png\" \/>\n<meta name=\"author\" content=\"joapen\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"joapen\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/\"},\"author\":{\"name\":\"joapen\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"headline\":\"Amazon SageMaker + Spark\",\"datePublished\":\"2022-10-17T12:43:55+00:00\",\"dateModified\":\"2022-10-17T12:44:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/\"},\"wordCount\":102,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"image\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/10\\\/image-13-1024x519.png\",\"articleSection\":[\"AWS\",\"Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/\",\"name\":\"Amazon SageMaker + Spark -\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/10\\\/image-13-1024x519.png\",\"datePublished\":\"2022-10-17T12:43:55+00:00\",\"dateModified\":\"2022-10-17T12:44:00+00:00\",\"description\":\"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#primaryimage\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/10\\\/image-13-1024x519.png\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/10\\\/image-13-1024x519.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2022\\\/10\\\/17\\\/amazon-sagemaker-spark\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/joapen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Amazon SageMaker + Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/\",\"name\":\"joapen projects\",\"description\":\"Just a place to write\",\"publisher\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/joapen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\",\"name\":\"joapen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"width\":400,\"height\":400,\"caption\":\"joapen\"},\"logo\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\"},\"sameAs\":[\"http:\\\/\\\/www.joapen.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Amazon SageMaker + Spark -","description":"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/","og_locale":"en_US","og_type":"article","og_title":"Amazon SageMaker + Spark -","og_description":"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects","og_url":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/","og_site_name":"joapen projects","article_published_time":"2022-10-17T12:43:55+00:00","article_modified_time":"2022-10-17T12:44:00+00:00","og_image":[{"url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png","type":"","width":"","height":""}],"author":"joapen","twitter_misc":{"Written by":"joapen"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#article","isPartOf":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/"},"author":{"name":"joapen","@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"headline":"Amazon SageMaker + Spark","datePublished":"2022-10-17T12:43:55+00:00","dateModified":"2022-10-17T12:44:00+00:00","mainEntityOfPage":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/"},"wordCount":102,"commentCount":0,"publisher":{"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"image":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png","articleSection":["AWS","Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/","url":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/","name":"Amazon SageMaker + Spark -","isPartOf":{"@id":"https:\/\/joapen.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#primaryimage"},"image":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png","datePublished":"2022-10-17T12:43:55+00:00","dateModified":"2022-10-17T12:44:00+00:00","description":"Some screenshots and notes for my poor memory ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker Apache Spark\u00a0is an\u00a0open-source\u00a0unified - joapen projects","breadcrumb":{"@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#primaryimage","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2022\/10\/image-13-1024x519.png"},{"@type":"BreadcrumbList","@id":"https:\/\/joapen.com\/blog\/2022\/10\/17\/amazon-sagemaker-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/joapen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Amazon SageMaker + Spark"}]},{"@type":"WebSite","@id":"https:\/\/joapen.com\/blog\/#website","url":"https:\/\/joapen.com\/blog\/","name":"joapen projects","description":"Just a place to write","publisher":{"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/joapen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217","name":"joapen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","width":400,"height":400,"caption":"joapen"},"logo":{"@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg"},"sameAs":["http:\/\/www.joapen.com"]}]}},"_links":{"self":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/6751","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/comments?post=6751"}],"version-history":[{"count":2,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/6751\/revisions"}],"predecessor-version":[{"id":6760,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/6751\/revisions\/6760"}],"wp:attachment":[{"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/media?parent=6751"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/categories?post=6751"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/tags?post=6751"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}