Amazon SageMaker + Spark

October 17, 2022 by joapen

Some screenshots and notes for my poor memory

ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker

Apache Spark is an open-source unified analytics engine for large-scale data processing.
PCA = principal components analysis.

Collaborative Filtering

Deep Structure Semantic Module (DSSM)

A matrix factorization solution in its core is multiplication of 2 matrices.
Neural Networks are good at picking up semantic intent at phrase / sentence level.
Neural Networks are great at image captioning.
The output of a network is a tensor.
So we can use the output of several networks as our embedding layer for an enriched recommendation system.

Leave a Comment Cancel reply