Amazon SageMaker + Spark

Some screenshots and notes for my poor memory

ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker

  • Apache Spark is an open-source unified analytics engine for large-scale data processing. 
  • PCA = principal components analysis.

Collaborative Filtering

Deep Structure Semantic Module (DSSM)

  • A matrix factorization solution in its core is multiplication of 2 matrices.
  • Neural Networks are good at picking up semantic intent at phrase / sentence level.
  • Neural Networks are great at image captioning.
  • The output of a network is a tensor.
  • So we can use the output of several networks as our embedding layer for an enriched recommendation system.

Leave a Comment