Some screenshots and notes for my poor memory
ML Pipeline with PCA on Spark, and K-Means on Amazon SageMaker
- Apache Spark is an open-source unified analytics engine for large-scale data processing.
- PCA = principal components analysis.
Collaborative Filtering
Deep Structure Semantic Module (DSSM)
- A matrix factorization solution in its core is multiplication of 2 matrices.
- Neural Networks are good at picking up semantic intent at phrase / sentence level.
- Neural Networks are great at image captioning.
- The output of a network is a tensor.
- So we can use the output of several networks as our embedding layer for an enriched recommendation system.