Machine Learning Planning and architectures

There are multiple types of projects on machine learning, so the phases and steps are different. I will try to reduce to some basic type of projects.

Basic project plans (main phases)

Machine learning solution based on a Product

  • Technology assessment = 2 – 3 days.
  • Production trial = 8 – 12 days.
  • Application deployment in production = 3 – 6 months.

Machine learning solution based on a platform

  • Proof of concept, and prepare business case = 2 – 4 weeks
  • Executive briefing with results = 2 – 4 hours.
  • Production trial = 8 – 12 days.
  • Application deployment in production = 3 – 6 months.

Six Sigma projects can be implemented using both approaches.

Solution Architecture

Some examples of architectures representation (just the main picture)

Architecture example based on C3.ai company.

Another example, from Microsoft:

Example of components of the Azure Machine Learning architecture.

Another example related to AWS:

AWS predictive maintenance example

 

As usual, the selection of the solution brand depends on the partnerships and the knowledge your organization has related to one or other brand, platform or company.

Tuning a Machine Learning Model

I continue taking some basic notes of the book “Enterprise Artificial Intelligence and Machine Learning for Managers“.

Tuning a machine learning model is an iterative process. Data scientists typically run
numerous experiments to train and evaluate models, trying out different features,
different loss functions, different AI/ML models, and adjusting model parameters
and hyper-parameters.

Feature engineering

Feature engineering broadly refers to mathematical transformations of raw data in order to feed appropriate signals into AI/ML models.

In real world data are derived from a variety of source systems and typically are not reconciled or aligned in time and space. Data scientists often put significant effort into defining data transformation pipelines and building out their feature vectors.

In addition, data scientists should implement requirements for feature normalization or scaling to ensure that no one feature overpowers the algorithm.

Loss Functions

A loss function serves as the objective function that the AI/ML algorithm is seeking to optimize during training efforts. During model training, the AI/ML algorithm aims to minimize the loss function. Data scientists often consider different loss functions to
improve the model – e.g., make the model less sensitive to outliers, better handle noise, or reduce over-fitting.

A simple example of a loss function is mean squared error (MSE), which often is used to optimize regression models. MSE measures the average of squared difference between predictions and actual output values.

These two linear regression models have the same MSE, but the model on
the left is under-predicting and the model on the right is over-predicting.

It is important, to recognize the weaknesses of loss functions. Over-relying on loss functions as an indicator of prediction accuracy may lead to erroneous model set points.

Regularization

Regularization is a method to balance over-fitting and under-fitting a model during training. Both over-fitting and under-fitting are problems that ultimately cause poor predictions on new data.

  • Over-fitting occurs when a machine learning model is tuned to learn the noise in the data rather than the patterns or trends in the data. A supervised model that is over-fit will typically perform well on data the model was trained on, but perform poorly on data the model has not seen before.
  • Under-fitting occurs when the machine learning model does not capture variations in the data – where the variations in data are not caused by noise. Such a model is considered to have high bias, or low variance. A supervised model that is under-fit will typically perform poorly on both data the model was trained on, and
    on data the model has not seen before.

Regularization helps to balance variance and bias during model training.

Regularization is a technique to adjust how closely a model is trained to fit historical data. One way to apply regularization is by adding a parameter that penalizes the loss function when the tuned model is overfit.

Hyper-parameters

Hyper-parameters are model parameters that are specified before training a model – i.e., parameters that are different from model parameters – or weights that an AI/ML model learns during model training.

Finding the best hyper-parameters is an iterative and potentially time intensive
process called “hyper-parameter optimization.”

Examples:

  • Number of hidden layers and the learning rate of deep neural network algorithms.
  • Number of leaves and depth of trees in decision tree algorithms.
  • Number of clusters in clustering algorithms.

To address the challenge of hyper-parameter optimization, data scientists use specific optimization algorithms designed for this task (i.e.: grid search, random search, and Bayesian optimization). These optimization approaches help narrow the search
space of all possible hyper-parameter combinations to find the best (or near best) result.

Enterprise Artificial Intelligence and Machine Learning for Managers

I decided to read this short book published by C3.ai that is focused for managers. The book focuses on concepts and gives you the basic nomenclature to understand how these type of initiatives are implemented. Later, when you review the product list of C3 company, you realize where they are classified in terms of the standard AI classification.

Below, some notes of the concepts I would like to review in future.

The Author is Nikhil Krishnan, PhD.

Machine Learning categories

Common categories of Machine Learning algorithms

Main types of supervised learning

Supervised learning algorithms learn by tuning a set of model parameters that operate on the model’s inputs, and that best fit the set of outputs.

Examples of classification and regression techniques

Unsupervised learning

Unsupervised learning techniques operate without known outputs or observations – that is, these techniques are not trying to predict any specific outcomes. Instead, unsupervised techniques attempt to uncover patterns within data sets.

Unsupervised techniques include clustering algorithms that group data in meaningful ways.

Clustering algorithms

Unsupervised machine learning models do not require labels to train on past data. Instead, they automatically
detect patterns in data to generate predictions.

Dimensionality reduction

Dimensionality reduction is a powerful approach to construct a low-dimensional representation of high-dimensional input data. The purpose of dimensionality reduction is to reduce noise so that a model can identify strong signals among complex inputs – i.e., to identify useful information.

High dimensionality poses two challenges. First, it is hard for a person to conceptualize high-dimensional space, meaning that interpreting a model is non-intuitive. Second, algorithms have a hard time learning patterns when there are many sources of input data relative to the amount of available training data.

Example of an unsupervised machine learning model for anomaly detection.

Reinforcement learning

Reinforcement learning (RL) is a category of machine learning that uses a trial-and-error approach. RL is a more goal-directed learning approach than either supervised or unsupervised machine learning.

Deep Learning

Deep learning is a subset of machine learning that involves the application of complex, multi-layered artificial neural networks to solve problems.

Deep learning takes advantage of yet another step change in compute capabilities. Deep learning models are typically compute-intensive to train and much harder to interpret than conventional approaches.

A deep learning neural network is a collection of many nodes. The nodes are organized into layers, and the outputs from neurons in one layer become the inputs for the nodes in the next layer.

Single nodes are combined to form input, output, and hidden layers of a deep learning neural network.