Machine Learning: map and players

This post is a mix review of Machine Learning type of solutions the market offers, and a quick review of some players I have in my mind.

Machine Learning Wardley Map

Components:

  1. Machine Learning can be used by companies and individuals. The B2B and B2C is important when you look at the perspective that many individuals don’t know the are using solutions based on machine learning every day.
  2. Data scientific: I have centered as the element creating the machine learning solution, which is not completely true, as around the person or team creating a machine learning solution for a big organization there are many stakeholders working around it: operations, data analysts, marketing, legal department….
  3. Specific solution: it’s a solution for a given problem that is pre-build or build and it’s available to be acquired by a company or a person. Usually they are centered on an industry and they are very specific to a niche or a given problem. We will not find here generic problems or generic solutions.
  4. Machine learning development kit and platform : they are the tool and the environment available for the data scientific. The existence of cloud solutions and industrialized solutions enable an individual to have available environments and development kits to work on Machine Learning. We cannot forget the amount of resources available are incredible.

After drawing this simple map, I got distracted by stocks and I put my attention on components 3 and 4: specific solutions and tool + platform. The result of being 2 hours looking around, I got these 2 tables: This second table have some spaces, which mean there is not solution on that area for the given company (at least I have not seen it).

Some notes that I have not included in the pictures:

  • Impressive work done by Google on many areas, their capacity and amount of available resources offered to a person like me are infinite.
  • Facebook is a refined machine of data and algorithms that seems to work in a perfect way taking into account that what they are managing is very difficult: people’s opinions and behaviors. They are investing a lot on virtual reality and augmented reality.
  • Robotics, you have Amazon, and then the rest of the world. The day they deciede to sell their robotic solutions, it’s going to be interesting what happens in the industry.
  • Palantir has a good bunch of solutions and they are very closed to big clients. They are attending customer issues and have a lot of work to be done.
  • C3.ai, same thing as Palantir, they are attending end customer issues with their solutions. By the moment they have so many industrial customers, they have to demonstrate they can attend other type of customers with a good volume of Purchase Orders.
  • There is space for all of them and other players. This just have started.

What are your thoughts about Machine Learning solutions and its players?

 

Selecting the right Machine Learning problem

This post is very boring, there are just questions and no answers.

I will start with 2 questions:

  1. Which problem of my organization is the right one that investing on Machine Learning solution I will capture valuable return?
  2. What are the features of a problem that makes it the appropriate one for a ML solution?

The answers:

For question #1, I do not have an answer. For question #2 I learned some suggestions from book Enterprise Artificial Intelligence and Machine Learning for Managers.

  1. The problem are tractable with a reasonable scope and solution times.
  2. Unlock sufficient business value and can be operationalized so you can capture the value.
  3. Address ethical considerations.

Other secondary questions:

  1. Do we have enough data to track the problem?
  2. Do we know the economic value of the problem?
  3. Can we measure the performance of the business function where we pretend to apply the Machine Learning solution with respect a baseline?
  4. Do we have data sets that are fair?
  5. Does have our potential solution the right balance with between fairness and bias?
  6. Did we take into account potential safety issues?
  7. Can the solution approach being explainable?
  8. Is the solution approach transparent and easily understandable?
  9. What is the advantage of using a Machine Learning solution instead of another solution?
  10. Can we classify different issues or business cases in terms of priority?
  11. Do the different business cases have relation with others?

Machine Learning Planning and architectures

There are multiple types of projects on machine learning, so the phases and steps are different. I will try to reduce to some basic type of projects.

Basic project plans (main phases)

Machine learning solution based on a Product

  • Technology assessment = 2 – 3 days.
  • Production trial = 8 – 12 days.
  • Application deployment in production = 3 – 6 months.

Machine learning solution based on a platform

  • Proof of concept, and prepare business case = 2 – 4 weeks
  • Executive briefing with results = 2 – 4 hours.
  • Production trial = 8 – 12 days.
  • Application deployment in production = 3 – 6 months.

Six Sigma projects can be implemented using both approaches.

Solution Architecture

Some examples of architectures representation (just the main picture)

Architecture example based on C3.ai company.

Another example, from Microsoft:

Example of components of the Azure Machine Learning architecture.

Another example related to AWS:

AWS predictive maintenance example

 

As usual, the selection of the solution brand depends on the partnerships and the knowledge your organization has related to one or other brand, platform or company.

Tuning a Machine Learning Model

I continue taking some basic notes of the book “Enterprise Artificial Intelligence and Machine Learning for Managers“.

Tuning a machine learning model is an iterative process. Data scientists typically run
numerous experiments to train and evaluate models, trying out different features,
different loss functions, different AI/ML models, and adjusting model parameters
and hyper-parameters.

Feature engineering

Feature engineering broadly refers to mathematical transformations of raw data in order to feed appropriate signals into AI/ML models.

In real world data are derived from a variety of source systems and typically are not reconciled or aligned in time and space. Data scientists often put significant effort into defining data transformation pipelines and building out their feature vectors.

In addition, data scientists should implement requirements for feature normalization or scaling to ensure that no one feature overpowers the algorithm.

Loss Functions

A loss function serves as the objective function that the AI/ML algorithm is seeking to optimize during training efforts. During model training, the AI/ML algorithm aims to minimize the loss function. Data scientists often consider different loss functions to
improve the model – e.g., make the model less sensitive to outliers, better handle noise, or reduce over-fitting.

A simple example of a loss function is mean squared error (MSE), which often is used to optimize regression models. MSE measures the average of squared difference between predictions and actual output values.

These two linear regression models have the same MSE, but the model on
the left is under-predicting and the model on the right is over-predicting.

It is important, to recognize the weaknesses of loss functions. Over-relying on loss functions as an indicator of prediction accuracy may lead to erroneous model set points.

Regularization

Regularization is a method to balance over-fitting and under-fitting a model during training. Both over-fitting and under-fitting are problems that ultimately cause poor predictions on new data.

  • Over-fitting occurs when a machine learning model is tuned to learn the noise in the data rather than the patterns or trends in the data. A supervised model that is over-fit will typically perform well on data the model was trained on, but perform poorly on data the model has not seen before.
  • Under-fitting occurs when the machine learning model does not capture variations in the data – where the variations in data are not caused by noise. Such a model is considered to have high bias, or low variance. A supervised model that is under-fit will typically perform poorly on both data the model was trained on, and
    on data the model has not seen before.

Regularization helps to balance variance and bias during model training.

Regularization is a technique to adjust how closely a model is trained to fit historical data. One way to apply regularization is by adding a parameter that penalizes the loss function when the tuned model is overfit.

Hyper-parameters

Hyper-parameters are model parameters that are specified before training a model – i.e., parameters that are different from model parameters – or weights that an AI/ML model learns during model training.

Finding the best hyper-parameters is an iterative and potentially time intensive
process called “hyper-parameter optimization.”

Examples:

  • Number of hidden layers and the learning rate of deep neural network algorithms.
  • Number of leaves and depth of trees in decision tree algorithms.
  • Number of clusters in clustering algorithms.

To address the challenge of hyper-parameter optimization, data scientists use specific optimization algorithms designed for this task (i.e.: grid search, random search, and Bayesian optimization). These optimization approaches help narrow the search
space of all possible hyper-parameter combinations to find the best (or near best) result.

Enterprise Artificial Intelligence and Machine Learning for Managers

I decided to read this short book published by C3.ai that is focused for managers. The book focuses on concepts and gives you the basic nomenclature to understand how these type of initiatives are implemented. Later, when you review the product list of C3 company, you realize where they are classified in terms of the standard AI classification.

Below, some notes of the concepts I would like to review in future.

The Author is Nikhil Krishnan, PhD.

Machine Learning categories

Common categories of Machine Learning algorithms

Main types of supervised learning

Supervised learning algorithms learn by tuning a set of model parameters that operate on the model’s inputs, and that best fit the set of outputs.

Examples of classification and regression techniques

Unsupervised learning

Unsupervised learning techniques operate without known outputs or observations – that is, these techniques are not trying to predict any specific outcomes. Instead, unsupervised techniques attempt to uncover patterns within data sets.

Unsupervised techniques include clustering algorithms that group data in meaningful ways.

Clustering algorithms

Unsupervised machine learning models do not require labels to train on past data. Instead, they automatically
detect patterns in data to generate predictions.

Dimensionality reduction

Dimensionality reduction is a powerful approach to construct a low-dimensional representation of high-dimensional input data. The purpose of dimensionality reduction is to reduce noise so that a model can identify strong signals among complex inputs – i.e., to identify useful information.

High dimensionality poses two challenges. First, it is hard for a person to conceptualize high-dimensional space, meaning that interpreting a model is non-intuitive. Second, algorithms have a hard time learning patterns when there are many sources of input data relative to the amount of available training data.

Example of an unsupervised machine learning model for anomaly detection.

Reinforcement learning

Reinforcement learning (RL) is a category of machine learning that uses a trial-and-error approach. RL is a more goal-directed learning approach than either supervised or unsupervised machine learning.

Deep Learning

Deep learning is a subset of machine learning that involves the application of complex, multi-layered artificial neural networks to solve problems.

Deep learning takes advantage of yet another step change in compute capabilities. Deep learning models are typically compute-intensive to train and much harder to interpret than conventional approaches.

A deep learning neural network is a collection of many nodes. The nodes are organized into layers, and the outputs from neurons in one layer become the inputs for the nodes in the next layer.

Single nodes are combined to form input, output, and hidden layers of a deep learning neural network.

Year 4, Q1 personal readjustments and Artificial Intelligence

During my last learning slot I was focusing on the Wardley maps learning, the generation of content, the increase of deep knowledge and practice of mapping. This has been the second quarter I focused on it and I have the feeling I have a speed of work that can let me go to another chapter.

This quarter I want to do some readjustments about how I work on my priorities and learnings, focusing on habits and my schedule. Why? I have the feeling I’m working in a way that is not focused on the things that are more important and I need to readjust some habits to improve on the real priorities of my life.

This doesn’t mean that I will not add specific learning on a field of knowledge, I will do it, and I was thinking about it so much during Christmas. The focus this time is going to be around some specific areas of Artificial Intelligence. Artificial Intelligence is a broad knowledge field, so I want to understand the topologies of the different areas and review the learnings I did during 2018.

So the V2MOM for this quarter will be:

  • Vision: Readjust a set of habits and learn about Artificial Intelligence areas.
  • Values: have fun, change habits day by day.
  • Method:
    • For the readjustments, use the principles and suggestions I learned from Triggers book. Add to it some meditation.
    • Artificial Intelligence: read about AI and gain perspective of the work being done, draw some maps that provides me context and understand how these type of projects are planned and executed.
  • Obstacles:
    • Time,
    • Aversion to do some activities that are not comfortable to me.
    • Ariel’s surgery.
  • Measures:
    • Readjustments of behavior:
      • Reschedule the day agenda (once it’s defined, check how much is followed). There are 60 labor days this quarter, so at least do 45 days.
      • Follow the “Daily questions” challenge. There are 90 days this quarter, so complete it at least 70 days.
      • Take the meditation audios and complete at least 70 days.
      • Follow the measures on a notebook: reduce time on computer.
    • Artificial Intelligence:
      • Read 1 book related to the area (1 per quarter).
      • Listen 13 hours of Artificial Intelligence podcasts (1 per week).
      • Draw at least 3 maps related to a specific topic on Artificial Intelligence (1 per month).

Death line = 31/March/2021

Results (April 2021)

  • Measures (Actual / target):
    • .

.

Quantitative trading on cryptocurrency market Q3

This is the second chapter of a learning process that started last September.

Third Quarter

The third step is defined for the next 3 months, where the main goal is to define a specific strategy of quantitative trading and work on it with real money on crypto currency market.

Following the V2MOM model:

  • Vision: Have a strategy running in crypto currency market running not with a period of 2 – 3 hours, but some days (stop operating at 3m).
  • Values: have fun, learn a lot, build a team with Dani, do practices and more practices.
  • Method: learn about trading basis, do backtesting with Quantopian on stocks or Forex (analyze the results in deep).
  • Obstacles: Time.
  • Measures:
    • Make short/long decisions based on 1 hour.
    • Read at least 1 book of trading.
    • Perform backtesting with Quantopian and document the results and findings.
    • Improve and document the “mode operations” and “mode backtesting”.

Death line = June 2018

Results (July 1st, 2018)

  • Time to be accountable, let’s go…
  • I have done more than 50 operations, being May and June with negative results. July has been with 34 operations and positive global results.
  • I have learned to trade with a bearish market, and interestingly I have issues to work with a bullish market (I short too soon).
  • I have not worked with Quantopian nor tradingview on any backtesting, this was a bid fault.
  • I discovered an interesting indicator: VPCI, Volume price confirmation indicator. It really helps to indentify real moves of the market.
  • I have been able to cultivate patience during these months, but some days I did some moves that did not make sense. I have to evolve on this.
  • I finish read one of the fundamental analysis books.
  • I have started to apply the knowledge I’m acquiring on the medium term market I work on stocks.
  • Not too much.
  • I need to retake the back testing exercises and continuing the practices.

Support vector machine (SVM)

The basis

  • Support vector machine (SVM) is a supervised learning method.
  • It can be used for regression or classification purposes.
  • SVM is useful for hyper-text categorization, classification of images, recognition of characters…
  • The basic visual idea is the creation of planes (lines) to separate features.
  • The position of these planes can be adjusted to maximized the margins of the separation of features.
  • How we determine which plane is the best? well, it’s done using support vectors.

Support vectors

The support vectors are the dots that determine the planes. The orange and blue dots generate lines and in the middle you can find what is called the hyper-plane.

The minimum distance between the lines created by the support vectors is called the margin.

The diagram above represents the more simple support vector draw you can find, from here you can make it more complex

Machine learning, source of errors

Before to start

What is an error?

Observation prediction error = Target – Prediction = Bias + Variance + Noise

The main sources of errors are

  • Bias and Variability (variance).
  • Underfitting or overfitting.
  • Underclustering or overclustering.
  • Improper validation (after the training). It could be that comes from the wrong validation set. It is important to divide completely the training and validation processes to minimize this error, and document assumptions in detail.

Underfitting

This phenomenon happens when we have low variance and high bias.

This happens typically when we have too few features and the final model we have is too simple.

How can I prevent underfitting?

  • Increase the number of features and hence the model complexity.
  • If you are using a PCA, it applies a dimension reduction, so the step should be to unapply this dimension reduction.
  • Perform cross-validation.

Overfitting

This phenomenon happens when we have high variance and low bias.

This happens typically when we have too many features and the final model we have is too complex.

How can I prevent overfitting?

  • Decrease the number of features and hence the complexity of the model.
  • Perform a dimension reduction (PCA)
  • Perform cross-validation.

Cross validation

This is one of the typical methods to reduce the appareance of errors on a machine learning solution. It consist on testing the model in many different contexts.

You have to be careful when re-testing model on the same training/test sets, the reason? this often leads you to underfitting or overfitting errors.

The cross-validation tries to mitigate these behaviors.

The typical way to enable cross validation is to divide the data set in different sections, so you use 1 for testing, and the others for validations. For instance, you can take a stock data set from 2010 to 2017, use the data from 2012 as testing dataset and use the other divisions by year for validation of your trading model.

Neural networks

They can be used to avoid that errors are backpropagated. The neural network helps you to minimize the error by adjusting the impact of the accumulation of data.

 

k-means clustering

The basis

  • K-means clustering is an unsupervised learning method.
  • The aim is to find clusters and the CentroIDs that can potentially identify the
  • What is a cluster? a set of data points grouped in the same category.
  • What is a CentroID? center or average of a given cluster.
  • What is “k”? the number of CentroIDs

Typical questions you will face

  • The number k is not always known upfront.
  • First look for the number of CentroIDs, then find the k-values (separate the 2 problems).
  • Is our data “clusterable” or it is homogeneous?
  • How can I determine if the dataset can be clustered?
  • Can we measure its clustering tendency?

Visual example

A real situation: identification of geographical clusters

You can create a K-means algorithm, where the distance is used to know the similarities or dissimilarities. Any pointers how a properties of observations are mapped so that you can decide the groups based on the K-means or hierarchical clustering.

Since k-means tries to group based solely on euclidean distance between objects you will get back clusters of locations that are close to each other.

To find the optimal number of clusters you can try making an ‘elbow’ plot of the within group sum of square distance.

http://nbviewer.jupyter.org/github/nborwankar/LearnDataScience/blob/master/notebooks/D3.%20K-Means%20Clustering%20Analysis.ipynb

.