Quantopian concepts

I have invested some hours learning about Quantopian environment and the basic concepts around the platform. The environment is very powerful, so I wanted to gain some basic clarity of the basis.

Quantopian platform

It consists of several linked components, where the main ones are:

  • Quantopian Research platform  is an IPython notebook used for research and data analysis during algorithm creation. It is also used for analyzing the past performance of algorithms.
  • The IDE  is used for writing algorithms and kicking off backtests using historical data.
  • Paper trading ability, so you can perform simulations using live data.
  • Alphalens is a Python package for performance analysis of alpha factors which can be used to create cross-sectional equity algorithms.
  • Alpha factors express a predictive relationship between some given set of information and future returns.
  • help: https://www.quantopian.com/help#optimize-title

The workflow

To maximize the use of Quantopian is important to understand how to work on the different steps to achieve your goals.

The basis are the same ones that the ones explained on this post, and that are represented by this diagram:

I did not found a diagram in any place, so I draw my own diagram.

Workflow, step by step

1.Universe Selection : define a subset of tradeable values (stocks/futures); the universe should be broad but have some degree of self similarity to enable extraction of relative value. It should also eliminate hard to trade or prohibited instruments. (Example: select companies with >10B$ revenue and dividend rate >3%).

This is done throught the object named “Pipeline”. The idea is not to limit yourself to a set of specific stocks but define a pipeline of stocks that allow you to quickly and efficiently consider many thousands of companies.

Pipeline allows you to address all companies, then filter them.

2.Single Alpha Factor Modeling:

Initially these 4 words together sounds like chinese to me. So I will try to explain as I understood it: it’s a model composed by a single factor that tries to find a result that has statistical significance (alpha factor).

Not enough?, Ok, I will try to explain some concepts.

First, I need to review some basis of statistics.

What is an alpha factor?

In statistical hypothesis testing, a result has statistical significance when it is very unlikely that it has occurred randomly. The level of significance is commonly represented by the Greek symbol α (alpha). The significance levels of 0.05, 0.01 and 0.001 are common.

What is a factor model?

A model in quantopian is composed by a set of factors; usually it should include:

  • a factor for the market,
  • one or two factors for value/pricing,
  • and maybe a factor for momentum.

Now let’s come back to Quantopian Single Alpha factor Modeling.

It is basically to define and evaluate individual expressions which rank the cross section of equities in your universe. By applying this relationship to multiple stocks we can hope to generate an alpha signal and trade off of it.

This can be done in 2 ways:

  • Manually: Hand crafted alphas. By the moment I will start with this method.
  • Deep Learning: alphas are learned directly, instead of defined by hand (Long-short term memory (LSTM), 1D convolutional nets). I will let this method for later.


  • Developing a good alpha signal is challenging (for instance: detect an earning surprise before the formal announcement based on sentiment data).
  • It’s important to have a scientific mindset when doing this exercise.

By being able to analyze your factors in IPython Notebook you can spend less time writing and running global back-tests. It also enables you to annotate assumptions and analyze potential bias.

This is in deed the main function of Alphalens python package: to surface the most relevant statistics and plots about a single alpha factor. This information can tell you if the alpha factor you found is predictive. These statistics cover:

  • Returns Analysis.
  • Information Coefficient Analysis.
  • Turnover Analysis.
  • Sector Analysis.

3.Alpha Combination: you basically combine many single alphas into a final alpha which has stronger prediction power than the best single alpha. Two examples about how to do it:

  • Classifier (E.g.: SVM, random forest).
  • Deep Learning: alphas are learned directly, instead of defined by hand (Long-short term memory (LSTM), 1D convolutional nets).

For simplification I have started just with 1 alpha factor, so I am right now skiping this step.

4.Portfolio Construction: implement a process which takes your final combined alpha and your risk model and produces a target portfolio that minimizes risk under your model. The natural steps to perform it are:

  • Define objective: what you want to maximize.
  • Define constrains: such leverage, turnover, position size…
  • Define risk model: define and calculate the set of risk factors you want to use to constrain your portfolio.

5.Execution: implement a trading process to transition the current portfolio (if any) to the target portfolio.

Trading view pros and cons

I have been testing Tradingview for a month, doing different scripts with Pine editor and performing some backtesting annotating results, and writing assumptions, parameters…

What is really nice

  • Data from whatever market you can think.
  • You can jump from one value to other, you can visually navigate at all levels with a very good time response. The way to draw and play with standard shapes is quite impresive.
  • Pine editor is really nice, the learning curve is very short and with few hours of learning you can build thousand of things.
  • The community of people sharing ideas, scripts is very useful.
  • Timing indicator adjusted to the interval you want. For instace you can define intervals as 7 minutes, 33 minutes and so on.

What I miss

  • Pine editor should enable to package functions and enable you to play with a function with different parameters in an automated way. For instace to scan the best use of parameters.
  • Pine editor should enable you to plot outside of the main screen. The context is always the value you have in the screen, I would like to have the possibility to plot outside of that context. For instace do a simple code that scans the best use of parameters on my strategy.
  • Strategy tester: the list of trades, I can only read it, I cannot work on that data (for instance to export to excel), so the analysis of results is difficult and it’s very tedious work.
  • I cannot use other values as input for my strategies. For instance, I would like to use one value of the oil industry as input to define a signal for trading a chemical company. On Pine editor I can only work with data from the current


The basis

Etherium platform for enabling the trading of sport players rights.

There are different solutions explained in the white paper, the more interesting one to me is the solution for Athletes:

GLOBATALENT will allow young players to sell part of their future incomes without having to have an everlasting mortgage on their life.

The other brilliant alternative is the engagement offered to the funs:

Fans will be able to buy future benefits of the club that
they support and at the same time are able to make
investments and receive profits.

This part of the business does not exist at this moment, so there are no numbers about the volume. The potential is huge, the sport bets are very popular right now, imagine to bet not on a specific game but around young players or consolidated players that offer part of their rights to the supporters, so they can long and short a percentage of these rights through their own tokens (Can you imagine to have a GlobaPlayerLebronJames token?).

Other revenue thread I see can come from the people who has been playing to sport games since years, there are people that play to these virtual games trading players in a season, doing a wolrdwide competition, etc. So well, in 2019 they can do it live, with real money.

Lauch calendar

  • Private Pre-Sale: before the Public Sale.
  • Primary Crowd Sale: from 16th April, 2018 to 06th May, 2018.
  • Partner on-boarding: April – June 2018
  • Platform release: October – December 2018.

Some numbers

The investment target is 42m€

The table below shows the sports market revenue.

The total spending on transfer fees by year is a key element for Globatalent. Imagine they are able to manage a 2% of 5B$. This will mean at 3% fee that the revenue would be 3m$.


  • GLOBATALENT Tokens (GBT) being security tokens will be limited to users who are accredited investors.
  • GlobaPlayer Tokens: Each player will have their own custom token.



What is overfitting?

when you are preparing a machine learning solution, you work basically with data sets that contains:

  • Data: relevant and/or important data.
  • Noise: inrelevant and/or non important data.

With this data you want to identify a trigger, a signal that responds to your target pattern you want that your code identifies.

So you start identifying a pattern and you work to improve it.

Suddenly, you improve your pattern identification so much, till a point where you will be not just using the data but your pattern is also using the noise side of the data to trigger the signal.

This phenomenon is not desired, and it is what is called overfitting.

In the picture from the left:

  • The black line represents a healthy pattern.
  • The green line represents an overfitted pattern.

Understanding machine learning

I have watched this video from @TessFerrandez: Machine Learning for Developers.

The video explains how the process of building a machine learning solution is. She explains it in plain English and with very nice examples easy to remind the concepts.

The video helped me to link a lot of technical ideas explained in the courses with a natural flow. Now all make sense to me.

When do I need a machine learning solution?

Imagine that you have this catalogue of pictures:

and that you want to identify when a picture has a muffin or a chihuahua.

The traditional way to do it is using “if / else” sentences, like for example: The results are not going to be good, why?

  • The problem is more complex than the basic questions you are doing and it requires thounsand of combinations of conditional sentences.
  • To find the right sequence of conditional sentences can take years.

Here, is when machine learning techniques can help you. At the end of the day, it’s a different approach to find a solution to a complex problem.

What are the steps to perform a machine learning solution?

The basic steps to build a machine learning solution are:

1.- Ask a sharp question:

At the end of the day, depending on the question you ask, you will use a different machine learning techniques.

What type of machine learning technique can I use? Well there are so many of them, but these are the basic ones:

  • Supervised learning : learning model based on a set of labeled examples. For instance you want to identify when there is a cat in a picture and you use a set of pictures where you know that there are indeed cats.
  • Unsupervised learning: think about the a data set of population where you use a clustering algorithm to classify the people in five different groups, but we do not say in which type of groups. For instance when we are looking for movies recommendations and suddenly there is a pattern identified by age (which initially we did not know it was a cluster of relevant data that we could cluster or classify)
  • Reinforcement : it uses the feedback to make decisions. For instance, a system that measures the temperature, then it compares with the target temperature and finally raise or lower the temperature. This reminds me to the servo-systems and fuzzy logic used at electronic level when I studied electronic engineering.

2.- Collect data

look for databanks, there are so many on the internet. For sure if you want precise trading data from a good bunch of markets and thousands of parameters, you will have to pay for it.

3.- Explore data: relevant, important and simple.

  • Relevant: determine features, define relevant features, discard irrelevant features.
  • Important: define important data.
  • Simple: it has to be simple (for instance avoid GPS coordinates, and replace by distance to a lake).

4.- Clean data:

  • Identify duplicate observations,
  • complete or discard missing data,
  • identify outliers,
  • identify and remove structural errors.

This step is a tedious process, but it also helps to understand the data.

5.- Transform features

Do things like to turn the GPS coordinates into distance to a lake,

6.- Select algorithms

the base algorithms are:

  • Linear regression,
  • decision tree,
  • naive bias
  • logistics regression
  • neural nets: basically is the combination of different layers of data and algorithms.

The more complex algorithms are composed in so many cases by the base algorithms. It could be neural nets or if we make it more complex we will build neural architectures.

As it is reflected on the table above, the election of the algorithm depends on the question we want to answer.

7.- Train the model

Apply the algorithms to the cleaned datasets and do a fine tune of the algorithm.

8.- Score the model

Test the model and evaluate how good/bad it is.

Typical metrics are:

  • Accuracy: how many of the total ones were classified correctly?
  • Precision: how many decisions were done correctly?
  • Recall: how many of the specific decision were correct?

9.- Use the answer

You did all this with a purpose, so if the solution works, use it. 🙂

In the video it’s mentioned a couple of tools:

  • Jupyter Notebook (python)
  • Azure Machine Learning Studio: the video includes a demo walking on the tool.

Some other notes:

  • Take notes about the assumptions and decisions you do on every step, as you will have to review them once you want to improve the algorithm.
  • Hyper parameters: the different algorithms have features that you can define (for instance: how many layers will have the decision tree algorithm?).
  • Bias / intercept: it refers to an error that is not represented by the rest of the model.

Machine Learning, sources of information

My problem

I want to learn Machine Learning concepts, understand how to apply on real cases.

There are so many sources of information, some of them with good/bad quality, some others very complex.

The questions are simple:

  • when is useful to use machine learning solution?
  • what is a neuronal net?
  • what are the steps to build a machine learning solution?
  • what are the type of algorithms you can use for specific problems?

The solution I found

  1. I’m doing some Skillsoft courses I have available in the company platform. These are covering the basis and now I’m jumping on specific topics. They are nice as they include examples in Python with scikit.
  2. Review these notes from @TessFerrandez: Notes from Coursera Deep Learning courses: https://www.slideshare.net/TessFerrandez/notes-from-coursera-deep-learning-courses-by-andrew-ng
  3. Watch the video from @TessFerrandez: Machine Learning for Developers. The video explains how the process of building a machine learning solution is, explained in plain English and with very nice examples easy to remind. The video helped me to link a lot of technical ideas explained in the courses with a natural flow.



What is Modum.io?

modum.io offers a passive monitoring solution, ensuring GDP compliance and auditability by using blockchain and IoT technology.

Modum’s tested solution offers a significant cost savings over the active-cooling methods currently used. An additional benefit provided, through the use of cutting-edge technology, is that valuable data can be created to drive continuous improvement in supply-chain logistics.

The company combines blockchain (for contracts) and Internet of the Things technologies (programable sensors) to achieve these business goals.

What was the reason to create Modum.io?

The regulations outlined by the European Commission for the Good Distribution Practice of medicinal products for human use (GDP 2013/C 343/01) were published on November 5, 2013.

Based in Zürich, created in 2016, very closed to Novartis and Hoffmann-La Roche headquarters.


You cannot do all this alone, you need to integrate it with the existing providers covering the processes.

SAP: Agreement signed with SAP for a Co-Innovation Lab (https://modum.io/ceo-update-february-2018/ ).


I miss a section where they list the current customers they are working on.