Quantitative trading strategy: Pandas and matplotlib

Start from basis is important to me to understand how to handle basic data and start having real contact with the data and the code.

I found this tutorial very useful for these purposes. The use of Pandas for reading data from yahoo, google… and matplotlib to build an easy chart is key to take the first steps. The best of the tutorial are the comments about each step.

Problems I have found

Yahoo has closed the API that enabled Pandas to retrieve data. My colleague has found a workaround to continue using the API. The solution consists on:

  1. Add with pip the library: fix_yahoo_finance.
  2. Add these 2 lines to the code:

import fix_yahoo_finance as yf

yf.pdr_override() # <== that’s all it takes 🙂

This video is a must see to me as Jev Kuznetsov explains it from scratch.

The first code I performed was this one:

from pandas_datareader import data
import pandas as pd
import matplotlib.pyplot as plt

# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
tickers = [‘T’, ‘VZ’, ‘SPY’]

# Define which online source one should use
data_source = ‘google’

# We would like all available data from 01/01/2000 until 12/31/2016.
start_date = ‘2015-01-01’
end_date = ‘2017-10-10’

# User pandas_reader.data.DataReader to load the desired data. As simple as that.
panel_data = data.DataReader(tickers, data_source, start_date, end_date)

# Getting just the adjusted closing prices. This will return a Pandas DataFrame
# The index in this DataFrame is the major index of the panel_data.
close = panel_data.ix[‘Close’]

# Getting all weekdays between 01/01/2000 and 12/31/2016
all_weekdays = pd.date_range(start=start_date, end=end_date, freq=’B’)

# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)
#print(close.head(-100))
print(close.describe())
close = close.fillna(method=’ffill’)
# Get the MSFT time series. This now returns a Pandas Series object indexed by date.
vz = close.ix[:, ‘VZ’]
spy = close.ix[:, ‘SPY’]
# Calculate the 20 and 100 days moving averages of the closing prices
short_rolling_vz = vz.rolling(window=20).mean()
long_rolling_vz = vz.rolling(window=100).mean()

# Plot everything by leveraging the very powerful matplotlib package
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(vz.index, vz, label=’VZ’)
#ax.plot(spy.index, spy, label=’SPY’)
ax.plot(short_rolling_vz.index, short_rolling_vz, label=’20 days rolling’)
ax.plot(long_rolling_vz.index, long_rolling_vz, label=’100 days rolling’)
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Closing price ($)’)
ax.legend()
plt.show()

Quantitative trading, Ernie Chang

This book contains basic concepts and approach (step by step) for does that want to initiate themselves on Quantitative trading. The focus is on statistical arbitrage trading, that deals with the simplest financial instruments: stocks, futures, and sometimes currencies.

The chapters cover the steps a trader should take:

  1. The Whats, Whos, and Whys of Quantitative Trading,
  2. Fishing for Ideas,
  3. Backtesting,
  4. Setting Up Your Business,
  5. Execution Systems,
  6. Money and Risk Management,
  7. Special Topics in Quantitative Trading, (reading it now)
  8. Conclusions.

You can follow the author blog that contains further valuable information and some nice examples.

A short list of common pitfalls related to how the back-test program is written:

  1. Survivor-ship bias: data does not contain companies that have fallen bankruptcy.
  2. Look-Ahead Bias: this phenomenon happens when you are using information that was available only at a time ahead of the instant the trade was made. For instance, “Buy when the stock is within 1 percent of the day’s low”.
  3. Data-Snooping Bias: the use too much parameters that make you build an over-optimized model. A rule of thumb: 5 parameters.
  4. Sample size: you need enough data to back test, how much? As a rule of thumb, let’s assume that the number of data points needed for optimizing your parameters is equal to 252 times the number of free parameters your model has. For instance, you define a daily trading model with three parameters, then you should have at least three years’ worth of back-test data with daily prices.
  5. Out-of-Sample Testing: divide your historical data into 2 parts. Use the first part for training, and keep the second part to test the resulting model.