Motivation¶
Both Python an R are very powerful data analysis tools, and each has its own strength. The best approach would be to combine the power of both tools and use whichever one when it is appropriate. In this article we will use an example of Cointegrating test to demonstrate how to seamlessly combine Python and R in the IPython Notebook environment. Cointegration test is an important method to determine if two stocks are good for pair trading
Related Packages¶
Below we import the necessary Python packages for data processing
%matplotlib inline
from IPython.core.pylabtools import figsize
import pandas as pd
import numpy as np
import datetime
The key package to enble the communication below Python and R is rpy2. The subpackage rpy2.ipython will allow user to easily run R scripts in IPython Notebook and send data back and forth between Python and R easily
%load_ext rpy2.ipython
Cointegration test using both Python and R¶
In this example we download data from Yahoo Finance using Python
import pandas.io.data as web
class DataObj(object):
    def __init__(self, symbols = ['SPY', 'VGK', 'FXI', 'TLT'], 
                 start_time = datetime.datetime(2010, 1, 1),
                 end_time = datetime.datetime.now().date() - \
                 datetime.timedelta(days = 1)
                ):
        self.data = {}; self.returns = pd.DataFrame()
        self.adjClose = pd.DataFrame()
        for s in symbols:
            try:
                self.data[s] = web.DataReader(s, 'yahoo', 
                                              start_time, end_time)
                self.adjClose[s] = self.data[s]['Adj Close']
                self.returns[s] = self.data[s].diff()['Adj Close'].dropna() \
                    / self.data[s]['Adj Close'][:-1].values
            except:
                pass
We choose GLD and IAU as the example in this article because they are two different ETFs tracking gold price. They are closely related to each and is expected to be cointegrated
figsize(10, 8)
time_span_days  = 365*4
today = datetime.date.today()
start_date  = today + datetime.timedelta(-time_span_days)
StockData = DataObj(symbols = ['GLD', 'IAU'], start_time = start_date, 
                  end_time = today)
print "Got %s Days of data" % str(len(StockData.adjClose))
StockData.adjClose.plot()
Using the command below, we can send the obtained Pandas DataFrame to R and do the Johansen test in R
X = StockData.adjClose.values
%Rpush X
%%R
library('urca')
coRes = ca.jo(data.frame(X),type = "trace", K = 2, 
              ecdet = "none", spec = "longrun")
summary(coRes)
Moreover, we want to let R do the Johansen test and get the results back to Python, and then we can continue our work with Python. So instead of using the %%R to run an R Script, we define the following Python function to call the Johansen test function in R, and send the results back to Python in the format of Numpy arrays
def isCointegrated(X, verbose = False):
    %R -i X coRes = ca.jo(data.frame(X),type = "trace", K = 2, ecdet = "none", spec = "longrun")
    testStat = %R slot(coRes, "teststat")
    cVal = %R slot(coRes, "cval")
    if verbose:
        print "Test Statics: ", testStat
        print "Critical Values: "
        print cVal
    testStat = np.array(testStat)
    cVal = np.matrix(cVal)
    coIntFlag = testStat[-1] > cVal[-1, 1]
    if verbose:
        print "Cointegrate?: ", coIntFlag
    eigenVec = %R slot(coRes, "V")
    if verbose:
        print "Eigen vectors: "
        print eigenVec
    eigenVec = np.array(eigenVec)
    return coIntFlag, eigenVec[:, 0]
coIntFlag, eigenVec = isCointegrated(StockData.adjClose.values, 
                                     verbose = True)
Johansen test estimates the rank (r) of given matrix of time series with confidence level. In our example we have two time series, therefore Johansen tests null hypothesis of r=0 < (no cointegration at all), r<1 (till n-1, where n=2 in our example). In this case, when r = 0, the test Statistics is larger than the 5% (even 1%) of the critical value. So the pair is cointegrated. Once a cointegration is established, eigenvector (normalized first column) would be used as weight for a portfolio. Below we use the obtained weight to check if the residual of the pair is really stationary
resid = pd.Series(index = StockData.adjClose.index); resid = 0
for ii, s in enumerate(StockData.adjClose.columns):
    resid = resid + eigenVec[ii] * StockData.adjClose[s]
resid.plot()
Clearly, using the obtained weight, the residual of the pair formed a quite stationary time series
 
