--- title: "Getting Into Day Trading: Analyzing The Moving Average" date: 2017-11-04T14:11:54-04:00 draft: false tags: ["day trading", "data analysis", "python"] --- I know that I have this bit of data for the WLTW symbol, and what would be helpful is to see that data completely plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date (x). ![Image](/img/post/WLTW_CLOSING_COSTS.png) This is a good start, but how good are the SMA's at tracking this close cost? Let's first write a little Python that will grab the SMA for a given window, and the end of the window it was calculated for the X-axis. ```python import numpy as np def moving_avs(col, window): moving_avs = {} for i in range(0, len(col), window): moving_avs[i] = np.mean(col[i:i+window]) return moving_avs ``` Using Numpy for analysis, and Pandas for Series to hold my values, I can use this function to create a dictionary tracking exactly what day I am ending an SMA calculation on, as well as the SMA for that range. Window becomes the step size in the range call, and `np.mean` does the work calculating simple moving averages for slices of the data array. Now I can plug in my values to the function to generate some simple moving averages. ```python data = pd.read_csv("prices.csv") wltw = data[data["symbol"] == "WLTW"] threedaysma = moving_avs(wltw, 3) fivedaysma = moving_avs(wltw, 5) ``` Back to the orignal question, how well do the SMAs track against the closing cost? Well let's find out. ```python import matplotlib.pyplot as plt plt.plot(wltw["close"]) plt.plot(list(threedaysma.keys()), list(threedaysma.values())) plt.show() ``` The resulting graph is here for the three day SMA: ![Image](/img/post/threedaysma.png) That looks very very promising for this small timerange. The three day SMA follows the closing cost very closely. Now I don't know about you, but I'd like to see just how closely the three day SMA follows the closing cost. I learned in statistics of a little measure called correlation. From the interwebs: > Correlation is a statistical measure for how two or more variables fluctuate together. Now, I won't go into too many details here, as I have mammoth libraries at my disposal. However, I can explain the basics of the measure of correlation. Correlation is between the values of -1 and 1, inclusive. A value of 1 means that the two datasets are positively correlated (fluctuate together), while a value of -1 means that the two datasets are negatively correlated (fluctuate inversely). Any number in-between represents how strongly correlated datasets are positively or negatively, and 0 means that the data is not correlated whatsoever. To calculate correlation, I use the Numpy method for the Pearson product-moment correlation given two array-like inputs. First, I clean the data. I'll do this by dropping close cost values that don't correspond to the end of SMA windows for the 3-day SMA. ```python cleaned = wltw.iloc[list(threedaysma.keys()),:] ``` And now, to calculate our Pearson product-moment correlation coefficients ```python threedaysma_array = np.array(list(threedaysma.values())) print(np.corrcoef(cleaned["close"], threedaysma_array)) #[[ 1. 0.96788571] # [ 0.96788571 1. ]] ``` What this output 2D array tells us, is that the data are very strongly correlated! For the points we cleaned, the correlation coefficient is almost 1. That is great news, and we can most likely use this moving forward for forecasting and short-term trading.