blog/content/post/day-trading-generating-training-data.md

85 lines
3.5 KiB
Markdown

---
title: "Getting Into Day Trading: Analyzing The Moving Average"
date: 2017-11-04T14:11:54-04:00
draft: false
tags: ["day trading", "data analysis", "python"]
---
I know that I have this bit of data for the WLTW symbol, and what would be helpful is to see that data completely
plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date (x).
![Image](/img/post/WLTW_CLOSING_COSTS.png)
This is a good start, but how good are the SMA's at tracking this close cost? Let's first write a little Python that will grab
the SMA for a given window, and the end of the window it was calculated for the X-axis.
```python
import numpy as np
def moving_avs(col, window):
moving_avs = {}
for i in range(0, len(col), window):
moving_avs[i] = np.mean(col[i:i+window])
return moving_avs
```
Using Numpy for analysis, and Pandas for Series to hold my values, I can use this function to create a dictionary tracking exactly what
day I am ending an SMA calculation on, as well as the SMA for that range. Window becomes the step size in the range call, and `np.mean` does
the work calculating simple moving averages for slices of the data array.
Now I can plug in my values to the function to generate some simple moving averages.
```python
data = pd.read_csv("prices.csv")
wltw = data[data["symbol"] == "WLTW"]
threedaysma = moving_avs(wltw, 3)
fivedaysma = moving_avs(wltw, 5)
```
Back to the orignal question, how well do the SMAs track against the closing cost? Well let's find out.
```python
import matplotlib.pyplot as plt
plt.plot(wltw["close"])
plt.plot(list(threedaysma.keys()), list(threedaysma.values()))
plt.show()
```
The resulting graph is here for the three day SMA:
![Image](/img/post/threedaysma.png)
That looks very very promising for this small timerange. The three day SMA follows the closing cost very closely.
Now I don't know about you, but I'd like to see just how closely the three day SMA follows the closing cost. I learned
in statistics of a little measure called correlation. From the interwebs:
> Correlation is a statistical measure for how two or more variables fluctuate together.
Now, I won't go into too many details here, as I have mammoth libraries at my disposal. However, I can explain the basics of
the measure of correlation. Correlation is between the values of -1 and 1, inclusive. A value of 1 means that the two datasets are positively
correlated (fluctuate together), while a value of -1 means that the two datasets are negatively correlated (fluctuate inversely). Any number in-between
represents how strongly correlated datasets are positively or negatively, and 0 means that the data is not correlated whatsoever.
To calculate correlation, I use the Numpy method for the Pearson product-moment correlation given two array-like inputs. First, I clean the data.
I'll do this by dropping close cost values that don't correspond to the end of SMA windows for the 3-day SMA.
```python
cleaned = wltw.iloc[list(threedaysma.keys()),:]
```
And now, to calculate our Pearson product-moment correlation coefficients
```python
threedaysma_array = np.array(list(threedaysma.values()))
print(np.corrcoef(cleaned["close"], threedaysma_array))
#[[ 1. 0.96788571]
# [ 0.96788571 1. ]]
```
What this output 2D array tells us, is that the data are very strongly correlated! For the points we cleaned, the correlation coefficient is almost 1. That
is great news, and we can most likely use this moving forward for forecasting and short-term trading.