Publishing data analysis post

This commit is contained in:
Jacob Windle 2017-12-28 20:09:26 -05:00
parent 4247ac4d79
commit 61c8fac08c
2 changed files with 69 additions and 23 deletions

View File

@ -1,38 +1,84 @@
---
title: "Getting Into Day Trading: Analyzing The Moving Average"
date: 2017-11-04T14:11:54-04:00
draft: true
tags: ["day trading", "data analysis", "julia"]
draft: false
tags: ["day trading", "data analysis", "python"]
---
Now that we have a Julia environment good to go, and a dataset available, time to start doing some real analysis.
I know that I have this bit of data for the WLTW symbol, and what would be helpful is to see that data completely
plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date (x).
![Image](/img/post/WLTW_CLOSING_COSTS.png)
Not bad, we can see an ok trend going from January to December 2016. This data isn't very useful yet but I can
showcase some awesome Julia packages, and how I generated the graph.
This is a good start, but how good are the SMA's at tracking this close cost? Let's first write a little Python that will grab
the SMA for a given window, and the end of the window it was calculated for the X-axis.
I used DataFrames.jl to store the data, Query.jl to grab a subset of the data, and Gadfly.jl to plot the data.
All of these are excellent libraries for doing your thing when analyzing.
```python
import numpy as np
```julia
data = readtable("prices.csv", header=True)
q = @from i in data begin
@where i.symbol == "WLTW"
@select {i.date, i.close}
@collect DataFrame
end
p = (q, y=:close, Geom.Point, Guide.Title("Closing Costs: WLTW - 2016"))
draw(PNG("wltw_closing_costs.png", 6inch, 4inch), p)
def moving_avs(col, window):
moving_avs = {}
for i in range(0, len(col), window):
moving_avs[i] = np.mean(col[i:i+window])
return moving_avs
```
Now I'd like to add the plots for the 3-day SMA, and the 5-day SMA to the plot of WLTW closing costs. What these
are, are the average of either the last 3 days or the last 5 days for a single datapoint. I believe that
by doing so, we may be able to visualize if either datapoint is adequate in predicting trends in this data. I'll be looking for
how close any given moving average is to the actual trend of the close costs for the WLTW security.
Using Numpy for analysis, and Pandas for Series to hold my values, I can use this function to create a dictionary tracking exactly what
day I am ending an SMA calculation on, as well as the SMA for that range. Window becomes the step size in the range call, and `np.mean` does
the work calculating simple moving averages for slices of the data array.
Now I can plug in my values to the function to generate some simple moving averages.
```python
data = pd.read_csv("prices.csv")
wltw = data[data["symbol"] == "WLTW"]
threedaysma = moving_avs(wltw, 3)
fivedaysma = moving_avs(wltw, 5)
```
Back to the orignal question, how well do the SMAs track against the closing cost? Well let's find out.
```python
import matplotlib.pyplot as plt
plt.plot(wltw["close"])
plt.plot(list(threedaysma.keys()), list(threedaysma.values()))
plt.show()
```
The resulting graph is here for the three day SMA:
![Image](/img/post/threedaysma.png)
That looks very very promising for this small timerange. The three day SMA follows the closing cost very closely.
Now I don't know about you, but I'd like to see just how closely the three day SMA follows the closing cost. I learned
in statistics of a little measure called correlation. From the interwebs:
> Correlation is a statistical measure for how two or more variables fluctuate together.
Now, I won't go into too many details here, as I have mammoth libraries at my disposal. However, I can explain the basics of
the measure of correlation. Correlation is between the values of -1 and 1, inclusive. A value of 1 means that the two datasets are positively
correlated (fluctuate together), while a value of -1 means that the two datasets are negatively correlated (fluctuate inversely). Any number in-between
represents how strongly correlated datasets are positively or negatively, and 0 means that the data is not correlated whatsoever.
To calculate correlation, I use the Numpy method for the Pearson product-moment correlation given two array-like inputs. First, I clean the data.
I'll do this by dropping close cost values that don't correspond to the end of SMA windows for the 3-day SMA.
```python
cleaned = wltw.iloc[list(threedaysma.keys()),:]
```
And now, to calculate our Pearson product-moment correlation coefficients
```python
threedaysma_array = np.array(list(threedaysma.values()))
print(np.corrcoef(cleaned["close"], threedaysma_array))
#[[ 1. 0.96788571]
# [ 0.96788571 1. ]]
```
What this output 2D array tells us, is that the data are very strongly correlated! For the points we cleaned, the correlation coefficient is almost 1. That
is great news, and we can most likely use this moving forward for forecasting and short-term trading.

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB