Publishing data analysis post
This commit is contained in:
parent
4247ac4d79
commit
61c8fac08c
@ -1,38 +1,84 @@
|
||||
---
|
||||
title: "Getting Into Day Trading: Analyzing The Moving Average"
|
||||
date: 2017-11-04T14:11:54-04:00
|
||||
draft: true
|
||||
tags: ["day trading", "data analysis", "julia"]
|
||||
draft: false
|
||||
tags: ["day trading", "data analysis", "python"]
|
||||
---
|
||||
|
||||
Now that we have a Julia environment good to go, and a dataset available, time to start doing some real analysis.
|
||||
|
||||
I know that I have this bit of data for the WLTW symbol, and what would be helpful is to see that data completely
|
||||
plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date(x).
|
||||
plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date (x).
|
||||
|
||||
![Image](/img/post/WLTW_CLOSING_COSTS.png)
|
||||
|
||||
Not bad, we can see an ok trend going from January to December 2016. This data isn't very useful yet but I can
|
||||
showcase some awesome Julia packages, and how I generated the graph.
|
||||
This is a good start, but how good are the SMA's at tracking this close cost? Let's first write a little Python that will grab
|
||||
the SMA for a given window, and the end of the window it was calculated for the X-axis.
|
||||
|
||||
I used DataFrames.jl to store the data, Query.jl to grab a subset of the data, and Gadfly.jl to plot the data.
|
||||
All of these are excellent libraries for doing your thing when analyzing.
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
```julia
|
||||
data = readtable("prices.csv", header=True)
|
||||
q = @from i in data begin
|
||||
@where i.symbol == "WLTW"
|
||||
@select {i.date, i.close}
|
||||
@collect DataFrame
|
||||
end
|
||||
|
||||
p = (q, y=:close, Geom.Point, Guide.Title("Closing Costs: WLTW - 2016"))
|
||||
draw(PNG("wltw_closing_costs.png", 6inch, 4inch), p)
|
||||
def moving_avs(col, window):
|
||||
moving_avs = {}
|
||||
for i in range(0, len(col), window):
|
||||
moving_avs[i] = np.mean(col[i:i+window])
|
||||
return moving_avs
|
||||
```
|
||||
|
||||
Now I'd like to add the plots for the 3-day SMA, and the 5-day SMA to the plot of WLTW closing costs. What these
|
||||
are, are the average of either the last 3 days or the last 5 days for a single datapoint. I believe that
|
||||
by doing so, we may be able to visualize if either datapoint is adequate in predicting trends in this data. I'll be looking for
|
||||
how close any given moving average is to the actual trend of the close costs for the WLTW security.
|
||||
Using Numpy for analysis, and Pandas for Series to hold my values, I can use this function to create a dictionary tracking exactly what
|
||||
day I am ending an SMA calculation on, as well as the SMA for that range. Window becomes the step size in the range call, and `np.mean` does
|
||||
the work calculating simple moving averages for slices of the data array.
|
||||
|
||||
Now I can plug in my values to the function to generate some simple moving averages.
|
||||
|
||||
```python
|
||||
data = pd.read_csv("prices.csv")
|
||||
wltw = data[data["symbol"] == "WLTW"]
|
||||
|
||||
threedaysma = moving_avs(wltw, 3)
|
||||
fivedaysma = moving_avs(wltw, 5)
|
||||
```
|
||||
|
||||
Back to the orignal question, how well do the SMAs track against the closing cost? Well let's find out.
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
plt.plot(wltw["close"])
|
||||
plt.plot(list(threedaysma.keys()), list(threedaysma.values()))
|
||||
plt.show()
|
||||
```
|
||||
|
||||
The resulting graph is here for the three day SMA:
|
||||
|
||||
![Image](/img/post/threedaysma.png)
|
||||
|
||||
That looks very very promising for this small timerange. The three day SMA follows the closing cost very closely.
|
||||
Now I don't know about you, but I'd like to see just how closely the three day SMA follows the closing cost. I learned
|
||||
in statistics of a little measure called correlation. From the interwebs:
|
||||
|
||||
> Correlation is a statistical measure for how two or more variables fluctuate together.
|
||||
|
||||
Now, I won't go into too many details here, as I have mammoth libraries at my disposal. However, I can explain the basics of
|
||||
the measure of correlation. Correlation is between the values of -1 and 1, inclusive. A value of 1 means that the two datasets are positively
|
||||
correlated (fluctuate together), while a value of -1 means that the two datasets are negatively correlated (fluctuate inversely). Any number in-between
|
||||
represents how strongly correlated datasets are positively or negatively, and 0 means that the data is not correlated whatsoever.
|
||||
|
||||
To calculate correlation, I use the Numpy method for the Pearson product-moment correlation given two array-like inputs. First, I clean the data.
|
||||
I'll do this by dropping close cost values that don't correspond to the end of SMA windows for the 3-day SMA.
|
||||
|
||||
```python
|
||||
cleaned = wltw.iloc[list(threedaysma.keys()),:]
|
||||
```
|
||||
|
||||
And now, to calculate our Pearson product-moment correlation coefficients
|
||||
|
||||
```python
|
||||
threedaysma_array = np.array(list(threedaysma.values()))
|
||||
print(np.corrcoef(cleaned["close"], threedaysma_array))
|
||||
|
||||
#[[ 1. 0.96788571]
|
||||
# [ 0.96788571 1. ]]
|
||||
|
||||
```
|
||||
|
||||
What this output 2D array tells us, is that the data are very strongly correlated! For the points we cleaned, the correlation coefficient is almost 1. That
|
||||
is great news, and we can most likely use this moving forward for forecasting and short-term trading.
|
||||
|
||||
|
BIN
static/img/post/threedaysma.png
Normal file
BIN
static/img/post/threedaysma.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
Loading…
Reference in New Issue
Block a user