diff --git a/content/post/day-trading-generating-training-data.md b/content/post/day-trading-generating-training-data.md index 3623f95..faf0f4e 100644 --- a/content/post/day-trading-generating-training-data.md +++ b/content/post/day-trading-generating-training-data.md @@ -1,38 +1,84 @@ --- title: "Getting Into Day Trading: Analyzing The Moving Average" date: 2017-11-04T14:11:54-04:00 -draft: true -tags: ["day trading", "data analysis", "julia"] +draft: false +tags: ["day trading", "data analysis", "python"] --- -Now that we have a Julia environment good to go, and a dataset available, time to start doing some real analysis. - I know that I have this bit of data for the WLTW symbol, and what would be helpful is to see that data completely -plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date(x). +plotted in all of it's glory. Let's take a look at the closing costs (y) plotted against the date (x). ![Image](/img/post/WLTW_CLOSING_COSTS.png) -Not bad, we can see an ok trend going from January to December 2016. This data isn't very useful yet but I can -showcase some awesome Julia packages, and how I generated the graph. +This is a good start, but how good are the SMA's at tracking this close cost? Let's first write a little Python that will grab +the SMA for a given window, and the end of the window it was calculated for the X-axis. -I used DataFrames.jl to store the data, Query.jl to grab a subset of the data, and Gadfly.jl to plot the data. -All of these are excellent libraries for doing your thing when analyzing. +```python +import numpy as np -```julia -data = readtable("prices.csv", header=True) -q = @from i in data begin - @where i.symbol == "WLTW" - @select {i.date, i.close} - @collect DataFrame -end - -p = (q, y=:close, Geom.Point, Guide.Title("Closing Costs: WLTW - 2016")) -draw(PNG("wltw_closing_costs.png", 6inch, 4inch), p) +def moving_avs(col, window): + moving_avs = {} + for i in range(0, len(col), window): + moving_avs[i] = np.mean(col[i:i+window]) + return moving_avs ``` -Now I'd like to add the plots for the 3-day SMA, and the 5-day SMA to the plot of WLTW closing costs. What these -are, are the average of either the last 3 days or the last 5 days for a single datapoint. I believe that -by doing so, we may be able to visualize if either datapoint is adequate in predicting trends in this data. I'll be looking for -how close any given moving average is to the actual trend of the close costs for the WLTW security. +Using Numpy for analysis, and Pandas for Series to hold my values, I can use this function to create a dictionary tracking exactly what +day I am ending an SMA calculation on, as well as the SMA for that range. Window becomes the step size in the range call, and `np.mean` does +the work calculating simple moving averages for slices of the data array. +Now I can plug in my values to the function to generate some simple moving averages. + +```python +data = pd.read_csv("prices.csv") +wltw = data[data["symbol"] == "WLTW"] + +threedaysma = moving_avs(wltw, 3) +fivedaysma = moving_avs(wltw, 5) +``` + +Back to the orignal question, how well do the SMAs track against the closing cost? Well let's find out. + +```python +import matplotlib.pyplot as plt +plt.plot(wltw["close"]) +plt.plot(list(threedaysma.keys()), list(threedaysma.values())) +plt.show() +``` + +The resulting graph is here for the three day SMA: + +![Image](/img/post/threedaysma.png) + +That looks very very promising for this small timerange. The three day SMA follows the closing cost very closely. +Now I don't know about you, but I'd like to see just how closely the three day SMA follows the closing cost. I learned +in statistics of a little measure called correlation. From the interwebs: + +> Correlation is a statistical measure for how two or more variables fluctuate together. + +Now, I won't go into too many details here, as I have mammoth libraries at my disposal. However, I can explain the basics of +the measure of correlation. Correlation is between the values of -1 and 1, inclusive. A value of 1 means that the two datasets are positively +correlated (fluctuate together), while a value of -1 means that the two datasets are negatively correlated (fluctuate inversely). Any number in-between +represents how strongly correlated datasets are positively or negatively, and 0 means that the data is not correlated whatsoever. + +To calculate correlation, I use the Numpy method for the Pearson product-moment correlation given two array-like inputs. First, I clean the data. +I'll do this by dropping close cost values that don't correspond to the end of SMA windows for the 3-day SMA. + +```python +cleaned = wltw.iloc[list(threedaysma.keys()),:] +``` + +And now, to calculate our Pearson product-moment correlation coefficients + +```python +threedaysma_array = np.array(list(threedaysma.values())) +print(np.corrcoef(cleaned["close"], threedaysma_array)) + +#[[ 1. 0.96788571] +# [ 0.96788571 1. ]] + +``` + +What this output 2D array tells us, is that the data are very strongly correlated! For the points we cleaned, the correlation coefficient is almost 1. That +is great news, and we can most likely use this moving forward for forecasting and short-term trading. diff --git a/static/img/post/threedaysma.png b/static/img/post/threedaysma.png new file mode 100644 index 0000000..27bb9d5 Binary files /dev/null and b/static/img/post/threedaysma.png differ