Stock data reduction via identification of “pivot points”
TLDR
(Too Long, Didn’t Read or the contents of your typical README.md file)
This project implements a method for simplifying and reducing time series data via the identification of “pivot points” (defined here as a series of points selected from the data which determine a set of line segments from which the actual data does not deviate more than a given tolerance).
Two different algorithms are presented for calculating these points, depending on whether the full time series is available or if the points are being calculated in real time.
It is assumed that the representation of the actual data via these “pivot points” is more advantageous for certain applications than the full data (particularly for stock market price prediction, trend identification and recognition of patterns).
The following files are contained in the repository:
- A .R script implementing functions for finding the pivot points and for generating randomized stock data for testing.
- .R scripts for creating animations to illustrate features of each of the algorithms used for calculation of the pivot points.
- An images folder, where the scripts that create animations save the temporary .png and the resulting .gif files and from which this report uses the images.
If you want to replicate this analysis, you can clone this repository and either 1) run the declaration_of_functions.R script to be able to use the find.pivot() and rstock() functions or 2) run the animate scripts to create animations based on randomly generated stock data.
The remainder of this file contains a detailed discussion of the implementation and characteristics. Animated gifs illustrate the algorithms and results.
Introduction
Motivation and goals
The methods developed here were motivated by difficulties encountered in previous attempts to write scripts with explicit rules for making buy/sell decisions in the stock market.
Due to the highly dynamic nature of the price signals involved, the identification of patterns in the data is a complex task, which renders the definition of rules based on prominent features very difficult.
The usual solution to the problem of dealing with highly noisy price data is the use of longer price candles. However, this method presents a shortcoming in that it is not sensitive to varying levels of market volatility (since each candle represents the same amount of time).
In order to deal with these problems, the goal of this project is to define algorithms that reduce the data so that:
- The reduced time series is comprised of a minimal set of points (“pivot points”) selected from the original data
- The points in the reduced data determine a set of line segments that represent the most prominent features (such as peaks and valleys) of the original data
- The original data falls within a specified tolerance around the line segments
- The reduced data is sensitive to different levels of volatility in the original data
Expected advantages
As a hypothesis (to be confirmed in a posterior analysis), 3 different advantages are expected from the use of this reduction technique:
- Easier to identify relationships: Through analytic comparison of the pivot points for different stocks, patterns and relationships will be identified more easily than from the raw price data. For instance, companies whose price follow the same industry trends will have that feature readily identifiable from the pivot points reduction.
- More efficient trend identification: The data obtained by the reduction to pivot points present the bare essentials of the price history and its prominent features, exempt from noise and day-to-day market volatility. It is expected that this will be a more robust set of data for algorithms attempting to predict future prices with extended time ranges, since the data will be closer to the features considered in a fundamentalist analysis.
- Precise market volatility indication: The density of pivot points is highly related with market volatility. It is expected that a histogram showing the daily concentration of pivot points for multiple stocks over time will be a precise indicator of market volatility and relevant predictor of future behavior.
The technique
Two different algorithms were developed to identify the pivot points for a given time series, depending on whether or not the full data is available at the time of calculation:
- Recursive algorithm: when the entire data series is available, the algorithm is able to identify the most prominent features in the data and set the pivot points accordingly. This version tends to be more precise in the definition of points, thus requiring less points to characterize the data.
- Sequential algorithm: when the data becomes available one day (or other unit of time) at a time, past pivot points are fixed and the algorithm only decides if a new pivot point needs to be added. This version tends to require more points, but can be used for real time calculation as each new data point becomes available.
We now detail the two algorithms and illustrate their results.
Recursive algorithm
As stated above, this approach requires knowledge of the entire data set at the time of calculation.
These are its steps:
- Define interval under analysis as the full interval between first and last available points
- Draw reference line segment between the two points
- Evaluate deviation between all the values in the segment and the reference line
- If any point deviates more than the specified tolerance, the point where the highest deviation is found will be defined as a pivot point
- For each new pivot point added, run steps 2-4 to recursively evaluate the two segments defined by cutting the interval defined in step 1 at the new pivot point
The animation below illustrates the calculation for several randomly generated stocks. They each cover a period of 960 days (roughly 4 years of daily prices), and each frame represents one iteration of the recursive process. The red points and line segments represent the pivot points and respective price approximation, and the green band represents the tolerance. The tolerance for all the examples is 10%.
From the images, it is evident that, given the 10% tolerance, a set of around 20 points is normally sufficient to represent the full set of 960 points in the original data.
Sequential algo
In order to deal of the limitation of the recursive algorithm, namely that the entire data set must be known at the time of calculation, the sequential approach was developed.
In this method, the set of past pivot points is not affected by the addition of new data. All the algorithm does is evaluate whether the new point requires the creation of a new pivot point in order to keep all the original data within the specified tolerance.
These are the algorithm steps:
- Define interval under analysis (will begin at last pivot point and end at current value, if there is no last point, will use the first point in the time series)
- Draw reference line segment between the two points
- Evaluate deviation between all the values in the segment and the reference line
- If any point deviates more than the tolerance, the point where the highest deviation is found will be defined as a pivot point
- For each new value, repeat steps 1-4
The algorithm can be seen in action in the animation below. Here, the blue error bar at the last segment in each frame represents the current pivot point candidate, which is the point that further deviates from the reference line. This point is added to the list of pivot points when the deviation is larger than the given tolerance.
The obvious advantage with this approach is that the algorithm can be applied ad eternum, as new pivot points are simply added to the running list. The following animation illustrates this feature: