fascinating aspects of time series is the intrinsic complexity of such an apparently simple kind of data.
At the end of the day, in time series, you have an x axis that usually represents time (t), and a y axis that represents the quantity of interest (stock price, temperature, traffic, clicks, etc…). This is significantly simpler than a video, for example, where you might have thousands of images, and each image is a tensor of width, height, and three channels (RGB).
However, the evolution of the quantity of interest (y axis) over time (x axis) is where the complexity is hidden. Does this evolution present a trend? Does it have any data points that clearly deflect from the expected signal? Is it stable or unpredictable? Is the average value of the quantity larger than what we would expect? Those can all somehow be defined as anomalies.
This article is a collection of multiple anomaly detection techniques. The goal is that, given a dataset of multiple time series, we can detect which time series is anomalous and why.
These are the 4 time series anomalies we are going to detect:

Image made by author
We are going to theoretically describe each anomaly detection method from this collection, and we are going to show the Python implementation. The whole code I used for this blog post is included in the PieroPaialungaAI/timeseriesanomaly GitHub folder
In order to build the anomaly collector, we need to have a dataset where we know exactly what anomaly we are searching for, so that we know if our anomaly detector is working or not. In order to do that, I have created a data.py script. The script contains a DataGenerator object that:
This is the code snippet:

Image made by author
So we can see that we have:
The anomalies are, as expected:
Now our goal will be to have a toolbox that can identify each one of these anomalies for the whole dataset.
*The config.json file allows you to modify all the parameters of our dataset, such as the number of time series, the time series axis and the kind of anomalies. This is how it looks like:
When we say “a trend anomaly”, we are looking for a structural behavior: the series moves upward or downward over time, or it bends in a consistent way. This matters in real data because drift often means sensor degradation, changing user behavior, model/data pipeline issues, or another underlying phenomenon to be investigated in your dataset.
We consider two kinds of trends:
In practice, we measure the error of the Linear Regression model. If it is too large, we fit the Polynomial Regression one. We consider a trend to be “significant” when the p value is lower than a set threshold (commonly p < 0.05).
The AnomalyDetector object in anomaly_detector.py will run the code described above using the following functions:
We can use plot_trend_anomalies to display the time series and see how we are doing:

Image made by author
Good! So we are able to retrieve the “trendy” time series in our dataset without any bugs. Let’s move on!
Now that we have a global trend, we can focus on volatility. What I mean by volatility is, in plain English, how all over the place is our time series? In more precise terms, how does the variance of the time series compare to the average one of our dataset?
This is how we are going to test this anomaly:
Pretty simple, right? Let’s dive in with the code!
Similarly to what we have done for the trends, we have:
This is how we display the results:

Image made by author
Ok, now let’s ignore all the other time series of the dataset and let’s focus on each time series at a time. For our time series of interest, we want to see if we have one point that is clearly anomalous. There are many ways to do that; we can leverage Transformers, 1D CNN, LSTM, Encoder-Decoder, etc. For the sake of simplicity, let’s use a very simple algorithm:
We define a point as anomalous when it exceeds a fixed Z-score value. We are going to use Z-score = 3 which means 3 times the standard deviations.
Similarly to what we have done for the trends and volatility, we have:
And this is how it is performing:

Image made by author
This part is intentionally simple. Here we are not looking for weird points in time, we are looking for weird signals in the bank. What we want to answer is:
Is there any time series whose overall magnitude is significantly larger (or smaller) than what we expect given the rest of the dataset?
To do that, we compress each time series into a single “baseline” number (a typical level), and then we compare those baselines across the whole bank. The comparison will be done in terms of the median and Z score.
This is how we do the dataset-level anomaly:
This is the code to do so:

Ok, it’s time to put it all together. We will use detector.detect_all_anomalies() and we will evaluate anomalies for the whole dataset based on trend, volatility, single-point and dataset-level anomalies. The script to do this is very simple:
The df will give you the anomaly for each time series. This is how it looks like:
If we use the following function we can see that in action:

Image made by author
Pretty impressive right? We did it. 🙂
Thank you for spending time with us, it means a lot. ❤️ Here’s what we have done together:
In many real projects, a toolbox like the one we built here gets you very far, because:
Keep in mind that the baseline is simple on purpose, and it uses very simple statistics. However, the modularity of the code allows you to easily add complexity by just adding the functionality in the anomaly_detector_utils.py and anomaly_detector.py.
Thank you again for your time. It means a lot ❤️
My name is Piero Paialunga, and I’m this guy here:

Image made by author
I’m originally from Italy, hold a Ph.D. from the University of Cincinnati, and work as a Data Scientist at The Trade Desk in New York City. I write about AI, Machine Learning, and the evolving role of data scientists both here on TDS and on LinkedIn. If you liked the article and want to know more about machine learning and follow my studies, you can:
A. Follow me on Linkedin, where I publish all my stories
B. Follow me on GitHub, where you can see all my code
C. For questions, you can send me an email at piero.paialunga@hotmail