🌎 Anomaly Detection In Network Services
# Task
At Frontier Communications, one of the high-reaching goals on our team's blueprint for the year involved an implementation of predictive analytics that support teams could operationalize. Our team aimed to harness the enormous datasets at Frontier, and develop machine learning models to proactively address customer issues, with a particular focus on our copper network customers. These customers contributed substantially to the bottom line; it followed then that detecting anomalous network behavior and sending reports to operations teams could sustain customer satisfaction and reduce churn.
# Implementation
I positioned myself in the front seat and took this initiative from 0 to 1. I scoped out the data requirements, built the ETL pipelines, and delivered a robust anomaly detection platform that could ingest data from millions of customers and label problematic outliers daily. Model outputs were exposed in an internal data table for network operations teams.
In detail, Frontier works with near-petabyte-scale data, and so distributed learning frameworks (i.e., PySpark) supplied the only viable solution for training a perfomant machine learning model. SynapseML is a distributed learning Python library from Microsoft that is compatible with anomaly detection tasks. In combination with other libraries such as MLflow and HyperOpt, the SynapseML package (specifically its isolation forest1 class) comprised a complete end-to-end solution for training a model on a not-insignificant amount of sample data.
The hyperopt
package provided a means to select the best hyperparameters during training. However, with unsupervised learning problems, it can be sort of an art to determine which model is best. Something new I experimented with was using a quantitative measure of model performance to guide hyperparameter optimization. emmv
is a package that provided functions for computing excess-mass and mass-volume metrics. More information can be found here.
# Outcomes
In accordance with CI/CD principles, anomaly detection models were regularly retrained on recent data to mitigate model drift, with deployments being managed a la MLflow's version control framework. In operation, the model generates the topmost outliers every day and ranks them via anomaly scores, and tabulates the results in a dashboard table accessible by internal teams. Backed by serverless architecture, operations teams could query the dashboard and efficiently scour the most egregious network issues.
Overall, this project represented a significant achievement for advancing Frontier's network analytics capabilities. By leveraging Frontier's literal data gold mine, this anomaly detection solution empowered operations teams and enabled a more proactive approach to solving customer network issues.
# Notes
1. Isolation forests are, like you might have guessed, a variant of random forest models. Each tree aims to partition a sample of the feature space until individual data points are isolated. Anomaly scores are computed based on the smallest average distance from the root across all trees. In other words, anomalies get isolated close to the root because the tree does not need very many partitions to do so.