SDDM: an interpretable statistical concept drift detection method for data streams

By

Micevska S.

Awad A.

Sakr S.

Machine learning models assume that data is drawn from a stationary distribution. However, in practice, challenges are imposed on models that need to make sense of fast-evolving data streams, where the content of data is changing and evolving over time. This change between the distributions of training data seen so-far and the distribution of newly coming data is called concept drift. It is of utmost importance to detect concept drifts to maintain the accuracy and reliability of online classifiers. Reactive drift detectors monitor the performance of the underlying machine learning model. That is, to detect a drift, feedback on the classifier output has to be given to the drift detector, known as prequential evaluation. In many real-life scenarios, immediate feedback on classifier output is not possible. Thus, drift detection is delayed and gets out of context. Moreover, the drift detector output is in the form of a binary answer if there is a drift or not. However, it is equally important to explain the source of drift. In this paper, we present the Statistical Drift Detection Method (SDDM) which can detect drifts by monitoring the change of data distribution without the need for feedback on classifier output. Moreover, the detection is quantified and the source of drift is identified. We empirically evaluate our method against the state-of-the-art on both synthetic and real life data sets. SDDM outperforms other related approaches by producing a smaller number of false positives and false negatives. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature.