Publications:An adaptive algorithm for anomaly and novelty detection in evolving data streams

From ISLAB/CAISR

Do not edit this section

Keep all hand-made modifications below

Title An adaptive algorithm for anomaly and novelty detection in evolving data streams
Author Mohamed-Rafik Bouguelia and Sławomir Nowaczyk and Amir H. Payberah
Year 2018
PublicationType Journal Paper
Journal Data mining and knowledge discovery
HostPublication
Conference
DOI http://dx.doi.org/10.1007/s10618-018-0571-0
Diva url http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:1205294
Abstract In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change.