Starten Sie Ihre Suche...


Durch die Nutzung unserer Webseite erklären Sie sich damit einverstanden, dass wir Cookies verwenden. Weitere Informationen

Modeling Recurring Concepts in Single-label and Multi-label Streams

Mainz: Univ. 2020 0 S.

Erscheinungsjahr: 2020

Publikationstyp: Buch (Dissertation)

Sprache: Englisch

Doi/URN: urn:nbn:de:hebis:77-diss-1000032202

Volltext über DOI/URN

GeprüftBibliothek

Inhaltszusammenfassung


Today, we have access to a vast amount of data in the forms of images, speech signals, structured and unstructured texts, and sensor-based signals. Our digital universe is growing quickly. Statistics indicate that 500 million tweets are posted every day. 65 billion messages are transferred on WhatsApp per day. 294 billion emails are sent daily via different platforms. Each self-driving car creates 4 terabytes of data per day. According to a study by Digital Universe, the amount of data produc...Today, we have access to a vast amount of data in the forms of images, speech signals, structured and unstructured texts, and sensor-based signals. Our digital universe is growing quickly. Statistics indicate that 500 million tweets are posted every day. 65 billion messages are transferred on WhatsApp per day. 294 billion emails are sent daily via different platforms. Each self-driving car creates 4 terabytes of data per day. According to a study by Digital Universe, the amount of data produced by humans and machines will exceed 44 billion terabytes by 2020. This means that there will be 5,200 gigabyte of data for every person on earth. It is estimated that by 2025, the created data will increase to 463 million terabytes per day. Processing and leveraging knowledge from these sources of data requires proper infrastructure and efficient methods to analyze them in real-time. Data stream mining is the field of propounding such scalable and efficient methods, which can process data incrementally. Incremental induction from a limited set of observations of an unknown distribution has been the topic of many studies for a long time. Depending on the application, the target class can be only one or many labels among which some unknown dependencies exist. Although this problem is challenging enough, in many of the stream mining applications, the statistical properties of the input and target variable(s) may change over time in unforeseen ways. This phenomena is called concept drift. If not considered and captured properly, the trained online models quickly become obsolete over time. However, these drifts are not well-defined and could contain any change in the statistical properties of data, adding more difficulty to the prediction problem. In this thesis, our overall focus is to model one type of drifts which is called recurrent concepts. Recurrent concepts are important to be captured independently, as most of stream mining methods employ a forgetting mechanism in the learning process and forget their outdated extracted knowledge. To this end, we propose the GraphPool and multi-label GraphPool frameworks for both single-label and multi-label data streams. These frameworks keep a pool of concepts and their transitions in a first-order Markov chain to quickly recover from drifts in the streams with periodic behavior. In the course of designing such a framework for multi-label streams, we develop an efficient algorithm for classifying stationary multi-label streams. To show the effectiveness of our methods, we conduct an extensive set of experiments with both synthetic and real-world data.» weiterlesen» einklappen

Autoren


Ahmadi, Zahra (Autor)

Klassifikation


DDC Sachgruppe:
Naturwissenschaften