A systematic review on detection and adaptation of concept drift in streaming data using machine learning techniques

A systematic review on detection and adaptation of concept drift in streaming data using machine learning techniques

Detecting concept drift: A visual guide.


Abstract

Last decade demonstrate the massive growth in organizational data which keeps on increasing multi-fold as millions of records get updated every second. Handling such vast and continuous data is challenging which further opens up many research areas. The continuously flowing data from various sources and in real-time is termed as streaming data. While deriving valuable statistics from data streams, the variation that occurs in data distribution is called concept drift. These drifts play a significant role in a variety of disciplines, including data mining, machine learning, ubiquitous knowledge discovery, quantitative decision theory, and so forth. As a result, a substantial amount of research is carried out for studying methodologies and approaches for dealing with drifts. However, the available material is scattered and lacks guidelines for selecting an effective technique for a particular application. The primary novel objective of this survey is to present an understanding of concept drift challenges and allied studies. Further, it assists researchers from diverse domains to accommodate detection and adaptation algorithms for concept drifts in their applications. Overall, this study aims to contribute to deeper insights into the classification of various types of drifts and methods for detection and adaptation along with their key features and limitations. Furthermore, this study also highlights performance metrics used to evaluate the concept drift detection methods for streaming data. This paper presents the future research scope by highlighting gaps in the existing literature for the development of techniques to handle concept drifts.

This article is categorized under: Algorithmic Development > Ensemble Methods Application Areas > Data Mining Software Tools Fundamental Concepts of Data and Knowledge > Big Data Mining

The state‐of‐art review of ultra‐precision machining using text mining: Identification of main themes and recommendations for the future direction

The state-of-art review of ultra-precision machining using text mining: Identification of main themes and recommendations for the future direction

The graphical abstract of the study including research procedures of the text mining approach (bottom left) and the text mining/thematic network of ultra-precision machining (bottom right).


Abstract

Ultra-precision machining (UPM), one of the most advanced machining techniques that can produce exact components, significantly impacts the technological community. The significance of UPM attracts the attention of academic and industrial partners. As a result of the rapid development of UPM caused by technological advancement, it is necessary to revisit the current stages and evolution of UPM to sustain and advance this technology. The state of the art in UPM is first investigated systematically in this study by identifying the current four major UPM themes. The UPM thematic network is then built, along with a structural analysis of the network, to determine the interactions between each theme and the primary roles of theme members responsible for the interactions. Furthermore, the “bridge” role is assigned to the specific UPM theme content. On the other hand, Sentiment analysis is conducted to determine how the academic community at UPM feels about the themes for UPM research to focus on those themes with a need for more confidence. Considering the above findings, the future perspective of UPM and suggestions for its advancement are discussed and provided. This study provides a comprehensive understanding and the current state-of-the-art review of UPM technology by a text mining technique to critically analyze its research content, as well as suggestions to enhance UPM development by focusing on its current challenges, thereby assisting academia and institutions in leveraging this technology to benefit society.

This article is categorized under: Algorithmic Development > Text Mining Application Areas > Science and Technology Application Areas > Industry Specific Applications

Bias in human data: A feedback from social sciences

Bias in human data: A feedback from social sciences

The human-machine cultivation cylcle.


Abstract

The fairness of human-related software has become critical with its widespread use in our daily lives, where life-changing decisions are made. However, with the use of these systems, many erroneous results emerged. Technologies have started to be developed to tackle unexpected results. As for the solution to the issue, companies generally focus on algorithm-oriented errors. The utilized solutions usually only work in some algorithms. Because the cause of the problem is not just the algorithm; it is also the data itself. For instance, deep learning cannot establish the cause–effect relationship quickly. In addition, the boundaries between statistical or heuristic algorithms are unclear. The algorithm's fairness may vary depending on the data related to context. From this point of view, our article focuses on how the data should be, which is not a matter of statistics. In this direction, the picture in question has been revealed through a scenario specific to “vulnerable and disadvantaged” groups, which is one of the most fundamental problems today. With the joint contribution of computer science and social sciences, it aims to predict the possible social dangers that may arise from artificial intelligence algorithms using the clues obtained in this study. To highlight the potential social and mass problems caused by data, Gerbner's “cultivation theory” is reinterpreted. To this end, we conduct an experimental evaluation on popular algorithms and their data sets, such as Word2Vec, GloVe, and ELMO. The article stresses the importance of a holistic approach combining the algorithm, data, and an interdisciplinary assessment.

This article is categorized under: Algorithmic Development > Statistics

Unsupervised EHR‐based phenotyping via matrix and tensor decompositions

Unsupervised EHR-based phenotyping via matrix and tensor decompositions

Computational phenotyping based on low-rank approximations allows to transform electronic health records (EHR) into clinically relevant and interpretable concepts.


Abstract

Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co-occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low-rank data approximation methods such as matrix (e.g., nonnegative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping. The existing literature is categorized into temporal versus static phenotyping approaches based on matrix versus tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, that is, the assessment of clinical significance.

This article is categorized under: Algorithmic Development > Structure Discovery Fundamental Concepts of Data and Knowledge > Explainable AI Technologies > Machine Learning

Interpretable and explainable machine learning: A methods‐centric overview with concrete examples

Interpretable and explainable machine learning: A methods-centric overview with concrete examples

Interpretability and explainability are essential principles of machine learning model and method design and development for medicine, economics, law, and natural sciences applications. Over the last 30 years, many techniques motivated by these properties have been developed. This review is intended for a general machine learning audience interested in exploring the challenges of interpretation and explanation beyond the logistic regression or random forest variable importance. We will examine inductive biases behind interpretable and explainable machine learning and illustrate them with concrete examples from the literature.


Abstract

Interpretability and explainability are crucial for machine learning (ML) and statistical applications in medicine, economics, law, and natural sciences and form an essential principle for ML model design and development. Although interpretability and explainability have escaped a precise and universal definition, many models and techniques motivated by these properties have been developed over the last 30 years, with the focus currently shifting toward deep learning. We will consider concrete examples of state-of-the-art, including specially tailored rule-based, sparse, and additive classification models, interpretable representation learning, and methods for explaining black-box models post hoc. The discussion will emphasize the need for and relevance of interpretability and explainability, the divide between them, and the inductive biases behind the presented “zoo” of interpretable models and explanation methods.

This article is categorized under: Fundamental Concepts of Data and Knowledge > Explainable AI Technologies > Machine Learning Commercial, Legal, and Ethical Issues > Social Considerations

A survey of online video advertising

A survey of online video advertising

Online video advertising has important research in social and computer sciences. Social science study focuses on which factors influence users' attitudes toward ads, brands, and intention to purchase. Computer science study aims to placing ads into videos conveniently and effectively.


Abstract

With the development of social media and the ubiquity of the Internet, recent years have witnessed the rapid development of online video advertising among publishers and advertisers. Video advertising, as a new type of advertisement, has gained significant research attention from both academia and industry, coinciding with the ever-growing volume of online videos. In this research, we provide a comprehensive survey of online video advertising in the fields of social science and computer science. We investigate state-of-the-art articles from 1990 to the present and provide a new taxonomy of extant research topics based on these articles. We also highlight the factors that cause advertising to affect people and the most popular video advertising techniques used in computer science. Finally, on the basis of the analytics of the surveyed papers, future challenges are identified and potential solutions to these are discussed.

This article is categorized under: Algorithmic Development > Multimedia Application Areas > Internet and Web-Based Applications

Open source intelligence extraction for terrorism‐related information: A review

Open source intelligence extraction for terrorism-related information: A review

Overview of OSINT extraction process from terrorism-related textual information in three phases, that is, (i) Data Acquisition, (ii) Data Enrichment, and (iii) Knowledge Inference.


Abstract

In this contemporary era, where a large part of the world population is deluged by extensive use of the internet and social media, terrorists have found it a potential opportunity to execute their vicious plans. They have got a befitting medium to reach out to their targets to spread propaganda, disseminate training content, operate virtually, and further their goals. To restrain such activities, information over the internet in context of terrorism needs to be analyzed to channel it to appropriate measures in combating terrorism. Open Source Intelligence (OSINT) accounts for a felicitous solution to this problem, which is an emerging discipline of leveraging publicly accessible sources of information over the internet by effectively utilizing it to extract intelligence. The process of OSINT extraction is broadly observed to be in three phases (i) Data Acquisition, (ii) Data Enrichment, and (iii) Knowledge Inference. In the context of terrorism, researchers have given noticeable contributions in compliance with these three phases. However, a comprehensive review that delineates these research contributions into an integrated workflow of intelligence extraction has not been found. The paper presents the most current review in OSINT, reflecting how the various state-of-the-art tools and techniques can be applied in extracting terrorism-related textual information from publicly accessible sources. Various data mining and text analysis-based techniques, that is, natural language processing, machine learning, and deep learning have been reviewed to extract and evaluate textual data. Additionally, towards the end of the paper, we discuss challenges and gaps observed in different phases of OSINT extraction.

This article is categorized under: Application Areas > Government and Public Sector Commercial, Legal, and Ethical Issues > Social Considerations Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

On the application of machine learning in astronomy and astrophysics: A text‐mining‐based scientometric analysis

On the application of machine learning in astronomy and astrophysics: A text-mining-based scientometric analysis

This work presents a text-mining-based scientometric analysis of the scientific output in the last three decades regarding the use of artificial intelligence and machine learning in the fields of astronomy and astrophysics.


Abstract

Since the beginning of the 21st century, the fields of astronomy and astrophysics have experienced significant growth at observational and computational levels, leading to the acquisition of increasingly huge volumes of data. In order to process this vast quantity of information, artificial intelligence (AI) techniques are being combined with data mining to detect patterns with the aim of modeling, classifying or predicting the behavior of certain astronomical phenomena or objects. Parallel to the exponential development of the aforementioned techniques, the scientific output related to the application of AI and machine learning (ML) in astronomy and astrophysics has also experienced considerable growth in recent years. Therefore, the increasingly abundant articles make it difficult to monitor this field in terms of which research topics are the most prolific or novel, or which countries or authors are leading them. In this article, a text-mining-based scientometric analysis of scientific documents published over the last three decades on the application of AI and ML in the fields of astronomy and astrophysics is presented. The VOSviewer software and data from the Web of Science (WoS) are used to elucidate the evolution of publications in this research field, their distribution by country (including co-authorship), the most relevant topics addressed, and the most cited elements and most significant co-citations according to publication source and authorship. The obtained results demonstrate how application of AI/ML to the fields of astronomy/astrophysics represents an established and rapidly growing field of research that is crucial to obtaining scientific understanding of the universe.

This article is categorized under: Algorithmic Development > Text Mining Technologies > Machine Learning Application Areas > Science and Technology

A review on data fusion in multimodal learning analytics and educational data mining

A review on data fusion in multimodal learning analytics and educational data mining

General multimodal data fusion approach for EMD/LA.


Abstract

The new educational models such as smart learning environments use of digital and context-aware devices to facilitate the learning process. In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused, and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary to apply correctly data fusion approaches and techniques in order to combine various sources of multimodal learning analytics (MLA). These sources or modalities in MLA include audio, video, electrodermal activity data, eye-tracking, user logs, and click-stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech, or writing. This survey introduces data fusion in learning analytics (LA) and educational data mining (EDM) and how these data fusion techniques have been applied in smart learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends, and challenges in this specific research area.

This article is categorized under: Application Areas > Education and Learning

A review of bus arrival time prediction using artificial intelligence

A review of bus arrival time prediction using artificial intelligence

Graphical representation of review on bus arrival time prediction using AI


Abstract

Buses are one of the important parts of public transport system. To provide accurate information about bus arrival and departure times at bus stops is one of the main parameters of good quality public transport. Accurate arrival and departure times information is important for a public transport mode since it enhances ridership as well as satisfaction of travelers. With accurate arrival-time and departure time information, travelers can make informed decisions about their journey. The application of artificial intelligence (AI) based methods/algorithms to predict the bus arrival time (BAT) is reviewed in detail. Systematic survey of existing research conducted by various researchers by applying the different branches of AI has been done. Prediction models have been segregated and are accumulated under respective branches of AI. Thorough discussion is presented to elaborate different branches of AI that have been applied for several aspects of BAT prediction. Research gaps and possible future directions for further research work are summarized.

This article is categorized under: Application Areas > Science and Technology Technologies > Artificial Intelligence Technologies > Prediction