Knowledge graph‐driven data processing for business intelligence

Knowledge graph-driven data processing for business intelligence

Occupational safety ontology built from news articles published by OSHA helps in getting a comprehensive view of incidents, causes, regions and so on.


Abstract

With proliferation of Big Data, organizational decision making has also become more complex. Business Intelligence (BI) is no longer restricted to querying about marketing and sales data only. It is more about linking data from disparate applications and also churning through large volumes of unstructured data like emails, call logs, social media, News, and so on in an attempt to derive insights that can also provide actionable intelligence and better inputs for future strategy making. Semantic technologies like knowledge graphs have proved to be useful tools that help in linking disparate data sources intelligently and also enable reasoning through complex networks that are created as a result of this linking. Over the last decade the process of creation, storage, and maintenance of knowledge graphs have sufficiently matured, and they are now making inroads into business decision making also. Very recently, these graphs are also seen as a potential way to reduce hallucinations of large language models, by including these during pre-training as well as generation of output. There are a number of challenges also. These include building and maintaining the graphs, reasoning with missing links, and so on. While these remain as open research problems, we present in this article a survey of how knowledge graphs are currently used for deriving business intelligence with use-cases from various domains.

This article is categorized under: Algorithmic Development > Text Mining Application Areas > Business and Industry

A survey of episode mining

A survey of episode mining

A search space of frequent episodes (sequences of events that appear frequently in a sequence of events).


Abstract

Episode mining is a research area in data mining, where the aim is to discover interesting episodes, that is, subsequences of events, in an event sequence. The most popular episode-mining task is frequent episode mining (FEM), which consists of identifying episodes that appear frequently in an event sequence, but this task has also been extended in various ways. It was shown that episode mining can reveal insightful patterns for numerous applications such as web stream analysis, network fault management, and cybersecurity, and that episodes can be useful for prediction. Episode mining is an active research area, and there have been numerous advances in the field over the last 25 years. However, due to the rapid evolution of the pattern mining field, there is no prior study that summarizes and gives a detailed overview of this field. The contribution of this article is to fill this gap by presenting an up-to-date survey that provides an introduction to episode mining and an overview of recent developments and research opportunities. This advanced review first gives an introduction to the field of episode mining and the first algorithms. Then, the main concepts used in these algorithms are explained. After that, several recent studies are reviewed that have addressed some limitations of these algorithms and proposed novel solutions to overcome them. Finally, the paper lists some possible extensions of the existing frameworks to mine more meaningful patterns and presents some possible orientations for future work that may contribute to the evolution of the episode mining field.

This article is categorized under: Algorithmic Development > Spatial and Temporal Data Mining Algorithmic Development > Association Rules Technologies > Association Rules

Multispectral data mining: A focus on remote sensing satellite images

Multispectral data mining: A focus on remote sensing satellite images

(Left) The multispectral image of Kerala, India, that is captured by Sentinel-2, shows the aftermath of its devastating floods in August 2018. Data mining of such an image would lead to knowledge discovery, which is critical for the estimation of damages, risk assessment, and so on. The data mining in this scenario involves processes, such as segmentation and change detection from multi-source and/or time-series images. These processes lead to the knowledge of flood extent. (Right) Our article reviews all such processes in the context of the entire data science workflow for multispectral images from satellite sensors. (Image generated by authors. (Left) Satellite images and data story courtesy: https://earthobservatory.nasa.gov/images/92669/before-and-after-the-kerala-floods)


Abstract

This article gives a brief overview of various aspects of data mining of multispectral image data. We focus on specifically the remote sensing satellite images acquired using multispectral imaging (MSI), given the technology used across multiple knowledge domains, such as chemistry, medical imaging, remote sensing, and so on with a sufficient amount of variation. In this article, the different data mining processes are reviewed along with state-of-the-art methods and applications. To study data mining, it is important to know how the data are acquired and preprocessed. Hence, those topics are briefly covered in the article. The article concludes with applications demonstrating the knowledge discovery from data mining, modern challenges, and promising future directions for MSI data mining research.

This article is categorized under: Application Areas > Science and Technology Fundamental Concepts of Data and Knowledge > Knowledge Representation Fundamental Concepts of Data and Knowledge > Big Data Mining

Deepfake detection using deep learning methods: A systematic and comprehensive review

Deepfake detection using deep learning methods: A systematic and comprehensive review

The suggested DL-deepfake detection taxonomy separated four distinct method.


Abstract

Deep Learning (DL) has been effectively utilized in various complicated challenges in healthcare, industry, and academia for various purposes, including thyroid diagnosis, lung nodule recognition, computer vision, large data analytics, and human-level control. Nevertheless, developments in digital technology have been used to produce software that poses a threat to democracy, national security, and confidentiality. Deepfake is one of those DL-powered apps that has lately surfaced. So, deepfake systems can create fake images primarily by replacement of scenes or images, movies, and sounds that humans cannot tell apart from real ones. Various technologies have brought the capacity to change a synthetic speech, image, or video to our fingers. Furthermore, video and image frauds are now so convincing that it is hard to distinguish between false and authentic content with the naked eye. It might result in various issues and ranging from deceiving public opinion to using doctored evidence in a court. For such considerations, it is critical to have technologies that can assist us in discerning reality. This study gives a complete assessment of the literature on deepfake detection strategies using DL-based algorithms. We categorize deepfake detection methods in this work based on their applications, which include video detection, image detection, audio detection, and hybrid multimedia detection. The objective of this paper is to give the reader a better knowledge of (1) how deepfakes are generated and identified, (2) the latest developments and breakthroughs in this realm, (3) weaknesses of existing security methods, and (4) areas requiring more investigation and consideration. The results suggest that the Conventional Neural Networks (CNN) methodology is the most often employed DL method in publications. According to research, the majority of the articles are on the subject of video deepfake detection. The majority of the articles focused on enhancing only one parameter, with the accuracy parameter receiving the most attention.

This article is categorized under: Technologies > Machine Learning Algorithmic Development > Multimedia Application Areas > Science and Technology

Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020–2022

Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020–2022

A review of recent research on the application of deep learning models to price forecast of financial time series, with information on model architectures, applications, advantages and disadvantages, and directions for future research.


Abstract

Accurately predicting the prices of financial time series is essential and challenging for the financial sector. Owing to recent advancements in deep learning techniques, deep learning models are gradually replacing traditional statistical and machine learning models as the first choice for price forecasting tasks. This shift in model selection has led to a notable rise in research related to applying deep learning models to price forecasting, resulting in a rapid accumulation of new knowledge. Therefore, we conducted a literature review of relevant studies over the past 3 years with a view to aiding researchers and practitioners in the field. This review delves deeply into deep learning-based forecasting models, presenting information on model architectures, practical applications, and their respective advantages and disadvantages. In particular, detailed information is provided on advanced models for price forecasting, such as Transformers, generative adversarial networks (GANs), graph neural networks (GNNs), and deep quantum neural networks (DQNNs). The present contribution also includes potential directions for future research, such as examining the effectiveness of deep learning models with complex structures for price forecasting, extending from point prediction to interval prediction using deep learning models, scrutinizing the reliability and validity of decomposition ensembles, and exploring the influence of data volume on model performance.

This article is categorized under: Technologies > Prediction Technologies > Artificial Intelligence

A review on client selection models in federated learning

A review on client selection models in federated learning

Basic federated learning architecture.


Abstract

Federated learning (FL) is a decentralized machine learning (ML) technique that enables multiple clients to collaboratively train a common ML model without them having to share their raw data with each other. A typical FL process involves (1) FL client(s) selection, (2) global model distribution, (3) local training, and (4) aggregation. As such FL clients are heterogeneous edge devices (i.e., mobile phones) that differ in terms of computational resources, training data quality, and distribution. Therefore, FL client(s) selection has a significant influence on the execution of the remaining steps of an FL process. There have been a variety of FL client(s) selection models proposed in the literature, however, their critical review and/or comparative analysis is much less discussed. This paper brings the scattered FL client(s) selection models onto a single platform by first categorizing them into five categories, followed by providing a detailed analysis of the benefits/shortcomings and the applicability of these models for different FL scenarios. Such understanding can help researchers in academia and industry to develop improved FL client(s) selection models to address the requirement challenges and shortcomings of the current models. Finally, future research directions in the area of FL client(s) selection are also discussed.

This article is categorized under: Technologies > Machine Learning Technologies > Artificial Intelligence

A survey on artificial intelligence in pulmonary imaging

A survey on artificial intelligence in pulmonary imaging

Human chest CT-based airway tree segmentation using deep learning and multi-parametric iterative thresholding.


Abstract

Over the last decade, deep learning (DL) has contributed to a paradigm shift in computer vision and image recognition creating widespread opportunities of using artificial intelligence in research as well as industrial applications. DL has been extensively studied in medical imaging applications, including those related to pulmonary diseases. Chronic obstructive pulmonary disease, asthma, lung cancer, pneumonia, and, more recently, COVID-19 are common lung diseases affecting nearly 7.4% of world population. Pulmonary imaging has been widely investigated toward improving our understanding of disease etiologies and early diagnosis and assessment of disease progression and clinical outcomes. DL has been broadly applied to solve various pulmonary image processing challenges including classification, recognition, registration, and segmentation. This article presents a survey of pulmonary diseases, roles of imaging in translational and clinical pulmonary research, and applications of different DL architectures and methods in pulmonary imaging with emphasis on DL-based segmentation of major pulmonary anatomies such as lung volumes, lung lobes, pulmonary vessels, and airways as well as thoracic musculoskeletal anatomies related to pulmonary diseases.

This article is categorized under: Application Areas > Health Care Technologies > Artificial Intelligence Technologies > Computational Intelligence Application Areas > Science and Technology

Sentiment analysis using fuzzy logic: A comprehensive literature review

Sentiment analysis using fuzzy logic: A comprehensive literature review

A comprehensive review on sentiment analysis using fuzzy logic.


Abstract

Understanding and comprehending humans' views, beliefs, attitudes, or opinions toward a particular entity is sentiment analysis (SA). Advancements in e-commerce platforms has led to an abundance of the real-time and free forms of opinions floating on social media platforms. This real-world data are imprecise and vague hence fuzzy logic is required to deal with such subjective data. Since opinions can be fuzzy in nature and definitions of opinion words can be elucidated differently; fuzzy logic has witnessed itself as an effective method to capture the expression of opinions. The study presents an elaborate review of the around 170 published research works for SA using fuzzy logic. The primary emphasis is focused on text-based SA, audio-based SA, and fusion of text-audio features-based SA. This article discusses the various novel ways of classifying fuzzy logic-based SA research articles, which have not been accomplished by any other review article till date. The article puts forward the importance of SA tasks and identifies how fuzzy logic adds to this importance. Finally, the article outlines a taxonomy for sentiment classification based on the technique-supervised and unsupervised in the SA models and comprehensively reviews the SA approaches specific to their task. Prominently, this study highlights the suitability of fuzzy-based SA approaches into five different classes vis-a-vis (a) Sentiment Cognition from Words using fuzzy logic, (b) Sentiment Cognition from Phrases using fuzzy logic, (c) Fuzzy-rule based SA, (d) Neuro-fuzzy network-based SA, and (e) Fuzzy Emotion Recognition.

This article is categorized under: Algorithmic Development > Text Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

The benefits and dangers of using machine learning to support making legal predictions

The benefits and dangers of using machine learning to support making legal predictions

The prediction of the potential outcome of a legal dispute is an important aspect in the provision of legal advice. Unfortunately legal data is far more open to bias and misinterpretation than it is in medicine and the natural and physical sciences. Explanation and governance are vital in legal domains and are thus examined in detail. This review is intended for a machine learning audience interested in legal and governance issues as well as the community who are interested in using data analytic tools to enhance legal decision-making.


Abstract

Rule-based systems have been used in the legal domain since the 1970s. Save for rare exceptions, machine learning has only recently been used. But why this delay? We investigate the appropriate use of machine learning to support and make legal predictions. To do so, we need to examine the appropriate use of data in global legal domains—including in common law, civil law, and hybrid jurisdictions. The use of various forms of Artificial Intelligence, including rule-based reasoning, case-based reasoning and machine learning in law requires an understanding of jurisprudential theories. We will see that the use of machine learning is particularly appropriate for non-professionals: in particular self-represented litigants or those relying upon legal aid services. The primary use of machine learning to support decision-making in legal domains has been in criminal detection, financial domains, and sentencing. The use in these areas has led to concerns that the inappropriate use of Artificial Intelligence leads to biased decision making. This requires us to examine concerns about governance and ethics. Ethical concerns can be minimized by providing enhanced explanation, choosing appropriate data to be used, appropriately cleaning that data, and having human reviews of any decisions.

This article is categorized under: Commercial, Legal, and Ethical Issues > Legal Issues Commercial, Legal, and Ethical Issues > Fairness in Data Mining

A feature selection for video quality of experience modeling: A systematic literature review

A feature selection for video quality of experience modeling: A systematic literature review

Feature selection methods application analysis in data preparation step according to data collection and data modeling steps in video QoE modeling.


Abstract

Quality of Experience (QoE) multidimensional concept is the key for successful delivery of multimedia services. Higher user requirements for new experiences such as augmented reality, virtual reality, and future 6G services set higher requirements for QoE. A more complex QoE space requires the use of data mining methods in order to process the data for better QoE prediction. The increased dimensionality of the QoE space becomes a limiting factor for achieving the desired QoE prediction accuracy. Existing studies considering the QoE multidimensional concept with approaches that overcome the challenge of increased QoE space dimensionality are of great importance for future research. Accordingly, this article aims to review the applications of Feature Selection (FS) methods in video QoE modeling. It provides a comprehensive overview of the existing studies with the categorization and review of applied FS methods with reference to the data collection and data modeling steps. The analysis included 71 studies which provides overview of the FS methods applications in video QoE modeling depending on the input Influence Factor (IF) dimension sizes, type of IFs, QoE prediction methods used and QoE evaluation type. Our review revealed the advantages of using FS methods in video QoE modeling, frequency of application of FS methods with potential of applying more FS methods in a series or a parallel, gives an overview of the achieved dimensionality reduction degree for different methods, and provides insights in opportunities for researchers for applying FS methods on complex multidimensional QoE space.

This article is categorized under: Technologies > Data Preprocessing Algorithmic Development > Multimedia