Journal of AI and Data Mining – Page 2

Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran

January 2, 2024January 2, 2024 Journal of AI and Data Mining Edit

To mitigate COVID-19’s overwhelming burden, a rapid and efficient early screening scheme for COVID-19 in the first-line is required. Much research has utilized laboratory tests, CT scans, and X-ray data, which are obstacles to agile and real-time screening. In this study, we propose a user-friendly and low-cost COVID-19 detection model based on self-reportable data at home. The most exhausted input features were identified and included in the demographic, symptoms, semi-clinical, and past/present disease data categories. We employed Grid search to identify the optimal combination of hyperparameter settings that yields the most accurate prediction. Next, we apply the proposed model with tuned hyperparameters to 11 classic state-of-the-art classifiers. The results show that the XGBoost classifier provides the highest accuracy of 73.3%, but statistical analysis shows that there is no significant difference between the accuracy performance of XGBoost and AdaBoost, although it proved the superiority of these two methods over other methods. Furthermore, the most important features obtained using SHapely Adaptive explanations were analyzed. “Contact with infected people,” “cough,” “muscle pain,” “fever,” “age,” “Cardiovascular commodities,” “PO2,” and “respiratory distress” are the most important variables. Among these variables, the first three have a relatively large positive impact on the target variable. Whereas, “age,” “PO2”, and “respiratory distress” are highly negatively correlated with the target variable. Finally, we built a clinically operable, visible, and easy-to-interpret decision tree model to predict COVID-19 infection.

LSTM Modeling and Optimization of Rice (Oryza sativa L.) Seedling Growth using Intelligent Chamber

January 2, 2024January 2, 2024 Journal of AI and Data Mining Edit

An intelligent growth chamber was designed in 2021 to model and optimize rice seedlings' growth. According to this, an experiment was implemented at Sari University of Agricultural Sciences and Natural Resources, Iran, in March, April, and May 2021. The model inputs included radiation, temperature, carbon dioxide, and soil acidity. These growth factors were studied at ambient and incremental levels. The model outputs were seedlings' height, root length, chlorophyll content, CGR, RGR, the leaves number, and the shoot's dry weight. Rice seedlings' growth was modeled using LSTM neural networks and optimized by the Bayesian method. It concluded that the best parameter setting was at epoch=100, learning rate=0.001, and iteration number=500. The best performance during training was obtained when the validation RMSE=0.2884.

Applying Twin-Hybrid Feature Selection Scheme on Transient Multi-Trajectory Data for Transient Stability Prediction

January 2, 2024January 2, 2024 Journal of AI and Data Mining Edit

A speedy and accurate transient stability assessment (TSA) is gained by employing efficient machine learning- and statistics-based (MLST) algorithms on transient nonlinear time series space. In the MLST’s world, the feature selection process by forming compacted optimal transient feature space (COTFS) from raw high dimensional transient data can pave the way for high-performance TSA. Hence, designing a comprehensive feature selection scheme (FSS) that populates COTFS with the relevant-discriminative transient features (RDTFs) is an urgent need. This work aims to introduce twin hybrid FSS (THFSS) to select RDTFs from transient 28-variate time series data. Each fold of THFSS comprises filter-wrapper mechanisms. The conditional relevancy rate (CRR) is based on mutual information (MI) and entropy calculations are considered as the filter method, and incremental wrapper subset selection (IWSS) and IWSS with replacement (IWSSr) formed by kernelized support vector machine (SVM) and twin SVM (TWSVM) are used as wrapper ones. After exerting THFSS on transient univariates, RDTFs are entered into the cross-validation-based train-test procedure for evaluating their efficiency in TSA. The results manifested that THFSS-based RDTFs have a prediction accuracy of 98.87 % and a processing time of 102.653 milliseconds for TSA.

Using Convolutional Neural Network to Enhance Classification Accuracy of Cancerous Lung Masses from CT Scan Images

December 10, 2023December 10, 2023 Journal of AI and Data Mining Edit

Lung cancer is a highly serious illness, and detecting cancer cells early significantly enhances patients' chances of recovery. Doctors regularly examine a large number of CT scan images, which can lead to fatigue and errors. Therefore, there is a need to create a tool that can automatically detect and classify lung nodules in their early stages. Computer-aided diagnosis systems, often employing image processing and machine learning techniques, assist radiologists in identifying and categorizing these nodules. Previous studies have often used complex models or pre-trained networks that demand significant computational power and a long time to execute. Our goal is to achieve accurate diagnosis without the need for extensive computational resources. We introduce a simple convolutional neural network with only two convolution layers, capable of accurately classifying nodules without requiring advanced computing capabilities. We conducted training and validation on two datasets, LIDC-IDRI and LUNA16, achieving impressive accuracies of 99.7% and 97.52%, respectively. These results demonstrate the superior accuracy of our proposed model compared to state-of-the-art research papers.

Identification of Influential Nodes in Social Networks based on Profile Analysis

November 29, 2023November 29, 2023 Journal of AI and Data Mining Edit

Analyzing the influence of people and nodes in social networks has attracted a lot of attention. Social networks gain meaning, despite the groups, associations, and people interested in a specific issue or topic, and people demonstrate their theoretical and practical tendencies in such places. Influential nodes are often identified based on the information related to the social network structure and less attention is paid to the information spread by the social network user. The present study aims to assess the structural information in the network to identify influential users in addition to using their information in the social network. To this aim, the user’s feelings were extracted. Then, an emotional or affective score was assigned to each user based on an emotional dictionary and his/her weight in the network was determined utilizing centrality criteria. Here, the Twitter network was applied. Thus, the structure of the social network was defined and its graph was drawn after collecting and processing the data. Then, the analysis capability of the network and existing data was extracted and identified based on the algorithm proposed by users and influential nodes. Based on the results, the nodes identified by the proposed algorithm are considered high-quality and the speed of information simulated is higher than other existing algorithms.

Autoencoder-PCA-based Online Supervised Feature Extraction-Selection Approach

November 25, 2023November 25, 2023 Journal of AI and Data Mining Edit

Due to the growing number of data-driven approaches, especially in artificial intelligence and machine learning, extracting appropriate information from the gathered data with the best performance is a remarkable challenge. The other important aspect of this issue is storage costs. The principal component analysis (PCA) and autoencoders (AEs) are samples of the typical feature extraction methods in data science and machine learning that are widely used in various approaches. The current work integrates the advantages of AEs and PCA for presenting an online supervised feature extraction selection method. Accordingly, the desired labels for the final model are involved in the feature extraction procedure and embedded in the PCA method as well. Also, stacking the nonlinear autoencoder layers with the PCA algorithm eliminated the kernel selection of the traditional kernel PCA methods. Besides the performance improvement proved by the experimental results, the main advantage of the proposed method is that, in contrast with the traditional PCA approaches, the model has no requirement for all samples to feature extraction. As regards the previous works, the proposed method can outperform the other state-of-the-art ones in terms of accuracy and authenticity for feature extraction.

A New Hybrid Method to Detect Risk of Gastric Cancer Using Machine Learning Techniques

November 12, 2023November 12, 2023 Journal of AI and Data Mining Edit

Machine learning (ML) is a popular tool in healthcare while it can help to analyze large amounts of patient data, such as medical records, predict diseases, and identify early signs of cancer. Gastric cancer starts in the cells lining the stomach and is known as the 5th most common cancer worldwide. Therefore, predicting the survival of patients, checking their health status, and detecting their risk of gastric cancer in the early stages can be very beneficial. Surprisingly, with the help of machine learning methods, this can be possible without the need for any invasive methods which can be useful for both patients and physicians in making informed decisions. Accordingly, a new hybrid machine learning-based method for detecting the risk of gastric cancer is proposed in this paper. The proposed model is compared with traditional methods and based on the empirical results, not only the proposed method outperform existing methods with an accuracy of 98% but also gastric cancer can be one of the most important consequences of H. pylori infection. Additionally, it can be concluded that lifestyle and dietary factors can heighten the risk of gastric cancer, especially among individuals who frequently consume fried foods and suffer from chronic atrophic gastritis and stomach ulcers. This risk is further exacerbated in individuals with limited fruit and vegetable intake and high salt consumption.

Auto-UFSTool: An Automatic Unsupervised Feature Selection Toolbox for MATLAB

October 15, 2023October 15, 2023 Journal of AI and Data Mining Edit

Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data dimensionality, and computation costs, these approaches require programming knowledge, which limits their popularity and has not adequately addressed unlabeled real-world data. Automatic UFS Toolbox (Auto-UFSTool) for MATLAB, proposed in this study, is a user-friendly and fully-automatic toolbox that utilizes several UFS approaches from the most recent research. It is a collection of 25 robust UFS approaches, most of which were developed within the last five years. Therefore, a clear and systematic comparison of competing methods is feasible without requiring a single line of code. Even users without any previous programming experience may utilize the actual implementation by the Graphical User Interface (GUI). It also provides the opportunity to evaluate the feature selection results and generate graphs that facilitate the comparison of subsets of varying sizes. It is freely accessible in the MATLAB File Exchange repository and includes scripts and source code for each technique. The link to this toolbox is freely available to the general public on: bit.ly/AutoUFSTool

Link Prediction in Social Networks: A Bibliometric Analysis and Review of Literature (1987-2021)

October 9, 2023October 9, 2023 Journal of AI and Data Mining Edit

Link prediction (LP) has become a hot topic in the data mining, machine learning, and deep learning community. This study aims to implement bibliometric analysis to find the current status of the LP studies and investigate it from different perspectives. The present study provides a Scopus-based bibliometric overview of the LP studies landscape since 1987 when LP studies were published for the first time. Various kinds of analysis, including document, subject, and country distribution are applied. Moreover, author productivity, citation analysis, and keyword analysis is used, and Bradford’s law is applied to discover the main journals in this field. Most documents were published by conferences in the field. The majority of LP documents have been published in the computer science and mathematics fields. So far, China has been at the forefront of publishing countries. In addition, the most active sources of LP publications are lecture notes in Computer Science, including subseries lecture notes in Artificial Intelligence (AI) and lecture notes in Bioinformatics, and IEEE Access. The keyword analysis demonstrates that while social networks had attracted attention in the early period, knowledge graphs have attracted more attention, recently. Since the LP problem has been approached recently using machine learning (ML), the current study may inform researchers to concentrate on ML techniques. This is the first bibliometric study of “link prediction” literature and provides a broad landscape of the field.

Segmentation of Breast Cancer using Convolutional Neural Network and U-Net Architecture

October 9, 2023October 9, 2023 Journal of AI and Data Mining Edit

Breast cancer is a disease of abnormal cell proliferation in the breast tissue organs. One method for diagnosing and screening breast cancer is mammography. However, the results of this mammography image have limitations because it has low contrast and high noise and contrast as non-coherence. This research segmented breast cancer images derived from Ultrasonography (USG) photo using a Convolutional Neural Network (CNN) using the U-Net architecture. Testing on the CNN model with the U-Net architecture results the highest Mean Intersection over Union (Mean IoU) value in the data scenario with a ratio of 70:30, 100 epochs, and a learning rate of 5x10-5, which is 77%, while the lowest Mean IoU in the data scenario with a ratio 90:10, 50 epochs, and a learning rate of 1x10-4 learning rate, which is 64.4%.