An academic recommender system on large citation data based on clustering, graph modeling and deep learning

Abstract

Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.

Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement

Abstract

Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.

Protecting the privacy of social network data using graph correction

Abstract

Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.

An approach for fuzzy group decision making and consensus measure with hesitant judgments of experts

Abstract

In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.

Range control-based class imbalance and optimized granular elastic net regression feature selection for credit risk assessment

Abstract

Credit risk, stemming from the failure of a contractual party, is a significant variable in financial institutions. Assessing credit risk involves evaluating the creditworthiness of individuals, businesses, or entities to predict the likelihood of defaulting on financial obligations. While financial institutions categorize consumers based on creditworthiness, there is no universally defined set of attributes or indices. This research proposes Range control-based class imbalance and Optimized Granular Elastic Net regression (ROGENet) for feature selection in credit risk assessment. The dataset exhibits severe class imbalance, addressed using Range-Controlled Synthetic Minority Oversampling TEchnique (RCSMOTE). The balanced data undergo Granular Elastic Net regression with hybrid Gazelle sand cat Swarm Optimization (GENGSO) for feature selection. Elastic net, ensuring sparsity and grouping for correlated features, proves beneficial for assessing credit risk. ROGENet provides a detailed perspective on credit risk evaluation, surpassing conventional methods. The oversampling feature selection enhances the accuracy of minority class by 99.4, 99, 98.6 and 97.3%, respectively.

A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data

Abstract

A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.

Argumentation-based multi-agent distributed reasoning in dynamic and open environments

Abstract

This work presents an approach for distributed and contextualized reasoning in multi-agent systems, considering environments in which agents may have incomplete, uncertain and inconsistent knowledge. Knowledge is represented by defeasible logic with mapping rules, which model the capability of agents to acquire knowledge from other agents during reasoning. Based on such knowledge representation, an argumentation-based reasoning model that enables distributed building of reusable argument structures to support conclusions is proposed. Conflicts between arguments are resolved by an argument strength calculation that considers the trust among agents and the degree of similarity between knowledge of different agents, based on the intuition that greater similarity between knowledge defined by different agents implies in less uncertainty about the validity of the built argument. Contextualized reasoning is supported through sharing of relevant knowledge by an agent when issuing queries to other agents, which enable the cooperating agents to be aware of knowledge not known a priori but that is important to reach a reasonable conclusion given the context of the agent that issued the query. A distributed algorithm is presented and analytically and experimentally evaluated asserting its computational feasibility. Finally, our approach is compared to related work, highlighting the contributions presented, demonstrating its applicability in a broader range of scenarios, and presenting perspectives for future work.

Graph neural architecture search with heterogeneous message-passing mechanisms

Abstract

In recent years, neural network search has been utilized in designing effective heterogeneous graph neural networks (HGNN) and has achieved remarkable performance beyond manually designed networks. Generally, there are two mainstream design manners in heterogeneous graph neural architecture search (HGNAS). The one is to automatically design a meta-graph to guide the direction of message-passing in a heterogeneous graph, thereby obtaining semantic information. The other learns to design the convolutional operator aiming to enhance message extraction capabilities to handle the diverse information in a heterogeneous graph. Through experiments, we observe a strong interdependence between message-passing direction and message extraction, which has a significant impact on the performance of HGNNs. However, previous HGNAS methods focus on one-sided design and lacked the ability to capture this interdependence. To address the issue, we propose a novel perspective called heterogeneous message-passing mechanism for HGNAS, which enables HGNAS to effectively capture the interdependence between message-passing direction and message extraction for designing HGNNs with better performance automatically. We call our method heterogeneous message-passing mechanisms search (HMMS). Extensive experiments on two popular tasks show that our method designs powerful HGNNs that have achieved SOTA results in different benchmark datasets. Codes are available at https://github.com/HetGNAS/HMMS.

Adaptive semi-supervised learning from stronger augmentation transformations of discrete text information

Abstract

Semi-supervised learning is a promising approach to dealing with the problem of insufficient labeled data. Recent methods grouped into paradigms of consistency regularization and pseudo-labeling have outstanding performances on image data, but achieve limited improvements when employed for processing textual information, due to the neglect of the discrete nature of textual information and the lack of high-quality text augmentation transformation means. In this paper, we propose the novel SeqMatch method. It can automatically perceive abnormal model states caused by anomalous data obtained by text augmentations and reduce their interferences and instead leverages normal ones to improve the effectiveness of consistency regularization. And it generates hard artificial pseudo-labels to enable the model to be efficiently updated and optimized toward low entropy. We also design several much stronger well-organized text augmentation transformation pipelines to increase the divergence between two views of unlabeled discrete textual sequences, thus enabling the model to learn more knowledge from the alignment. Extensive comparative experimental results show that our SeqMatch outperforms previous methods on three widely used benchmarks significantly. In particular, SeqMatch can achieve a maximum performance improvement of 16.4% compared to purely supervised training when provided with a minimal number of labeled examples.

Deep graph clustering via mutual information maximization and mixture model

Abstract

Attributed graph clustering or community detection which learns to cluster the nodes of a graph is a challenging task in graph analysis. Recently contrastive learning has shown significant results in various unsupervised graph learning tasks. In spite of the success of graph contrastive learning methods in self-supervised graph learning, using them for graph clustering is not well explored. In this paper, we introduce a contrastive learning framework for learning clustering-friendly node embedding. We propose Gaussian mixture information maximization which utilizes a mutual information maximization approach for node embedding. Meanwhile, in order to have a clustering-friendly embedding space, it imposes a mixture of Gaussians distribution on this space. The parameters of the contrastive node embedding model and the mixture distribution are optimized jointly in a unified framework. Experiments show that our clustering-directed embedding space can enhance clustering performance in comparison with the case where community structure of the graph is ignored during node representation learning. The results on real-world datasets demonstrate the effectiveness of our method in community detection.