Deep Graph Matching and Searching for Semantic Code Retrieval

Xiang Ling, Lingfei Wu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling Ji

Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however, neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this article, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve the best matching code snippet.

Optimal Algebraic Breadth-First Search for Sparse Graphs

Paul Burkhardt

There has been a rise in the popularity of algebraic methods for graph algorithms given the development of the GraphBLAS library and other sparse matrix methods. An exemplar for these approaches is Breadth-First Search (BFS). The algebraic BFS algorithm is simply a recurrence of matrix-vector multiplications with the n × n adjacency matrix, but the many redundant operations over nonzeros ultimately lead to suboptimal performance. Therefore an optimal algebraic BFS should be of keen interest especially if it is easily integrated with existing matrix methods. Current methods, notably in the GraphBLAS, use a Sparse Matrix masked-Sparse Vector multiplication in which the input vector is kept in a sparse representation in each step of the BFS, and nonzeros in the vector are masked in subsequent steps.

Sequential Transform Learning

Shalini Sharma, Angshul Majumdar

This work proposes a new approach for dynamical modeling; we call it sequential transform learning. This is loosely based on the transform (analysis dictionary) learning formulation. This is the first work on this topic. Transform learning, was originally developed for static problems; we modify it to model dynamical systems by introducing a feedback loop. The learnt transform coefficients for the tth instant are fed back along with the t + 1st sample, thereby establishing a Markovian relationship. Furthermore, the formulation is made supervised by the label consistency cost. Our approach keeps the best of two worlds, marrying the interpretability and uncertainty measure of signal processing with the function approximation ability of neural networks.

A Survey on Causal Inference

Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, Jing Gao, Aidong Zhang

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not.

Self-Adaptive Skeleton Approaches to Detect Self-Organized Coalitions From Brain Functional Networks Through Probabilistic Mixture Models

Kai Liu, Hongbo Liu, Tomas E. Ward, Hua Wang, Yu Yang, Bo Zhang, Xindong Wu

Detecting self-organized coalitions from functional networks is one of the most important ways to uncover functional mechanisms in the brain. Determining these raises well-known technical challenges in terms of scale imbalance, outliers and hard-examples. In this article, we propose a novel self-adaptive skeleton approach to detect coalitions through an approximation method based on probabilistic mixture models. The nodes in the networks are characterized in terms of robust k-order complete subgraphs (k-clique) as essential substructures. The k-clique enumeration algorithm quickly enumerates all k-cliques in a parallel manner for a given network. Then, the cliques, from max-clique down to min-clique, of each order k, are hierarchically embedded into a probabilistic mixture model.

Density Guarantee on Finding Multiple Subgraphs and Subtensors

Quang-huy Duong, Heri Ramampiaro, Kjetil Nørvåg, Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density.

Anomaly Detection With Kernel Preserving Embedding

Huawen Liu, Enhui Li, Xinwang Liu, Kaile Su, Shichao Zhang

Similarity representation plays a central role in increasingly popular anomaly detection techniques, which have been successfully applied in various realistic scenes. Until now, many low-rank representation techniques have been introduced to measure the similarity relations of data; yet, they only concern to minimize reconstruction errors, without involving the structural information of data. Besides, the traditional low-rank representation methods often take nuclear norm as their low-rank constraints, easily yielding a suboptimal solution. To address the problems above, in this article, we propose a novel anomaly detection method, which exploits kernel preserving embedding, as well as the double nuclear norm, to explore the similarity relations of data.

Improved Customer Lifetime Value Prediction With Sequence-To-Sequence Learning and Feature-Based Models

Josef Bauer, Dietmar Jannach

The prediction of the Customer Lifetime Value (CLV) is an important asset for tool-supported marketing by customer relationship managers. Since standard methods based on purchase recency, frequency, and past profit and revenue statistics often have limited predictive power, advanced machine learning (ML) techniques were applied to this task in recent years. However, existing approaches are often not fully capable of modeling certain temporal patterns that can be commonly found in practice, such as periodic purchasing behavior of customers. To address these shortcomings, we propose a novel method for CLV prediction based on a combination of several ML techniques. At its core, our method consists of a tailored deep learning approach based on encoder–decoder sequence-to-sequence recurrent neural networks with augmented temporal convolutions.

Graph Neural Networks for Fast Node Ranking Approximation

Sunil Kumar Maurya, Xin Liu, Tsuyoshi Murata

Graphs arise naturally in numerous situations, including social graphs, transportation graphs, web graphs, protein graphs, etc. One of the important problems in these settings is to identify which nodes are important in the graph and how they affect the graph structure as a whole. Betweenness centrality and closeness centrality are two commonly used node ranking measures to find out influential nodes in the graphs in terms of information spread and connectivity. Both of these are considered as shortest path based measures as the calculations require the assumption that the information flows between the nodes via the shortest paths. However, exact calculations of these centrality measures are computationally expensive and prohibitive, especially for large graphs.

Adaptive Influence Maximization: If Influential Node Unwilling to Be the Seed

Jianxiong Guo, Weili Wu

Influence maximization problem attempts to find a small subset of nodes that makes the expected influence spread maximized, which has been researched intensively before. They all assumed that each user in the seed set we select is activated successfully and then spread the influence. However, in the real scenario, not all users in the seed set are willing to be an influencer. Based on that, we consider each user associated with a probability with which we can activate her as a seed, and we can attempt to activate her many times. In this article, we study the adaptive influence maximization with multiple activations (Adaptive-IMMA) problem, where we select a node in each iteration, observe whether she accepts to be a seed, if yes, wait to observe the influence diffusion process; if no, we can attempt to activate her again with a higher cost or select another node as a seed.