ACM Transactions on Knowledge Discovery from Data (TKDD)

DeepDepict: Enabling Information Rich, Personalized Product Description Generation With the Deep Multiple Pointer Generator Network

June 28, 2021June 28, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Shaoyang Hao, Bin Guo, Hao Wang, Yunji Liang, Lina Yao, Qianru Wang, Zhiwen Yu

In e-commerce platforms, the online descriptive information of products shows significant impacts on the purchase behaviors. To attract potential buyers for product promotion, numerous workers are employed to write the impressive product descriptions. The hand-crafted product descriptions are less-efficient with great labor costs and huge time consumption. Meanwhile, the generated product descriptions do not take consideration into the customization and the diversity to meet users’ interests. To address these problems, we propose one generic framework, namely DeepDepict, to automatically generate the information-rich and personalized product descriptive information. Specifically, DeepDepict leverages the graph attention to retrieve the product-related knowledge from external knowledge base to enrich the diversity of products, constructs the personalized lexicon to capture the linguistic traits of individuals for the personalization of product descriptions, and utilizes multiple pointer-generator network to fuse heterogeneous data from multi-sources to generate informative and personalized product descriptions.

Robust Image Representation via Low Rank Locality Preserving Projection

June 18, 2021June 18, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Shuai Yin, Yanfeng Sun, Junbin Gao, Yongli Hu, Boyue Wang, Baocai Yin

Locality preserving projection (LPP) is a dimensionality reduction algorithm preserving the neighhorhood graph structure of data. However, the conventional LPP is sensitive to outliers existing in data. This article proposes a novel low-rank LPP model called LR-LPP. In this new model, original data are decomposed into the clean intrinsic component and noise component. Then the projective matrix is learned based on the clean intrinsic component which is encoded in low-rank features. The noise component is constrained by the ℓ1-norm which is more robust to outliers. Finally, LR-LPP model is extended to LR-FLPP in which low-dimensional feature is measured by F-norm. LR-FLPP will reduce aggregated error and weaken the effect of outliers, which will make the proposed LR-FLPP even more robust for outliers.

Side Information Fusion for Recommender Systems over Heterogeneous Information Network

June 10, 2021June 10, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Huan Zhao, Quanming Yao, Yangqiu Song, James T. Kwok, Dik Lun Lee

Collaborative filtering (CF) has been one of the most important and popular recommendation methods, which aims at predicting users’ preferences (ratings) based on their past behaviors. Recently, various types of side information beyond the explicit ratings users give to items, such as social connections among users and metadata of items, have been introduced into CF and shown to be useful for improving recommendation performance. However, previous works process different types of information separately, thus failing to capture the correlations that might exist across them. To address this problem, in this work, we study the application of heterogeneous information network (HIN), which offers a unifying and flexible representation of different types of side information, to enhance CF-based recommendation methods.

Clustering Heterogeneous Information Network by Joint Graph Embedding and Nonnegative Matrix Factorization

June 10, 2021June 10, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Benhui Zhang, Maoguo Gong, Jianbin Huang, Xiaoke Ma

Many complex systems derived from nature and society consist of multiple types of entities and heterogeneous interactions, which can be effectively modeled as heterogeneous information network (HIN). Structural analysis of heterogeneous networks is of great significance by leveraging the rich semantic information of objects and links in the heterogeneous networks. And, clustering heterogeneous networks aims to group vertices into classes, which sheds light on revealing the structure–function relations of the underlying systems. The current algorithms independently perform the feature extraction and clustering, which are criticized for not fully characterizing the structure of clusters. In this study, we propose a learning model by joint <underline>G</underline>raph <underline>E</underline>mbedding and <underline>N</underline>onnegative <underline>M</underline>atrix <underline>F</underline>actorization (aka GEjNMF), where feature extraction and clustering are simultaneously learned by exploiting the graph embedding and latent structure of networks.

Exploring BCI Control in Smart Environments: Intention Recognition Via EEG Representation Enhancement Learning

May 28, 2021May 28, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Lin Yue, Hao Shen, Sen Wang, Robert Boots, Guodong Long, Weitong Chen, Xiaowei Zhao

The brain–computer interface (BCI) control technology that utilizes motor imagery to perform the desired action instead of manual operation will be widely used in smart environments. However, most of the research lacks robust feature representation of multi-channel EEG series, resulting in low intention recognition accuracy. This article proposes an EEG2Image based Denoised-ConvNets (called EID) to enhance feature representation of the intention recognition task. Specifically, we perform signal decomposition, slicing, and image mapping to decrease the noise from the irrelevant frequency bands. After that, we construct the Denoised-ConvNets structure to learn the colorspace and spatial variations of image objects without cropping new training images precisely.

A Method for Mining Granger Causality Relationship on Atmospheric Visibility

May 28, 2021May 28, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Bo Liu, Xi He, Mingdong Song, Jiangqiang Li, Guangzhi Qu, Jianlei Lang, Rentao Gu

Atmospheric visibility is an indicator of atmospheric transparency and its range directly reflects the quality of the atmospheric environment. With the acceleration of industrialization and urbanization, the natural environment has suffered some damages. In recent decades, the level of atmospheric visibility shows an overall downward trend. A decrease in atmospheric visibility will lead to a higher frequency of haze, which will seriously affect people's normal life, and also have a significant negative economic impact. The causal relationship mining of atmospheric visibility can reveal the potential relation between visibility and other influencing factors, which is very important in environmental management, air pollution control and haze control.

Critique on Natural Noise in Recommender Systems

May 28, 2021May 28, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Wissam Al Jurdi, Jacques Bou Abdo, Jacques Demerjian, Abdallah Makhoul

Recommender systems have been upgraded, tested, and applied in many, often incomparable ways. In attempts to diligently understand user behavior in certain environments, those systems have been frequently utilized in domains like e-commerce, e-learning, and tourism. Their increasing need and popularity have allowed the existence of numerous research paths on major issues like data sparsity, cold start, malicious noise, and natural noise, which immensely limit their performance. It is typical that the quality of the data that fuel those systems should be extremely reliable. Inconsistent user information in datasets can alter the performance of recommenders, albeit running advanced personalizing algorithms. The consequences of this can be costly as such systems are employed in abundant online businesses.

Anomaly Detection With Kernel Preserving Embedding

May 10, 2021May 10, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Huawen Liu, Enhui Li, Xinwang Liu, Kaile Su, Shichao Zhang

Similarity representation plays a central role in increasingly popular anomaly detection techniques, which have been successfully applied in various realistic scenes. Until now, many low-rank representation techniques have been introduced to measure the similarity relations of data; yet, they only concern to minimize reconstruction errors, without involving the structural information of data. Besides, the traditional low-rank representation methods often take nuclear norm as their low-rank constraints, easily yielding a suboptimal solution. To address the problems above, in this article, we propose a novel anomaly detection method, which exploits kernel preserving embedding, as well as the double nuclear norm, to explore the similarity relations of data.

A Survey on Causal Inference

May 10, 2021May 10, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, Jing Gao, Aidong Zhang

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not.

Sequential Transform Learning

May 10, 2021May 10, 2021 ACM Transactions on Knowledge Discovery from Data (TKDD) Edit

Shalini Sharma, Angshul Majumdar

This work proposes a new approach for dynamical modeling; we call it sequential transform learning. This is loosely based on the transform (analysis dictionary) learning formulation. This is the first work on this topic. Transform learning, was originally developed for static problems; we modify it to model dynamical systems by introducing a feedback loop. The learnt transform coefficients for the tth instant are fed back along with the t + 1st sample, thereby establishing a Markovian relationship. Furthermore, the formulation is made supervised by the label consistency cost. Our approach keeps the best of two worlds, marrying the interpretability and uncertainty measure of signal processing with the function approximation ability of neural networks.