Review – Recent Articles

A cluster differences unfolding method for large datasets of preference ratings on an interval scale: Minimizing the mean squared centred residuals

January 11, 2024April 7, 2024 nRodrigo Macías, nJ. Fernando Vera, nWillem J. Heisern Edit

Abstract

Clustering and spatial representation methods are often used in combination, to analyse preference ratings when a large number of individuals and/or object is involved. When analysed under an unfolding model, row-conditional linear transformations are usually most appropriate when the goal is to determine clusters of individuals with similar preferences. However, a significant problem with transformations that include both slope and intercept is the occurrence of degenerate solutions. In this paper, we propose a least squares unfolding method that performs clustering of individuals while simultaneously estimating the location of cluster centres and object locations in low-dimensional space. The method is based on minimising the mean squared centred residuals of the preference ratings with respect to the distances between cluster centres and object locations. At the same time, the distances are row-conditionally transformed with optimally estimated slope parameters. It is computationally efficient for large datasets, and does not suffer from the appearance of degenerate solutions. The performance of the method is analysed in an extensive Monte Carlo experiment. It is illustrated for a real data set and the results are compared with those obtained using a two-step clustering and unfolding procedure.

Structure-based, deep-learning models for protein-ligand binding affinity prediction

January 3, 2024January 3, 2024 Debby D. Wang, Wenhui Wu and Ran Wang Edit

The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently cal...

A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL

June 20, 2023June 20, 2023 Jakub Galgonek and Jiří Vondrášek Edit

Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database ...

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

September 9, 2022September 9, 2022 Fidan Musazade, Narmin Jamalova and Jamaladdin Hasanov Edit

Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conven...

Novel digital approaches to the assessment of problematic opioid use

July 15, 2022July 15, 2022 Philip J. Freda Jr, Henry R. Kranzler and Jason H. Moore Edit

The opioid epidemic continues to contribute to loss of life through overdose and significant social and economic burdens. Many individuals who develop problematic opioid use (POU) do so after being exposed to ...

A tutorial on generative adversarial networks with application to classification of imbalanced data

December 31, 2021December 31, 2021 nYuxiao Huang, nKara G. Fields, nYan Man Edit

Abstract

A challenge unique to classification model development is imbalanced data. In a binary classification problem, class imbalance occurs when one class, the minority group, contains significantly fewer samples than the other class, the majority group. In imbalanced data, the minority class is often the class of interest (e.g., patients with disease). However, when training a classifier on imbalanced data, the model will exhibit bias towards the majority class and, in extreme cases, may ignore the minority class completely. A common strategy for addressing class imbalance is data augmentation. However, traditional data augmentation methods are associated with overfitting, where the model is fit to the noise in the data. In this tutorial we introduce an advanced method for data augmentation: generative adversarial networks (GANs). The advantages of GANs over traditional data augmentation methods are illustrated using the Breast Cancer Wisconsin study. To promote the adoption of GANs for data augmentation, we present an end-to-end pipeline that encompasses the complete life cycle of a machine learning project along with alternatives and good practices both in the paper and in a separate video. Our code, data, full results and video tutorial are publicly available in the paper's GitHub repository (https://github.com/yuxiaohuang/research/tree/master/gwu/accepted/sam_2021).

Next waves in veridical network embedding*

November 26, 2020January 20, 2021 nOwen G. Ward, nZhen Huang, nAndrew Davison, nTian Zhengn Edit

Abstract

Embedding nodes of a large network into a metric (e.g., Euclidean) space has become an area of active research in statistical machine learning, which has found applications in natural and social sciences. Generally, a representation of a network object is learned in a Euclidean geometry and is then used for subsequent tasks regarding the nodes and/or edges of the network, such as community detection, node classification and link prediction. Network embedding algorithms have been proposed in multiple disciplines, often with domain‐specific notations and details. In addition, different measures and tools have been adopted to evaluate and compare the methods proposed under different settings, often dependent of the downstream tasks. As a result, it is challenging to study these algorithms in the literature systematically. Motivated by the recently proposed PCS framework for Veridical Data Science, we propose a framework for network embedding algorithms and discuss how the principles of predictability, computability, and stability (PCS) apply in this context. The utilization of this framework in network embedding holds the potential to motivate and point to new directions for future research.

Tag: Review

A cluster differences unfolding method for large datasets of preference ratings on an interval scale: Minimizing the mean squared centred residuals

Abstract

Structure-based, deep-learning models for protein-ligand binding affinity prediction

A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Novel digital approaches to the assessment of problematic opioid use

A tutorial on generative adversarial networks with application to classification of imbalanced data

Abstract

Next waves in veridical network embedding*

Abstract

Molecular representations in AI-driven drug discovery: a review and practical guide

Software architectures for big data: a systematic literature review

The C++ programming language in cheminformatics and computational chemistry