Correcting Biased Centered Kernel Alignment Measures in Biological and Artificial Neural Networks

arXiv:2405.01012v1 Announce Type: new Abstract: Centred Kernel Alignment (CKA) has recently emerged as a popular metric to compare activations from biological and artificial neural networks (ANNs) in order to quantify the alignment between internal representations derived from stimuli sets (e.g. images, text, video) that are presented to both systems. In this paper we highlight issues that the community should take into account if using CKA as an alignment metric with neural data. Neural data are in the low-data high-dimensionality domain, which is one of the cases where (biased) CKA results in high similarity scores even for pairs of random matrices. Using fMRI and MEG data from the THINGS project, we show that if biased CKA is applied to representations of different sizes in the low-data high-dimensionality domain, they are not directly comparable due to biased CKA's sensitivity to differing feature-sample ratios and not stimuli-driven responses. This situation can arise both when comparing a pre-selected area of interest (e.g. ROI) to multiple ANN layers, as well as when determining to which ANN layer multiple regions of interest (ROIs) / sensor groups of different dimensionality are most similar. We show that biased CKA can be artificially driven to its maximum value when using independent random data of different sample-feature ratios. We further show that shuffling sample-feature pairs of real neural data does not drastically alter biased CKA similarity in comparison to unshuffled data, indicating an undesirable lack of sensitivity to stimuli-driven neural responses. Positive alignment of true stimuli-driven responses is only achieved by using debiased CKA. Lastly, we report findings that suggest biased CKA is sensitive to the inherent structure of neural data, only differing from shuffled data when debiased CKA detects stimuli-driven alignment.

A Survey on the Real Power of ChatGPT

arXiv:2405.00704v1 Announce Type: new Abstract: ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (ii) review the social implications and safety issues of ChatGPT, and (iii) emphasize key challenges and opportunities for its evaluation. We hope our survey can shed some light on its blackbox manner, so that researchers are not misleaded by its surface generation.

Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis

arXiv:2405.00688v1 Announce Type: cross Abstract: This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification tasks in terms of accuracy, precision, recall, and F1 score. Our results indicate that, in binary classification tasks, the Support Vector Machine (SVM) model using Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram get the highest accuracy. In ternary classification tasks, the model using Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU) significantly outperforms other machine learning models, achieving an accuracy, precision, recall, and F1 score of 0.78. Additionally, we employ the Latent Dirichlet Allocation model to generate 10 topics from the comments to explore the public's underlying views on SDRs. Drawing from these findings, we propose targeted recommendations for shaping future policies concerning SDRs. This work provides valuable insights for stakeholders in the SDR sector regarding social perception, interaction, and safety.

Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models

arXiv:2405.01501v1 Announce Type: new Abstract: Knowledge workers often need to extract and analyze information from a collection of documents to solve complex information tasks in the workplace, e.g., hiring managers reviewing resumes or analysts assessing risk in contracts. However, foraging for relevant information can become tedious and repetitive over many documents and criteria of interest. We introduce Marco, a mixed-initiative workspace supporting sensemaking over diverse business document collections. Through collection-centric assistance, Marco reduces the cognitive costs of extracting and structuring information, allowing users to prioritize comparative synthesis and decision making processes. Users interactively communicate their information needs to an AI assistant using natural language and compose schemas that provide an overview of a document collection. Findings from a usability study (n=16) demonstrate that when using Marco, users complete sensemaking tasks 16% more quickly, with less effort, and without diminishing accuracy. A design probe with seven domain experts identifies how Marco can benefit various real-world workflows.

Designing Algorithmic Recommendations to Achieve Human-AI Complementarity

arXiv:2405.01484v1 Announce Type: new Abstract: Algorithms frequently assist, rather than replace, human decision-makers. However, the design and analysis of algorithms often focus on predicting outcomes and do not explicitly model their effect on human decisions. This discrepancy between the design and role of algorithmic assistants becomes of particular concern in light of empirical evidence that suggests that algorithmic assistants again and again fail to improve human decisions. In this article, we formalize the design of recommendation algorithms that assist human decision-makers without making restrictive ex-ante assumptions about how recommendations affect decisions. We formulate an algorithmic-design problem that leverages the potential-outcomes framework from causal inference to model the effect of recommendations on a human decision-maker's binary treatment choice. Within this model, we introduce a monotonicity assumption that leads to an intuitive classification of human responses to the algorithm. Under this monotonicity assumption, we can express the human's response to algorithmic recommendations in terms of their compliance with the algorithm and the decision they would take if the algorithm sends no recommendation. We showcase the utility of our framework using an online experiment that simulates a hiring task. We argue that our approach explains the relative performance of different recommendation algorithms in the experiment, and can help design solutions that realize human-AI complementarity.

Student Reflections on Self-Initiated GenAI Use in HCI Education

arXiv:2405.01467v1 Announce Type: new Abstract: This study explores students' self-initiated use of Generative Artificial Intelligence (GenAI) tools in an interactive systems design class. Through 12 group interviews, students revealed the dual nature of GenAI in (1) stimulating creativity and (2) speeding up design iterations, alongside concerns over its potential to cause shallow learning and reliance. GenAI's benefits were pronounced in the execution phase of design, aiding rapid prototyping and ideation, while its use in initial insight generation posed risks to depth and reflective practice. This reflection highlights the complex role of GenAI in Human-Computer Interaction education, emphasizing the need for balanced integration to leverage its advantages without compromising fundamental learning outcomes.

Quantifying Spatial Domain Explanations in BCI using Earth Mover’s Distance

arXiv:2405.01277v1 Announce Type: new Abstract: Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It's crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn't work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover's distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model's accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-Computer Interfaces (BCIs).

Towards Optimising EEG Decoding using Post-hoc Explanations and Domain Knowledge

arXiv:2405.01269v1 Announce Type: new Abstract: Decoding EEG during motor imagery is pivotal for the Brain-Computer Interface (BCI) system, influencing its overall performance significantly. As end-to-end data-driven learning methods advance, the challenge lies in balancing model complexity with the need for human interpretability and trust. Despite strides in EEG-based BCIs, challenges like artefacts and low signal-to-noise ratio emphasise the ongoing importance of model transparency. This work proposes using post-hoc explanations to interpret model outcomes and validate them against domain knowledge. Leveraging the GradCAM post-hoc explanation technique on the motor imagery dataset, this work demonstrates that relying solely on accuracy metrics may be inadequate to ensure BCI performance and acceptability. A model trained using all EEG channels of the dataset achieves 72.60% accuracy, while a model trained with motor-imagery/movement-relevant channel data has a statistically insignificant decrease of 1.75%. However, the relevant features for both are very different based on neurophysiological facts. This work demonstrates that integrating domain-specific knowledge with XAI techniques emerges as a promising paradigm for validating the neurophysiological basis of model outcomes in BCIs. Our results reveal the significance of neurophysiological validation in evaluating BCI performance, highlighting the potential risks of exclusively relying on performance metrics when selecting models for dependable and transparent BCIs.

Attention and Sensory Processing in Augmented Reality: Empowering ADHD population

arXiv:2405.01218v1 Announce Type: new Abstract: The brain's attention system is a complex and adaptive network of brain regions that enables individuals to interact effectively with their surroundings and perform complex tasks. This system involves the coordination of various brain regions, including the prefrontal cortex and the parietal lobes, to process and prioritize sensory information, manage tasks, and maintain focus. In this study, we investigate the intricate mechanisms underpinning the brain's attention system, followed by an exploration within the context of augmented reality (AR) settings. AR emerges as a viable technological intervention to address the multifaceted challenges faced by individuals with Attention Deficit Hyperactivity Disorder (ADHD). Given that the primary characteristics of ADHD include difficulties related to inattention, hyperactivity, and impulsivity, AR offers tailor-made solutions specifically designed to mitigate these challenges and enhance cognitive functioning. On the other hand, if these ADHD-related issues are not adequately addressed, it could lead to a worsening of their condition in AR. This underscores the importance of employing effective interventions such as AR to support individuals with ADHD in managing their symptoms. We examine the attentional mechanisms within AR environments and the sensory processing dynamics prevalent among the ADHD population. Our objective is to comprehensively address the attentional needs of this population in AR settings and offer a framework for designing cognitively accessible AR applications.

Using Schema to Inform Method Design Practices

arXiv:2405.00901v1 Announce Type: new Abstract: There are many different forms of design knowledge that guide and shape a designer's ability to act and realize potential realities. Methods and schemas are examples of design knowledge commonly used by design researchers and designers alike. In this pictorial, we explore, engage, and describe the role of schemas as tools that can support design researchers in formulating methods to support design action, with our framing of method design specifically focused on ethical design complexity. We present four ways for method designers to engage with schema: 1) Systems to operationalize complex design constructs such as ethical design complexity through an A.E.I.O.YOU schema; 2) Classifiers to map existing methods and identify the possibility for new methods through descriptive semantic differentials; 3) Tools that enable the creation of methods that relate to one or more elements of the schema through creative departures from research to design; and 4) Interactive channels to playfully engage potential and new opportunities through schema interactivity.