Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.

AI Assistance for UX: A Literature Review Through Human-Centered AI

Recent advancements in HCI and AI research attempt to support user experience (UX) practitioners with AI-enabled tools. Despite the potential of emerging models and new interaction mechanisms, mainstream adoption of such tools remains limited. We took the lens of Human-Centered AI and presented a systematic literature review of 359 papers, aiming to synthesize the current landscape, identify trends, and uncover UX practitioners' unmet needs in AI support. Guided by the Double Diamond design framework, our analysis uncovered that UX practitioners' unique focuses on empathy building and experiences across UI screens are often overlooked. Simplistic AI automation can obstruct the valuable empathy-building process. Furthermore, focusing solely on individual UI screens without considering interactions and user flows reduces the system's practical value for UX designers. Based on these findings, we call for a deeper understanding of UX mindsets and more designer-centric datasets and evaluation metrics, for HCI and AI communities to collaboratively work toward effective AI support for UX.

Artificial Intelligence for Literature Reviews: Opportunities and Challenges

This manuscript presents a comprehensive review of the use of Artificial Intelligence (AI) in Systematic Literature Reviews (SLRs). A SLR is a rigorous and organised methodology that assesses and integrates previous research on a given topic. Numerous tools have been developed to assist and partially automate the SLR process. The increasing role of AI in this field shows great potential in providing more effective support for researchers, moving towards the semi-automatic creation of literature reviews. Our study focuses on how AI techniques are applied in the semi-automation of SLRs, specifically in the screening and extraction phases. We examine 21 leading SLR tools using a framework that combines 23 traditional features with 11 AI features. We also analyse 11 recent tools that leverage large language models for searching the literature and assisting academic writing. Finally, the paper discusses current trends in the field, outlines key research challenges, and suggests directions for future research.

What is a “bug”? On subjectivity, epistemic power, and implications for software research

Considerable effort in software research and practice is spent on bugs. Finding, reporting, tracking, triaging, attempting to fix them automatically, detecting "bug smells" -these comprise a substantial portion of large projects' time and development cost, and are of significant interest to researchers in Software Engineering, Programming Languages, and beyond. But, what is a bug, exactly? While segmentation faults rarely spark joy, most bugs are not so clear cut. Per the Oxford English Dictionary, the word "bug" has been a colloquialism for an engineering "defect" at least since the 1870s. Most modern software-oriented definitions speak to a disconnect between what a developer intended and what a program actually does. Formal verification, from its inception, has developed means to identify deviations from a formal specification, expected to more or less fully encode desired behavior. However, software is rarely accompanied by full and formal specifications, and this intention is instead treated as implicit or partially-documented at best. The International Software Testing Qualifications board writes: "A human being can make an error (mistake), which produces a defect (fault, bug) in the program code, or in a document. If a defect in code is executed, the system may fail to do what it should do (or do something it shouldn't), causing a failure. Defects may result in failures, but not all [do]". Most sources forsake this precision. The influential paper "Finding bugs is easy" begins by saying "bug patterns are code idioms that are often errors"-with no particular elaboration. Other work relies on imperfect practical proxies for specifications. For example, in automatic program repair research, a bug corresponds to a failing test case: when the test passes, the bug is considered fixed. However, when we interrogate fairly straightforward definitions, they start to break down...

Extending 3D body pose estimation for robotic-assistive therapies of autistic children

Robotic-assistive therapy has demonstrated very encouraging results for children with Autism. Accurate estimation of the child's pose is essential both for human-robot interaction and for therapy assessment purposes. Non-intrusive methods are the sole viable option since these children are sensitive to touch. While depth cameras have been used extensively, existing methods face two major limitations: (i) they are usually trained with adult-only data and do not correctly estimate a child's pose, and (ii) they fail in scenarios with a high number of occlusions. Therefore, our goal was to develop a 3D pose estimator for children, by adapting an existing state-of-the-art 3D body modelling method and incorporating a linear regression model to fine-tune one of its inputs, thereby correcting the pose of children's 3D meshes. In controlled settings, our method has an error below $0.3m$, which is considered acceptable for this kind of application and lower than current state-of-the-art methods. In real-world settings, the proposed model performs similarly to a Kinect depth camera and manages to successfully estimate the 3D body poses in a much higher number of frames.

Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions

Machine learning has enabled the development of powerful systems capable of editing images from natural language instructions. However, in many common scenarios it is difficult for users to specify precise image transformations with text alone. For example, in an image with several dogs, it is difficult to select a particular dog and move it to a precise location. Doing this with text alone would require a complex prompt that disambiguates the target dog and describes the destination. However, direct manipulation is well suited to visual tasks like selecting objects and specifying locations. We introduce Point and Instruct, a system for seamlessly combining familiar direct manipulation and textual instructions to enable precise image manipulation. With our system, a user can visually mark objects and locations, and reference them in textual instructions. This allows users to benefit from both the visual descriptiveness of natural language and the spatial precision of direct manipulation.

The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting

We explored the viability of Large Language Models (LLMs) for triggering and personalizing content for Just-in-Time Adaptive Interventions (JITAIs) in digital health. JITAIs are being explored as a key mechanism for sustainable behavior change, adapting interventions to an individual's current context and needs. However, traditional rule-based and machine learning models for JITAI implementation face scalability and reliability limitations, such as lack of personalization, difficulty in managing multi-parametric systems, and issues with data sparsity. To investigate JITAI implementation via LLMs, we tested the contemporary overall performance-leading model 'GPT-4' with examples grounded in the use case of fostering heart-healthy physical activity in outpatient cardiac rehabilitation. Three personas and five sets of context information per persona were used as a basis of triggering and personalizing JITAIs. Subsequently, we generated a total of 450 proposed JITAI decisions and message content, divided equally into JITAIs generated by 10 iterations with GPT-4, a baseline provided by 10 laypersons (LayPs), and a gold standard set by 10 healthcare professionals (HCPs). Ratings from 27 LayPs indicated that JITAIs generated by GPT-4 were superior to those by HCPs and LayPs over all assessed scales: i.e., appropriateness, engagement, effectiveness, and professionality. This study indicates that LLMs have significant potential for implementing JITAIs as a building block of personalized or "precision" health, offering scalability, effective personalization based on opportunistically sampled information, and good acceptability.

Assessing the Privacy Risk of Cross-Platform Identity Linkage using Eye Movement Biometrics

The recent emergence of ubiquitous, multi-platform eye tracking has raised user privacy concerns over re-identification across platforms, where a person is re-identified across multiple eye tracking-enabled platforms using personally identifying information that is implicitly expressed through their eye movement. We present an empirical investigation quantifying a modern eye movement biometric model's ability to link subject identities across three different eye tracking devices using eye movement signals from each device. We show that a state-of-the art eye movement biometrics model demonstrates above-chance levels of biometric performance (34.99% equal error rate, 15% rank-1 identification rate) when linking user identities across one pair of devices, but not for the other. Considering these findings, we also discuss the impact that eye tracking signal quality has on the model's ability to meaningfully associate a subject's identity between two substantially different eye tracking devices. Our investigation advances a fundamental understanding of the privacy risks for identity linkage across platforms by employing both quantitative and qualitative measures of biometric performance, including a visualization of the model's ability to distinguish genuine and imposter authentication attempts across platforms.

Exploring diversity perceptions in a community through a Q&A chatbot

While diversity has become a debated issue in design, very little research exists on positive use-cases for diversity beyond scholarly criticism. The current work addresses this gap through the case of a diversity-aware chatbot, exploring what benefits a diversity-aware chatbot could bring to people and how do people interpret diversity when being presented with it. In this paper, we motivate a Q&A chatbot as a technology probe and deploy it in two student communities within a study. During the study, we collected contextual data on people's expectations and perceptions when presented with diversity during the study. Our key findings show that people seek out others with shared niche interests, or their search is driven by exploration and inspiration when presented with diversity. Although interacting with chatbots is limited, participants found the engagement novel and interesting to motivate future research.

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to establish their identity simply by walking for a brief interval, despite the sensor's placement away from the feet. We employ self-supervised metric learning to train a model that yields a highly discriminative representation of a user's 3D acceleration, with no retraining required. We tested our method in a study involving 50 participants, achieving an average F1 score of 92.9% and equal error rate of 2.3%. We extend our evaluation by assessing performance under various conditions (e.g. shoe types and surfaces). We discuss the opportunities and challenges these variations introduce and propose new directions for advancing passive authentication for wearable devices.