On the Shape of Brainscores for Large Language Models (LLMs)

arXiv:2405.06725v3 Announce Type: replace Abstract: With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing brainscores. To our knowledge, this study represents the first attempt to comprehend the novel metric brainscore within this interdisciplinary domain.

Three Dimensional Spatial Cognition: Bees and Bats

arXiv:2405.09413v1 Announce Type: new Abstract: The paper describes a program which computes the best possible Bayesian model of 3D space from vision (in bees) or echo location (in bats), at Marrs [1982] Level 2. The model exploits the strong Bayesian prior probability that most other things do not move, as the animal moves. 3D locations of things are computed from successive sightings or echoes, computing structure from the animals motion (SFM). The program can be downloaded and run. It also computes a tracking approximate model, which is more tractable for animal brains than the full Bayesian computation. The tracking model is nearly as good as the full Bayesian model, but only if spatial memory storage errors are small. Neural storage of spatial positions gives too high error levels, and is too slow. Alternatively, a 3D model of space could be stored in a wave excitation, as a Fourier transform of real space. This could give high memory capacity and precision, with low spatial distortion, fast response, and simpler computation. Evidence is summarized from related papers that a wave excitation holds spatial memory in the mammalian thalamus, and in the central body of the insect brain.

Matching domain experts by training from scratch on domain knowledge

arXiv:2405.09395v1 Announce Type: new Abstract: Recently, large language models (LLMs) have outperformed human experts in predicting the results of neuroscience experiments (Luo et al., 2024). What is the basis for this performance? One possibility is that statistical patterns in that specific scientific literature, as opposed to emergent reasoning abilities arising from broader training, underlie LLMs' performance. To evaluate this possibility, we trained (next word prediction) a relatively small 124M-parameter GPT-2 model on 1.3 billion tokens of domain-specific knowledge. Despite being orders of magnitude smaller than larger LLMs trained on trillions of tokens, small models achieved expert-level performance in predicting neuroscience results. Small models trained on the neuroscience literature succeeded when they were trained from scratch using a tokenizer specifically trained on neuroscience text or when the neuroscience literature was used to finetune a pretrained GPT-2. Our results indicate that expert-level performance may be attained by even small LLMs through domain-specific, auto-regressive training approaches.

ReconBoost: Boosting Can Achieve Modality Reconcilement

arXiv:2405.09321v1 Announce Type: cross Abstract: This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at https://github.com/huacong/ReconBoost.

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

arXiv:2405.09266v1 Announce Type: cross Abstract: The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: https://DabFusion.github.io.

QMedShield: A Novel Quantum Chaos-based Image Encryption Scheme for Secure Medical Image Storage in the Cloud

arXiv:2405.09191v1 Announce Type: cross Abstract: In the age of digital technology, medical images play a crucial role in the healthcare industry which aids surgeons in making precise decisions and reducing the diagnosis time. However, the storage of large amounts of these images in third-party cloud services raises privacy and security concerns. There are a lot of classical security mechanisms to protect them. Although, the advent of quantum computing entails the development of quantum-based encryption models for healthcare. Hence, we introduce a novel quantum chaos-based encryption scheme for medical images in this article. The model comprises bit-plane scrambling, quantum logistic map, quantum operations in the diffusion phase and hybrid chaotic map, DNA encoding, and computations in the confusion phase to transform the plain medical image into a cipher medical image. The proposed scheme has been evaluated using multiple statistical measures and validated against more attacks such as differential attacks with three different medical datasets. Hence the introduced encryption model has proved to be attack-resistant and robust than other existing image encryption schemes, ensuring the secure storage of medical images in cloud environments.

Scalable Image Coding for Humans and Machines Using Feature Fusion Network

arXiv:2405.09152v1 Announce Type: cross Abstract: As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding

arXiv:2405.09286v1 Announce Type: new Abstract: Recent years have witnessed the rapid development of short videos, which usually contain both visual and audio modalities. Background music is important to the short videos, which can significantly influence the emotions of the viewers. However, at present, the background music of short videos is generally chosen by the video producer, and there is a lack of automatic music recommendation methods for short videos. This paper introduces MVBind, an innovative Music-Video embedding space Binding model for cross-modal retrieval. MVBind operates as a self-supervised approach, acquiring inherent knowledge of intermodal relationships directly from data, without the need of manual annotations. Additionally, to compensate the lack of a corresponding musical-visual pair dataset for short videos, we construct a dataset, SVM-10K(Short Video with Music-10K), which mainly consists of meticulously selected short videos. On this dataset, MVBind manifests significantly improved performance compared to other baseline methods. The constructed dataset and code will be released to facilitate future research.

Methodology for isolation of serum, cerebrospinal fluid, and hippocampal neuron proteins from rat and their analysis using mass spectrometry-based shotgun proteomics

Emerging interests in the field of research related to diseases such as Alzheimers disease, Parkinsons disease, cognition, and other mental health-related disorders have prompted a need for a common method for the isolation of serum, CSF, and hippocampus. The hippocampus is responsible for learning and memory. It can be affected by various neurological and psychiatric disorders. However, the process of collecting samples such as CSF and hippocampal neurons is challenging, especially for small animals like rats. We have presented here a method for the isolation of serum, CSF, and hippocampal neurons that can be used for its downstream applications such as proteomics. We have used high-speed centrifugation instruments and density gradient centrifugation methods, which are easy to follow. Additionally, we have tested the proteins identified through mass spectrometry. Our method enables the study of proteins in serum, CSF, and neural cells for researching protein cross-talks and neurological disorder mechanisms.

The cost of behavioral flexibility: reversal learning driven by a spiking neural network

To survive in a changing world, animals often need to suppress an obsolete behavior and acquire a new one. This process is known as reversal learning (RL). The neural mechanisms underlying RL in spatial navigation have received limited attention and it remains unclear what neural mechanisms maintain behavioral flexibility. We extended an existing closed-loop simulator of spatial navigation and learning, based on spiking neural networks (Ghazinouri et al. 2023). The activity of place cells and boundary cells were fed as inputs to action selection neurons, which drove the movement of the agent. When the agent reached the goal, behavior was reinforced with spike-timing-dependent plasticity (STDP) coupled with an eligibility trace which marks synaptic connections for future reward-based updates. The modeled RL task had an ABA design, where the goal was switched between two locations A and B every 10 trials. Agents using symmetric STDP excel initially on finding target A, but fail to find target B after the goal switch, persevering on target A. Using asymmetric STDP, using many small place fields, and injecting short noise pulses to action selection neurons were effective in driving spatial exploration in the absence of rewards, which ultimately led to finding target B. However, this flexibility came at the price of slower learning and lower performance. Our work shows three examples of neural mechanisms that achieve flexibility at the behavioral level, each with different characteristic costs.