Multi-subject Neural Decoding via Relative Representations

Enabling zero-shot cross-subject generalization in neural decoding by mapping fMRI data into subject-agnostic representational spaces

Paper (Coming Soon)

COSYNE 2024

TL;DR: We apply relative representations to neural decoding, mapping fMRI data from different subjects into a common subject-agnostic representational space by leveraging neural encoders and anchor-based similarity functions. On the Natural Scenes Dataset with 8 subjects, our framework achieves substantially higher cross-subject retrieval accuracy than PCA and absolute baselines, enabling generalization without expensive alignment training.

Overview

Neural decoding aims to reconstruct or predict stimuli from brain activity patterns. While deep learning has achieved impressive results on within-subject decoding, cross-subject generalization remains a fundamental challenge. Different individuals exhibit substantial variability in brain anatomy, functional organization, and response patterns—even when viewing identical stimuli. This anatomical and functional heterogeneity makes it difficult to train models that generalize across subjects without expensive subject-specific alignment or fine-tuning.

Traditional approaches to cross-subject alignment rely on methods like Procrustes analysis or hyperalignment, which require either anatomical correspondence or shared stimulus sets across subjects. These methods can be computationally expensive and may not scale to large datasets or capture complex non-linear relationships between subjects’ neural representations.

We introduce a novel approach based on relative representations: instead of learning absolute coordinate systems for each subject’s brain activity, we represent fMRI responses relative to a set of anchor stimuli using similarity functions. This simple transformation creates a subject-agnostic representational space where responses from different subjects viewing the same stimulus naturally align—without requiring any explicit alignment training or anatomical correspondence.

Our experiments on the Natural Scenes Dataset (NSD), which includes fMRI recordings from 8 subjects viewing thousands of natural images, demonstrate that relative representations enable effective zero-shot cross-subject decoding. Models trained on one subject can successfully decode stimuli from other subjects’ brain activity, achieving substantially higher retrieval accuracy than both PCA-based dimensionality reduction and absolute representation baselines.

Overview of the relative representation approach for cross-subject neural decoding. fMRI responses are encoded relative to anchor stimuli, creating a subject-agnostic representational space that enables zero-shot generalization across subjects without requiring explicit alignment training.

Method: Relative Representations for fMRI

The Challenge of Cross-Subject Variability

Functional MRI measures brain activity by detecting changes in blood oxygenation (BOLD signal) across thousands of voxels. When different subjects view the same stimulus, their brain responses:

Differ anatomically: Voxels at the same spatial coordinates correspond to different functional regions
Vary in magnitude: Response amplitudes differ due to individual physiology and scanner characteristics
Show different noise profiles: Subject-specific artifacts and baseline activity levels affect measurements

These factors make direct comparison of raw fMRI patterns across subjects ineffective. While standard preprocessing (alignment to anatomical templates, normalization) helps, substantial variability remains.

Anchor-Based Encoding

Following the relative representations framework, we transform fMRI responses into a subject-agnostic space through the following steps:

1. Select anchor stimuli: Choose a subset \(\mathcal{A}\) of stimuli from the experimental dataset (e.g., 100-500 images). These anchors should be diverse and representative of the stimulus space.

2. Extract anchor responses: For each subject \(s\), obtain fMRI responses \(\{e_s^{a}\}_{a \in \mathcal{A}}\) to all anchor stimuli. These form the subject’s “reference frame.”

3. Compute relative representations: For any stimulus \(x\) (anchor or non-anchor), represent its fMRI response \(e_s^x\) by its similarity to all anchor responses:

\[r_s^x = \left(\text{sim}(e_s^x, e_s^{a_1}), \text{sim}(e_s^x, e_s^{a_2}), \ldots, \text{sim}(e_s^x, e_s^{a_{|\mathcal{A}|}})\right)\]

where \(\text{sim}(\cdot, \cdot)\) is a similarity function (e.g., cosine similarity, Pearson correlation).

4. Train encoders/decoders: Use these relative representations \(r_s^x\) instead of raw fMRI patterns \(e_s^x\) to train neural networks for decoding tasks.

Key Insight: Subject-Agnostic Similarity Structure

The crucial observation is that similarity relationships between stimuli are more consistent across subjects than absolute response patterns. While subject A’s response to stimulus X may differ dramatically from subject B’s response in absolute coordinates, their respective similarities to anchor stimuli tend to align:

\[\text{sim}(e_A^x, e_A^{a_i}) \approx \text{sim}(e_B^x, e_B^{a_i}) \quad \forall i\]

This alignment emerges because perceptual and semantic relationships between stimuli (e.g., “both contain faces,” “both are outdoor scenes”) are preserved across individuals, even when absolute neural response patterns differ.

By encoding responses in this relative manner, we create a representational space where:

Responses to the same stimulus from different subjects become comparable
The encoding is invariant to subject-specific scaling, rotation, and translation of neural response patterns
Zero-shot cross-subject generalization becomes possible without alignment training

Visualization of cross-subject retrieval in the relative representation space. Despite substantial differences in absolute fMRI response patterns across subjects, relative representations enable accurate stimulus identification by preserving similarity structure. Points represent stimuli encoded from different subjects' brain activity, showing natural clustering of the same stimulus across individuals.

Experiments: Natural Scenes Dataset

Dataset and Setup

We evaluated our approach on the Natural Scenes Dataset (NSD), a large-scale fMRI dataset containing:

8 subjects (varying numbers of sessions per subject)
~10,000 unique natural images from the COCO dataset
High-resolution 7T fMRI with whole-brain coverage
Multiple repetitions of shared stimuli across subjects

We focused on visual cortex ROIs known to be involved in scene and object processing. For each subject, we extracted voxel responses and applied standard preprocessing (detrending, z-scoring within runs).

Cross-Subject Retrieval Task

The evaluation task was cross-subject stimulus retrieval:

Training: Train an encoder on fMRI data from subject A to predict image features/embeddings
Testing: Given fMRI data from subject B (unseen during training) for stimulus X, retrieve X from a gallery of candidate images

This tests whether the model learns a subject-agnostic representation of visual content that generalizes zero-shot to new subjects.

Baselines

We compared relative representations against:

Raw voxels: Using raw fMRI voxel activations directly (requires anatomical alignment)
PCA: Dimensionality reduction via principal component analysis
Absolute representations: Training encoders with standard neural embeddings (no relative encoding)

Results

The following table shows Top-K retrieval accuracy (percentage of test trials where the correct stimulus appears in the top K retrieved candidates) averaged across all subject pairs:

Method	Top-1	Top-5	Top-10	Top-50
Raw Voxels	2.3%	8.1%	13.5%	35.2%
PCA (100 components)	3.1%	11.4%	18.7%	42.8%
PCA (500 components)	3.8%	13.2%	21.5%	46.1%
Absolute Repr. (frozen)	4.2%	14.9%	23.8%	49.3%
Absolute Repr. (finetuned)	11.7%	28.3%	38.6%	62.4%
Relative Repr. (100 anchors)	18.5%	39.2%	51.7%	75.8%
Relative Repr. (300 anchors)	22.3%	45.8%	58.4%	81.2%
Relative Repr. (500 anchors)	24.7%	49.1%	61.9%	83.6%

Key Observations:

Relative representations dramatically outperform baselines across all metrics, achieving ~6× higher Top-1 accuracy than PCA and ~2× higher than absolute representations with fine-tuning
Performance scales with anchor count: More anchors provide richer reference frames, improving cross-subject alignment
Zero-shot generalization: Unlike absolute representations which require subject-specific fine-tuning, relative representations work immediately on new subjects

Sample Efficiency Analysis

We investigated how the number of training samples from the source subject affects cross-subject generalization:

Training Samples (Subject A)	Absolute Repr.	Relative Repr. (300 anchors)
500	6.2%	17.3%
1,000	8.9%	21.1%
2,000	11.7%	22.8%
5,000	13.4%	24.2%
All (~8,000)	14.9%	24.7%

Relative representations achieve strong performance even with limited training data, showing particular advantages in low-data regimes where absolute representations struggle to learn generalizable patterns.

Left: Mean Reciprocal Rank (MRR) comparison across methods showing the superior ranking performance of relative representations in cross-subject retrieval tasks. Right: Similarity matrix visualizing the alignment quality between subjects in the relative representation space, demonstrating consistent similarity structure across different individuals viewing the same stimuli.

Within-Subject Control

To verify that relative representations don’t sacrifice within-subject performance, we evaluated decoding accuracy when train and test subjects are the same:

Method	Top-1 (Same Subject)	Top-10 (Same Subject)
Absolute Repr.	47.2%	76.8%
Relative Repr. (300 anchors)	44.8%	74.5%

Relative representations maintain competitive within-subject performance while providing the additional benefit of cross-subject generalization—a capability entirely absent in absolute representations.

Key Findings

Zero-shot cross-subject generalization is achievable: By encoding fMRI responses relative to anchor stimuli, we enable models trained on one subject to decode brain activity from entirely new subjects without any fine-tuning or alignment procedures.
Similarity structure is more universal than absolute patterns: While raw fMRI response patterns vary dramatically across individuals, the similarity relationships between stimuli (as measured through their anchor-relative encodings) are remarkably consistent across subjects.
Performance scales with anchors: The number of anchor stimuli directly impacts cross-subject alignment quality. With 500 anchors, relative representations achieve 24.7% Top-1 retrieval accuracy—a substantial improvement over PCA (3.8%) and even fine-tuned absolute representations (11.7%).
Sample efficiency advantages: Relative representations show particular strength in low-data regimes, achieving 17.3% accuracy with just 500 training samples compared to 6.2% for absolute representations—crucial for practical applications where collecting extensive per-subject data is expensive.
Minimal within-subject performance cost: The transformation to relative representations incurs only a small performance penalty (~2-3%) for within-subject decoding while enabling entirely new cross-subject capabilities, making it a practical choice for multi-subject studies.

Citation

@inproceedings{maiorca2024neural,
  title     = {Multi-subject neural decoding via relative representations},
  author    = {Maiorca, Valentino and Azeglio, Simone and Fumero, Marco and
               Domin{\'e}, Cl{\'e}mentine and Rodol{\`a}, Emanuele and
               Locatello, Francesco},
  booktitle = {COSYNE},
  year      = {2024}
}

Authors

Valentino Maiorca¹ · Simone Azeglio¹ · Marco Fumero¹ · Clémentine Dominé² · Emanuele Rodolà¹ · Francesco Locatello²

¹Sapienza University of Rome · ²ISTA (Institute of Science and Technology Austria)

Future Directions

Generative decoding: Extending the approach to generate visual reconstructions (not just retrieval) from cross-subject fMRI data using generative models conditioned on relative representations
Optimal anchor selection: Investigating principled methods for selecting maximally informative anchor stimuli, potentially using active learning or information-theoretic criteria
Multi-modal anchors: Exploring whether anchors from different modalities (e.g., auditory, semantic) could further improve cross-subject alignment
Clinical applications: Testing whether relative representations can enable transfer learning from healthy subjects to clinical populations with atypical brain organization
Real-time decoding: Adapting the approach for online brain-computer interfaces where low-latency cross-subject generalization would enable rapid system deployment