Metric-Based Few-Shot Graph Classification

A Unified Framework for Few-Shot Learning on Graphs

LoG 2022

Overview

Few-shot learning aims to train models that can quickly adapt to new classes with only a handful of labeled examples—a critical capability when labeled data is scarce or expensive to obtain. While few-shot learning has been extensively studied in computer vision and NLP, its application to graph-structured data presents unique challenges.

The graph few-shot learning literature has proposed numerous specialized architectures, but several questions remain unanswered:

  • How well do standard few-shot learning methods work on graphs?
  • Are graph-specific inductive biases necessary, or do strong graph embedders suffice?
  • What is the role of task conditioning and data augmentation?

We address these questions by developing a modular, unified framework for few-shot graph classification that enables systematic comparison of different components.

Key Findings Preview

  1. Simple baselines work remarkably well: Standard metric learning methods (Prototypical Networks, Matching Networks) with modern graph neural networks outperform many specialized graph few-shot methods
  2. Task conditioning is crucial: Adapting the embedding space to each few-shot task significantly improves performance
  3. MixUp augmentation provides consistent gains: Graph mixup consistently improves few-shot generalization across methods

Method: Modular Few-Shot Framework

Our framework decomposes few-shot graph classification into three modular components:

1. Graph Embedder

Transform input graphs into fixed-dimensional representations. We evaluate:

  • GIN (Graph Isomorphism Network): Strong baseline with message-passing
  • EGC (Edge-conditioned Graph Convolution): Incorporates edge features
  • GraphSAINT: Sampling-based approach for scalability

The embedder \(f_\theta: \mathcal{G} \rightarrow \mathbb{R}^d\) maps a graph \(G\) to a vector representation.

2. Task Conditioning

Standard few-shot methods use a fixed embedding space for all tasks. We introduce task-conditioned embeddings that adapt to each specific few-shot episode:

\[\mathbf{z}_i = g_\phi(\mathbf{h}_i, \mathcal{S})\]

where \(\mathbf{h}_i = f_\theta(G_i)\) is the base embedding and \(\mathcal{S}\) is the support set for the current task. The conditioning network \(g_\phi\) learns to emphasize task-relevant features.

Implementation: We use a self-attention mechanism over support set embeddings to produce task-specific transformations.

3. Metric Learning

Compare query graphs to support set examples using distance metrics:

Prototypical Networks: Classify based on distance to class prototypes \(p(y=k|\mathbf{z}_q) \propto \exp(-d(\mathbf{z}_q, \mathbf{c}_k))\)

where $$\mathbf{c}_k = \frac{1}{ S_k }\sum_{i \in S_k} \mathbf{z}_i\(is the prototype for class\)k$$.

Matching Networks: Use attention over support set examples \(p(y=k|\mathbf{z}_q) = \sum_{i \in S_k} a(\mathbf{z}_q, \mathbf{z}_i)\)

Data Augmentation: Graph MixUp

We adapt MixUp augmentation to graphs:

\[\tilde{G} = \lambda G_i + (1-\lambda) G_j\]

where \(\lambda \sim \text{Beta}(\alpha, \alpha)\). For graphs, this involves:

  • Interpolating node features: \(\tilde{\mathbf{X}} = \lambda \mathbf{X}_i + (1-\lambda) \mathbf{X}_j\)
  • Combining adjacency matrices: \(\tilde{\mathbf{A}} = \lambda \mathbf{A}_i + (1-\lambda) \mathbf{A}_j\)
  • Mixing labels: \(\tilde{y} = \lambda y_i + (1-\lambda) y_j\)

Experiments

We evaluate on standard few-shot graph classification benchmarks with different graph properties:

Dataset Domain Graphs Classes Avg Nodes Task
TRIANGLES Synthetic 45,000 10 60 Triangle counting
ENZYMES Biology 600 6 32.6 Enzyme function
REDDIT-BINARY Social 2,000 2 429.6 Community detection
PROTEINS Biology 1,113 2 39.1 Protein function

Main Results (5-way 5-shot Accuracy)

Method TRIANGLES ENZYMES REDDIT PROTEINS Avg
Vanilla GIN 82.3 34.7 62.5 71.2 62.7
+ Task Conditioning 87.9 (+5.6) 41.2 (+6.5) 66.8 (+4.3) 74.5 (+3.3) 67.6
+ MixUp 91.2 (+3.3) 44.8 (+3.6) 69.1 (+2.3) 76.8 (+2.3) 70.5
Graph-specific SOTA 88.5 39.4 64.2 73.1 66.3

Our modular approach with task conditioning + MixUp outperforms specialized graph few-shot methods by +4.2% on average.

Component Ablations

Impact of Task Conditioning (5-way 5-shot):

  • Fixed embeddings: 62.7% average accuracy
  • Task-conditioned: 67.6% (+4.9%)

Task conditioning provides the largest single improvement, adapting the embedding space to emphasize relevant features for each episode.

Impact of Data Augmentation:

  • No augmentation: 67.6%
  • Standard augmentation (dropout): 68.2% (+0.6%)
  • Graph MixUp: 70.5% (+2.9%)

MixUp consistently outperforms standard augmentation techniques across all datasets.

Comparison to Graph-Specific Methods

Method Uses Graph Structure ENZYMES PROTEINS Avg
GFL (Graph Filter) 38.2 72.3 55.3
EGNN (Edge-conditioned) 39.4 73.1 56.3
Ours (Modular) 44.8 76.8 60.8

Despite using simpler architectural components, our modular framework with proper task conditioning and augmentation outperforms methods with specialized graph inductive biases.


Key Findings

  1. Strong embedders + proper training beats specialized architectures: State-of-the-art graph neural networks (GIN, EGC) combined with task conditioning outperform specialized few-shot graph methods. The quality of graph representations matters more than architectural novelty.

  2. Task conditioning is critical for few-shot graph learning: Adapting embeddings to each task episode provides 5-6% improvement over fixed embeddings. This suggests that graph few-shot learning requires flexible representations, not just good graph encoders.

  3. MixUp augmentation generalizes to graphs: Graph MixUp (interpolating adjacency matrices and node features) consistently improves few-shot performance by 2-3%, demonstrating that manifold mixup principles apply beyond Euclidean data.

  4. Modular frameworks enable systematic evaluation: By separating embedder, task conditioning, and metric learning, we can identify which components drive performance. This reveals that most gains come from better embedders and task adaptation, not from graph-specific few-shot mechanisms.

  5. Graph few-shot learning benefits from standard FSL techniques: Many advances from image-based few-shot learning (task conditioning, MixUp, metric learning) transfer effectively to graphs, suggesting the field should adopt more techniques from the broader FSL literature.


Code and Reproducibility

The full framework is available at github.com/crisostomi/metric-few-shot-graph.

Features:

  • Modular design for easy experimentation
  • Implementations of GIN, EGC, GraphSAINT embedders
  • Task conditioning mechanisms
  • Graph MixUp augmentation
  • Standardized evaluation protocols

Citation

@inproceedings{crisostomi2022metric,
  title     = {Metric Based Few-Shot Graph Classification},
  author    = {Crisostomi, Donato and Antonelli, Simone and Maiorca, Valentino and
               Moschella, Luca and Marin, Riccardo and Rodolà, Emanuele},
  booktitle = {Learning on Graphs Conference},
  year      = {2022},
  publisher = {PMLR}
}

Authors

Donato Crisostomi¹ · Simone Antonelli¹ · Valentino Maiorca¹ · Luca Moschella¹ · Riccardo Marin¹ · Emanuele Rodolà¹

¹Sapienza University of Rome