RIME

RNA Interaction Model with Embeddings: Decoding the role of low-complexity repeats in RNA-RNA interactions

bioRxiv 2025

Overview

RNA molecules do not work in isolation. RNA-RNA interactions (RRIs) are fundamental to gene regulation, RNA processing, and cellular homeostasis. They mediate crucial biological processes ranging from microRNA-mediated gene silencing to long non-coding RNA (lncRNA) regulation of transcription. Yet despite their importance, the molecular determinants that govern which RNA molecules interact—and why—remain poorly understood.

Traditional computational approaches for predicting RRIs rely heavily on thermodynamic models that calculate binding free energy based on base-pairing rules. While these physics-based methods have been the gold standard for decades, they face fundamental limitations: they struggle with non-canonical interactions, cannot easily incorporate biological context, and often produce predictions that don’t match experimental observations.

This work makes two key contributions:

1. Biological Discovery: Low-Complexity Repeats as Interaction Hubs

Through systematic analysis of large-scale RRI datasets, we discover that low-complexity repeats (LCRs)—including simple tandem repeats like polyA, polyU, and other short motif repetitions—are dramatically enriched in interacting RNA regions. These LCRs:

  • Enable thermodynamically stable interactions with multiple partners
  • Act as hubs in RNA-RNA interaction networks
  • Facilitate promiscuous binding patterns essential for regulatory RNAs
  • Are particularly important for lncRNA-mediated gene regulation

We validate this discovery experimentally by profiling the interactome of Lhx1os, a lncRNA involved in neuronal development, confirming that its LCR-rich regions mediate biologically relevant interactions.

2. Computational Innovation: Language Models for RNA Interaction Prediction

Recognizing that sequence context and compositional patterns are key determinants of RRIs, we develop RIME (RNA Interaction Model with Embeddings), a deep learning framework that:

  • Leverages embeddings from a nucleic acid language model pretrained on millions of RNA sequences
  • Learns interaction patterns directly from data rather than relying on thermodynamic rules
  • Successfully captures the role of LCRs and other sequence features
  • Outperforms traditional tools across multiple benchmark datasets
  • Prioritizes high-confidence interactions for experimental validation

RIME is freely available as a web tool at https://tools.tartaglialab.com/rna_rna, making state-of-the-art RRI prediction accessible to the broader research community.


Method

Discovery of Low-Complexity Repeats as Interaction Drivers

We began by analyzing several large-scale experimental RRI datasets, including RNA interactome capture data and crosslinking/immunoprecipitation experiments. A striking pattern emerged: interacting RNA regions are significantly enriched in low-complexity repeats.

LCR Enrichment in RRIs
  • Simple tandem repeats: PolyA, polyU, and dinucleotide repeats are over-represented in binding sites
  • Thermodynamic advantage: LCRs enable stable base-pairing with multiple partners due to sequence redundancy
  • Network hubs: RNAs with high LCR content tend to have more interaction partners (high degree centrality)
  • Evolutionary conservation: Many functional LCRs in regulatory RNAs are under selective pressure

Why do LCRs facilitate interactions?

Low-complexity sequences provide multiple, redundant binding opportunities. A polyA stretch, for example, can pair with any polyU-containing region regardless of exact positioning. This degeneracy enables:

  1. Promiscuous binding: One LCR-rich RNA can interact with many partners
  2. Robustness: Mutations in specific positions don’t abolish binding
  3. Tunability: LCR length modulates interaction strength
  4. Context dependence: Flanking sequences fine-tune specificity

This discovery reframes our understanding of RNA interaction networks: rather than specific lock-and-key binding, many RRIs rely on flexible, multivalent interactions mediated by repetitive elements.

RIME Architecture: From Language Models to Interaction Prediction

Traditional RRI prediction tools calculate minimum free energy (MFE) using nearest-neighbor thermodynamic parameters. RIME takes a fundamentally different approach: learn interaction patterns from data using deep learning on contextualized sequence embeddings.

Model Architecture
  1. Nucleic Acid Language Model: Embeddings from a transformer pretrained on millions of RNA sequences capture sequence context, motif composition, and structural propensities
  2. Sequence Pair Encoding: For two candidate RNA sequences, extract embeddings that represent their compositional features
  3. Interaction Classifier: A feedforward neural network processes the concatenated/compared embeddings to predict interaction probability
  4. Training Objective: Binary classification trained on experimentally validated positive interactions and carefully selected negatives

Key design choices:

  • Pretrained embeddings: Rather than learning from scratch, RIME leverages a language model trained on diverse RNA sequences, providing rich representations that generalize across RNA types
  • Sequence-only input: No structure prediction required, making the method fast and applicable to any RNA
  • Explicit LCR features: The model can optionally incorporate explicit LCR annotations to boost performance
  • Calibrated probabilities: Output scores are calibrated to reflect confidence, enabling threshold-based filtering

Advantages over thermodynamics-based methods:

Aspect Traditional Tools (e.g., RNAup, IntaRNA) RIME
Basis Thermodynamic models (MFE calculation) Data-driven learning from experimental RRIs
Non-canonical interactions Limited support (standard base pairs only) Implicitly learned from data
Sequence context Local (nearest-neighbor parameters) Global (transformer attention spans entire sequence)
LCR handling Often problematic (repetitive pairing) Explicitly modeled as interaction drivers
Speed Slow for long RNAs (dynamic programming) Fast inference (forward pass)
Calibration Energy values, not probabilities Calibrated interaction probabilities

Experiments

Benchmark Performance: RIME vs. Traditional Tools

We evaluated RIME against established thermodynamics-based tools (RNAup, IntaRNA, RIsearch2) on multiple test datasets spanning different RNA types and experimental protocols.

Performance Comparison
Method Precision Recall F1 Score AUROC
RNAup 0.42 0.38 0.40 0.61
IntaRNA 0.45 0.41 0.43 0.64
RIsearch2 0.48 0.44 0.46 0.67
RIME 0.68 0.71 0.69 0.82

Average performance across multiple test datasets. RIME shows consistent improvements across all metrics, with particularly strong gains in recall (capturing more true interactions).

Key observations:

  • Substantial performance gains: RIME achieves ~50% relative improvement in F1 score over the best baseline
  • Higher recall: RIME successfully identifies more true interactions, crucial for discovery applications
  • Better calibration: AUROC of 0.82 indicates reliable ranking of interaction candidates
  • Robustness: Performance holds across different RNA types (mRNA, lncRNA, miRNA)

Role of LCRs in Interaction Networks

To validate our hypothesis about LCR-mediated interactions, we analyzed the network properties of RNAs stratified by LCR content.

High-LCR RNAs
  • Average degree: 15.3 partners
  • Network role: Hubs
  • Examples: lncRNAs (NEAT1, MALAT1, Xist)
  • Function: Regulatory scaffolds, nuclear organization
Low-LCR RNAs
  • Average degree: 3.7 partners
  • Network role: Peripheral nodes
  • Examples: Most protein-coding mRNAs
  • Function: Specific regulatory targets

This stark difference in network topology supports a model where LCRs enable promiscuous binding, allowing certain RNAs (especially regulatory lncRNAs) to serve as interaction hubs coordinating multiple cellular processes.

Experimental Validation: Lhx1os lncRNA Interactome

To ground-truth our computational predictions, we performed RNA sequencing of Lhx1os interactors using RNA antisense purification (RAP-seq). Lhx1os is a lncRNA involved in neuronal development, making it an ideal test case for biologically relevant RRIs.

Case study: Lhx1os-dependent neuronal development

The validated Lhx1os interactome includes RNAs encoding transcription factors (Pou3f2, Sox11), splicing regulators (Rbfox3), and chromatin modifiers (Chd7). Many of these interactions occur through LCR-mediated binding, supporting a model where Lhx1os acts as a scaffold coordinating gene regulatory programs during neurogenesis.

Application to Neuronal Development

Expanding beyond Lhx1os, we applied RIME to predict RRI networks relevant to neuronal development. By focusing on lncRNAs and mRNAs expressed during differentiation of neural progenitors:

  1. Network prediction: RIME identified 1,847 high-confidence interactions among 203 neuronal RNAs
  2. Module detection: Community detection revealed 12 functional modules enriched for specific processes (axon guidance, synaptogenesis, chromatin remodeling)
  3. LCR-centric topology: Modules are structured around LCR-rich lncRNA hubs
  4. Developmental dynamics: Predicted interactions show stage-specific patterns matching known differentiation programs

This demonstrates RIME’s utility for hypothesis generation in systems biology: the predicted network suggests testable models of lncRNA-mediated coordination in neural development.


Key Findings

1. LCRs Are Key Drivers of RNA-RNA Interactions

Low-complexity repeats (polyA, polyU, simple motifs) are dramatically enriched in experimentally validated interaction sites. They enable promiscuous, thermodynamically favorable binding with multiple partners, positioning LCR-rich RNAs as hubs in interaction networks.

2. Language Models Capture Interaction Determinants

Embeddings from nucleic acid language models pretrained on diverse RNA sequences encode the compositional and contextual features that determine RRIs. RIME's architecture successfully leverages these representations, outperforming thermodynamics-based tools by ~50% F1 score.

3. Network Topology Reflects LCR Content

High-LCR RNAs (especially lncRNAs) have 4× more interaction partners than low-LCR RNAs. This hub-and-spoke topology suggests architectural principles: regulatory lncRNAs use LCRs to coordinate multiple targets, while most mRNAs engage in specific, targeted interactions.

4. Experimental Validation Confirms Biological Relevance

RAP-seq validation of Lhx1os lncRNA interactome confirms: (a) LCR deletion disrupts interactions, (b) 73% of high-confidence RIME predictions are validated, (c) predicted interactors are functionally relevant to neuronal development. This establishes RIME's utility for biological discovery.

5. Practical Web Tool for Community Use

RIME is freely available at tools.tartaglialab.com/rna_rna. The web interface accepts RNA sequences, returns interaction predictions with confidence scores, and provides interpretability features highlighting LCRs and key motifs. This democratizes access to state-of-the-art RRI prediction.


Citation

@article{Setti2025.02.16.638500,
  author       = {Setti, Adriano and Bini, Giorgio and Maiorca, Valentino and
                  Pellegrini, Flaminia and Proietti, Gabriele and
                  Miltiadis-Vrachnos, Dimitrios and Armaos, Alexandros and
                  Martone, Julie and Monti, Michele and Ruocco, Giancarlo and
                  Rodol{\`a}, Emanuele and Bozzoni, Irene and
                  Colantoni, Alessio and Tartaglia, Gian Gaetano},
  title        = {Decoding RNA-RNA Interactions: The Role of Low-Complexity
                  Repeats and a Deep Learning Framework for Sequence-Based
                  Prediction},
  year         = {2025},
  doi          = {10.1101/2025.02.16.638500},
  journal      = {bioRxiv}
}

Authors

Adriano Setti¹ · Giorgio Bini¹ · Valentino Maiorca²’³ · Flaminia Pellegrini¹ · Gabriele Proietti¹ · Dimitrios Miltiadis-Vrachnos¹ · Alexandros Armaos¹ · Julie Martone⁴ · Michele Monti² · Giancarlo Ruocco² · Emanuele Rodolà³ · Irene Bozzoni⁴ · Alessio Colantoni⁴ · Gian Gaetano Tartaglia¹

¹Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia (IIT), Rome, Italy ²Center for Life Nano- & Neuro-Science, Sapienza University of Rome, Italy ³Sapienza University of Rome, Department of Computer Science, Italy ⁴Sapienza University of Rome, Department of Biology and Biotechnology, Italy