TL;DR
We identify low-complexity repeats (LCRs) as key drivers of RNA-RNA interactions and develop RIME, a deep learning model using nucleic acid language model embeddings to predict RNA-RNA interactions. RIME outperforms traditional thermodynamics-based tools and successfully captures LCR-mediated interactions important for gene regulation and neuronal development.
Overview
RNA molecules do not work in isolation. RNA-RNA interactions (RRIs) are fundamental to gene regulation, RNA processing, and cellular homeostasis. They mediate crucial biological processes ranging from microRNA-mediated gene silencing to long non-coding RNA (lncRNA) regulation of transcription. Yet despite their importance, the molecular determinants that govern which RNA molecules interact—and why—remain poorly understood.
Traditional computational approaches for predicting RRIs rely heavily on thermodynamic models that calculate binding free energy based on base-pairing rules. While these physics-based methods have been the gold standard for decades, they face fundamental limitations: they struggle with non-canonical interactions, cannot easily incorporate biological context, and often produce predictions that don’t match experimental observations.
This work makes two key contributions:
1. Biological Discovery: Low-Complexity Repeats as Interaction Hubs
Through systematic analysis of large-scale RRI datasets, we discover that low-complexity repeats (LCRs)—including simple tandem repeats like polyA, polyU, and other short motif repetitions—are dramatically enriched in interacting RNA regions. These LCRs:
- Enable thermodynamically stable interactions with multiple partners
- Act as hubs in RNA-RNA interaction networks
- Facilitate promiscuous binding patterns essential for regulatory RNAs
- Are particularly important for lncRNA-mediated gene regulation
We validate this discovery experimentally by profiling the interactome of Lhx1os, a lncRNA involved in neuronal development, confirming that its LCR-rich regions mediate biologically relevant interactions.
2. Computational Innovation: Language Models for RNA Interaction Prediction
Recognizing that sequence context and compositional patterns are key determinants of RRIs, we develop RIME (RNA Interaction Model with Embeddings), a deep learning framework that:
- Leverages embeddings from a nucleic acid language model pretrained on millions of RNA sequences
- Learns interaction patterns directly from data rather than relying on thermodynamic rules
- Successfully captures the role of LCRs and other sequence features
- Outperforms traditional tools across multiple benchmark datasets
- Prioritizes high-confidence interactions for experimental validation
RIME is freely available as a web tool at https://tools.tartaglialab.com/rna_rna, making state-of-the-art RRI prediction accessible to the broader research community.
Method
Discovery of Low-Complexity Repeats as Interaction Drivers
We began by analyzing several large-scale experimental RRI datasets, including RNA interactome capture data and crosslinking/immunoprecipitation experiments. A striking pattern emerged: interacting RNA regions are significantly enriched in low-complexity repeats.
LCR Enrichment in RRIs
- Simple tandem repeats: PolyA, polyU, and dinucleotide repeats are over-represented in binding sites
- Thermodynamic advantage: LCRs enable stable base-pairing with multiple partners due to sequence redundancy
- Network hubs: RNAs with high LCR content tend to have more interaction partners (high degree centrality)
- Evolutionary conservation: Many functional LCRs in regulatory RNAs are under selective pressure
Why do LCRs facilitate interactions?
Low-complexity sequences provide multiple, redundant binding opportunities. A polyA stretch, for example, can pair with any polyU-containing region regardless of exact positioning. This degeneracy enables:
- Promiscuous binding: One LCR-rich RNA can interact with many partners
- Robustness: Mutations in specific positions don’t abolish binding
- Tunability: LCR length modulates interaction strength
- Context dependence: Flanking sequences fine-tune specificity
This discovery reframes our understanding of RNA interaction networks: rather than specific lock-and-key binding, many RRIs rely on flexible, multivalent interactions mediated by repetitive elements.
RIME Architecture: From Language Models to Interaction Prediction
Traditional RRI prediction tools calculate minimum free energy (MFE) using nearest-neighbor thermodynamic parameters. RIME takes a fundamentally different approach: learn interaction patterns from data using deep learning on contextualized sequence embeddings.
Model Architecture
- Nucleic Acid Language Model: Embeddings from a transformer pretrained on millions of RNA sequences capture sequence context, motif composition, and structural propensities
- Sequence Pair Encoding: For two candidate RNA sequences, extract embeddings that represent their compositional features
- Interaction Classifier: A feedforward neural network processes the concatenated/compared embeddings to predict interaction probability
- Training Objective: Binary classification trained on experimentally validated positive interactions and carefully selected negatives
Key design choices:
- Pretrained embeddings: Rather than learning from scratch, RIME leverages a language model trained on diverse RNA sequences, providing rich representations that generalize across RNA types
- Sequence-only input: No structure prediction required, making the method fast and applicable to any RNA
- Explicit LCR features: The model can optionally incorporate explicit LCR annotations to boost performance
- Calibrated probabilities: Output scores are calibrated to reflect confidence, enabling threshold-based filtering
Advantages over thermodynamics-based methods:
Aspect | Traditional Tools (e.g., RNAup, IntaRNA) | RIME |
---|---|---|
Basis | Thermodynamic models (MFE calculation) | Data-driven learning from experimental RRIs |
Non-canonical interactions | Limited support (standard base pairs only) | Implicitly learned from data |
Sequence context | Local (nearest-neighbor parameters) | Global (transformer attention spans entire sequence) |
LCR handling | Often problematic (repetitive pairing) | Explicitly modeled as interaction drivers |
Speed | Slow for long RNAs (dynamic programming) | Fast inference (forward pass) |
Calibration | Energy values, not probabilities | Calibrated interaction probabilities |
Experiments
Benchmark Performance: RIME vs. Traditional Tools
We evaluated RIME against established thermodynamics-based tools (RNAup, IntaRNA, RIsearch2) on multiple test datasets spanning different RNA types and experimental protocols.
Performance Comparison
Method | Precision | Recall | F1 Score | AUROC |
---|---|---|---|---|
RNAup | 0.42 | 0.38 | 0.40 | 0.61 |
IntaRNA | 0.45 | 0.41 | 0.43 | 0.64 |
RIsearch2 | 0.48 | 0.44 | 0.46 | 0.67 |
RIME | 0.68 | 0.71 | 0.69 | 0.82 |
Average performance across multiple test datasets. RIME shows consistent improvements across all metrics, with particularly strong gains in recall (capturing more true interactions).
Key observations:
- Substantial performance gains: RIME achieves ~50% relative improvement in F1 score over the best baseline
- Higher recall: RIME successfully identifies more true interactions, crucial for discovery applications
- Better calibration: AUROC of 0.82 indicates reliable ranking of interaction candidates
- Robustness: Performance holds across different RNA types (mRNA, lncRNA, miRNA)
Role of LCRs in Interaction Networks
To validate our hypothesis about LCR-mediated interactions, we analyzed the network properties of RNAs stratified by LCR content.
High-LCR RNAs
- Average degree: 15.3 partners
- Network role: Hubs
- Examples: lncRNAs (NEAT1, MALAT1, Xist)
- Function: Regulatory scaffolds, nuclear organization
Low-LCR RNAs
- Average degree: 3.7 partners
- Network role: Peripheral nodes
- Examples: Most protein-coding mRNAs
- Function: Specific regulatory targets
This stark difference in network topology supports a model where LCRs enable promiscuous binding, allowing certain RNAs (especially regulatory lncRNAs) to serve as interaction hubs coordinating multiple cellular processes.
Experimental Validation: Lhx1os lncRNA Interactome
To ground-truth our computational predictions, we performed RNA sequencing of Lhx1os interactors using RNA antisense purification (RAP-seq). Lhx1os is a lncRNA involved in neuronal development, making it an ideal test case for biologically relevant RRIs.
Validation Results
- LCR importance confirmed: Deletion of LCR-rich regions in Lhx1os dramatically reduced interaction counts
- RIME predictions validated: 73% of high-confidence RIME predictions (score > 0.8) were confirmed as true interactors
- Functional relevance: Several validated interactors are involved in neuronal differentiation pathways
- Novel discoveries: RIME identified previously unknown Lhx1os partners missed by thermodynamic tools
Case study: Lhx1os-dependent neuronal development
The validated Lhx1os interactome includes RNAs encoding transcription factors (Pou3f2, Sox11), splicing regulators (Rbfox3), and chromatin modifiers (Chd7). Many of these interactions occur through LCR-mediated binding, supporting a model where Lhx1os acts as a scaffold coordinating gene regulatory programs during neurogenesis.
Application to Neuronal Development
Expanding beyond Lhx1os, we applied RIME to predict RRI networks relevant to neuronal development. By focusing on lncRNAs and mRNAs expressed during differentiation of neural progenitors:
- Network prediction: RIME identified 1,847 high-confidence interactions among 203 neuronal RNAs
- Module detection: Community detection revealed 12 functional modules enriched for specific processes (axon guidance, synaptogenesis, chromatin remodeling)
- LCR-centric topology: Modules are structured around LCR-rich lncRNA hubs
- Developmental dynamics: Predicted interactions show stage-specific patterns matching known differentiation programs
This demonstrates RIME’s utility for hypothesis generation in systems biology: the predicted network suggests testable models of lncRNA-mediated coordination in neural development.
Key Findings
1. LCRs Are Key Drivers of RNA-RNA Interactions
Low-complexity repeats (polyA, polyU, simple motifs) are dramatically enriched in experimentally validated interaction sites. They enable promiscuous, thermodynamically favorable binding with multiple partners, positioning LCR-rich RNAs as hubs in interaction networks.
2. Language Models Capture Interaction Determinants
Embeddings from nucleic acid language models pretrained on diverse RNA sequences encode the compositional and contextual features that determine RRIs. RIME's architecture successfully leverages these representations, outperforming thermodynamics-based tools by ~50% F1 score.
3. Network Topology Reflects LCR Content
High-LCR RNAs (especially lncRNAs) have 4× more interaction partners than low-LCR RNAs. This hub-and-spoke topology suggests architectural principles: regulatory lncRNAs use LCRs to coordinate multiple targets, while most mRNAs engage in specific, targeted interactions.
4. Experimental Validation Confirms Biological Relevance
RAP-seq validation of Lhx1os lncRNA interactome confirms: (a) LCR deletion disrupts interactions, (b) 73% of high-confidence RIME predictions are validated, (c) predicted interactors are functionally relevant to neuronal development. This establishes RIME's utility for biological discovery.
5. Practical Web Tool for Community Use
RIME is freely available at tools.tartaglialab.com/rna_rna. The web interface accepts RNA sequences, returns interaction predictions with confidence scores, and provides interpretability features highlighting LCRs and key motifs. This democratizes access to state-of-the-art RRI prediction.
Citation
@article{Setti2025.02.16.638500,
author = {Setti, Adriano and Bini, Giorgio and Maiorca, Valentino and
Pellegrini, Flaminia and Proietti, Gabriele and
Miltiadis-Vrachnos, Dimitrios and Armaos, Alexandros and
Martone, Julie and Monti, Michele and Ruocco, Giancarlo and
Rodol{\`a}, Emanuele and Bozzoni, Irene and
Colantoni, Alessio and Tartaglia, Gian Gaetano},
title = {Decoding RNA-RNA Interactions: The Role of Low-Complexity
Repeats and a Deep Learning Framework for Sequence-Based
Prediction},
year = {2025},
doi = {10.1101/2025.02.16.638500},
journal = {bioRxiv}
}
Authors
Adriano Setti¹ · Giorgio Bini¹ · Valentino Maiorca²’³ · Flaminia Pellegrini¹ · Gabriele Proietti¹ · Dimitrios Miltiadis-Vrachnos¹ · Alexandros Armaos¹ · Julie Martone⁴ · Michele Monti² · Giancarlo Ruocco² · Emanuele Rodolà³ · Irene Bozzoni⁴ · Alessio Colantoni⁴ · Gian Gaetano Tartaglia¹
¹Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia (IIT), Rome, Italy ²Center for Life Nano- & Neuro-Science, Sapienza University of Rome, Italy ³Sapienza University of Rome, Department of Computer Science, Italy ⁴Sapienza University of Rome, Department of Biology and Biotechnology, Italy