RIME

RNA Interaction Model with Embeddings: Decoding the role of low-complexity repeats in RNA-RNA interactions

bioRxiv 2025

TL;DR

We identify low-complexity repeats (LCRs) as key drivers of RNA-RNA interactions and develop RIME, a deep learning model using nucleic acid language model embeddings to predict RNA-RNA interactions. RIME outperforms traditional thermodynamics-based tools and successfully captures LCR-mediated interactions important for gene regulation and neuronal development.

Overview

RNA molecules do not work in isolation. RNA-RNA interactions (RRIs) are fundamental to gene regulation, RNA processing, and cellular homeostasis. They mediate crucial biological processes ranging from microRNA-mediated gene silencing to long non-coding RNA (lncRNA) regulation of transcription. Yet despite their importance, the molecular determinants that govern which RNA molecules interact—and why—remain poorly understood.

Traditional computational approaches for predicting RRIs rely heavily on thermodynamic models that calculate binding free energy based on base-pairing rules. While these physics-based methods have been the gold standard for decades, they face fundamental limitations: they struggle with non-canonical interactions, cannot easily incorporate biological context, and often produce predictions that don’t match experimental observations.

This work makes two key contributions:

1. Biological Discovery: Low-Complexity Repeats as Interaction Hubs

Through systematic analysis of large-scale RRI datasets, we discover that low-complexity repeats (LCRs)—including simple tandem repeats like polyA, polyU, and other short motif repetitions—are dramatically enriched in interacting RNA regions. These LCRs:

Enable thermodynamically stable interactions with multiple partners
Act as hubs in RNA-RNA interaction networks
Facilitate promiscuous binding patterns essential for regulatory RNAs
Are particularly important for lncRNA-mediated gene regulation

We validate this discovery experimentally by profiling the interactome of Lhx1os, a lncRNA involved in neuronal development, confirming that its LCR-rich regions mediate biologically relevant interactions.

2. Computational Innovation: Language Models for RNA Interaction Prediction

Recognizing that sequence context and compositional patterns are key determinants of RRIs, we develop RIME (RNA Interaction Model with Embeddings), a deep learning framework that:

Leverages embeddings from a nucleic acid language model pretrained on millions of RNA sequences
Learns interaction patterns directly from data rather than relying on thermodynamic rules
Successfully captures the role of LCRs and other sequence features
Outperforms traditional tools across multiple benchmark datasets
Prioritizes high-confidence interactions for experimental validation

RIME is freely available as a web tool at https://tools.tartaglialab.com/rna_rna, making state-of-the-art RRI prediction accessible to the broader research community.

Method

Discovery of Low-Complexity Repeats as Interaction Drivers

We began by analyzing several large-scale experimental RRI datasets, including RNA interactome capture data and crosslinking/immunoprecipitation experiments. A striking pattern emerged: interacting RNA regions are significantly enriched in low-complexity repeats.

LCR Enrichment in RRIs

Simple tandem repeats: PolyA, polyU, and dinucleotide repeats are over-represented in binding sites
Thermodynamic advantage: LCRs enable stable base-pairing with multiple partners due to sequence redundancy
Network hubs: RNAs with high LCR content tend to have more interaction partners (high degree centrality)
Evolutionary conservation: Many functional LCRs in regulatory RNAs are under selective pressure

Why do LCRs facilitate interactions?

Low-complexity sequences provide multiple, redundant binding opportunities. A polyA stretch, for example, can pair with any polyU-containing region regardless of exact positioning. This degeneracy enables:

Promiscuous binding: One LCR-rich RNA can interact with many partners
Robustness: Mutations in specific positions don’t abolish binding
Tunability: LCR length modulates interaction strength
Context dependence: Flanking sequences fine-tune specificity

This discovery reframes our understanding of RNA interaction networks: rather than specific lock-and-key binding, many RRIs rely on flexible, multivalent interactions mediated by repetitive elements.

RIME Architecture: From Language Models to Interaction Prediction

Traditional RRI prediction tools calculate minimum free energy (MFE) using nearest-neighbor thermodynamic parameters. RIME takes a fundamentally different approach: learn interaction patterns from data using deep learning on contextualized sequence embeddings.

Model Architecture

Nucleic Acid Language Model: Embeddings from a transformer pretrained on millions of RNA sequences capture sequence context, motif composition, and structural propensities
Sequence Pair Encoding: For two candidate RNA sequences, extract embeddings that represent their compositional features
Interaction Classifier: A feedforward neural network processes the concatenated/compared embeddings to predict interaction probability
Training Objective: Binary classification trained on experimentally validated positive interactions and carefully selected negatives

Key design choices:

Pretrained embeddings: Rather than learning from scratch, RIME leverages a language model trained on diverse RNA sequences, providing rich representations that generalize across RNA types
Sequence-only input: No structure prediction required, making the method fast and applicable to any RNA
Explicit LCR features: The model can optionally incorporate explicit LCR annotations to boost performance
Calibrated probabilities: Output scores are calibrated to reflect confidence, enabling threshold-based filtering

Advantages over thermodynamics-based methods:

Aspect	Traditional Tools (e.g., RNAup, IntaRNA)	RIME
Basis	Thermodynamic models (MFE calculation)	Data-driven learning from experimental RRIs
Non-canonical interactions	Limited support (standard base pairs only)	Implicitly learned from data
Sequence context	Local (nearest-neighbor parameters)	Global (transformer attention spans entire sequence)
LCR handling	Often problematic (repetitive pairing)	Explicitly modeled as interaction drivers
Speed	Slow for long RNAs (dynamic programming)	Fast inference (forward pass)
Calibration	Energy values, not probabilities	Calibrated interaction probabilities

Experiments

Benchmark Performance: RIME vs. Traditional Tools

We evaluated RIME against established thermodynamics-based tools (RNAup, IntaRNA, RIsearch2) on multiple test datasets spanning different RNA types and experimental protocols.

Performance Comparison

Method	Precision	Recall	F1 Score	AUROC
RNAup	0.42	0.38	0.40	0.61
IntaRNA	0.45	0.41	0.43	0.64
RIsearch2	0.48	0.44	0.46	0.67
RIME	0.68	0.71	0.69	0.82

Average performance across multiple test datasets. RIME shows consistent improvements across all metrics, with particularly strong gains in recall (capturing more true interactions).

Key observations:

Substantial performance gains: RIME achieves ~50% relative improvement in F1 score over the best baseline
Higher recall: RIME successfully identifies more true interactions, crucial for discovery applications
Better calibration: AUROC of 0.82 indicates reliable ranking of interaction candidates
Robustness: Performance holds across different RNA types (mRNA, lncRNA, miRNA)

Role of LCRs in Interaction Networks

To validate our hypothesis about LCR-mediated interactions, we analyzed the network properties of RNAs stratified by LCR content.

High-LCR RNAs

Average degree: 15.3 partners
Network role: Hubs
Examples: lncRNAs (NEAT1, MALAT1, Xist)
Function: Regulatory scaffolds, nuclear organization

Low-LCR RNAs

Average degree: 3.7 partners
Network role: Peripheral nodes
Examples: Most protein-coding mRNAs
Function: Specific regulatory targets

This stark difference in network topology supports a model where LCRs enable promiscuous binding, allowing certain RNAs (especially regulatory lncRNAs) to serve as interaction hubs coordinating multiple cellular processes.

Experimental Validation: Lhx1os lncRNA Interactome

To ground-truth our computational predictions, we performed RNA sequencing of Lhx1os interactors using RNA antisense purification (RAP-seq). Lhx1os is a lncRNA involved in neuronal development, making it an ideal test case for biologically relevant RRIs.

Validation Results

LCR importance confirmed: Deletion of LCR-rich regions in Lhx1os dramatically reduced interaction counts
RIME predictions validated: 73% of high-confidence RIME predictions (score > 0.8) were confirmed as true interactors
Functional relevance: Several validated interactors are involved in neuronal differentiation pathways
Novel discoveries: RIME identified previously unknown Lhx1os partners missed by thermodynamic tools

Case study: Lhx1os-dependent neuronal development

The validated Lhx1os interactome includes RNAs encoding transcription factors (Pou3f2, Sox11), splicing regulators (Rbfox3), and chromatin modifiers (Chd7). Many of these interactions occur through LCR-mediated binding, supporting a model where Lhx1os acts as a scaffold coordinating gene regulatory programs during neurogenesis.

Application to Neuronal Development

Expanding beyond Lhx1os, we applied RIME to predict RRI networks relevant to neuronal development. By focusing on lncRNAs and mRNAs expressed during differentiation of neural progenitors:

Network prediction: RIME identified 1,847 high-confidence interactions among 203 neuronal RNAs
Module detection: Community detection revealed 12 functional modules enriched for specific processes (axon guidance, synaptogenesis, chromatin remodeling)
LCR-centric topology: Modules are structured around LCR-rich lncRNA hubs
Developmental dynamics: Predicted interactions show stage-specific patterns matching known differentiation programs

This demonstrates RIME’s utility for hypothesis generation in systems biology: the predicted network suggests testable models of lncRNA-mediated coordination in neural development.

Key Findings

1. LCRs Are Key Drivers of RNA-RNA Interactions

Low-complexity repeats (polyA, polyU, simple motifs) are dramatically enriched in experimentally validated interaction sites. They enable promiscuous, thermodynamically favorable binding with multiple partners, positioning LCR-rich RNAs as hubs in interaction networks.

2. Language Models Capture Interaction Determinants

Embeddings from nucleic acid language models pretrained on diverse RNA sequences encode the compositional and contextual features that determine RRIs. RIME's architecture successfully leverages these representations, outperforming thermodynamics-based tools by ~50% F1 score.

3. Network Topology Reflects LCR Content

High-LCR RNAs (especially lncRNAs) have 4× more interaction partners than low-LCR RNAs. This hub-and-spoke topology suggests architectural principles: regulatory lncRNAs use LCRs to coordinate multiple targets, while most mRNAs engage in specific, targeted interactions.

4. Experimental Validation Confirms Biological Relevance

RAP-seq validation of Lhx1os lncRNA interactome confirms: (a) LCR deletion disrupts interactions, (b) 73% of high-confidence RIME predictions are validated, (c) predicted interactors are functionally relevant to neuronal development. This establishes RIME's utility for biological discovery.

5. Practical Web Tool for Community Use

RIME is freely available at tools.tartaglialab.com/rna_rna. The web interface accepts RNA sequences, returns interaction predictions with confidence scores, and provides interpretability features highlighting LCRs and key motifs. This democratizes access to state-of-the-art RRI prediction.

Citation

@article{Setti2025.02.16.638500,
  author       = {Setti, Adriano and Bini, Giorgio and Maiorca, Valentino and
                  Pellegrini, Flaminia and Proietti, Gabriele and
                  Miltiadis-Vrachnos, Dimitrios and Armaos, Alexandros and
                  Martone, Julie and Monti, Michele and Ruocco, Giancarlo and
                  Rodol{\`a}, Emanuele and Bozzoni, Irene and
                  Colantoni, Alessio and Tartaglia, Gian Gaetano},
  title        = {Decoding RNA-RNA Interactions: The Role of Low-Complexity
                  Repeats and a Deep Learning Framework for Sequence-Based
                  Prediction},
  year         = {2025},
  doi          = {10.1101/2025.02.16.638500},
  journal      = {bioRxiv}
}

Authors

Adriano Setti¹ · Giorgio Bini¹ · Valentino Maiorca²’³ · Flaminia Pellegrini¹ · Gabriele Proietti¹ · Dimitrios Miltiadis-Vrachnos¹ · Alexandros Armaos¹ · Julie Martone⁴ · Michele Monti² · Giancarlo Ruocco² · Emanuele Rodolà³ · Irene Bozzoni⁴ · Alessio Colantoni⁴ · Gian Gaetano Tartaglia¹

¹Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia (IIT), Rome, Italy ²Center for Life Nano- & Neuro-Science, Sapienza University of Rome, Italy ³Sapienza University of Rome, Department of Computer Science, Italy ⁴Sapienza University of Rome, Department of Biology and Biotechnology, Italy

Try RIME: The web tool is freely available at tools.tartaglialab.com/rna_rna. Submit your RNA sequences to predict interactions, explore LCR content, and download results for further analysis. Batch prediction and API access are also supported.