Abstract:
- RNA-protein interactions are key effectors of post-transcriptional regulation → characterization of protein binding mechanisms and identifying the sequence/structure feaures of RNA are important for impact of binding specificty of different proteins → predicting these interactions in silico is poor
- paper introduces RPI-Net, a GNN approach for RNA-protein interaction prediction
- uses graph representation of RNA molecule's secondary structure → important for modelling post-transcriptional regulatory eevnts with focus on structure: RNA splicing, capping, nuclear export, degredation, subcellular localization and translation
- many regulatory processes are influenced by diverse population of RNA binding proteins → each have affinity for one or other specific RNA motifs
- identifying defects in RNA-binding-proteins and RNA interactions are key for identifying neuromuscular disorders and cancer*
- RBP binding has been shown to be determined by both sequence and structure of RNA → eg: Vts1p is RBP that binds a certain seuquence motif within a hairpin loop of RNA only (finding points of interest can inform design and diagnosis)
- RNA secondary structure (RSS) represented as strings or dot-bracket notation and number of recently proposed prediction approaches use strings with nucleotide sequence information *not optimal for LSTMS to ientify motifs because of lack of diversity to learn from most likely
- RNA folding is fundamentally stochastic, since mispairing can lead to alternative structures that may not fit the obvious energy state but are viable → graph represenatations are better for this task
- RPI-Net dataset from (Hafner et al., 2010; Licatalosi et al., 2008; Konig et al., 2010), originally assembled and curated by Maticzka et al. (2014)
- in this structure prediction, can reveal RBP sequence and structure binding preferences that closely match experimentally derived motifs