Using in-cell SHAPE-Seq and simulations to probe structure–function design principles of RNA transcriptional regulators

Antisense RNA-mediated transcriptional regulators are powerful tools for controlling gene expression and creating synthetic gene networks. RNA transcriptional repressors derived from natural mechanisms called attenuators are particularly versatile, though their mechanistic complexity has made them difficult to engineer. Here we identify a new structure–function design principle for attenuators that enables the forward engineering of new RNA transcriptional repressors. Using in-cell SHAPE-Seq to characterize the structures of attenuator variants within Escherichia coli, we show that attenuator hairpins that facilitate interaction with antisense RNAs require interior loops for proper function. Molecular dynamics simulations of these attenuator variants suggest these interior loops impart structural flexibility. We further observe hairpin flexibility in the cellular structures of natural RNA mechanisms that use antisense RNA interactions to repress translation, confirming earlier results from in vitro studies. Finally, we design new transcriptional attenuators in silico using an interior loop as a structural requirement and show that they function as desired in vivo. This work establishes interior loops as an important structural element for designing synthetic RNA gene regulators. We anticipate that the coupling of experimental measurement of cellular RNA structure and function with computational modeling will enable rapid discovery of structure–function design principles for a diverse array of natural and synthetic RNA regulators.


Simulation details
All-atom, replica exchange molecular dynamics (REMD) were performed for Fusion 3 and Fusion 3 L2(GU-CA) sense strand, hairpins (nucleotides 14-61) using the GROMACS software package version 5.0.4 (Pronk et al. 2013). The Amber-99 force field (Wang et al. 2000) ported to GROMACS by Sorin and Pande (Sorin and Pande 2005) was used with modifications for nucleic acids introduced by Chen and Garcia (Chen and García 2013). Further improvements to the nucleic acid torsion and base-pairing potentials calibrated against ultrasonic absorption (Nishikawa et al. 2000) and NMR relaxation dispersion (Rinnenthal et al. 2010) were incorporated. A total of 13209 and 13186 explicit water molecules were added to the Fusion 3 and Fusion 3 L2(GU-CA) systems respectively with the TIP3P model (Jorgensen et al. 1983).
Additionally 86 Na + and 39 Cl ions were added to each system to neutralize the net charge and bring the salt concentration to 0.15 M. Ions were modeled using parameters by Åqvist (Aaqvist 1990) following the approach by Chen and Pappu (Chen and Pappu 2007) to eliminate spurious ion-pairing artifacts.
Each RNA was centered in a 6.0 x 6.0 x 12.0 Å box and aligned to the principle axis.
The box size was chosen based on maintaining a minimum distance of 10 Å between periodic images in the conformational space explored during preliminary, long, high-temperature simulations. A rough alignment with the principle axis was maintained by the application of 3 Å, flat-bottomed, cylindrical restraints with weak, 100 kJ/mol force constant, harmonic edges to the C5' residues at the base of the stem and a 5 Å, flat-bottomed, spherical restraint with weak, 50 kJ/mol force constant, harmonic edges to the C3' residue at the center of the loop. Long-range electrostatic interactions were treated using the particle mesh Ewald approach (Cheatham et al. 1995).
Initial, all-atom RNA structures were generated by the MC-Sym package (Parisien and Major 2008) using secondary structures generated by RNAstructure (Reuter and Mathews 2010) as input. A steepest decent energy minimization was performed until the maximum force was less than 100.0 kJ/mol/nm. A 100 ps, constant volume simulation was used to equilibrate the temperature to 300 K and was followed by a 100 ps, constant pressure equilibration at 1 bar.
Production simulations were performed for 130 ns with conformational sampling by replica exchange. Constant temperature was maintained for each replica using a modified Berendsen thermostat with a tau-t of 0.1 ps. A 2 fs time step was used and snapshots were saved every 2 ps. The first 30 ns were considered equilibration based on analysis of cumulative average base pair occupancy (Supplementary Figure S9).
A preliminary REMD temperature schedule was generated using the temperature predictor algorithm by Patriksson and van der Spoel (Patriksson and van der Spoel 2008). A 1 ns REMD run was used to optimize the temperature schedule by calculation of the rate of acceptance using Gaussian energy distributions as implemented by Garcia and Paschek (García et al. 2006). The resulting schedule comprised 66 replicates ranging from 290.00 K to 435.10 K. The upper temperature limit was selected to permit significant melting of loop and fusion region while leave the stem relatively intact. Exchange rates of 25% were obtained with swaps attempted every 2 ps.
Supplementary Table S1: Plasmids used in this study. Abbreviations are as follows. TrrnB = rrnB terminator, CmR = chloramphenicol resistance, AmpR = ampicillin resistance, SFGFP = super folder green fluorescent protein, t500 = T500 terminator, ECK120051404 = ECK120051404 terminator     TAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTC  TTGCCCGCCTGATGAATGCTCATCCGGAATTTCGTATGGCAATGAAAGACGGT  GAGCTGGTGATATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCA  AACTGAAACGTTTTCATCGCTCTGGAGTGAATACCACGACGATTTCCGGCAGT  TTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTAT  TTCCCTAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGT  GAGTTTCACCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCC  CCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCC  GCTGGCGATTCAGGTTCATCATGCCGTTTGTGATGGCTTCCATGTCGGCAGA  ATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAAT  TTGATATCGAGCTCGCTTGGACTCCTGTTGATAGATCCAGTAATGACCTCAGA  ACTCCATCTGGATTTGTTCAGAACGCTCGGTTGCCGCCGGGCGTTTTTTATTG  GTGAGAATCCAAGCCTCCGATCAACGTCTCATTTTCGCCAAAAGTTGGCCCAG  GGCTTCCCGGTATCAACAGGGACACCAGGATTTATTTATTCTGCGAAGTGATC  TTCCGTCACAGGTATTTATTCGGCGCAAAGTGCGTCGGGTGATGCTGCCAAC  TTACTGATTTAGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTC  TATCAGCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAA  AGCACCGCCGGACATCAGCGCTAGCGGAGTGTATACTGGCTTACTATGTTGG  CACTGATGAGGGTGTCAGTGAAGTGCTTCATGTGGCAGGAGAAAAAAGGCTG  CACCGGTGCGTCAGCAGAATATGTGATACAGGATATATTCCGCTTCCTCGCTC  ACTGACTCGCTACGCTCGGTCGTTCGACTGCGGCGAGCGGAAATGGCTTACG  AACGGGGCGGAGATTTCCTGGAAGATGCCAGGAAGATACTTAACAGGGAAGT  GAGAGGGCCGCGGCAAAGCCGTTTTTCCATAGGCTCCGCCCCCCTGACAAG  CATCACGAAATCTGACGCTCAAATCAGTGGTGGCGAAACCCGACAGGACTAT  AAAGATACCAGGCGTTTCCCCCTGGCGGCTCCCTCGTGCGCTCTCCTGTTCC  TGCCTTTCGGTTTACCGGTGTCATTCCGCTGTTATGGCCGCGTTTGTCTCATT  CCACGCCTGACACTCAGTTCCGGGTAGGCAGTTCGCTCCAAGCTGGACTGTA  TGCACGAACCCCCCGTTCAGTCCGACCGCTGCGCCTTATCCGGTAACTATCG  TCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACT  GGTAATTGATTTAGAGGAGTTAGTCTTGAAGTCATGCGCCGGTTAAGGCTAAA  CTGAAAGGACAAGTTTTGGTGACTGCGCTCCTCCAAGCCAGTTACCTCGGTT  CAAAGAGTTGGTAGCTCAGAGAACCTTCGAAAAACCGCCCTGCAAGGCGGTT  TTTTCGTTTTCAGAGCAAGAGATTACGCGCAGACCAAAACGATCTCAAGAAGA Table S5: Oligonucleotides used for in-cell SHAPE-Seq. Abbreviations within primer sequences are as follows: '/5Biosg/' is a 5' biotin moiety, '/5Phos/' is a 5' monophosphate group, '/3SpC3/' is a 3' 3-carbon spacer group, VIC and NED are fluorophores (ABI), and asterisks indicate a phosphorothioate backbone modification.   (Reuter and Mathews 2010) and are reported in kcal/mol. The ON structure was obtained by forced pairing of the antiterminator with the 5' half of the terminator stem using the RNAStructure fold utility with default parameters. Similarly, the OFF (no antisense) structure is the lowest free energy structure where the complete terminator and poly U are formed. The OFF-with-antisense free energy was calculated using the duplex utility by also including the complete antisense sequence (without terminator). All of these analysis predict that OFF (with antisense) is much more stable than OFF (no antisense), indicating that from a thermodynamic perspective all fusions should be functional.        Fus 4 L(AC-UG)   Figure S9. The simulation cumulative average base pair occupancy is shown for A) fusion 3 and B) fusion 3L2 at 311K. The first 30 ns (the grayed region) was discarded as equilibration while the following 100 ns was considered converged and used in the calculation of the reported base pair occupancies.  Figure 3F. Solvent accessible surface representations, shown in transparent gray, depict the points contacted by a spherical probe of 1.4Å radius rolled across the van der Waals radii of the RNA atoms. Images generated using VMD software (Humphrey et al. 1996).      .((((.(((((((..(((....)))..))))))).)))).))....... Movie of a 2 ns segment from REMD simulations of Fusion 3. These trajectories were generated by following the dynamics of an initial conformation, including exchanges across neighboring temperature replicas. This process results in a physically contiguous trajectory in which the simulation temperature is free to gradually vary in the REMD range (290.00 K to 435.10 K). Ribbon representations of nucleic backbone and bases are colored according to in-cell SHAPE-Seq reactivities from Figure 3F. The surface, shown in transparent gray, follows the contour of a constant global atomic density generated using the VMD software (Chen and García 2013;Humphrey et al. 1996) qsurf representation.

Supplementary Movie 2.
Movie of a 2 ns segment from REMD simulations of Fusion 3 L2(GU-CA). As in Supplementary Movie 1.