Dissection of protein and RNA regions required for SPEN binding to XIST A-repeat RNA

XIST noncoding RNA promotes the initiation of X chromosome silencing by recruiting the protein SPEN to one X chromosome in female mammals. The SPEN protein is also called SHARP (SMRT and HDAC-associated repressor protein) and MINT (Msx-2 interacting nuclear target) in humans. SPEN recruits N-CoR2 and HDAC3 to initiate histone deacetylation on the X chromosome, leading to the formation of repressive chromatin marks and silencing gene expression. We dissected the contributions of different RNA and protein regions to the formation of a human XIST–SPEN complex in vitro and identified novel sequence and structure determinants that may contribute to X chromosome silencing initiation. Binding of SPEN to XIST RNA requires RRM 4 of the protein, in contrast to the requirement of RRM 3 and RRM 4 for specific binding to SRA RNA. Measurements of SPEN binding to full-length, dimeric, trimeric, or other truncated versions of the A-repeat region revealed that high-affinity binding of XIST to SPEN in vitro requires a minimum of four A-repeat segments. SPEN binding to XIST A-repeat RNA changes the accessibility of the RNA at specific nucleotide sequences, as indicated by changes in RNA reactivity through chemical structure probing. Based on computational modeling, we found that inter-repeat duplexes formed by multiple A-repeats can present an unpaired adenosine in the context of a double-stranded region of RNA. The presence of this specific combination of sequence and structural motifs correlates with high-affinity SPEN binding in vitro. These data provide new information on the molecular basis of the XIST and SPEN interaction.


INTRODUCTION
Mammals use an RNA-based silencing mechanism to equalize gene expression dosage in females with two X chromosomes (XX) compared to males with one X chromosome (XY).Silencing of all but one X chromosome in XX, XXX, or XXY individuals is termed X chromosome inactivation (XCI) and takes place during early embryonic development (Lyon 1962).This silencing process ensures that only a single allele for most X-linked genes will be transcribed in each somatic female cell.In humans, XCI occurs randomly, and either the maternal or paternal X chromosome becomes silenced (Moreira de Mello et al. 2010).The silenced X chromosome is condensed to form a Barr body (Barr and Bertram 1949) and this transcriptional silencing is maintained during clonal expansion through development.
Expression of the XIST noncoding RNA is essential for XCI in humans and other eutherian mammals (Penny et al. 1996;Sahakyan et al. 2018), while metatherian mammals rely on the Rsx noncoding RNA for a similar process (Grant et al. 2012).Human XIST is a 17-kilonucleotide RNA containing multiple regions of conserved repeat sequences with functions in silencing (Brockdorff et al. 1992;Brown et al. 1992).During initiation of XCI, XIST RNA directly binds and recruits SPEN to the inactive X chromosome (McHugh et al. 2015).Specifically, the A-repeat region of XIST is required for recruitment of SPEN and initiation of silencing (Wutz et al. 2002;Chu et al. 2015).In humans, the XIST A-repeat region consists of eight and a half A-repeat sequences with intervening U-rich linkers.This linear structure, consisting of alternating repeats and U-rich linker regions, is conserved among species.In other mammalian species such as chiropters and lagomorphs, the A-repeat region can contain as few as six full repeats.While the overall structure of each XIST A-repeat is conserved, nucleotide sequence variations occur across individual repeats even in the same species (Supplemental Fig. S1).The unusual sequence and structure of the A-repeat region of XIST RNA may be critical for SPEN binding and recruitment to the X chromosome.A single XIST A-repeat sequence can form a double hairpin while multiple repeats can form inter-repeat duplexes (Duszczyk et al. 2008(Duszczyk et al. , 2011)).Higher order and more complex structures have been proposed to form through base complementarity with neighboring or distant repeats (Maenner et al. 2010;Fang et al. 2015).Structural analyses of the A-repeat region suggested that this region can fold into a compact RNA structure, independent of the remainder of the XIST RNA transcript (Lu et al. 2016;Liu et al. 2017).
SPEN is a human homolog of the Drosophila spen (split ends) protein.This protein is also called SHARP (SMRT and HDAC-associated repressor protein) and MINT (Msx-2 interacting nuclear target) in humans.SPEN is a large multidomain RNA-binding protein (RBP).The SPEN family proteins regulate gene expression in several developmental processes.The Drosophila spen protein was first identified as a positive regulator of the DER/Ras signaling pathway (Chen and Rebay 2000) and interacts with Hox to regulate segmental morphologies (Wiellette et al. 1999).SPEN can act as a transcriptional repressor (Ariyoshi and Schwabe 2003).SPEN contains several conserved protein domains that enable interactions with RNA coactivators and the nuclear hormone coreceptors N-CoR2 and histone deacetylases (HDACs) to silence transcription (Shi et al. 2001).SPEN is a predominantly nuclear protein, containing four RNA recognition motif (RRM) domains at the N-terminus and a SPOC domain at the C-terminus.The SPOC domain allows SPEN to achieve transcriptional repressor activity by recruiting regulatory proteins to the X chromosome (Dossin et al. 2020).
RNA binding of SPEN occurs through the four N-terminal RRM domains.The N-terminal region of the protein contains RRM 1 followed by a disordered linker region, RRM 2, a second linker region, RRM 3, and RRM 4. The SPEN RRM 2-4 construct has been crystallized and can bind tightly to the steroid receptor antigen (SRA) noncoding RNA (Shi et al. 2001;Arieti et al. 2014).The mechanism by which SPEN recognizes and binds XIST is still unknown.RRM domains can recognize and bind a wide range of nucleotide sequences (Afroz et al. 2015) making prediction of specific RNA-protein interactions from sequence information a challenge.For human XIST RNA, the number and identity of individual A-repeat sequences within the XIST A-repeat region that are required to achieve high-affinity binding to SPEN in vitro were not previously investigated systematically.Since the XIST-SPEN interaction is critical for X chromosome silencing, we performed truncation and binding studies of the RNA and protein in vitro to investigate the molecular basis for this RNA-protein interaction.We also examined the consequences of protein binding on XIST A-repeat RNA structure and identified a specific structural and sequence motif that was correlated with high affinity in vitro SPEN binding to XIST A-repeat sequences.

XIST A-full and SRA RNA bind SPEN with similar affinity
To investigate the binding regions required for formation of stable SPEN and human XIST RNA complexes in vitro, we cloned the N-terminal region of human SPEN to create the SPEN RRM 1-4 protein construct, along with the human XIST A-repeat RNA region and SRA RNA transcript (Fig. 1A).Previous work suggested that the A-repeat region folds independently and may form complex inter-repeat interactions.We performed in vitro binding studies using a multiplexed version of the electrophoretic mobility shift assay (EMSA) where trace amounts of fluorescently labeled RNA were incubated with increasing concentrations of SPEN protein.The resulting RNA-protein complexes were separated by native polyacrylamide gel electrophoresis (Fig. 1B).BSA binding to SRA RNA was used as a negative control for the in vitro RNA-protein binding experiments (Supplemental Fig. S2).We quantified the bound fraction of RNA and calculated the binding affinity of SPEN RRM 1-4 for XIST A-full .The equilibrium dissociation constant (K d ) of the SPEN RRM 1-4 and XIST A-full RNA-protein interaction was 100 ± 30 nM (Fig. 1C; Table 1).The same experiments were also performed for SRA RNA binding to SPEN RRM 1-4 , and the K d for this interaction was 100 ± 20 nM (Fig. 1D,E).SRA RNA and XIST A-full RNA each bound to SPEN RRM 1-4 with similar affinity in EMSA studies (Fig. 1F).A size-matched fragment from an unrelated messenger RNA, BARD1, was used as a negative control for SPEN binding (Supplemental Fig. S2).Since SRA and XIST RNA were predicted to bind to SPEN at the same site (Monfort et al. 2015;Carter et al. 2020), we performed a competition assay between XIST and SRA for SPEN binding.Labeled XIST A-full RNA was mixed with SPEN RRM 1-4 , then increasing quantities of unlabeled SRA or BARD1 negative control RNA were added to the samples to determine whether SRA or BARD1 binding could outcompete XIST binding.Addition of SRA RNA to the complexes could displace the binding of XIST A-full RNA to SPEN RRM 1-4 protein, indicating that XIST A-full and SRA binding to SPEN were mutually exclusive (Fig. 1G).The control experiment performed with BARD1 negative control RNA showed that BARD1 could not compete with XIST A-full for binding to SPEN.

SPEN RRM 1-4 increases SRA RNA binding compared to SPEN RRM 2-4 construct
In previous work, the binding of a SPEN RRM 2-4 protein construct to SRA RNA was examined in detail in vitro (Arieti et al. 2014).We compared the contribution of RRM 1 and the linker sequence between RRM 1 and RRM 2 of SPEN protein to the binding affinity of SPEN RRM 2-4 for the XIST A-full and SRA RNA transcripts.The SPEN RRM 1-RRM 2 linker consists of a low complexity, predicted intrinsically disordered protein region with multiple serine SPEN binding to XIST A-repeat RNA and arginine residues.Low-complexity protein domains have been proposed to confer nonspecific RNA binding (Basu and Bahadur 2016).SPEN RRM 2-4 is necessary for the initial recruitment of SPEN to the X chromosome while RRM 1 has been found to be dispensable for XCI (Dossin et al. 2020).Consistent with these results, we found that inclusion of RRM 1 in the SPEN RRM 1-4 construct had no effect on XIST A-full binding.However, inclusion of this additional protein sequence significantly increased SPEN RRM 1-4 binding to both SRA RNA and the negative control BARD1 RNA, when compared to the previously studied SPEN RRM 2-4 construct containing only RRM 2, RRM 3, and RRM 4 (Fig. 2).
SPEN RRM 3 is required for binding to SRA RNA but not to XIST A-full RNA SRA binding to SPEN RRM 2-4 can be eliminated by mutation of several aromatic amino acid residues at the RNA-binding interface of RRM 3 within the SPEN RRM 2-4 protein construct (Arieti et al. 2014).We created the same RRM 3 mutant SPEN RRM 2-4 mut3 protein and tested whether the RRM 3 domain was similarly required for SPEN recognition of XIST RNA, or whether these two RNA targets can be recognized differently by the protein.The negative control for nonspecific RNA binding in these experiments was BARD1 mRNA, a size-matched fragment from an unrelated messenger RNA that would not be expected to bind to SPEN.Confirming the previous findings, SPEN RRM 2-4 mut3 binding to SRA was greatly reduced when RRM 3 residues were mutated, confirming that specific aromatic amino acid mutations (F282A, K311A, Y319A, F321A, and K353A) on the β-sheet face of RRM 3 disrupted SRA binding (Fig. 3).Mutation of aromatic residues in RRM 3 of SPEN RRM 2-4 resulted in a loss of binding (defined as K d > 1000 nM) for both SRA and the negative control, BARD1 RNA.Mutation of the aromatic residues of SPEN RRM 3 did not eliminate binding to XIST in the same manner as for SRA RNA and BARD1 RNA.Mutations in the construct SPEN RRM 2-4 mut3 decreased but did not completely abrogate binding to XIST A-full A-repeat RNA, resulting in a twofold reduction of the K d to 230 ± 40 nM (Table 2).
SPEN RRM 4 is required for binding to any of the RNA transcripts examined Since mutations in RRM 3 did not eliminate SPEN binding to XIST, we next evaluated the contribution of RRM 4 to RNA-binding affinity and specificity.The minimal SRA RNA binding construct of SPEN is RRM 3-4 (Arieti et al. 2014), and our results showed that XIST and SRA bind mutually exclusively to SPEN RRM 1-4 .Therefore, we hypothesized that RRM 4 would be required for both XIST binding and SRA binding.We found that RRM 4 was indeed required for XIST-SPEN or SRA-SPEN binding interactions (Fig. 4).The SPEN RRM 2- 3 construct was unable to bind XIST or SRA with high affinity.Furthermore, elimination of SPEN RRM 4 to create the construct SPEN RRM 2-3 also resulted in a significant loss of binding to the negative control RNA transcript BARD1.
Four XIST A-repeat units are sufficient for high-affinity SPEN RRM 1-4 binding To evaluate the minimum number of XIST A-repeat region RNA units required for binding to SPEN, we designed a series of RNA constructs with varying sequences of A-repeat units, linkers, and additional sequences in dimers, trimers, or multimers (Fig. 5A).Initial creation of RNA constructs was based on a structural model of the human XIST A-repeat region from previous work which combined enzymatic cleavage and chemical structure probing information (Maenner et al. 2010).The equilibrium dissociation constant of RNA binding to SPEN RRM 1-4 for each XIST A-repeat construct was measured and compared to XIST A-full binding (Table 2).Negative controls for these in vitro experiments were the BARD1 RNA and a small, structured RNA transcript, tRNA Cys .

A B C
First, we determined whether dimer or trimer A-repeat sequences were sufficient for high-affinity SPEN RRM 1-4 binding.The trimer A-repeat RNA consisted of XIST A-repeat numbers 6-8 (XIST A6-8 ), while the dimer A-repeat RNA consisted of repeat numbers 7-8 (XIST A7-8 ).RNA transcripts containing two or three repeat units could not recapitulate the higher binding affinity of the XIST A-full A-repeat region construct, and these RNA transcripts bound SPEN RRM 1-4 with similar affinity to the negative control transcripts (Fig. 5B; Table 2).
We next tested whether the identity of the specific XIST A-repeats included in the RNA transcript affected the RNAbinding affinity for SPEN RRM 1-4 .The SPEN binding of three different XIST transcripts containing repeat unit numbers 1-4 (XIST A1-4 ), repeat numbers 4-6 (XIST A4-6 ), or repeat numbers 5-8 (XIST A5-8 ) were compared to the SPEN binding of the XIST A-full RNA transcript containing all of the XIST repeats.We found that the exact identity of the repeat sequences included in each RNA construct did not significantly affect binding to SPEN RRM 1-4 (Fig. 5B).Since these three fragments were of similar length, but each contained portions of repeat number 5, we could not determine with certainty whether the presence of the sequence from repeat number 5 was enabling binding to SPEN RRM 1-4 .Therefore, we cloned an RNA construct containing the A-repeat numbers 6-9, plus an additional region of the downstream XIST sequence (XIST A6-9+ ) to create an RNA of similar length to the previous three constructs.Surprisingly, the XIST A6-9+ RNA bound to SPEN RRM 1-4 with a significantly higher affinity than XIST A-full .Finally, a shorter RNA transcript, containing only the XIST A-repeat numbers 6-9, without the extended downstream sequence (XIST A6-9 ), showed much lower binding to SPEN RRM 1-4 .

SPEN binding changes XIST A-repeat accessibility to chemical probing
We next used chemical structure probing to examine changes in accessibility and reactivity of the XIST A6-9+ RNA transcript when bound or unbound to SPEN RRM 1-4 .To this end, we performed SHAPE-MaP structural probing on the construct XIST 6-9+ by treating the RNA with 1-methyl-7-nitrosatoic anhydride (1m7) either in the presence or absence of SPEN RRM 1-4 .The RNA and protein concentrations for the SHAPE-MaP assay were selected to ensure complete binding of the RNA, based on the EMSA studies.Single nucleotide reactivity profiles were calculated for RNA only and RNA-protein complexes along the entire XIST A6-9+ transcript (Fig. 6A,B).Next, we calculated the  deltaSHAPE-MaP for XIST RNA-only sample reactivity at each nucleotide position of the RNA, compared to the reactivity within the XIST-SPEN complex (Fig. 6C).The deltaSHAPE-MaP plot of flexible RNA bases revealed several changes in nucleotide accessibility upon protein binding.Notably, unpaired adenosines in a doublestranded RNA sequence were reduced in accessibility on both repeat 7 and repeat 8 of the XIST A6-9+ RNA transcript after SPEN RRM 1-4 binding (Fig. 6D).In both repeat 7 and repeat 8 structures, the unpaired nucleotides were the conserved adenosines in the GAUAC sequence in the 5 ′ region of the A-repeat unit.Repeat 6 also forms a similar structure with an unpaired adenosine but this sequence was included in the primer region, so we did not obtain nucleotide reactivity profiles in this area.Based on the secondary structure modeling combined with deltaSHAPE-MaP, inter-repeat interactions at the GAUAC sequence were most strongly altered upon protein binding.
Presentation of an unpaired adenosine in a bulge or internal loop correlates with high-affinity SPEN binding Secondary structure predictions of the minimum free energy conformations for each truncated A-repeat construct were analyzed for the presence of common RNA sequence and structure motifs to identify features correlated with high-affinity SPEN binding in vitro (Fig. 7).We particularly focused on the differences between XIST A6-9 and XIST A6-9+ since the latter RNA showed the highest binding affinity for SPEN RRM 1-4 .An unpaired adenosine in the GAUAC sequence of the A-repeat, presented in a bulge or small internal loop within the context of a double-stranded RNA helix, was identified in each repeat in the computationally predicted secondary structure of the extended XIST A6-9+ transcript when compared to XIST A6-9 .
Similarly, the other tested XIST RNA transcripts containing an unpaired adenosine in one or more GAUAC, GUAUC, or GAUAUC sequence, when presented within a bulge or small internal loop (XIST A-full , XIST A1-4 , XIST A4-6 , and XIST A5-8 ) were able to bind SPEN RRM 1-4 with similar high affinity in the in vitro binding assays (Fig. 7, gray ovals).A statistically significant decrease in the in vitro binding affinity for SPEN RRM 1-4 was observed for XIST A-repeat transcripts XIST A6-9 , XIST A6-8 (trimer), and XIST A7-8 (dimer) which did not present this specific combination of sequence and structural motifs.

DISCUSSION
In this study, we dissected the regions required for RNA interaction and protein binding in the XIST-SPEN complex through in vitro binding experiments and RNA structure probing.This RNA-protein interaction is necessary for the initiation of X chromosome silencing in humans.We constructed a series of mutants of human SPEN including RRM 1-RRM 4 and quantified the affinity of these proteins for binding to SRA, XIST, or negative control RNA transcripts (Table 1).SPEN RRM 1-4 bound strongly to either SRA or the XIST A-repeat RNA region (XIST A-full ) with a similar K d of 100 nM for each RNA transcript.Binding of SPEN RRM 1-4 to a size-matched negative control transcript from an unrelated messenger RNA (BARD1) or tRNA Cys showed a significantly higher K d around 250-300 nM.A competitive EMSA of SPEN RRM 1-4 binding to SRA and XIST showed that SRA was able to outcompete XIST A-full binding to SPEN.The negative control BARD1 RNA could not compete with XIST A-full for SPEN binding.
Since SRA and XIST RNA are both functional binding partners of SPEN in human cells, we investigated the determinants of these RNA-protein interactions by mutation or deletion of SPEN RRMs predicted to be required for RNA binding.SPEN contains four RRM domains.Previous work focused on the functions of SPEN RRM 2, RRM 3, and RRM 4 in SRA binding.We found that inclusion of RRM 1 increased the binding affinity of SPEN for both SRA and the negative control RNA BARD1.However, inclusion of the SPEN RRM 1 domain did not increase the binding affinity of the protein for the XIST A-repeat region RNA.We concluded that the RRM 1 domain is not required for high-affinity XIST A-repeat region binding in vitro but increases SPEN binding to the other RNA transcripts tested.Although XIST and SRA bind SPEN RRM 1-4 in a mutually exclusive manner, the molecular details of the interactions between SPEN and these two RNA transcripts are not identical.Next, we investigated whether the same RRM domains of SPEN were required for both SRA and XIST binding.Previously, SPEN RRM 3 and RRM 4 were both shown to be required for specific binding to human SRA RNA.Binding depends on both single and double-stranded re-gions of SRA (Hatchell et al. 2006;Arieti et al. 2014).Mutation of SPEN RRM 3 or RRM 4 domains had different effects on binding to the XIST or SRA RNA transcripts.In contrast to SRA binding, we observed that the canonical SPEN RRM 3 RNA-binding residues were not required to maintain high-affinity interactions with the XIST A-repeat region RNA in vitro.These results were consistent with the previous finding that RRM 3 mutation results in an approximately twofold decrease in binding to XIST Arepeat by fluorescence anisotropy (Carter et al. 2020).The SPEN RRM 2-3 construct was unable to bind XIST or SRA with high affinity, so we concluded that a functional RRM 4 was required for binding to XIST A-repeat region RNA.The combination of RRM 2, mutated RRM 3, and RRM 4 did retain some additional XIST binding compared to the SPEN RRM 2-4 construct, so it is possible that the combination of RRM 2 and RRM 4 or the linkers between these regions may play a role in binding XIST.Additional experiments with specific mutation or deletion of residues in RRM 2 would be required to assess the contribution of RRM 2 to SPEN-XIST binding.Loss of RRM 4 also destroyed binding to all other RNA transcripts tested.In the SPEN RRM 2-4 crystal structure, a C-terminal helix occludes the β-sheet face of RRM 4 (Arieti et al. 2014).Atypical RRM domains with structures similar to SPEN RRM 4 have also been identified in other RBPs including Prp24 (Montemayor et al. 2014) and U1A (de Vries et al. 2022).
We also investigated the minimal number of XIST A-repeats required for high-affinity binding between SPEN and XIST in vitro.We found that at least four full or partial XIST RNA A-repeat sequences were required for high-affinity RNA-protein interactions with SPEN RRM 1-4 .Human XIST contains eight and a half A-repeat sequences, numbered here as repeats 1-9 for simplicity, beginning from the 5 ′ end of the RNA.In human XIST, nucleotide sequence variation occurs across individual repeats to give each repeat a distinct sequence identity (Supplemental Fig. S1).Four different XIST A-repeat constructs of similar length were created that each contained a set of four or more repeats: XIST A1-4 , XIST A4-6 , XIST A5-8 , and XIST A6-9+ .The linear sequence length of the fragment containing RNA alone or (B) XIST A6-9+ RNA bound to SPEN RRM 1-4 .(C) deltaSHAPE-MaP analysis of XIST RNA only compared to XIST-SPEN complex calculated from the two data sets above.Nucleotide positions colored in green showed decreased flexibility in the RNA-protein complex, while nucleotide positions colored in purple showed increased flexibility in the RNA-protein complex, compared to XIST A6-9+ RNA alone.(D) deltaSHAPE-MaP changes in nucleotide flexibility after SPEN binding were mapped onto the predicted secondary structure of XIST A6-9+ RNA alone.Nucleotide positions colored in green showed decreased flexibility in the RNA-protein complex, while nucleotide positions colored in purple showed increased flexibility in the RNA-protein complex, compared to XIST A6-9+ RNA alone.Unpaired adenosine sequences in the GAUAC sequence of the XIST A-repeat numbers 7 and 8 are indicated with brackets.
XIST A-repeat transcripts did not directly correlate with the in vitro binding affinity for SPEN.Each of these transcripts bound SPEN RRM 1-4 with similar or higher affinity than the XIST A-full RNA containing the full complement of human A-repeats.
Surprisingly, we found that the RNA transcript with the highest binding affinity for SPEN RRM 1-4 was the XIST A6-9+ construct comprised of a partial repeat 6, full repeats 7 and 8, and the partial repeat 9 plus additional RNA sequence downstream from the A-repeat region.Truncation of the additional downstream sequence of XIST A6-9+ to cre-ate a transcript consisting of only the terminal repeats (XIST A6-9 ) resulted in loss of binding to SPEN RRM 1-4.This transcript had a binding affinity similar to the nonspecific negative control RNA transcripts BARD1 and tRNA Cys .However, this specific downstream sequence does not appear to be required for high-affinity binding.The XIST A5-8 construct containing four complete A-repeats was also able to recover high-affinity binding to SPEN.Four-repeat structures of XIST RNA have been predicted to have low free energies and can contain functional inter-repeat interactions, based on previous work (Maenner et al. 2010).FIGURE 7. Presentation of an unpaired adenosine in a bulge or internal loop correlates with high-affinity SPEN binding.Minimum free energy secondary structure predictions for XIST A-repeat transcripts.RNA secondary structures are labeled with specific nucleotide sequences (left column) or colored according to the repeat number for XIST A-repeats 1-9 (right column).XIST A-repeat 1 is colored in pink, repeat 2 is orange, repeat 3 is yellow, repeat 4 is olive green, repeat 5 is dark green, repeat 6 is light blue, repeat 7 is dark blue, repeat 8 is violet, and repeat 9 is purple.Gray ovals indicate the location of unpaired adenosines in GUAUC, GAUAC, or GAUAUC sequences within a dsRNA context for each XIST A-repeat region transcript.The specific sequence (or sequences) containing an unpaired adenosine within an internal loop is color-coded by repeat number, and listed to the right of each secondary structure.
To identify potential protein interaction regions, we created a secondary structure model of XIST A6-9+ based on SHAPE-MaP structure probing in the presence or absence of SPEN RRM 1-4 .The per-nucleotide reactivity and flexibility of XIST A6-9+ RNA were calculated both with and without binding to the SPEN RRM 1-4 protein.Notably, unpaired adenosines in a double-stranded helical context in the GAUAC sequences on both repeat 7 and repeat 8 in our XIST A6-9+ model had reduced flexibility upon SPEN binding.We compared the sequence and structural features of the computationally predicted minimum free energy RNA secondary structures of each A-repeat construct.We particularly focused on differences between the XIST A6-9 and XIST A6-9+ RNA transcripts, since these two constructs contained similar sequence information but showed highly divergent in vitro SPEN binding affinities.The comparative analysis of computationally predicted RNA secondary structures based on sequence information revealed that the presence of a single adenosine in an unpaired bulge or small internal loop in a GAUAC, GUAUC, or GAUAUC sequence was correlated with high SPEN binding affinity of the RNA transcript in vitro (Fig. 7).A major difference between XIST A6-9+ and XIST A6-9 is the ability to present this specific combination of sequence and structural motifs.The additional downstream nonrepeat sequences included in the strongly binding XIST A6-9+ RNA can interact through G-U wobble pairs with the 5 ′ portion of repeat 8.These interactions create a "pseudo inter-repeat" structure to enable the presentation of an unpaired adenosine in the GAUAC sequence.In our SHAPE studies of XIST A6-9+ , the GAUAC sequences in both repeat 7 and repeat 8 were significantly reduced in flexibility when bound by SPEN RRM 1-4 .A significantly lower SPEN binding affinity was observed for XIST A6-9 RNA, where this combination of sequence and structural motifs is not present.
The binding data combined with the computational secondary structure predictions suggest that high-affinity XIST-SPEN interactions may be facilitated by the presentation of an unpaired adenosine in a GAUAC or similar sequence, within a bulge or small internal loop in a doublestranded RNA context.A minimum of four full or partial A-repeat sequences were required for high-affinity SPEN binding in vitro, and a significant loss in SPEN binding for XIST RNA was observed in transcripts which did not have this combination of sequence and structural motifs in the computationally predicted secondary structures based on sequence information.A dimer of A-repeats (XIST A7-8 ) or trimer of A-repeats (XIST A6-8 ) showed significantly lower binding affinity for SPEN RRM 1-4 (K d ∼300 nM), similar to the negative controls.RNA transcripts with higher numbers of A-repeats may be able to access multiple inter-repeat conformations that would not be possible to create with a dimer repeat construct, trimer repeat construct, or the shorter XIST A6-9 construct.In several constructs, repeat 4 was predicted to form noncanonical interactions with por-tions of the adjacent U-rich linkers.Although the U-rich linkers are overall less highly conserved than the repeat regions, they may still contribute to the XIST A-repeat region function (Liu et al. 2017).The negative control RNA transcripts tRNA Cys and BARD1 do not contain unpaired adenosines in a GAUAC or similar sequence in an extended stem-loop structure in the computationally predicted secondary structures.This pattern along with our SHAPE data suggests that SPEN binding to XIST A-repeat RNA could be facilitated by a combination of minimal sequence and structural components that can be formed by the conserved A-repeat units as well as regions of the linkers and downstream sequences.
We compared our results with other models of XIST A-repeat RNA structure and SPEN binding and observed several similarities with the previous results.Previous structural studies indicated that the conserved D1 domain of XIST RepA, which contains the A-repeat region, folds as an independent unit (Liu et al. 2017).A pairwise alignment of RNA secondary structure with BEAGLE (Mattei et al. 2015) revealed a significant correlation between the secondary structure of our XIST A6-9+ model and the same RNA region from a model of the entire XIST RNA based on a prior in-cell SHAPE-seq data set (Sun et al. 2019;Carter et al. 2020).The RNA models showed around 30% structural similarity, with an alignment P-value of 0.003, indicating that secondary structures formed by XIST A6-9+ in vitro are similar to the structures formed by the same region in the full-length XIST RNA in vivo.
Each XIST A-repeat contains two regions of high sequence conservation: a 5 ′ region with a CAUCG sequence and a 3 ′ region with a GAUAC sequence (Supplemental Fig. S1).Each repeat 1-9 also has an individual sequence identity based on nucleotide variation outside of these two conserved regions (Supplemental Fig. S1).The ability of A-repeat sequences to form thermodynamically stable AUCG tetraloops in the 5 ′ region and a propensity for the 3 ′ region containing the GAUAC sequence to create inter-repeat interactions was previously characterized by NMR (Duszczyk et al. 2011).Focusing on SPEN binding, only a fraction of XIST transcripts in the cell may be bound during in vivo structure probing experiments and a large number of A-repeat secondary structure conformations and combinations may exist simultaneously in cells.Our data on changes in the in vitro structure of the XIST RNA upon SPEN binding, and direct comparison of changes among specific mutants of XIST A-repeat RNA, have therefore enabled additional insights into the sequences and structures required for this specific RNA-protein binding interaction that are complementary to the observations obtained from in-cell experiments.The deltaSHAPE-MaP data indicate that the largest changes in nucleotide accessibility upon SPEN binding to XIST occurred in regions with a bulged adenosine in the conserved GAUAC sequence, when this sequence was presented SPEN binding to XIST A-repeat RNA www.rnajournal.org249 within a double-stranded context in an inter-repeat interaction.In some models of XIST A-repeat structure, GAUAC sequences are unpaired and form part of a large loop along with an adjacent U-rich linker sequence, or alternately, form a short stem-loop (Maenner et al. 2010;Fang et al. 2015;Liu et al. 2017;Carter et al. 2020).In CLIP experiments, the GAUAC sequence in the XIST A-repeat also showed the highest crosslinking signal after capture of the endogenous SPEN protein, indicating that this RNA region is likely to be important for SPEN binding in cells (Carter et al. 2020).Our results are consistent with the hypothesis that a specific type of inter-repeat interaction by XIST repeats is required to occur in order to present the appropriate combination of sequence and structural information for SPEN binding.NMR studies of XIST A-repeat dimers showed a potential for inter-repeat interactions to create an unpaired adenosine structure in the GAUAC sequence (Duszczyk et al. 2011).In our in vitro studies, XIST A-repeat structures predicted by minimum free energy computational secondary structure modeling to contain this specific combination of structure and sequence motifs also showed high-affinity SPEN binding.Inter-repeat interactions have been found in living cells, and complex inter-repeat interactions in the XIST RNA structure may create multiple simultaneously coexisting RNA structures in vivo (Lu et al. 2016).In addition, multiple tertiary structures have been described for the RepA transcript which contains the A-repeat region (Aguilar et al. 2022).A novel insight from our in vitro studies was that the identity of the XIST repeats included in each construct did not significantly affect SPEN binding interactions in vitro.Given that the specific identity of repeats included in each XIST construct did not impact the in vitro RNA-binding affinity for SPEN RRM 1-4 , we hypothesized that multiple combinations of inter-repeat interactions may be able to present an unpaired adenosine within an internal loop of the GAUAC or similar sequence to facilitate high-affinity SPEN binding.
These findings are consistent with previous studies of the in vivo functions of truncated or synthetic XIST A-repeat constructs in mouse and human cells, while providing novel insight on the specific sequence and structural information that may facilitate high-affinity SPEN binding.Constructs containing fewer than eight and a half A-repeats can still achieve silencing of a nearby transgene by XIST in human cells, though with much lower efficacy as repeat numbers decrease (Minks et al. 2013).Similarly, four synthetic consensus "XCR" repeats could still accomplish a low level of silencing on the X chromosome (Wutz et al. 2002).XIST may have co-opted the existing nucleic acid binding ability of SPEN to achieve developmentally regulated gene silencing (Elisaphenko et al. 2008;Carter et al. 2020), since SPEN proteins can interact with multiple functional RNA transcripts.These proteins can have multiple binding sites on autosomes in addition to interaction sites across the X chromosome (Dossin et al. 2020).Since XIST can effectively compete with SRA and other RNA transcripts for SPEN binding, unique binding modes for each RNA may enable precisely timed XIST RNA expression in development to enable recruitment of SPEN to the X chromosome during silencing initiation.Since our binding experiments were performed in vitro with unmodified RNA transcripts, it is not possible to evaluate the contribution of RNA modifications on XIST to SPEN binding.Further research will be needed to validate the physiological relevance of these findings and verify whether the presentation of an unpaired adenosine in the context of a double-stranded inter-repeat interaction can contribute to the binding of XIST and SPEN during XCI in human cells.
Evolutionary modulation of RNA and RBP interactions can occur through the incorporation of multiple protein domains (Lunde et al. 2007).In the case of SPEN, the presence of multiple RRM domains with different nucleotide binding specificities may allow the protein to achieve transient binding in the nuclear environment, while still preserving the ability to interact with specific RNA transcripts including SRA and XIST during development.Similarly, the inclusion of multiple XIST A-repeat sequences may enable flexibility and protection against insertions, deletions, or mutations, since only four repeats are required to form the correct presentation of sequence and structural motifs to facilitate high-affinity SPEN binding.In conclusion, this work provides insight into the complex interactions between XIST and SPEN, highlighting the contributions of different regions of the protein and RNA to forming a biologically critical interaction driving gene silencing in early female development.

Cloning and purification of SPEN RRM 1-4 and mutant proteins
Nucleotide sequence encoding human SPEN RRM 1-4 was cloned from HEK293 complementary DNA (cDNA) prepared from purified total RNA by reverse transcription.A 6XHistidine tag, Maltose binding protein, and a TEV cleavage site were added to the N-terminus of the protein.Point mutations were created using gene synthesis.The SPEN RRM 3 mutant constructs contained the following mutations: F282A, K311A, Y319A, F321A, and K353A.SPEN protein purifications were performed essentially following the method of Arieti et al. (2014), with some modifications as listed below.All proteins were produced with the same method.First, Escherichia coli Rosetta 2 cells transformed with the construct of interest were grown in Terrific Broth medium at 37°C until ∼0.4 OD 600 , followed by overnight induction at 16°C with 1 mM IPTG and 0.1% arabinose.E. coli were harvested by centrifugation at 1000g for 60 min, washed with 1× PBS, and then resuspended in 50 mM HEPES buffer, pH 7.5 containing 300 mM NaCl, 20 mM imidazole, 0.1% Triton-X100, 1 μg/mL DNase I, 1 μg/mL lysozyme, 5 mM β-mercaptoethanol, and 1× Roche cOmplete protease inhibitor (EDTA-free).After a 30-min incubation on ice, cells were sonicated using a Branson microtip, and the lysate was cleared by centrifugation at 10,000g for 20 min.The clarified lysate was then applied to a 10 mL Ni Sepharose HP column (17-5268-02 GE).The protein was eluted with 250 mM imidazole and dialyzed against 50 mM HEPES (pH 7.5), 300 mM NaCl, 20 mM imidazole, and 5 mM β-mercaptoethanol overnight.TEV protease was added, and the sample was incubated at 30°C for 2 h.This sample was then cleared with centrifugation at 10,000g for 20 min and the supernatant was applied to charged Ni-NTA resin again.The unbound material from this column was collected, diluted twofold with heparin binding buffer (50 mM HEPES [pH 7], 0.1% Triton-X, 5 mM β-mercaptoethanol) and then applied to a 2 mL heparin column (Hi-Trap Heparin HP GE).Elution was performed with a linear salt gradient between 0.05 and 2 M of NaCl.The protein was further purified by gel filtration chromatography with a HiLoad 16/600 Superdex 200 pg (28-9893-35, GE Healthcare) column in 50 mM HEPES (pH 7.5) containing 300 mM NaCl, 5 mM β-mercaptoethanol, 0.1% Triton-X.The protein eluted from the gel filtration column as monomers and was concentrated with a 10 kDa MWCO Amicon spin filter to ∼10 mg/mL.All protein samples were more than 95% pure as judged by Coomassie-stained SDS-PAGE.Sequences of the protein constructs used in this study are provided in Table 3.Purified bovine serum albumin (BSA) protein negative control was purchased from Sigma-Aldrich.

Cloning and production of XIST A-repeat RNA transcripts
XIST A-repeat RNA was cloned from human cDNA prepared from purified total RNA by reverse transcription.Primers were designed to target specific repeats or partial repeats and provide amplification specificity in the repeat region.All clones were validated by Sanger Sequencing through Azenta.Plasmid templates were linearized by restriction digestion, and RNA transcripts of interest were generated by in vitro runoff transcription using an NEB HiScribe T7 High Yield RNA synthesis kit (E2040S), with the addition of fluorophore-labeled UTPs: Cyanine 3-uridine-5 ′ -triphosphate (enhanced) from Enzo Life Sciences, Inc. (catalog #ENZ-42505), fluorescein-12-uridine-5 ′ -triphosphate from Enzo Life Sciences, Inc. (catalog #ENZ-42834), or cyanine 5-UTP from Apexbio Technology LLC (purchased through Fisher Scientific catalog #50-199-8343).In vitro transcribed RNA was purified with Zymo Clean & Concentrator-5 Kit.RNA transcript integrity and complexity were evaluated by gel electrophoresis or Agilent TapeStation analysis.Sequences of RNA constructs used in this study are provided in Table 3.

Electrophoretic mobility shift assays
RNA transcripts with unique fluorophores were diluted to 2 nM each in UltraPure water and were denatured at 85°C for 5 min followed by rapid cooling on ice.Purified SPEN proteins were incubated with labeled RNA for 30 min at room temperature in buffer conditions containing 20 mM Tris-HCl (pH 7.5), 150 mM KCl, 5 mM MgCl 2 , and 10% glycerol and then loaded onto a 4.5% TBE polyacrylamide gel for separation of monomers and complexes.Multiplexed EMSA gels were imaged on a Typhoon FLA 9500 with a pixel size of 100 µm for each fluorophore.The laser/filter combinations were 473 nm/BPB1 for fluorescein, 532 nm/BPG1 for Cy3, and 635 nm/LPR for Cy5.Between three and six replicate experiments were performed for each RNA-protein combination, and the intensity of bands on each gel image was quantified using Image Studio Lite with background correction.The equilibrium dissociation constant (K d ) for each reaction was calculated by fitting to the curve using the equation: The custom R code used for K d calculation from EMSA results was: 1 7. start = c(K_d = 10,n = 1), algorithm = "port", lower = 0.001, upper =10000)

summary(logistic_fit)
Competitive EMSAs were performed as described in Carter et al. (2020).Briefly, Cy3 labeled XIST A-full and SPEN RRM 1-4 concentrations were kept at 1 µM and 10 µM, respectively.Increasing concentrations of unlabeled competitor RNA were added.Competition reactions were done in buffer conditions containing 20 mM Tris-HCl (pH 7.5), 150 mM KCl, 5 mM MgCl 2 , and 10% glycerol, incubated at room temperature for 30 min and run on a 4.5% TBE polyacrylamide gel.The gels were imaged on a Typhoon FLA 9500 at 100 µM resolution using the laser/filter combination of 532 nm/BPG1 for Cy3.

SHAPE-MaP reactions, high-throughput sequencing, and data analysis
The SHAPE-MaP protocol was adapted from previously published protocols (Smola and Weeks 2018).We followed the small RNA workflow with the addition of NEBNext mRNA second strand synthesis following cDNA synthesis.SHAPE-MaP of XIST A6-9+ RNA was performed either in the presence or absence of 1.7 µM SPEN RRM 1-4 .After sequencing with MiSeq Micro at the UCSD Institute for Genomic Medicine, the SHAPE data were analyzed with ShapeMapper 2 (Busan and Weeks 2018).Changes in the nucleotide reactivity of XIST A6-9+ upon the binding of SPEN RRM 1-4 were calculated with deltaSHAPE_v1.0 (Smola and Weeks 2018).

Sequence alignments and prediction of RNA secondary structures
Minimum free energy predictions of RNA secondary structures were calculated using the Vienna RNA Package programs RNAfold and visualized with forna (Gruber et al. 2008) or VARNA (Darty et al. 2009), using parameters as previously described SPEN binding to XIST A-repeat RNA www.rnajournal.org251  (Mathews et al. 2004).For XIST A6-9+ , the per-nucleotide reactivity data from SHAPE data was used in addition to RNA primary sequence; for all other constructs the primary RNA sequence was used to predict the secondary structure.Multiple sequence alignments were performed with MAFFT (Katoh et al. 2002) and visualized with MView (Madeira et al. 2022).

SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.

TABLE 1 .
K d of XIST A-full , SRA, and BARD1 binding to SPEN protein constructs

TABLE 2 .
K d of XIST constructs and control RNA binding to SPEN RRM 1-4

TABLE 3 .
Sequences of protein and RNA constructs used in this study.