Splicing regulation: From a parts list of regulatory elements to an integrated splicing code
Abstract
Alternative splicing of pre-mRNAs is a major contributor to both proteomic diversity and control of gene expression levels. Splicing is tightly regulated in different tissues and developmental stages, and its disruption can lead to a wide range of human diseases. An important long-term goal in the splicing field is to determine a set of rules or “code” for splicing that will enable prediction of the splicing pattern of any primary transcript from its sequence. Outside of the core splice site motifs, the bulk of the information required for splicing is thought to be contained in exonic and intronic cis-regulatory elements that function by recruitment of sequence-specific RNA-binding protein factors that either activate or repress the use of adjacent splice sites. Here, we summarize the current state of knowledge of splicing cis-regulatory elements and their context-dependent effects on splicing, emphasizing recent global/genome-wide studies and open questions.
Keywords
PRE-MRNA SPLICING
Because human genes typically contain multiple introns, the process of pre-mRNA splicing is an essential step in the expression of most genes. A majority of human genes undergo alternative splicing (AS), generating multiple splicing isoforms containing different combinations of exons (Johnson et al. 2003). The effects of AS on protein products can be dramatic, e.g., producing soluble versus membrane-bound forms of the Fas receptor that have opposing effects on apoptosis (Cascino et al. 1995), or producing isoforms of the Drosophila fruitless protein that act to specify sexual orientation (Demir and Dickson 2005). The major forms of AS are summarized in Figure 1A. Splicing regulation has been comprehensively described in several recent reviews (Black 2003; Konarska and Query 2005; Matlin et al. 2005; Blencowe 2006; House and Lynch 2008). Here, we briefly summarize some aspects of splicing specificity and regulation before turning to our main topic of splicing regulatory elements and the rules governing their activity.
(A) Major forms of alternative splicing. In many cases, these common forms can be combined to generate more complicated alternative splicing events. (B) A schematic of regulated splicing. (Open boxes) Exons, (jagged lines) introns, (brackets) splice sites (ss). The consensus motifs of ss are shown in pictogram, and the branch point adenosine is indicated. (Dashed lines) Two alternative splicing pathways, with the middle exon either included or excluded. Splicing is regulated by cis-elements (ESE, ESS, ISS, and ISE) and trans-acting splicing factors (SR proteins, hnRNP, and unknown factors).
The sequential phosphodiester transfer reactions involved in splicing are catalyzed by large ribonucleoprotein complexes known as spliceosomes. Containing more than 100 core proteins and five small nuclear RNAs (snRNAs U1, U2, U4, U5, and U6), spliceosomes may be the most complex machines in the cell (Zhou et al. 2002; Jurica and Moore 2003; Nilsen 2003). In addition to these core factors, additional regulatory proteins participate in the splicing of particular pre-mRNAs. Splicing of most introns is thought to occur cotranscriptionally with fairly extensive interactions between splicing factors and the core transcription machinery (Das et al. 2006; Hicks et al. 2006). Splicing mechanisms have also been well reviewed recently (Hertel and Graveley 2005; Konarska and Query 2005).
CORE SPLICING SIGNALS
Three sites, the 5′ splice site (5′ss), the 3′ splice site (3′ss), and the branch point sequence (BPS), participate in the splicing reaction and are present in every intron, and thus are known as the core splicing signals. These signals are recognized multiple times during spliceosome assembly, with the 5′ss interacting initially with U1 and later with U6 snRNP, and the BPS interacting with SF1/mBBP and later with U2 snRNP (Fig. 1B; Will and Lührmann 2007). Because the 5′ss and 3′ss can be mapped precisely by aligning cDNA or expressed sequence tag (EST) sequences to the genome, large datasets of accurately annotated exon/intron structures can be obtained, permitting development of statistical models of 5′ss and 3′ss motifs that capture second-order statistical interactions between positions to more accurately predict splice site locations (Yeo and Burge 2004). However, only a few dozen mammalian BPSs have been mapped. The limited size of the available BPS data set and the low information content of this motif make it difficult to derive a reliable sequence model to predict BPSs in introns. A recent study used comparative genomics to improve BPS prediction (Kol et al. 2005), and more human BPS sequences were identified recently by sequencing lariat RT-PCR products (GAO et al. 2008).
EXON DEFINITION AS A KEY STEP IN MAMMALIAN SPLICING REGULATION
A typical human gene contains relatively short exons (typically, 50–250 base pairs [bp] in length) separated by much larger introns (typically, hundreds to thousands of base pairs or more) that on average account for >90% of the primary transcript. This transcript geometry, and the predominant exon-skipping phenotype of splice site mutations, are consistent with the idea that in mammals splice sites are predominantly recognized in pairs across the exon through “exon definition” (Robberson et al. 1990; Nakai and Sakamoto 1994; Sterner et al. 1996). Exon definition involves initial interaction across the exon between factors recognizing the 5′ss and the upstream 3′ss, whereas in the alternative model, intron definition, interactions occur first across the intron between factors recognizing the 5′ss and the downstream 3′ss (for review, see Berget 1995). Recent analyses of the coevolution of the 5′ss and 3′ss have detected predominant cross-exon interactions in human and mouse, but cross-intron interactions in invertebrates, plants, and fungi, with pufferfishes representing an intermediate state, supporting the primacy of exon definition in mammals and intron definition in most other metazoans (Xiao et al. 2007). Because exon definition occurs early during splicing and may lead to commitment of the exon to splicing, this step is critical for splicing regulation and specificity. For example, polypyrimidine tract binding protein (PTB/hnRNP I) can inhibit exon definition complex formation by binding to an ESS sequence causing skipping of Fas exon 6 (Izquierdo et al. 2005); this same factor can also inhibit the spliceosome assembly across introns (intron definition) in repressing splicing of the c-src N1 exon (Sharma et al. 2005). Following initial splice site recognition in exon definition, a series of sequential structural rearrangements is required to activate the spliceosome, and commitment to alternative splice site pairing may occur after initial splice site recognition and E complex formation (Lim and Hertel 2004).
SPLICING REGULATORY ELEMENTS AND ASSOCIATED FACTORS
Monte Carlo simulations inserting artificial motifs with varying information content into transcripts in place of natural splice sites have been used to estimate that the core human splice site motifs contain only about half of the information required to accurately define exon/intron boundaries, even considering only short introns (Lim and Burge 2001). Large human introns typically contain numerous “decoy” splice sites: sequences with similar score/degree of consensus matching as authentic splice sites. Not infrequently, decoy splice sites occur in pairs as “pseudoexons,” which resemble authentic exons in terms of length and splice site strength, but are very rarely if ever spliced (Sun and Chasin 2000). Despite the large potential for errors, the splicing process appears to occur with very high fidelity, implying the widespread involvement of transcript features besides the core splice signals in splice site selection. This additional information is thought to derive in large part from the presence of numerous cis-regulatory elements that serve as either splicing enhancers or silencers. These elements are conventionally classified as exonic splicing enhancers (ESEs) or silencers (ESSs) if from an exonic location they function to promote or inhibit inclusion of the exon they reside in, and as intronic splicing enhancers (ISEs) or silencers (ISSs) if they enhance or inhibit usage of adjacent splice sites or exons from an intronic location. In general, these splicing regulatory elements (SREs) function by recruiting trans-acting splicing factors that activate or suppress splice site recognition or spliceosome assembly by various mechanisms (Matlin et al. 2005; Chasin 2007).
SREs have most often been identified by directed mutagenesis of alternatively spliced genes and by analysis of naturally occurring mutations that perturb splicing of disease genes. ESEs are very abundant in constitutive exons, but there is ample evidence that silencers play at least as important a role in splicing regulation as enhancers (Fairbrother et al. 2002; Wang et al. 2004; Zhang and Chasin 2004). Based on the selective constraints on constitutive exons for efficient and accurate splicing and on AS exons for regulated splicing, enhancing elements are expected to play predominant roles in constitutive splicing, with silencers being relatively more prominent in control of AS.
EXONIC SPLICING ENHANCERS AND SILENCERS
It is now well established that ESEs include a diverse range of sequences, and many if not all exons contain internal ESE sequences (Schaal and Maniatis 1999; Fairbrother et al. 2002; Cartegni et al. 2003). Most ESEs function by recruiting members of the SR protein family (for review, see Graveley 2000). These factors usually regulate splicing by binding ESEs through their N-terminal RRM domains and mediating protein–protein interactions that facilitate spliceosome assembly through C-terminal RS domains (Graveley and Maniatis 1998). However, at least for in vitro splicing, the RS domains may not always be required for ESE-dependent splicing activation (Shaw et al. 2007), and RS domains can interact with RNA at the BPS and 5′ss during splicing complex assembly, at least in some cases (Shen et al. 2004; Shen and Green 2006).
ESSs are often bound by splicing repressors of the hnRNP class, a diverse group of proteins containing one or more RNA-binding domains and sometimes splicing inhibitory domains such as glycine-rich motifs (Pozzoli and Sironi 2005). hnRNPs function by a wide variety of mechanisms. For example, PTB (hnRNP I) can block essential interactions between U1 and U2 snRNPs (Izquierdo et al. 2005; Sharma et al. 2005), whereas hnRNP A1 can inhibit splicing by binding on either side and “looping out” exons or by directly displacing snRNP binding (Zhu et al. 2001; Nasim et al. 2002). Other splicing inhibitory mechanisms may also be used by these or other repressors.
Beyond molecular genetics, global approaches, including both computational and experimental methods, have been developed to identify ESEs and ESSs on a large scale (for review, see Chasin 2007). ESEs have been identified experimentally by in vitro and in vivo SELEX approaches (Tian and Kole 1995; Coulter et al. 1997; Liu et al. 1998, 2000). ESEs have also been computationally identified based on their enrichment in authentic exons versus introns and in exons with weak splice sites (Fairbrother et al. 2002), and by their enrichment in authentic exons versus pseudoexons and 5′ UTRs of intronless genes (Zhang and Chasin 2004). These studies have generated comprehensive lists of ESE oligonucleotides that help to predict the splicing phenotypes of exonic mutations (Cartegni et al. 2002; Fairbrother et al. 2002; Pfarr et al. 2005). ESSs have also been predicted based on the assumption that ESSs are depleted from authentic exons (Zhang and Chasin 2004). We developed a cell-based fluorescence-activated screen (FAS) and used it to identify 133 ESS decanucleotides active in human cells from a library of random sequences (Wang et al. 2004). Further analyses of these FAS–ESS sequences suggested that ESSs play an important role in distinguishing authentic exons from pseudoexons and in regulating alternative splice site usage and intron retention (Wang et al. 2004, 2006).
Additional exonic SREs have been predicted based on sequence conservation (Goren et al. 2006), but the activities of these elements as either splicing enhancers or silencers or neutral sequences were observed to depend heavily on their exonic context. It is not clear whether the context dependence observed was due to the flexible activity of the sequences tested, or resulted from alterations to endogenous SREs in the reporters used. For example, all of the sequences inserted into the SXN minigene promoted exon skipping, which could be explained by disruption of an unknown ESE in the site where the foreign sequences were inserted (Goren et al. 2006). Other studies of ESSs have found much greater consistency of function in different exonic contexts (Wang et al. 2004, 2006).
INTRONIC SPLICING ENHANCERS AND SILENCERS
A number of intronic SREs are also known (for reviews, see Ladd and Cooper 2002; Zheng 2004), but fewer large-scale screens have been conducted for intronic elements, and many more intronic elements likely remain to be identified. One well characterized ISE is the G triplet (GGG) or G run (Gn; n ≥ 3), which often occur in clusters and can enhance recognition of adjacent 5′ss or 3′ss (McCullough and Berget 1997, 2000). This ISE is common in GC-rich introns and is conserved between human and mouse (Yeo et al. 2004). Intronic CA repeats in several cases can enhance splicing of upstream exons, probably through binding of hnRNP L (Hui et al. 2005; Hung et al. 2007). UGCAUG hexanucleotides or slight variations often occur downstream of neuron-specific exons and function as ISEs by binding to the brain- and muscle-specific splicing factors Fox-1/Fox-2 (Brudno et al. 2001; Jin et al. 2003; Minovitsky et al. 2005; Nakahata and Kawamoto 2005; Underwood et al. 2005). Pairs of YCAY motifs (Y = C or U) are recognized by the neuron-specific Nova family of splicing factors to regulate a large number of splicing events in the brain (Jensen et al. 2000; Ule et al. 2003). Interestingly, depending on their relative location in pre-mRNA, YCAY pairs can also function as either ESSs or ISSs (Hui et al. 2005; Ule et al. 2006). Such context dependence will be discussed in more detail below.
Characterized ISSs include binding sites for the splicing repressors PTB and hnRNP A1 (for reviews, see Zheng 2004; Matlin et al. 2005), CA-rich sequences bound by hnRNP L (Hui et al. 2005), specific octamers flanking exon IIIb of the FGFR2 gene (Wagner et al. 2005), and two elements in intron 7 of human SMN2, a therapeutic target for spinal muscular atropy (Singh et al. 2006; Kashima et al. 2007).
Intronic elements (ISS and ISE) are likely of primary importance in regulating AS events, as the intronic regions surrounding alternative exons are far more conserved in mammals than those surrounding constitutive exons, out to a distance of 150 bp or more (Sorek and Ast 2003). Such increased conservation has been used to predict unannotated alternative exons (Sorek et al. 2004; Yeo et al. 2005), and to predict intronic SREs (Voelker and Berglund 2007; Yeo et al. 2007). In the predictions of intronic SREs, both groups identified the Fox-1/2 binding motif UGCAUG as the most conserved sequence near AS exons, and the elements identified by Yeo and colleagues included most known binding motifs associated with tissue-specific splicing factors. Remarkably, not only the Fox-1/Fox-2 splicing factor, but also its highly specific binding motif UGCAUG, are conserved from nematodes to mammals (Kabat et al. 2006), indicating an ancient role in splicing regulation. The analyses by Voelker and colleagues also suggested that the AU-rich motifs are strongly associated with constitutive splicing and may function as ISEs (Voelker and Berglund 2007); rigorous controls for GC content effects and experimental tests will be needed to firmly establish this result.
Splicing enhancers and silencers often function additively, with additional copies increasing their effect on splicing regulation (e.g., Huh and Hynes 1994; McCullough and Berget 1997; Chou et al. 2000; Wang et al. 2004; Zhang and Chasin 2004), either because they increase the affinity of the associated factor (Dominguez and Allain 2006) or because they increase the copy numbers of the factor that are recruited, sometimes in synergistic fashion. Different SREs may also function cooperatively to regulate alternative splicing. For example, exonic UAGG motifs and intronic GGGG motifs overlapping the 5′ss can function cooperatively to silence the brain-specific CI cassette exon (exon 19) of the glutamate NMDA R1 receptor gene (Han et al. 2005); the purine-rich ESE in exon N1 of the c-src gene can cooperate with the downstream G-run ISE to increase exon N1 inclusion (Modafferi and Black 1999)
ELEMENTS THAT REGULATE INTRON RETENTION
Among the major types of alternative splicing, intron retention is unique in that it does not involve a choice between competing pairs of splice sites. Instead, it involves the choice between using a pair of splice sites to excise an intron and bypassing splicing to export intron-containing mRNA to the cytoplasm. Therefore, intron retention is probably regulated both by factors generally involved in splicing—some of which may also play a role in mRNA export—and by dedicated mRNA export factors (for review, see Reed and Cheng 2005).
Many SREs that affect other types of AS can regulate intron retention when properly situated (Sakabe and de Souza 2007). For example, G-run ISEs can promote splicing of a retained intron in human thrombopoietin (Marcucci et al. 2007), and a subset of FAS–ESS elements can activate splicing of a retained intron (Wang et al. 2006). In addition, the CA-rich motifs that bind to hnRNP L can enhance the splicing of multiple retained introns as revealed by splicing microarray analyses following RNAi against hnRNP L (Hung et al. 2007). Another intensely studied case is the Drosophila P-element third intron (INV3), which is retained in somatic cells but is fully spliced in germ cells (Rio 1991). One element that promotes intron retention is a 5′ss-like sequence in the upstream exon (Siebel et al. 1992); similar 5′ss-like sequences have ESS activity in the ATM gene (Pagani et al. 2002) and were identified in the FAS–ESS screen (Wang et al. 2004). Some protein factors that regulate INV3 retention are homologous with vertebrate proteins that regulate splicing by binding to splicing enhancers or silencers (Siebel et al. 1994, 1995; Min et al. 1997). Therefore, it is reasonable to speculate that most general SREs can also regulate intron retention since they usually have a direct effect on splice site activity.
Perhaps the best-studied systems involving regulated intron retention occur in retroviruses (e.g., HIV) that need to balance between splicing of mRNA and transport of unspliced genomic RNA to the cytoplasm for packaging (Fischer et al. 1999). Retroviruses use many general SREs to regulate splicing of their RNA (for review, see Stoltzfus and Madsen 2006) and also use RNA elements to regulate nuclear export of unspliced RNA (for reviews, see Pollard and Malim 1998; Cullen 2003). How HIV RNAs balance between the two pathways is not completely clear, but some common elements might serve as a functional bridge between the two pathways by participating in the regulation of both splicing and RNA export (e.g., an ESS recognized by hnRNP A1 [Asai et al. 2003] and the Rev response elements [Pongoski et al. 2002]).
CONTEXT DEPENDENCE OF SRES
It was realized early in the study of splicing that the activities of SREs may depend on the relative locations of the elements in pre-mRNAs. For example, G triplets commonly enhance splicing from intronic locations (McCullough and Berget 1997), but they function as splicing silencers when located in exons (Chen et al. 1999). A more subtle effect is that the activity of SR proteins to promote splicing depends on the distance between the ESE and the adjacent splice site (Graveley et al. 1998). In some cases, the SREs identified in one AS event fail to regulate splicing when located in a heterologous exon/intron context. These phenomena may be collectively described as the “context dependence” of SREs and can be considered to fall into two categories: (1) location-dependent activity, in which activity varies with relative positions in the pre-mRNA (Fig. 2A); and (2) gene-dependent activity, in which activity observed in one gene is lost when the SRE is moved to another (Fig. 2B).
Schematic of two types of context dependence for SREs. (A) Location-dependent activity of SREs, in which activity varies with pre-mRNA position. For example, G runs can function as both ESSs and ISEs (upper panel), and some SR protein binding sites can function as both ESEs and ISSs (lower panel). (Yellow) Alternative exons; (blue) constitutive exons. (B) Gene-dependent activity of SREs, in which activity observed in one gene is lost when the SRE is moved to another. Different genes are shown in different colors.
Location-dependent activity reflects the flexibility of splicing regulatory factors in their interactions with core splicing machinery. Given the size and complexity of spliceosomes and their multistep assembly, it is perhaps not surprising that the activities of factors that interact with core spliceosome components would vary depending on their locations relative to these components. For example, the G-run-binding factor hnRNP H can participate in a splicing enhancer complex when G runs are located downstream of the 5′ss (Chou et al. 1999; Caputi and Zahler 2001; Hastings et al. 2001; Schaub et al. 2007) but can inhibit splicing when similar sequences are located in an exon (Chen et al. 1999; Caputi and Zahler 2001).
Although the degree of context dependence of SREs may initially appear chaotic or confusing, there are usually patterns and rules that summarize this activity. For example, the dual activity of G runs as ISEs and ESSs can be conceptually interpreted as defining a region of pre-mRNA as intron rather than exon. We have observed that most ESSs inhibit the intron-proximal 5′ss or 3′ss when located between competing alternative splice sites, and that some ESSs, including G runs, can promote splicing of retained introns when located inside the intron (Wang et al. 2006). The underlying mechanism for these activities of G runs could involve inhibition by hnRNP F/H of exon definition complex formation “across” the site of binding. As noted above, clusters of YCAY motifs that bind the Nova family of neuron-specific splicing factors can function as ESSs, ISEs, or ISSs, depending on their position relative to the regulated exon. An “RNA map” describing these context-dependent activities has been generated that predicts the direction of Nova-dependent splicing regulation with very high accuracy (Ule et al. 2006). This study, including also a dissection of the mechanisms of splicing activation and repression by Nova, represents an important demonstration that the set of splicing regulatory activities of a factor can be rationalized.
Analogous to the common overlap between ESSs and ISEs, some SR proteins can promote splicing when bound to sites (ESEs) in exons, and also inhibit splicing when bound to intronic (ISS) sites (Kanopka et al. 1996; Ibrahim el et al. 2005; Buratti et al. 2007). Results of a FAS-based screen for ISSs suggest that overlap between ISS and ESE activity could be a general phenomenon (Z. Wang, J. Zhang, X. Xiao, and C.B. Burge, unpubl.). A recent report suggested the counterintuitive result that some elements (designated “ESRs”) can function as both ESSs and ESEs depending on exonic context (Goren et al. 2006). However, in our own analyses, large sets of ESEs and ESSs identified using entirely different methods (statistically-based and experimentally-based) were nonoverlapping, suggesting that a single sequence is unlikely to commonly function as both an ESS and an ESE (Fairbrother et al. 2002; Wang et al. 2004).
For intronic elements, the situation is much more complex. For example, Nova binding sequences commonly function as ISEs when located near the splice sites of the intron downstream of the regulated exon and as ISSs when located near the 5′ss of the upstream intron (Ule et al. 2006). Similarly, the Fox-1/2 binding site (UGCAUG) can function as an ISS when located upstream of an exon (Jin et al. 2003) and as an ISE when located downstream of an alternative exon (Huh and Hynes 1994; Lim and Sharp 1998; Jin et al. 2003). CA repeats bound by hnRNP L can function either as ISEs or ISSs depending on their distance to the upstream exon (Hui et al. 2005).
Gene-dependent activity of SREs may often result from factors that determine whether a cis-element can be recognized by the associated factor. Because most splicing regulatory cis-elements are short RNA oligonucleotides, they tend to be very abundant in pre-mRNA. Based on a recent comparative analysis of exonic splicing elements, it was estimated that about 1000 hexamers can function as either ESSs or ESEs (Stadler et al. 2006). Therefore, every pre-mRNA will contain many potential splicing regulatory motifs, only a fraction of which may be recognized by trans-factors. This is analogous to transcriptional regulation in that a search near a promoter region will predict a large number of transcription factor binding sites, only a small fraction of which are functional. Determining which cis-elements will be recognized remains an open problem. One obvious factor is the local structure of the pre-mRNA, which could affect cis-element accessibility. For example, RNA structure in the fibronectin EDA exon appears to affect recognition of an ESE by SR proteins, and this structure accounts for much of the difference between the splicing behavior of the orthologous exons in human and mouse (Buratti et al. 2004). A recent analysis of experimentally determined SREs suggested that these SRE motifs are significantly enriched in single-stranded regions of pre-mRNA (Hiller et al. 2007), suggesting that pre-mRNA secondary structure may play a general role in determining SRE function. Some factors, including Nova and muscle-blind-like splicing factors, recognize their cognate motifs in specific structural contexts (Jensen et al. 2000; Warf and Berglund 2007).
In addition to affecting SRE accessibility, secondary structures may also regulate splicing by directly affecting splice site accessibility. For example, a stem–loop structure at the 5′ss of exon 10 of the human tau gene directly affects the activity of the 5′ss, with stabilization of this structure decreasing exon 10 inclusion and destabilization of this stem–loop increasing exon 10 inclusion (Donahue et al. 2006). Another remarkable example was found in exon 6 of the Drosophila Dscam gene, where the intronic secondary structure ensures that inclusion of 48 alternative exons occurs in a mutually exclusive fashion (Graveley 2005). However, it is unclear whether examples like this represent unusual cases or are a general rule, and even for tau exon 10 the role of structure in splicing regulation is debated (D'Souza and Schellenberg 2002). Spliceosomes contain multiple RNA helicase components that can unwind RNA structures and remodel RNA/protein complexes (for review, see Bleichert and Baserga 2007). Although the primary function of these spliceosome-associated helicases appears to be rearrangement of snRNA/snRNA, snRNA/pre-mRNA, and snRNA/protein interactions in the spliceosome, at least one appears to function at an earlier stage, influencing the alternative splicing of the CD44 pre-mRNA, possibly by remodeling its structure and/or associated protein complexes (Honig et al. 2002; Lee 2002). By analogy to translation, where ribosome-associated helicases alter mRNA structure to facilitate translocation, it is tempting to speculate that spliceosome-associated helicases may also disrupt splicing-inhibitory structure in pre-mRNAs. As the general roles of structure in splicing regulation are still not clearly defined, large-scale measurements of RNA structures should be very valuable. Some recently developed methods like selective 2′-hydroxyl acylation and primer extension (SHAPE) provide new possibilities for high-throughput measurements of RNA structures (Merino et al. 2005), but high-throughput methods capable of assessing structure in vivo still need to be developed.
The activities of SREs are of course dependent on the presence and activity of the associated trans-factors, and this dependence is probably responsible for most of the tissue- or cell-type-specific splicing. SRE activity can also respond to external signals that alter the expression or activity of specific trans-factors (for review, see Shin and Manley 2004). The mechanisms by which inducible splicing responds to different stimuli can be very diverse, including, but not limited to, neuronal depolarization through CaM kinase (Xie and Black 2001; Lee et al. 2007), heat shock response through SR protein dephosphorylation (Shin and Manley 2002; Shin et al. 2004), response to T cell activation through an inducible ESS bound by hnRNP L (Rothrock et al. 2003, 2005), and response to neuronal excitation through a UAGG motif recognized by hnRNP A1 (Han et al. 2005).
GLOBAL ANALYSES OF SPLICING REGULATION
A number of new technologies have been developed recently for genome-wide analysis of AS. Several microarray platforms have been designed to distinguish between different splicing isoforms and to detect AS at a genomic scale. “Exon junction arrays” represent one early design, with high-density oligonucleotide probes targeted to the junctions between consecutive exons (Johnson et al. 2003). Other designs include the use of probe sets to target bodies and junctions of constitutive and alternative exons (Clark et al. 2002; Pan et al. 2004; Blanchette et al. 2005; Sugnet et al. 2006), as well as the use of bead-based fiber-optic microarray platforms with high detection sensitivity (Yeakley et al. 2002). These designs have facilitated analyses of genome-wide alternative splicing in human, mouse, and chimp (Srinivasan et al. 2005; Boutz et al. 2007; Calarco et al. 2007; Ip et al. 2007; Ni et al. 2007), as well as detection of the global impact of specific splicing factors or environmental stimuli on splicing regulation (Park et al. 2004; Hung et al. 2007; Makeyev et al. 2007; Pleiss et al. 2007a,b). A very high-density “exon array” containing probes in essentially all known and predicted human exons was recently developed by Affymetrix (Gardina et al. 2006). Generally speaking, these array platforms appear able to identify a subset of AS events with high confidence, but have an unknown and probably substantial rate of false negatives.
The systematic identification of RNA targets for different trans-factors can also be achieved by some new approaches such as a cross-linking/immunoprecipitation (CLIP) (Ule et al. 2003), RNP immunoprecipitation (RIP) (Keene et al. 2006; Townley-Tilson et al. 2006), and genomic SELEX (Lorenz et al. 2006). These analyses have the potential to identify regulatory targets of a factor and can be applied genome-wide when coupled with microarray or high-throughput sequencing technologies. Analysis of the target sequences will help to define the sequence determinants of binding, and may also help to identify cooperative or antagonistic relationships between different factors. Since only a subset of binding events confers regulatory activity, it is important to also have evidence of regulation. Such evidence can be obtained from knockout/knockdown or overexpression of the factor, which can be also applied on a genome-wide scale when coupled with isoform-specific microarrays (Blanchette et al. 2005), or with high-throughput sequencing.
SPLICING SIMULATION: PUTTING THE PIECES TOGETHER
An important long-term goal in the splicing field is to determine a “splicing code:” a set of rules that can predict the splicing pattern of any primary transcript sequence (Fu 2004; Matlin et al. 2005). As large “parts lists” of splicing regulatory cis-elements are identified, and identities and functions of associated trans-acting factors are determined, a natural next step is to assemble the available information into a predictive framework to simulate the recognition of exons and introns that occurs during splicing. An initial approach to this very challenging problem was the development of the general splicing simulation algorithm, ExonScan (Fig. 3; Wang et al. 2004). The goal of splicing simulation is to predict the splicing patterns of transcripts based only on the information that is accessible and known (or strongly implicated) to be recognized by the nuclear pre-mRNA splicing machinery. Thus, in addition to predictive accuracy, faithfulness to the in vivo mechanism is important in splicing simulation, and an important application of splicing simulation is to evaluate how different sequence features contribute to the determination of splicing specificity. This philosophy is different from that of most gene prediction/gene finding algorithms, whose goal is generally to make the most accurate prediction of exon and gene locations using whatever information is available. For example, most current gene finders use reading frame consistency of exons and/or cross-species conservation in their predictions, information that is inaccessible to the spliceosome.
Schematic diagram of a general splicing simulator, ExonScan. (Green boxes) Splicing enhancers, (red boxes) silencers. For simplicity, only ESSs and ESEs are indicated. Step 1, Identify the exon candidate as a pair of splice site (shown as brackets) that is 50–250 bp apart. Step 2, Score the splice sites and splicing enhancers and silencers, and sum up for the score of each exon candidate. Step 3, Determine the threshold and make the final prediction.
The ExonScan algorithm, illustrated in Figure 3, involves scanning the pre-mRNA for pairs of nearby potential 3′ss and 5′ss, and scoring these sites and nearby exonic and intronic SREs. Log-odds scoring is used, which rewards elements based on their statistical enrichment in the relevant location relative to their background frequency in the genome and naturally assigns positive scores to enhancers and negative scores to silencers. The results generated with this algorithm provided important clues to the in vivo function of ESSs (Wang et al. 2004). Such simulation should be improved by use of more refined sets of sequence elements and context-dependent scoring schemes, and by consideration of the functional interactions between different elements, as inferred from experimental assays or from patterns of coevolution (Xiao et al. 2007).
SPLICING REGULATORY NETWORK AS A SUBNETWORK OF GENE REGULATION
The splicing regulatory network is part of a larger network of gene regulation with which it is linked both physically and functionally. Most introns are thought to be spliced cotranscriptionally (de la Mata et al. 2003; Das et al. 2006; Hicks et al. 2006), and SR proteins associate with the C-terminal domain of RNA polymerase II (de la Mata and Kornblihtt 2006; Das et al. 2007). As a downstream event of transcription, AS can be regulated by factors that affect transcription. For example, a mutated RNA polymerase II with a slow elongation rate had been shown to affect the AS of endogenous genes (de la Mata et al. 2003), and the insertion of a transcription-pausing element, MAZ4, into a minigene construct can influence splicing of an alternative exon (Robson-Dixon and Garcia-Blanco 2004). Therefore, it is tempting to speculate that some of the identified SREs could act through effects on transcription, e.g., functioning as a transcription pause site to promote the inclusion of a weak upstream exon. In addition, the whole process of mRNA biosynthesis and processing, including transcription, 5′ capping, splicing, polyadenylation, and transport is extensively coupled, with a number of factors involved in more than one step of this process (Hieronymus and Silver 2004; Kornblihtt et al. 2004). The splicing regulation network should therefore be viewed as a specialized subnetwork of a more general gene regulatory netwok.
PERSPECTIVE
Sorting out the complex network that controls constitutive and alternative splicing represents a major challenge for post-genomic biology. A promising route to such global understanding is a “bottom-up” approach, involving systematic identification of the cis-regulatory components of the network, determination of rules—context dependence or otherwise—for their activity and functional interactions with other elements, and integration of these rules into simulation algorithms representing successive approximations to the splicing code. Beyond the challenges discussed above, another important challenge will be to learn how to generalize the cis-elements, factors, and rules identified in one system (e.g., in one cell type or developmental stage) to others. This will undoubtedly require new experimental approaches that can be readily adapted to a variety of systems.
Given the complexity of splicing regulation, the splicing code will not have the simple tabular form of the genetic code. Instead, it may look more like the U.S. tax code (i.e., IRS form 1040), with a variety of tables and subtables that are applicable in different circumstances, and a certain amount of arithmetic required to obtain the correct answer.
ACKNOWLEDGMENTS
We thank Xinshu Xiao for critical reading of our manuscript. Our research is supported by grants from the NIH and NSF (C.B.B.) and from the Damon Runyon Cancer Research Foundation (Z.W.).
Footnotes
-
Reprint requests to: Zefeng Wang, Department of Pharmacology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; e-mail: zefeng{at}med.unc.edu; fax: (919) 966-5640; or Christopher B. Burge, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; e-mail: cburge{at}mit.edu; fax: (617) 452-2936.
-
Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.876308.
- Copyright © 2008 RNA Society





