Sequence Organization of Repetitive Elements in the Flanking Regions of the Chloroplast Ribosomal Unit of Chlamydomonas Rtinhardii

The flanking regions and the end of the chloroplast ribosomal unit of Chlamydomonas reinhardii have been sequenced. The upstream region of the ribosomal unit contains three open reading frames coding for 111, 117 and 124 amino acids, respectively. The latter polypeptide is partially related to the ribosomal protein LI6 of E. coli. Two of the open reading frames overlap each other and are oriented in opposite direction. The region between these open reading frames and the 5 1 end of the 16S rRNA gene contains numerous short direct and inverted repeats which can be folded into large stem-loop structures. Sequence elements that resemble prokaryotic promoters are found in the same region. Several of the repeated elements are distributed throughout the non-coding regions of the chloroplast inverted repeat. Sequence comparison between the 5S rRNA and its gene does not reveal any significant sequence heterogeneity between the chloroplast 5S rRNA genes.


INTRODUCTION
As in most higher plants, the chloroplast ribosomal unit of Chlamydomonas reinhardii is contained within a segment of the chloroplast genome that is repeated in an inverted orientation (1).In C. reinhardii the ribosomal unit consists of the 16S, 7S, 3S, 23S and 5S rRNA genes (2).Previous work has revealed a remarkable sequence homology between the chloroplast rRNA gene sequences of C. reinhardii and those of higher plants and E. coli (3,4,5).It was also shown that a 1.5 kb restriction fragment on the 3' side of the ribosomal unit hybridizes to a large number of chloroplast DNA regions, indicating the existence of numerous repeats in this genome (7,8).
In order to gain more insights into the structure of the promoter region of the chloroplast rRNA genes and into the organization of the repeated elements, we have sequenced both ends of the ribosomal unit of C_. reinhardii.The sequences include 1449 bp upstream of the 16S rRNA gene and the complete sequence of the 5S rRNA gene and its 3'flanking regions.Analysis of the sequences of both flanking regions reveals a large number of direct and inverted repeats that appear to be scattered throughout the inverted repeat.An unusual feature of the promoter region is the presence of three open reading frames, two of which overlap each other and are oriented in opposite direction.One of these open reading frames appears to be related to a ribosomal protein gene of E. coli.

DNA.
The 3kb chloroplast BamHI fragment Ba 1 (1) was cut with Hindlll and the smaller of the two Hindlll-BamHI aubfragments was cloned into pBR322.The construction of the recombinant plasmid containing the chloroplast BamHI-EcoRI fragment BR1.3 was described previously (2).Plasmid DNA was isolated by published procedures (9).DNA sequencing was performed according to the chemical cleavage method of Maxam and Gilbert (10).Sequence analysis was performed on a Hewlett Packard computer, model 9845.

RNA.
Cellular RNA of C. reinhardii was prepared as described (11).The RNA was fractionated by centrifugation on a 5 to 20% linear sucrose gradient in lOOmM LiCl_, lOmM Tris-HCl pH7.8, 4mM EDTA, 0.2% SDS in a SW27 rotor for 18h at 25 000 revs/min.The top fractions of the gradient which contained mostly 5S RNA and tRNA were pooled and ethanol precipitated.This RNA was electrophoresed on a 8% polyacrylamide gel in TBE buffer (75mM Tris-borate, lmM EDTA, pH8.3).Under these conditions the chloroplast 5S RNA migrates slightly faster than the cytoplasmic 5S RNA.After extraction from the gel, the chloroplast 5S RNA was dephosphorylated with calf intestine alkaline phosphatase (Sigma) 32P and labelled at its 5"end using y -ATP and polynucleotide kinase (PL Biochemicals) .The labelled RNA was repurified by electrophoresis on a polyacrylamide gel in TBE-7M urea.RNA sequencing was performed as described (12,13).SI nuclease mapping was done as described by Berk and Sharp (14).

RESULTS AND DISCUSSION
The BamHI-EcoRI BR1.3 fragment contains 1449 bp of the upstream region of the 16 S rRNA gene plus the 5'terminal end of this gene.The restriction map of this region and the sequencing strategy are shown in fig. 1.The nucleotide sequence of the upstream region is displayed in fig. 2. The corresponding region has been well conserved in mustard (15) , maize (16) , tobacco (17), spinach (18) and Spirodela oliqorhiza (19), but the C. reinhardii sequence differs considerably from its higher plant counterpart.While in all higher plants examined a tRNA val gene is located about 300 bp upstream of the 5'end of the 16S rRNA, the comparable C. reinhardii region does not contain this tRNA val gene, nor does it contain any pseudo tRNA genes, as has been reported for Euglena (20).
Three open reading frames are present in the 5'upstream region of the chloroplast ribosomal unit of C. reinhardii (figs.1,2).Two of them, ORF1 and ORF3, have the same orientation as the 16S rRNA gene and they are located between positions 152 to 487 and 519 to 893, respectively (fig.2).They could possibly code for polypeptides of 111 and 124 amino acids.A computer search for related proteins has revealed a 20% amino acid sequence homology over a 55 amino acid stretch between the ORF3 polypeptide and the ribosoraal protein L16 of E. coli (21).It has been shown that 5-6 out of 33 polypeptides of the large chloroplast ribosomal subunit of C. reinhardii are synthesized within the chloroplast (22) .However the homologue of the E_. coli L16 protein has not yet been identified among the ribosomal proteins of C. reinhardii.It remains to be explored whether the ORF3 is indeed a ribosomal protein gene.Another possibility is that ORF3 is a pseudogene: we have been unable to detect a homologous transcript.
The open reading frame ORF2, (position 465 to 112 in fig.2) is oriented in the opposite direction.It overlaps nearly all of the first open reading frame and encodes a putative polypeptide of 117 amino acids.An open reading frame of opposite polarity to the rRNA genes has also been found in higher plants in the ribosomal upstream region.In mustard and maize the homologous open reading frames, of 44 and 55 codons respectively, share 30 equivalent codons (15).A similar open reading frame is also present in tobacco except that the 15 amino terminal codons are deleted.In spinach, the first 38 codons of an open reading frame have been determined which specifies an amino acid sequence that is different from the corresponding maize sequence (18).Comparison of the higher plant open reading frames with those of C. reinhardii does not reveal any significant amino acid sequence homology.However, the hydropathic In contrast to spinach where a transcript of the ribosomal open reading frame has been detected (18), none could be found in C. reinhardii for ORF 1, 2 or 3.It is noteworthy that while the chloroplast protein genes of C. reinhardii sequenced to date (23,24,25) contain the TAA stop codon, this is not observed for the three ribosomal open reading frames which end with TGA and TAG.Although there is no evidence that these open reading frames are expressed, it cannot be excluded that they are transcribed at a very low level and/or that their transcripts are unstable.
The 5'end of the ribosomal transcript (fig.2) was determined by SI nuclease digestion (fig.3) of hybrids formed between C. reinhardii RNA and a 5'end-labelled 391 bp Ddel fragment (fig. 1) that spans the 5'end of the 16S rRNA gene.Although this analysis does not allow one to discriminate between authentic transcriptional start sites and 5' termini of processed transcripts ( 26), transcription appears to start 72 bp upstream of the 5'end of the coding sequence of the mature 16S rRNA gene.The presumed start site is preceded by the sequences TAAATT  The DNA sequence shown in fig. 2 contains numerous nearly perfect direct and inverted repeats outside of the ribosomal coding regions.The positions of these repeats containing at least 10 contiguous bases are indicated in fig. 1.It is possible to combine several of these repeats into a large palindromic structure displayed in fig. 4. Secondary structures of this sort have been observed previously in the electron microscope in the 5 1 and 3 1 flanking regions of the 16S rRNA gene (2).An extensive secondary structure was also observed downstream of the chloroplast ribosomal unit (2).

-C T-A T-A T-« A-T C-O A-T T-A
In order to gain more insights into the structure of this downstream region, its DNA sequence was established.The region sequenced includes the gene of the chloroplast 5S RNA.RNA sequencing was also performed on the 5S rRNA, which allowed for precise mapping of the 5S rRNA gene boundaries.Since the RNA and DNA sequences are consistent, it does not appear that there is any significant sequence heterogeneity between the chloroplast 5S rRNA genes.Fig. 5 shows that the 5S RNA contains 121 nucleotides and that it can be folded into a structure similar to that of the E. coli 5S rRNA.It can be seen that the sequences of two of the loops are nearly identical in the 5S rRNA of £. reinhardii and E. coli.In contrast, there is no apparent homology between the chloroplast and cytoplasraic 5S rRNAs of C. reinhardii (28).
A spacer of 90 bp separates the 5' end of the 5S rRNA gene from the 3'end of the 23S rRNA gene (29).The downstream region of the 5S rRNA gene contains repetitive sequence elements, some of which are related to repeated motifs found in the ribosomal promoter region (element b, fig.6).In addition to the 5' and 3' flanking regions of the ribosomal operon of C. reinhardii, other non-coding regions of the inverted repeat have been sequenced.
They include the ribosomal 16S-23S spacer region (Schneider and Rochaix, unpublished results) and the 5'flanklng region of the psbA gene (Erickson, Rahire and Rochaix, unpublished results).A computer search for common sequences between all of these regions has revealed several repeated elements present either in the same or inverted orientation.The locations of these elements (a to i) are indicated in fig.6.Since these sequences have homology with the IR2, IR5, IR7 and IR9 repeats (fig.2) they may also be involved in the formation of secondary structures similar to that seen in fig. 4.
Palmer (30) has shown that the chloroplast inverted repeat of higher plants undergoes recombination which results in an inversion in the chloroplast genome of one single copy region relative to the other.A similar flipping mechanism has also been observed in £. reinhardii (31,32).In this organism it has been demonstrated that flipping still occurs in chloroplast deletion mutants lacking either end of the inverted repeat (32).This implies that the recombination events are not restricted to one unique site but that they can occur in several regions of the inverted repeat.It remains to be explored whether the repetitive elements described here play a role in this process.

-MOTFig. 2 .
Fig. 2. Nucleotide sequence of the 5'upstream region of the 16S rRNA gene of C. reinhardii as shown in fig. 1.The sequence of the non-coding ribosomal strand is shown.The protein sequences corresponding to the three open reading frames are indicated

+ c 1 Fig. 3 .
Mapping of the start site of transcription of the ribosomal unit.Left: Autoradiogram of A+G sequence ladder of the 5'end-labelled coding strand of the 391 Ddel fragment that spans the 5'end of B tne 16S rRNA gene (cf.fig. 1) .T i. Right: size of the protected fragment T A\ ^ after hybridization with C. reinhardii RNA A T \ ^^ and subsequent SI nuclease digestion.the C. reinhardii polypeptide encoded by ORF 2 (fig. 1) and of its maize counterpart are similar, (data not shown).Whether this observation has any functional significance remains an open question.

Fig. 4 .
Fig. 4. Stem-loop structure present upstream of the 16S rRNA gene of C. reinhardii.IR2 to IR9 correspond to the repeated elements shown in figs. 1 and 2. Key nucleotides are numbered as in fig. 2.

Fig. 5 .
Fig. 5. Upper: Sequence of the chloroplast 5S rRNA gene and of its flanking regions.The 5SRNA gene is framed.The two arrows indicate two 17bp direct repeats.Lower: Secondary structure of the chloroplast 5S RNA.The dark lines indicate conserved regions between C. reinhardii and E. coli 5S rRNA.

Fig. 6 .
Fig. 6.Location of repeated elements in the non-coding regions of the chloroplast inverted repeat of C. reinhardii.The 16S, 7S, 3S, 23S, 5S RNA genes (2), the psbA gene (24) and the tRNAile and tRNAala genes are indicated.Hatched areas in these genes represent introns.ORF 1+2 (overlapping open reading frames) and 0RF3 are as shown in figs.1,2.The different repeated elements are labelled a to i. BR refers to the 1449bp upstream region of the ribosomal unit (fig.2); HR represents the 1805bp 16S-7S spacer (Schneider and Rochaix, unpublished results), 5S indicates the DNA region of fig. 5 and IRJ refers to the 1060bp upstream region of psbA (Erickson and Rochaix, unpublished results).These four regions are shown enlarged.The ends of the inverted repeat are indicated by arrow heads.Size bars of lkb and lOObp are given for the entire repeat and for the enlarged portions, respectively.