Sequence of MmRV

Nucleotide sequence of proviral genome of a novel type-C retrovirus (MmRV), taken from position 112341 to 121005 of Mus musculus PAC Clone 657p21 (Accession: AC005743). Total length: 8665 bp. This sequence has been deposited in GenBank (Accession number: XXXXXXX). The 9 bp imperfect repeats that define the LTRs have marked in green- note that 2 bp are deleted from the terminal copies of these repeats upon insertion. The CAT box is not clearly identifiable (see Ref 5), though a candidate sequence appears upstream of the TATA box on the opposite strand. The repeat region spanning U3 and R is marked in italics, and the following features identified: terminal 9 bp repeats (blue), pol-like motif (bold), 12 bp inverse palindrome (underlined), poly-adenylation signal (underlined). Upstream of the gag ATG start codon, a CTG start codon defines the start of 99 bp of glycosylated Gag. The untranslated region between U5 and the start of glycosylated gag contains a variable number (two to six) of perfect 35 bp repeats.

     |U3->
1    TGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGTTCCCAGAACACTCCTG

61   AACTCTTCACCCTAGAATGCATTCCTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAAC

121  TAAAGACTGTTCCAAGAACATTTTTGAGATAAGGGCCTCCTGGAACAACCTCAGAATGAA

181  CCGGGTACATTGCCAAATAATAGGACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGA

241  ACCCCTTGTCCCTTGGCAGAACCCCTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCT

                        promoter       <-U3||R->
301  CTCCCCCCTTGAGTTTTCCTATATAAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCC

                                 inverse palindrome   repeat region
361  TCTACACCACTAGGTGTATGAGTTTCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCT
                               pol motif

           polyA signal  <-R||U5->
421  GCTGCTTTATTAAATCTTGCCTTCAACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTC

                                              <-U5||pbs->
481  CGCGGCTGTCCCGAGGCTTGAGTGAGGGTCTCCCTTCGGGGGTCTTTCATTTGGGGGCTC

     primer binding site                    splice donor site
541  GTCCGGGATCAGTGCGACCACCCAGAGACCCTAGACCCACTTTAAGGTAAGATTCTTTGA

601  CCTGTCTTGGTTTGGTGTCTCTGTTCTGTTTCTAAGTTTGGTGCGATCGCAGTTTCGGTT

661  TTGCGGACGCTCAGTGAGACTGCGCTCCGAGAGGGAACGCGGGGTGGATAAGGATAGACG

721  TGTCCAGGTGTCCGCCGTCCGTTCGCCCTGGGAGACGTCCCAGGAGGAACAGGGGAGGAC

                                                   [     Rpt1
781  CAGGGACGCCTGGTGGACCCCTTTGGAGGCCAAGAGACCATCTGGGGTTGCGAGATCGTG

                         ]   [     Rpt2                        ]
841  GGTTCGAGTCCCACCTCGTGCCTTGTTGCGAGATCGTGGGTTCGAGTCCCACCTCGTGCA

                                               |glyco-gag->
901  GAGGGTCTCAATCGGCCGGCCTTAGAAAAGCCATCTGATTCTCTGAGTTGCTTGTGGTCG
                                               LeuSerCysLeuTrpSer

961  ACGCGAAGTCGCCGCCGCTTTTGGTTTCTTTTTTGTCTTAGTCTCGTGTTCGCTCTTGTT
     ThrArgSerArgArgArgPheTrpPheLeuPheCysLeuSerLeuValPheAlaLeuVal

                          |gag->
1021 GTGTCTACTATTGTTCTGGAAATGGGACAATCTGTGTCCACTCCCCTTTCTCTAACTCTG
     ValSerThrIleValLeuGluMetGlyGlnSerValSerThrProLeuSerLeuThrLeu

1081 GAGCATTGGAAGGAGGTGCGGGTCAGAGCTCACAACCAGTCGGTGGAGGTCAGAAAGGGT
     GluHisTrpLysGluValArgValArgAlaHisAsnGlnSerValGluValArgLysGly

1141 CCGTGGCAGACCTTTTGCACCTCCGAGTGGCCGACGTTTGGAGTGGGCTGGCCACCAGAA
     ProTrpGlnThrPheCysThrSerGluTrpProThrPheGlyValGlyTrpProProGlu

1201 GGTGCTTTTGACTTATCACTAATCGCCGCCGTCAGGCGAATTGTTTTTCAGGAGGAAGGG
     GlyAlaPheAspLeuSerLeuIleAlaAlaValArgArgIleValPheGlnGluGluGly

1261 GGTCACCCTGATCAGATCCCCTACATTGTGACCTGGCAGAATCTCGTCCAATTCCCACCT
     GlyHisProAspGlnIleProTyrIleValThrTrpGlnAsnLeuValGlnPheProPro

1321 CCGTGGGTCAAGCCTTGGACCCCAAATTCTTCGAAACTGACGGTCGCGGTTGCCCAGTCT
     ProTrpValLysProTrpThrProAsnSerSerLysLeuThrValAlaValAlaGlnSer

1381 GATGCAGCCGGAAAGTCCAGCCCGTCAGCTCCCCCCAAGATTTATCCAGAGATTGACGAC
     AspAlaAlaGlyLysSerSerProSerAlaProProLysIleTyrProGluIleAspAsp

1441 CTCCTCTGGATGGACTCCCAACCTCCCCCTTATCCCCTGCCCCAGCAGCCACCTGCAGCC
     LeuLeuTrpMetAspSerGlnProProProTyrProLeuProGlnGlnProProAlaAla

1501 GCCCCACCACAGGGACCAATAGCGAGAGGGGCTCAGGGACCGGCGGGGGGGACTCGGAGC
     AlaProProGlnGlyProIleAlaArgGlyAlaGlnGlyProAlaGlyGlyThrArgSer

1561 CGACGAGGCCGAAGCCCCGGGGAGGAAGGGGGGCCGGATTCAACAGTTGCCTTACCACTT
     ArgArgGlyArgSerProGlyGluGluGlyGlyProAspSerThrValAlaLeuProLeu

1621 AGAGCACATGTGGGAGGGCCAGCGCCAGGACCCAATGATCTCATTCCTTTACAGTACTGG
     ArgAlaHisValGlyGlyProAlaProGlyProAsnAspLeuIleProLeuGlnTyrTrp

1681 TCTTTTTCCTCTTCTGATTTATATAATTGGAAAACTAACCACCCTCCTTTCTCAGAGAAC
     SerPheSerSerSerAspLeuTyrAsnTrpLysThrAsnHisProProPheSerGluAsn

1741 CCCTCTGGGCTTACTGGGCTCCTTGAGTCACTTATGTTCTCCCATCAACCCACTTGGGAT
     ProSerGlyLeuThrGlyLeuLeuGluSerLeuMetPheSerHisGlnProThrTrpAsp

1801 GATTGTCAGCAGCTTTTGCAGGTTCTTTTTACCACAGAGGAAAGAGAAAGAATCCTGATG
     AspCysGlnGlnLeuLeuGlnValLeuPheThrThrGluGluArgGluArgIleLeuMet

1861 GAGGCGAGAAAAAATGTTCTGGGAGAGGACGGCACACCCACTGCCCTCCCTAACCTCGTG
     GluAlaArgLysAsnValLeuGlyGluAspGlyThrProThrAlaLeuProAsnLeuVal

1921 GACGAGGCTTTCCCCTTGAACCGCCCCAACTGGGACTACAACACCGCAGAAGGTAGGGGA
     AspGluAlaPheProLeuAsnArgProAsnTrpAspTyrAsnThrAlaGluGlyArgGly

1981 CGCCTCCTTGTCTATCGCCAGACTCTAGTGGCAGGTCTCAGAGGAGCCGCTAGACGGCCC
     ArgLeuLeuValTyrArgGlnThrLeuValAlaGlyLeuArgGlyAlaAlaArgArgPro

2041 ACCAATTTGGCTAAGGTAAGAGAGGTCTTGCAGGGGCAGACTGAACCACCCTCAGTCTTC
     ThrAsnLeuAlaLysValArgGluValLeuGlnGlyGlnThrGluProProSerValPhe

2101 CTTGAGCGTCTAATGGAGGCATATAGGAGGTACACCCCTTTTGACCCCTTGTCAGAGGGG
     LeuGluArgLeuMetGluAlaTyrArgArgTyrThrProPheAspProLeuSerGluGly

2161 CAGAGAGCCGCTGTAGCCATGGCCTTCATTGGTCAGTCCGTTCCCGACATTAAGAAAAAG
     GlnArgAlaAlaValAlaMetAlaPheIleGlyGlnSerValProAspIleLysLysLys

2221 CTGCAAAGGCTGGAGGGGCTCCAAGATCATACGCTCCAAGATTTAGTAAAAGAAGCAGAG
     LeuGlnArgLeuGluGlyLeuGlnAspHisThrLeuGlnAspLeuValLysGluAlaGlu

2281 AAAGTCTATCATAAGAGGGAAACAGAAGAAGAGAGGCAGGAGAGAGAGAAGAAAGAAATG
     LysValTyrHisLysArgGluThrGluGluGluArgGlnGluArgGluLysLysGluMet

2341 GAGGAGAGGGAAAATAGACGGGGATTTCAGGAGAGAAATTTGAGTAAAATTTTGGCCGCA
     GluGluArgGluAsnArgArgGlyPheGlnGluArgAsnLeuSerLysIleLeuAlaAla

2401 GTTGTAAATGATAGACAGTCAGGAAAAGGTAAAATAGGGCTCCTGGGCAACAGGGCAGTG
     ValValAsnAspArgGlnSerGlyLysGlyLysIleGlyLeuLeuGlyAsnArgAlaVal

2461 AAACCGCCAGGTGGCAGAAAGATACCACTGGAAAAAGACCAATGCACCTATTGCAAAGAG
     LysProProGlyGlyArgLysIleProLeuGluLysAspGlnCysThrTyrCysLysGlu

2521 AAAGGACACTGGGCTAGAGATTGCCCTAAAAAACGGGAGCGATCCAAGGTCCTGACCCTA
     LysGlyHisTrpAlaArgAspCysProLysLysArgGluArgSerLysValLeuThrLeu

         <-gag | pro-pol->
2581 GAAGATGATTAGGGAAGTCGGGGCTCAGACCCCCTCCCTGAGCCTAGGGTAACTTTGTCC
     GluAspAspEndGlySerArgGlySerAspProLeuProGluProArgValThrLeuSer

2641 GTGGAGGGGACTCCCGTCAACTTCCTGATAGACACCGGAGCAGAGCATTCAGTACTCACT
     ValGluGlyThrProValAsnPheLeuIleAspThrGlyAlaGluHisSerValLeuThr

2701 AGCCCCCTAGGCAAGCTAGGCTCTAAAAAGACCATGGTGATTGGAGCCACTGGTAGTAAA
     SerProLeuGlyLysLeuGlySerLysLysThrMetValIleGlyAlaThrGlySerLys

2761 TTTTACCCCTGGACGACCGAACGAGCCCTACAGATAAACAAGAACATAGTGACTCATTCC
     PheTyrProTrpThrThrGluArgAlaLeuGlnIleAsnLysAsnIleValThrHisSer

2821 TTCCTGGTGATACCTGAGTGTCCTGCTCCCCTCTTGGGGCGCGATCTGCTAACCAAACTA
     PheLeuValIleProGluCysProAlaProLeuLeuGlyArgAspLeuLeuThrLysLeu


2881 AAGGCTCAAGTCCAATTTACTTCAGAAGGCCCACAAGTAAGCTGGGGAAAAGCCCCCGTT
     LysAlaGlnValGlnPheThrSerGluGlyProGlnValSerTrpGlyLysAlaProVal

2941 GCCTGCCTTGTCCTCAACACAGAGGAAGAATATCGGTTGCATGAAGAGCAACCCAAAAAT
     AlaCysLeuValLeuAsnThrGluGluGluTyrArgLeuHisGluGluGlnProLysAsn

3001 GCAGTCTCTTCAGGCTGGCTAACTGCGTTCCCCAATGTCTGGGCAGAACAAGCAGGAATG
     AlaValSerSerGlyTrpLeuThrAlaPheProAsnValTrpAlaGluGlnAlaGlyMet

3061 GGGTTGGCTAAACAAGTGCCTCCGGTTGTGGTAGAACTTAAAGCTGATGCCACCCCCATC
     GlyLeuAlaLysGlnValProProValValValGluLeuLysAlaAspAlaThrProIle

3121 TCGGTAAGACAATACCCCATGAGCAAGGAAGCTAGGGAGGGCATCCGGCCTCATATCCAG
     SerValArgGlnTyrProMetSerLysGluAlaArgGluGlyIleArgProHisIleGln

3181 AGGTTGCTAGACCAAGGAGTTTTAGTGGCCTGTCAGTCCCCCTGGAATACACCACTTCTG
     ArgLeuLeuAspGlnGlyValLeuValAlaCysGlnSerProTrpAsnThrProLeuLeu

3241 CCGGTTCGAAAACCAGGGACCAATGACTATCGCCCAGTGCAAGACCTCCGGGAAGTTAAC
     ProValArgLysProGlyThrAsnAspTyrArgProValGlnAspLeuArgGluValAsn

3301 AAAAGGGTCCTGGACATTCACCCCACAGTCCCGAACCCATACAATTTATTAAGCTCTCTC
     LysArgValLeuAspIleHisProThrValProAsnProTyrAsnLeuLeuSerSerLeu

3361 CCACCTGAGAGAACATGGTATACAGTCTTGGACTTAAAAGATGCCTTCTTTTGCCTGCGC
     ProProGluArgThrTrpTyrThrValLeuAspLeuLysAspAlaPhePheCysLeuArg

3421 TTGCACCCTAAGAGTCAGCTCCTGTTTGCCTTTGAATGGAGGGACCCAGAGGGCGGACAG
     LeuHisProLysSerGlnLeuLeuPheAlaPheGluTrpArgAspProGluGlyGlyGln

3481 ACTGGTCAACTAACCTGGACTAGGCTACCACAGGGGTTCAAAAATTCCCCCACCCTGTTT
     ThrGlyGlnLeuThrTrpThrArgLeuProGlnGlyPheLysAsnSerProThrLeuPhe

3541 GACGAGGCCCTCCATCGGGATCTCGCGCCTTTTCGTGCTCGAAACCCTCAGCTTACCCTA
     AspGluAlaLeuHisArgAspLeuAlaProPheArgAlaArgAsnProGlnLeuThrLeu

3601 CTACAGTATGTGGATGATCTCTTGGTCGCGGCGGCCTCGAAGGAGCTGTGTCACCAGGGA
     LeuGlnTyrValAspAspLeuLeuValAlaAlaAlaSerLysGluLeuCysHisGlnGly

3661 ACTGAGAGGCTCCTTGCAGAACTGAGTGACTTGGGGTATCGAGTTTCGGCTAAGAAGGCA
     ThrGluArgLeuLeuAlaGluLeuSerAspLeuGlyTyrArgValSerAlaLysLysAla

3721 CAAATTTGTCAAACTGAGGTAACCTACCTGGGGTATACCCTCCGAGGGGGCAAAAGATGG
     GlnIleCysGlnThrGluValThrTyrLeuGlyTyrThrLeuArgGlyGlyLysArgTrp

3781 CTCACAGAGGCCCGGAAGAAGACTGTTATGATGATCCCATCGCCAACTACCCCACGGCAG
     LeuThrGluAlaArgLysLysThrValMetMetIleProSerProThrThrProArgGln

3841 GTACGTGAGTTTCTGGGGACTGCTGGCTTTTGTAGACTCTGGATTCCAGGCTTTGCAACC
     ValArgGluPheLeuGlyThrAlaGlyPheCysArgLeuTrpIleProGlyPheAlaThr

3901 CTAGCAGCACCTCTATATCCTTTGACTAAGGAAGGGTTTCCTTTTGAGTGGAAAGAAGAG
     LeuAlaAlaProLeuTyrProLeuThrLysGluGlyPheProPheGluTrpLysGluGlu

3961 CACCAAAGAGCTTTTGAGGCTATCAAGTCGTCTCTAATGACTGCCCCCGCGCTAGCATTA
     HisGlnArgAlaPheGluAlaIleLysSerSerLeuMetThrAlaProAlaLeuAlaLeu

4021 CCAGACTTGACTAAGCCTTTCGTCCTATATGTGGACGAGAGAGCGGGTGTAGCCAGGGGA
     ProAspLeuThrLysProPheValLeuTyrValAspGluArgAlaGlyValAlaArgGly

4081 GTGTTGACACAAGCACTGGGACCCTGGAAGAGACCTGTAGCCTATTTGTCAAAGAAATTA
     ValLeuThrGlnAlaLeuGlyProTrpLysArgProValAlaTyrLeuSerLysLysLeu

4141 GATCCCGTTGCTAGTGGATGGCCCACATGCCTGAAAGCTATTGCGGCAATGGCCCTGCTG
     AspProValAlaSerGlyTrpProThrCysLeuLysAlaIleAlaAlaMetAlaLeuLeu

4201 ATCAAAGATGCTGACAAATTGACAATGGGACAACAGGTGACTGTTGTGGCCCCTCATGCC
     IleLysAspAlaAspLysLeuThrMetGlyGlnGlnValThrValValAlaProHisAla

4261 TTGGAAAGTATCGTGCGGCAGCCACCTGACAGATGGATGACAAATGCCCGAATGACACAC
     LeuGluSerIleValArgGlnProProAspArgTrpMetThrAsnAlaArgMetThrHis

4321 TATCAGAGCTTGCTGCTAAATGAGCGTGTAACCTTTGCGCCCCCTGCCATCCTCAACCCA
     TyrGlnSerLeuLeuLeuAsnGluArgValThrPheAlaProProAlaIleLeuAsnPro

4381 GCTACCCTTCTCCCTCTAACAAATGATTCCGTCCCAGTACATCAATGTACAGACATCCTC
     AlaThrLeuLeuProLeuThrAsnAspSerValProValHisGlnCysThrAspIleLeu

4441 GCTGAAGAGACTGGGACCAGAAGAGACCTGACTGACCAACCCTGGCCTGGAGCTCCCAGT
     AlaGluGluThrGlyThrArgArgAspLeuThrAspGlnProTrpProGlyAlaProSer

4501 TGGTATACGGATGGCAGCAGTTTCCTGATAGAGGGGAAGCGAAAGGCTGGAGCTGCGGTG
     TrpTyrThrAspGlySerSerPheLeuIleGluGlyLysArgLysAlaGlyAlaAlaVal

4561 GTGGACGGGAAAAAGGTAATTTGGGCAAGCGCTTTGCCTGAAGGAACGTCGGCACAAAAG
     ValAspGlyLysLysValIleTrpAlaSerAlaLeuProGluGlyThrSerAlaGlnLys

4621 GCTGAACTTATAGCACTTATACAAGCCCTCCGAGAGGCTAAAGGTAAGATCGTTAACATC
     AlaGluLeuIleAlaLeuIleGlnAlaLeuArgGluAlaLysGlyLysIleValAsnIle

4681 TACACTGACAGCCGCTATGCTTTTGCTACCGCACACATCCATGGGGCCATCTACAGGCAG
     TyrThrAspSerArgTyrAlaPheAlaThrAlaHisIleHisGlyAlaIleTyrArgGln

4741 CGAGGGCTATTGACTTCGGCTGGTAAAGACATTAAAAACAAAGAAGAAATTCTGGCCCTG
     ArgGlyLeuLeuThrSerAlaGlyLysAspIleLysAsnLysGluGluIleLeuAlaLeu

4801 TTGGAAGCCATACATGCACCTAAGAAGGTAGCCATCATCCACTGCCCCGGCCACCAAAGA
     LeuGluAlaIleHisAlaProLysLysValAlaIleIleHisCysProGlyHisGlnArg

4861 GGAGAAGACTTGGTGGCCAAGGGCAACCGAATGGCAGACTCAGTGGCAAAACAAGTTGCT
     GlyGluAspLeuValAlaLysGlyAsnArgMetAlaAspSerValAlaLysGlnValAla

4921 CAAGGGGCCATGATCTTAACTGAAAAAGGTGATCCACCCAAAAGCCCTGAGGATGAGAGG
     GlnGlyAlaMetIleLeuThrGluLysGlyAspProProLysSerProGluAspGluArg

4981 TATAACATAAAAGAGCTATTGTGGACCAGTGATCCCCTCCCATACTTTTTTGAAGGGAAA
     TyrAsnIleLysGluLeuLeuTrpThrSerAspProLeuProTyrPhePheGluGlyLys

5041 ATAGAATTGACTCCCGAAGAAGGAATAAAATTTGTGAAAGGACTACACCAATTCACCCAC
     IleGluLeuThrProGluGluGlyIleLysPheValLysGlyLeuHisGlnPheThrHis

5101 CTGGGAGTTGAAAAAATGATGAGACTAATTAAGAATTCCCGATACCAAGTCCCCAACCTG
     LeuGlyValGluLysMetMetArgLeuIleLysAsnSerArgTyrGlnValProAsnLeu

5161 AAGTCAGTGGCTCAAAAGATTATAGACTCCTGCAAACCATGTGCATTCACTAATGCGACT
     LysSerValAlaGlnLysIleIleAspSerCysLysProCysAlaPheThrAsnAlaThr

5221 AAAGCCTACAAAGAACCTGGAAAGAGACAACGGGGAGACCGTCCTGGAGTGTATTGGGAG
     LysAlaTyrLysGluProGlyLysArgGlnArgGlyAspArgProGlyValTyrTrpGlu

5281 GTAGATTTTACTGAAGTTAAACCTGGAATGTATGGTAACAAGTATCTGTTAGTATTTGTA
     ValAspPheThrGluValLysProGlyMetTyrGlyAsnLysTyrLeuLeuValPheVal

5341 GACACTTTTTCAGGATGGGTTGAGGCGTTTCCCACTAAAACTGAGACTGCCCAGATTGTG
     AspThrPheSerGlyTrpValGluAlaPheProThrLysThrGluThrAlaGlnIleVal

5401 GCCAAGAAGATCCTTGAAGAAATCCTGCCAAGATTTGGAATCCCTAAGGTAATCGGGTCC
     AlaLysLysIleLeuGluGluIleLeuProArgPheGlyIleProLysValIleGlySer

5461 GATAATGGACCAGCCTTTGTTGCCCAGGTAAGTCAGGGCTTGGCCACTCAGTTGGGCATC
     AspAsnGlyProAlaPheValAlaGlnValSerGlnGlyLeuAlaThrGlnLeuGlyIle

5521 GATTGGAAATTACACTGTGCTTACCGCCCTCAAAGCTCAGGACAGGTAGAGAGGATGAAT
     AspTrpLysLeuHisCysAlaTyrArgProGlnSerSerGlyGlnValGluArgMetAsn

5581 AGGACCTTAAAAGAGACCTTGACTAAATTAGCCATTGAGACCGGCGGGAAAGACTGGGTG
     ArgThrLeuLysGluThrLeuThrLysLeuAlaIleGluThrGlyGlyLysAspTrpVal

5641 GCTCTCCTTCCTCTTGCGCTCTTCCGAGCCCGAAACACCCCTGGACGTTTCGGGCTCACT
     AlaLeuLeuProLeuAlaLeuPheArgAlaArgAsnThrProGlyArgPheGlyLeuThr

5701 CCTTTTGAAGTTCTGTATGGAGGACCTCCCCCTTTAATGGAAGCTGGTGGAACATTGGTT
     ProPheGluValLeuTyrGlyGlyProProProLeuMetGluAlaGlyGlyThrLeuVal

                               splice acceptor site |
5761 TCCGGCTCTGACCCTGTCTTACCCTCCTCTTTGCTTATTCATTTAAAGGCCCTAGAAGTG
     SerGlySerAspProValLeuProSerSerLeuLeuIleHisLeuLysAlaLeuGluVal

5821 ATTAGGACCCAGATTTGGGACCAACTGAAGGCAGCCTATACCCCAGGGACCACCGCAGTA
     IleArgThrGlnIleTrpAspGlnLeuLysAlaAlaTyrThrProGlyThrThrAlaVal

5881 CCCCACGGGTTCCGAGTTGGAGATAAAGTCTTGGTCAGACGGCATCGAACCGGCAGCCTC
     ProHisGlyPheArgValGlyAspLysValLeuValArgArgHisArgThrGlySerLeu

5941 GAGCCACAGTGGAAGGGACCCTATTTGGTGTTACTGACAACCCCTACTGCGGTAAAAGTC
     GluProGlnTrpLysGlyProTyrLeuValLeuLeuThrThrProThrAlaValLysVal

                                                            |env->
6001 GACGGGATTGCCTCCTGGATCCACGCCTCCCACGTCAAGAGGGCCGCAAGTCAAGATGAA
     AspGlyIleAlaSerTrpIleHisAlaSerHisValLysArgAlaAlaSerGlnAspGlu
                                                            MetLys

6061 GAAAACCATGAAGACAATTGGACAGTGGCAGCCACTGACAATCCTCTTAAGCTTCGTTTG
     GluAsnHisGluAspAsnTrpThrValAlaAlaThrAspAsnProLeuLysLeuArgLeu
      LysThrMetLysThrIleGlyGlnTrpGlnProLeuThrIleLeuLeuSerPheValCys

                                <-pol|
6121 TGCCGCAGGCGCCACCCTGAGCCTAGGGAACCATAACCCTCATGCTCCAATTCAACAGTC
     CysArgArgArgHisProGluProArgGluProEnd
      AlaAlaGlyAlaThrLeuSerLeuGlyAsnHisAsnProHisAlaProIleGlnGlnSer

6181 TTGGGAAGTGCTTAATGAGGAGGGAAACATTGTGTGGGCAACCACTGCAGTCCATCCCCT
      TrpGluValLeuAsnGluGluGlyAsnIleValTrpAlaThrThrAlaValHisProLeu

6241 CTGGACTTGGTGGCCTGATCTCACACCTGACATCTGTAAGTTAGTGGCAGGATCCACCAA
      TrpThrTrpTrpProAspLeuThrProAspIleCysLysLeuValAlaGlySerThrLys

6301 ATGGGACCTCCCTGATCATACCGATCTTAGTAACCCACCCCCTGAAGAGCGGTGTGTCCC
      TrpAspLeuProAspHisThrAspLeuSerAsnProProProGluGluArgCysValPro

6361 AAACGGGATAGGGAGCACATATTGGTGTTCGGGGCAGTTTTACCGAGCTAATCTTAGAGC
      AsnGlyIleGlySerThrTyrTrpCysSerGlyGlnPheTyrArgAlaAsnLeuArgAla

6421 TGCACAATTTTATGTTTGCCCTGGTCAGGGTCAGAGCAAAAGGCTTCAACGAGAATGTGG
      AlaGlnPheTyrValCysProGlyGlnGlyGlnSerLysArgLeuGlnArgGluCysGly

6481 AGGGGCATCAGATTACTTTTGTGGTAAATGGACATGTGAAACGACAGGGGAAGCTTACTG
      GlyAlaSerAspTyrPheCysGlyLysTrpThrCysGluThrThrGlyGluAlaTyrTrp

6541 GAAGCCCTCCTCTGACTGGGACCTAATCACGGTAAAACGAGGAAGTGGCTATGATAGGTC
      LysProSerSerAspTrpAspLeuIleThrValLysArgGlySerGlyTyrAspArgSer

6601 AAACGAAGGAGAAAGAAACCCCTATAAATATCCAGAGAATGGGTGCGCTTTTAAAAACAG
      AsnGluGlyGluArgAsnProTyrLysTyrProGluAsnGlyCysAlaPheLysAsnSer

6661 CCCCCCAGGACCATGCAAAGGTAAATACTGCAACCCCCTACTTATAAAGTTCACCGAGAA
      ProProGlyProCysLysGlyLysTyrCysAsnProLeuLeuIleLysPheThrGluLys

6721 AGGGAAACAACACCGTCTAAGTTGGCTTAAAGGAAATAGGTGGGGTTGGCGAGTATACCT
      GlyLysGlnHisArgLeuSerTrpLeuLysGlyAsnArgTrpGlyTrpArgValTyrLeu

6781 TCCACTAAGAGATCCTGGGTTCATTTTCACGATCAGGCTGACAGTGAGAGACCTGGCGGT
      ProLeuArgAspProGlyPheIlePheThrIleArgLeuThrValArgAspLeuAlaVal

6841 GACACCTGTTGGGCCCAACAAGGTCCTTATAGAACAGGGCCCCCCAGTCGTACCGGCTCC
      ThrProValGlyProAsnLysValLeuIleGluGlnGlyProProValValProAlaPro

6901 CCCAAAGGTCCCAGCCGTACCAGCTCCACCAACTCCACAGCCCAACATAGTGGTACCCTC
      ProLysValProAlaValProAlaProProThrProGlnProAsnIleValValProSer

6961 CCTAGGGACTAATACTCCCCTCATAAAGCCTACCTTGGCTTCCCCACCGCCCCTAGGTAC
      LeuGlyThrAsnThrProLeuIleLysProThrLeuAlaSerProProProLeuGlyThr

7021 AGAGGACCGTCTGGTCAGTCTACTCCAGGGAGCTTTTTTAGCTTTAAATAGAACTAACCC
      GluAspArgLeuValSerLeuLeuGlnGlyAlaPheLeuAlaLeuAsnArgThrAsnPro

7081 TAATATGACTCAATCATGCTGGTTATGCTATACCTCTAGCCCCCCTTATTATGAAGGAAT
      AsnMetThrGlnSerCysTrpLeuCysTyrThrSerSerProProTyrTyrGluGlyIle

7141 AGCTCAGATCAGGACTTATAATATTACTTCAGATCATTCTCAATGTCTTTGGGGAGAAAA
      AlaGlnIleArgThrTyrAsnIleThrSerAspHisSerGlnCysLeuTrpGlyGluAsn

7201 CAGAAAGTTGACTCTGGCAGCAGTTTCAGGAAGAGGGCTTTGTTTGGGCCAGGTACCTCA
      ArgLysLeuThrLeuAlaAlaValSerGlyArgGlyLeuCysLeuGlyGlnValProGln

7261 GGATAAAGGGCACCTCTGTAATCAGACCCAGAACATCCAGTCTAGCAAAAGTGGTCAGTA
      AspLysGlyHisLeuCysAsnGlnThrGlnAsnIleGlnSerSerLysSerGlyGlnTyr

7321 TCTAGTGCCCCCCTTAGACACAGTATGGGCTTGCAATACCGGTCTCACTCCTTGTGTGTC
      LeuValProProLeuAspThrValTrpAlaCysAsnThrGlyLeuThrProCysValSer

7381 TATGTCTGTTTTTAATAGTTCCAAAGATTTCTGCATTTTGGTTCAGCTTATTCCTAGACT
      MetSerValPheAsnSerSerLysAspPheCysIleLeuValGlnLeuIleProArgLeu

7441 CCTGTATCATGATGATAGCTCCTTTTTAGACAAATTTGAGCGTTGGGTCCGCTGGAGAAG
      LeuTyrHisAspAspSerSerPheLeuAspLysPheGluArgTrpValArgTrpArgArg

7501 AGAGCCCGTTACCCTAACTTTGGCAGTTCTATTAGGATTAGGAGTAGCGGCTGGAGTAGG
      GluProValThrLeuThrLeuAlaValLeuLeuGlyLeuGlyValAlaAlaGlyValGly

7561 TACAGGAACCGCTGCCTTAATTAAGACCCCCCAATACTATGAAGAACTACGTGCAGCTAT
      ThrGlyThrAlaAlaLeuIleLysThrProGlnTyrTyrGluGluLeuArgAlaAlaMet

7621 GGATGTTGATCTTAGAACTATAGAACAGTCTATAACCAAATTAGAAGAATCTTTAACTTC
      AspValAspLeuArgThrIleGluGlnSerIleThrLysLeuGluGluSerLeuThrSer

7681 CCTGTCCGAAGTGGTGCTACAGAAAGGAAGGGGATTAGACTTATTATTCCTTAAAGAAGG
      LeuSerGluValValLeuGlnLysGlyArgGlyLeuAspLeuLeuPheLeuLysGluGly

37741 AGGACTCTGTGCTGCCCTAAAAGAAGAATGTTGTTTTTATGTTGACCATTCAGGAGTAAT
      GlyLeuCysAlaAlaLeuLysGluGluCysCysPheTyrValAspHisSerGlyValIle

7801 CAAAGATTCTATGGCCAAACTTAGAGAACGCCTAGATATACGTAAAAGAGAAAGAGAAAG
      LysAspSerMetAlaLysLeuArgGluArgLeuAspIleArgLysArgGluArgGluSer

7861 CCAACAAGGATGGTTCGAAAGCTGGTTTAATAAGTCCCCTTGGCTCACCACTCTCCTCTC
      GlnGlnGlyTrpPheGluSerTrpPheAsnLysSerProTrpLeuThrThrLeuLeuSer

7921 CACCATAGCAGGACCTTTGATTACTCTTATGCTTTTGCTTACTTTTGGCCCCTGCATCCT
      ThrIleAlaGlyProLeuIleThrLeuMetLeuLeuLeuThrPheGlyProCysIleLeu

7981 TAATAAGTTAGTAGCTTTTATTAGAGAAAGGATAAATGCAGTACAGGTTATGGTACTAAG
      AsnLysLeuValAlaPheIleArgGluArgIleAsnAlaValGlnValMetValLeuArg

                                       <-env|
8041 GCAACAATATCGGGTCCTTCAGGAGGTTGAAAACTCGCTCTAAGATTAGAGCTATTTCCT
      GlnGlnTyrArgValLeuGlnGluValGluAsnSerLeuEnd

      PPT        |U3->
8101 AAAAAGAGTGGGGAATGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGAC

8161 CCCTTCCATCTAGAGAGTGTTCCCAGAACACTCCTGAACTCTTCACCCTAGAATGCATTC

8221 CTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAACTAAAGACTGTTCCAAGAACATTTT

8281 TGAGATAAGGGCCTCCTGGAACAACCTCAGAATAAACCGGGTACATTGCCAAATAATAGG

8341 ACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGAACCCCTTGTCCCTTGGCAGAACCC

                                                         promoter
8401 CTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCTCTCCCGCCTTGAGTTTTCCTATAT

                <-U3||R->
8461 AAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCCTCTACACCACTAGGTGTATGAGTT

          inverse palindrome                     polyA signal
8521 TCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCTGCTGCTTTATTAAATCTTGCCTTC
       pol motif

  <-R||U5->
8581 AACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTCCGCGGCTGTCCCGAGGCTTGAGTG

                         <-U5|
8641 AGGGTCTCCCTTCGGGGGTCTTTCA