Nucleotide sequence of proviral genome of a novel type-C retrovirus (MmRV), taken from position 112341 to 121005 of Mus musculus PAC Clone 657p21 (Accession: AC005743). Total length: 8665 bp. This sequence has been deposited in GenBank (Accession number: XXXXXXX). The 9 bp imperfect repeats that define the LTRs have marked in green- note that 2 bp are deleted from the terminal copies of these repeats upon insertion. The CAT box is not clearly identifiable (see Ref 5), though a candidate sequence appears upstream of the TATA box on the opposite strand. The repeat region spanning U3 and R is marked in italics, and the following features identified: terminal 9 bp repeats (blue), pol-like motif (bold), 12 bp inverse palindrome (underlined), poly-adenylation signal (underlined). Upstream of the gag ATG start codon, a CTG start codon defines the start of 99 bp of glycosylated Gag. The untranslated region between U5 and the start of glycosylated gag contains a variable number (two to six) of perfect 35 bp repeats.
|U3-> 1 TGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGTTCCCAGAACACTCCTG 61 AACTCTTCACCCTAGAATGCATTCCTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAAC 121 TAAAGACTGTTCCAAGAACATTTTTGAGATAAGGGCCTCCTGGAACAACCTCAGAATGAA 181 CCGGGTACATTGCCAAATAATAGGACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGA 241 ACCCCTTGTCCCTTGGCAGAACCCCTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCT promoter <-U3||R-> 301 CTCCCCCCTTGAGTTTTCCTATATAAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCC inverse palindrome repeat region 361 TCTACACCACTAGGTGTATGAGTTTCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCT pol motif polyA signal <-R||U5-> 421 GCTGCTTTATTAAATCTTGCCTTCAACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTC <-U5||pbs-> 481 CGCGGCTGTCCCGAGGCTTGAGTGAGGGTCTCCCTTCGGGGGTCTTTCATTTGGGGGCTC primer binding site splice donor site 541 GTCCGGGATCAGTGCGACCACCCAGAGACCCTAGACCCACTTTAAGGTAAGATTCTTTGA 601 CCTGTCTTGGTTTGGTGTCTCTGTTCTGTTTCTAAGTTTGGTGCGATCGCAGTTTCGGTT 661 TTGCGGACGCTCAGTGAGACTGCGCTCCGAGAGGGAACGCGGGGTGGATAAGGATAGACG 721 TGTCCAGGTGTCCGCCGTCCGTTCGCCCTGGGAGACGTCCCAGGAGGAACAGGGGAGGAC [ Rpt1 781 CAGGGACGCCTGGTGGACCCCTTTGGAGGCCAAGAGACCATCTGGGGTTGCGAGATCGTG ] [ Rpt2 ] 841 GGTTCGAGTCCCACCTCGTGCCTTGTTGCGAGATCGTGGGTTCGAGTCCCACCTCGTGCA |glyco-gag-> 901 GAGGGTCTCAATCGGCCGGCCTTAGAAAAGCCATCTGATTCTCTGAGTTGCTTGTGGTCG LeuSerCysLeuTrpSer 961 ACGCGAAGTCGCCGCCGCTTTTGGTTTCTTTTTTGTCTTAGTCTCGTGTTCGCTCTTGTT ThrArgSerArgArgArgPheTrpPheLeuPheCysLeuSerLeuValPheAlaLeuVal |gag-> 1021 GTGTCTACTATTGTTCTGGAAATGGGACAATCTGTGTCCACTCCCCTTTCTCTAACTCTG ValSerThrIleValLeuGluMetGlyGlnSerValSerThrProLeuSerLeuThrLeu 1081 GAGCATTGGAAGGAGGTGCGGGTCAGAGCTCACAACCAGTCGGTGGAGGTCAGAAAGGGT GluHisTrpLysGluValArgValArgAlaHisAsnGlnSerValGluValArgLysGly 1141 CCGTGGCAGACCTTTTGCACCTCCGAGTGGCCGACGTTTGGAGTGGGCTGGCCACCAGAA ProTrpGlnThrPheCysThrSerGluTrpProThrPheGlyValGlyTrpProProGlu 1201 GGTGCTTTTGACTTATCACTAATCGCCGCCGTCAGGCGAATTGTTTTTCAGGAGGAAGGG GlyAlaPheAspLeuSerLeuIleAlaAlaValArgArgIleValPheGlnGluGluGly 1261 GGTCACCCTGATCAGATCCCCTACATTGTGACCTGGCAGAATCTCGTCCAATTCCCACCT GlyHisProAspGlnIleProTyrIleValThrTrpGlnAsnLeuValGlnPheProPro 1321 CCGTGGGTCAAGCCTTGGACCCCAAATTCTTCGAAACTGACGGTCGCGGTTGCCCAGTCT ProTrpValLysProTrpThrProAsnSerSerLysLeuThrValAlaValAlaGlnSer 1381 GATGCAGCCGGAAAGTCCAGCCCGTCAGCTCCCCCCAAGATTTATCCAGAGATTGACGAC AspAlaAlaGlyLysSerSerProSerAlaProProLysIleTyrProGluIleAspAsp 1441 CTCCTCTGGATGGACTCCCAACCTCCCCCTTATCCCCTGCCCCAGCAGCCACCTGCAGCC LeuLeuTrpMetAspSerGlnProProProTyrProLeuProGlnGlnProProAlaAla 1501 GCCCCACCACAGGGACCAATAGCGAGAGGGGCTCAGGGACCGGCGGGGGGGACTCGGAGC AlaProProGlnGlyProIleAlaArgGlyAlaGlnGlyProAlaGlyGlyThrArgSer 1561 CGACGAGGCCGAAGCCCCGGGGAGGAAGGGGGGCCGGATTCAACAGTTGCCTTACCACTT ArgArgGlyArgSerProGlyGluGluGlyGlyProAspSerThrValAlaLeuProLeu 1621 AGAGCACATGTGGGAGGGCCAGCGCCAGGACCCAATGATCTCATTCCTTTACAGTACTGG ArgAlaHisValGlyGlyProAlaProGlyProAsnAspLeuIleProLeuGlnTyrTrp 1681 TCTTTTTCCTCTTCTGATTTATATAATTGGAAAACTAACCACCCTCCTTTCTCAGAGAAC SerPheSerSerSerAspLeuTyrAsnTrpLysThrAsnHisProProPheSerGluAsn 1741 CCCTCTGGGCTTACTGGGCTCCTTGAGTCACTTATGTTCTCCCATCAACCCACTTGGGAT ProSerGlyLeuThrGlyLeuLeuGluSerLeuMetPheSerHisGlnProThrTrpAsp 1801 GATTGTCAGCAGCTTTTGCAGGTTCTTTTTACCACAGAGGAAAGAGAAAGAATCCTGATG AspCysGlnGlnLeuLeuGlnValLeuPheThrThrGluGluArgGluArgIleLeuMet 1861 GAGGCGAGAAAAAATGTTCTGGGAGAGGACGGCACACCCACTGCCCTCCCTAACCTCGTG GluAlaArgLysAsnValLeuGlyGluAspGlyThrProThrAlaLeuProAsnLeuVal 1921 GACGAGGCTTTCCCCTTGAACCGCCCCAACTGGGACTACAACACCGCAGAAGGTAGGGGA AspGluAlaPheProLeuAsnArgProAsnTrpAspTyrAsnThrAlaGluGlyArgGly 1981 CGCCTCCTTGTCTATCGCCAGACTCTAGTGGCAGGTCTCAGAGGAGCCGCTAGACGGCCC ArgLeuLeuValTyrArgGlnThrLeuValAlaGlyLeuArgGlyAlaAlaArgArgPro 2041 ACCAATTTGGCTAAGGTAAGAGAGGTCTTGCAGGGGCAGACTGAACCACCCTCAGTCTTC ThrAsnLeuAlaLysValArgGluValLeuGlnGlyGlnThrGluProProSerValPhe 2101 CTTGAGCGTCTAATGGAGGCATATAGGAGGTACACCCCTTTTGACCCCTTGTCAGAGGGG LeuGluArgLeuMetGluAlaTyrArgArgTyrThrProPheAspProLeuSerGluGly 2161 CAGAGAGCCGCTGTAGCCATGGCCTTCATTGGTCAGTCCGTTCCCGACATTAAGAAAAAG GlnArgAlaAlaValAlaMetAlaPheIleGlyGlnSerValProAspIleLysLysLys 2221 CTGCAAAGGCTGGAGGGGCTCCAAGATCATACGCTCCAAGATTTAGTAAAAGAAGCAGAG LeuGlnArgLeuGluGlyLeuGlnAspHisThrLeuGlnAspLeuValLysGluAlaGlu 2281 AAAGTCTATCATAAGAGGGAAACAGAAGAAGAGAGGCAGGAGAGAGAGAAGAAAGAAATG LysValTyrHisLysArgGluThrGluGluGluArgGlnGluArgGluLysLysGluMet 2341 GAGGAGAGGGAAAATAGACGGGGATTTCAGGAGAGAAATTTGAGTAAAATTTTGGCCGCA GluGluArgGluAsnArgArgGlyPheGlnGluArgAsnLeuSerLysIleLeuAlaAla 2401 GTTGTAAATGATAGACAGTCAGGAAAAGGTAAAATAGGGCTCCTGGGCAACAGGGCAGTG ValValAsnAspArgGlnSerGlyLysGlyLysIleGlyLeuLeuGlyAsnArgAlaVal 2461 AAACCGCCAGGTGGCAGAAAGATACCACTGGAAAAAGACCAATGCACCTATTGCAAAGAG LysProProGlyGlyArgLysIleProLeuGluLysAspGlnCysThrTyrCysLysGlu 2521 AAAGGACACTGGGCTAGAGATTGCCCTAAAAAACGGGAGCGATCCAAGGTCCTGACCCTA LysGlyHisTrpAlaArgAspCysProLysLysArgGluArgSerLysValLeuThrLeu <-gag | pro-pol-> 2581 GAAGATGATTAGGGAAGTCGGGGCTCAGACCCCCTCCCTGAGCCTAGGGTAACTTTGTCC GluAspAspEndGlySerArgGlySerAspProLeuProGluProArgValThrLeuSer 2641 GTGGAGGGGACTCCCGTCAACTTCCTGATAGACACCGGAGCAGAGCATTCAGTACTCACT ValGluGlyThrProValAsnPheLeuIleAspThrGlyAlaGluHisSerValLeuThr 2701 AGCCCCCTAGGCAAGCTAGGCTCTAAAAAGACCATGGTGATTGGAGCCACTGGTAGTAAA SerProLeuGlyLysLeuGlySerLysLysThrMetValIleGlyAlaThrGlySerLys 2761 TTTTACCCCTGGACGACCGAACGAGCCCTACAGATAAACAAGAACATAGTGACTCATTCC PheTyrProTrpThrThrGluArgAlaLeuGlnIleAsnLysAsnIleValThrHisSer 2821 TTCCTGGTGATACCTGAGTGTCCTGCTCCCCTCTTGGGGCGCGATCTGCTAACCAAACTA PheLeuValIleProGluCysProAlaProLeuLeuGlyArgAspLeuLeuThrLysLeu 2881 AAGGCTCAAGTCCAATTTACTTCAGAAGGCCCACAAGTAAGCTGGGGAAAAGCCCCCGTT LysAlaGlnValGlnPheThrSerGluGlyProGlnValSerTrpGlyLysAlaProVal 2941 GCCTGCCTTGTCCTCAACACAGAGGAAGAATATCGGTTGCATGAAGAGCAACCCAAAAAT AlaCysLeuValLeuAsnThrGluGluGluTyrArgLeuHisGluGluGlnProLysAsn 3001 GCAGTCTCTTCAGGCTGGCTAACTGCGTTCCCCAATGTCTGGGCAGAACAAGCAGGAATG AlaValSerSerGlyTrpLeuThrAlaPheProAsnValTrpAlaGluGlnAlaGlyMet 3061 GGGTTGGCTAAACAAGTGCCTCCGGTTGTGGTAGAACTTAAAGCTGATGCCACCCCCATC GlyLeuAlaLysGlnValProProValValValGluLeuLysAlaAspAlaThrProIle 3121 TCGGTAAGACAATACCCCATGAGCAAGGAAGCTAGGGAGGGCATCCGGCCTCATATCCAG SerValArgGlnTyrProMetSerLysGluAlaArgGluGlyIleArgProHisIleGln 3181 AGGTTGCTAGACCAAGGAGTTTTAGTGGCCTGTCAGTCCCCCTGGAATACACCACTTCTG ArgLeuLeuAspGlnGlyValLeuValAlaCysGlnSerProTrpAsnThrProLeuLeu 3241 CCGGTTCGAAAACCAGGGACCAATGACTATCGCCCAGTGCAAGACCTCCGGGAAGTTAAC ProValArgLysProGlyThrAsnAspTyrArgProValGlnAspLeuArgGluValAsn 3301 AAAAGGGTCCTGGACATTCACCCCACAGTCCCGAACCCATACAATTTATTAAGCTCTCTC LysArgValLeuAspIleHisProThrValProAsnProTyrAsnLeuLeuSerSerLeu 3361 CCACCTGAGAGAACATGGTATACAGTCTTGGACTTAAAAGATGCCTTCTTTTGCCTGCGC ProProGluArgThrTrpTyrThrValLeuAspLeuLysAspAlaPhePheCysLeuArg 3421 TTGCACCCTAAGAGTCAGCTCCTGTTTGCCTTTGAATGGAGGGACCCAGAGGGCGGACAG LeuHisProLysSerGlnLeuLeuPheAlaPheGluTrpArgAspProGluGlyGlyGln 3481 ACTGGTCAACTAACCTGGACTAGGCTACCACAGGGGTTCAAAAATTCCCCCACCCTGTTT ThrGlyGlnLeuThrTrpThrArgLeuProGlnGlyPheLysAsnSerProThrLeuPhe 3541 GACGAGGCCCTCCATCGGGATCTCGCGCCTTTTCGTGCTCGAAACCCTCAGCTTACCCTA AspGluAlaLeuHisArgAspLeuAlaProPheArgAlaArgAsnProGlnLeuThrLeu 3601 CTACAGTATGTGGATGATCTCTTGGTCGCGGCGGCCTCGAAGGAGCTGTGTCACCAGGGA LeuGlnTyrValAspAspLeuLeuValAlaAlaAlaSerLysGluLeuCysHisGlnGly 3661 ACTGAGAGGCTCCTTGCAGAACTGAGTGACTTGGGGTATCGAGTTTCGGCTAAGAAGGCA ThrGluArgLeuLeuAlaGluLeuSerAspLeuGlyTyrArgValSerAlaLysLysAla 3721 CAAATTTGTCAAACTGAGGTAACCTACCTGGGGTATACCCTCCGAGGGGGCAAAAGATGG GlnIleCysGlnThrGluValThrTyrLeuGlyTyrThrLeuArgGlyGlyLysArgTrp 3781 CTCACAGAGGCCCGGAAGAAGACTGTTATGATGATCCCATCGCCAACTACCCCACGGCAG LeuThrGluAlaArgLysLysThrValMetMetIleProSerProThrThrProArgGln 3841 GTACGTGAGTTTCTGGGGACTGCTGGCTTTTGTAGACTCTGGATTCCAGGCTTTGCAACC ValArgGluPheLeuGlyThrAlaGlyPheCysArgLeuTrpIleProGlyPheAlaThr 3901 CTAGCAGCACCTCTATATCCTTTGACTAAGGAAGGGTTTCCTTTTGAGTGGAAAGAAGAG LeuAlaAlaProLeuTyrProLeuThrLysGluGlyPheProPheGluTrpLysGluGlu 3961 CACCAAAGAGCTTTTGAGGCTATCAAGTCGTCTCTAATGACTGCCCCCGCGCTAGCATTA HisGlnArgAlaPheGluAlaIleLysSerSerLeuMetThrAlaProAlaLeuAlaLeu 4021 CCAGACTTGACTAAGCCTTTCGTCCTATATGTGGACGAGAGAGCGGGTGTAGCCAGGGGA ProAspLeuThrLysProPheValLeuTyrValAspGluArgAlaGlyValAlaArgGly 4081 GTGTTGACACAAGCACTGGGACCCTGGAAGAGACCTGTAGCCTATTTGTCAAAGAAATTA ValLeuThrGlnAlaLeuGlyProTrpLysArgProValAlaTyrLeuSerLysLysLeu 4141 GATCCCGTTGCTAGTGGATGGCCCACATGCCTGAAAGCTATTGCGGCAATGGCCCTGCTG AspProValAlaSerGlyTrpProThrCysLeuLysAlaIleAlaAlaMetAlaLeuLeu 4201 ATCAAAGATGCTGACAAATTGACAATGGGACAACAGGTGACTGTTGTGGCCCCTCATGCC IleLysAspAlaAspLysLeuThrMetGlyGlnGlnValThrValValAlaProHisAla 4261 TTGGAAAGTATCGTGCGGCAGCCACCTGACAGATGGATGACAAATGCCCGAATGACACAC LeuGluSerIleValArgGlnProProAspArgTrpMetThrAsnAlaArgMetThrHis 4321 TATCAGAGCTTGCTGCTAAATGAGCGTGTAACCTTTGCGCCCCCTGCCATCCTCAACCCA TyrGlnSerLeuLeuLeuAsnGluArgValThrPheAlaProProAlaIleLeuAsnPro 4381 GCTACCCTTCTCCCTCTAACAAATGATTCCGTCCCAGTACATCAATGTACAGACATCCTC AlaThrLeuLeuProLeuThrAsnAspSerValProValHisGlnCysThrAspIleLeu 4441 GCTGAAGAGACTGGGACCAGAAGAGACCTGACTGACCAACCCTGGCCTGGAGCTCCCAGT AlaGluGluThrGlyThrArgArgAspLeuThrAspGlnProTrpProGlyAlaProSer 4501 TGGTATACGGATGGCAGCAGTTTCCTGATAGAGGGGAAGCGAAAGGCTGGAGCTGCGGTG TrpTyrThrAspGlySerSerPheLeuIleGluGlyLysArgLysAlaGlyAlaAlaVal 4561 GTGGACGGGAAAAAGGTAATTTGGGCAAGCGCTTTGCCTGAAGGAACGTCGGCACAAAAG ValAspGlyLysLysValIleTrpAlaSerAlaLeuProGluGlyThrSerAlaGlnLys 4621 GCTGAACTTATAGCACTTATACAAGCCCTCCGAGAGGCTAAAGGTAAGATCGTTAACATC AlaGluLeuIleAlaLeuIleGlnAlaLeuArgGluAlaLysGlyLysIleValAsnIle 4681 TACACTGACAGCCGCTATGCTTTTGCTACCGCACACATCCATGGGGCCATCTACAGGCAG TyrThrAspSerArgTyrAlaPheAlaThrAlaHisIleHisGlyAlaIleTyrArgGln 4741 CGAGGGCTATTGACTTCGGCTGGTAAAGACATTAAAAACAAAGAAGAAATTCTGGCCCTG ArgGlyLeuLeuThrSerAlaGlyLysAspIleLysAsnLysGluGluIleLeuAlaLeu 4801 TTGGAAGCCATACATGCACCTAAGAAGGTAGCCATCATCCACTGCCCCGGCCACCAAAGA LeuGluAlaIleHisAlaProLysLysValAlaIleIleHisCysProGlyHisGlnArg 4861 GGAGAAGACTTGGTGGCCAAGGGCAACCGAATGGCAGACTCAGTGGCAAAACAAGTTGCT GlyGluAspLeuValAlaLysGlyAsnArgMetAlaAspSerValAlaLysGlnValAla 4921 CAAGGGGCCATGATCTTAACTGAAAAAGGTGATCCACCCAAAAGCCCTGAGGATGAGAGG GlnGlyAlaMetIleLeuThrGluLysGlyAspProProLysSerProGluAspGluArg 4981 TATAACATAAAAGAGCTATTGTGGACCAGTGATCCCCTCCCATACTTTTTTGAAGGGAAA TyrAsnIleLysGluLeuLeuTrpThrSerAspProLeuProTyrPhePheGluGlyLys 5041 ATAGAATTGACTCCCGAAGAAGGAATAAAATTTGTGAAAGGACTACACCAATTCACCCAC IleGluLeuThrProGluGluGlyIleLysPheValLysGlyLeuHisGlnPheThrHis 5101 CTGGGAGTTGAAAAAATGATGAGACTAATTAAGAATTCCCGATACCAAGTCCCCAACCTG LeuGlyValGluLysMetMetArgLeuIleLysAsnSerArgTyrGlnValProAsnLeu 5161 AAGTCAGTGGCTCAAAAGATTATAGACTCCTGCAAACCATGTGCATTCACTAATGCGACT LysSerValAlaGlnLysIleIleAspSerCysLysProCysAlaPheThrAsnAlaThr 5221 AAAGCCTACAAAGAACCTGGAAAGAGACAACGGGGAGACCGTCCTGGAGTGTATTGGGAG LysAlaTyrLysGluProGlyLysArgGlnArgGlyAspArgProGlyValTyrTrpGlu 5281 GTAGATTTTACTGAAGTTAAACCTGGAATGTATGGTAACAAGTATCTGTTAGTATTTGTA ValAspPheThrGluValLysProGlyMetTyrGlyAsnLysTyrLeuLeuValPheVal 5341 GACACTTTTTCAGGATGGGTTGAGGCGTTTCCCACTAAAACTGAGACTGCCCAGATTGTG AspThrPheSerGlyTrpValGluAlaPheProThrLysThrGluThrAlaGlnIleVal 5401 GCCAAGAAGATCCTTGAAGAAATCCTGCCAAGATTTGGAATCCCTAAGGTAATCGGGTCC AlaLysLysIleLeuGluGluIleLeuProArgPheGlyIleProLysValIleGlySer 5461 GATAATGGACCAGCCTTTGTTGCCCAGGTAAGTCAGGGCTTGGCCACTCAGTTGGGCATC AspAsnGlyProAlaPheValAlaGlnValSerGlnGlyLeuAlaThrGlnLeuGlyIle 5521 GATTGGAAATTACACTGTGCTTACCGCCCTCAAAGCTCAGGACAGGTAGAGAGGATGAAT AspTrpLysLeuHisCysAlaTyrArgProGlnSerSerGlyGlnValGluArgMetAsn 5581 AGGACCTTAAAAGAGACCTTGACTAAATTAGCCATTGAGACCGGCGGGAAAGACTGGGTG ArgThrLeuLysGluThrLeuThrLysLeuAlaIleGluThrGlyGlyLysAspTrpVal 5641 GCTCTCCTTCCTCTTGCGCTCTTCCGAGCCCGAAACACCCCTGGACGTTTCGGGCTCACT AlaLeuLeuProLeuAlaLeuPheArgAlaArgAsnThrProGlyArgPheGlyLeuThr 5701 CCTTTTGAAGTTCTGTATGGAGGACCTCCCCCTTTAATGGAAGCTGGTGGAACATTGGTT ProPheGluValLeuTyrGlyGlyProProProLeuMetGluAlaGlyGlyThrLeuVal splice acceptor site | 5761 TCCGGCTCTGACCCTGTCTTACCCTCCTCTTTGCTTATTCATTTAAAGGCCCTAGAAGTG SerGlySerAspProValLeuProSerSerLeuLeuIleHisLeuLysAlaLeuGluVal 5821 ATTAGGACCCAGATTTGGGACCAACTGAAGGCAGCCTATACCCCAGGGACCACCGCAGTA IleArgThrGlnIleTrpAspGlnLeuLysAlaAlaTyrThrProGlyThrThrAlaVal 5881 CCCCACGGGTTCCGAGTTGGAGATAAAGTCTTGGTCAGACGGCATCGAACCGGCAGCCTC ProHisGlyPheArgValGlyAspLysValLeuValArgArgHisArgThrGlySerLeu 5941 GAGCCACAGTGGAAGGGACCCTATTTGGTGTTACTGACAACCCCTACTGCGGTAAAAGTC GluProGlnTrpLysGlyProTyrLeuValLeuLeuThrThrProThrAlaValLysVal |env-> 6001 GACGGGATTGCCTCCTGGATCCACGCCTCCCACGTCAAGAGGGCCGCAAGTCAAGATGAA AspGlyIleAlaSerTrpIleHisAlaSerHisValLysArgAlaAlaSerGlnAspGlu MetLys 6061 GAAAACCATGAAGACAATTGGACAGTGGCAGCCACTGACAATCCTCTTAAGCTTCGTTTG GluAsnHisGluAspAsnTrpThrValAlaAlaThrAspAsnProLeuLysLeuArgLeu LysThrMetLysThrIleGlyGlnTrpGlnProLeuThrIleLeuLeuSerPheValCys <-pol| 6121 TGCCGCAGGCGCCACCCTGAGCCTAGGGAACCATAACCCTCATGCTCCAATTCAACAGTC CysArgArgArgHisProGluProArgGluProEnd AlaAlaGlyAlaThrLeuSerLeuGlyAsnHisAsnProHisAlaProIleGlnGlnSer 6181 TTGGGAAGTGCTTAATGAGGAGGGAAACATTGTGTGGGCAACCACTGCAGTCCATCCCCT TrpGluValLeuAsnGluGluGlyAsnIleValTrpAlaThrThrAlaValHisProLeu 6241 CTGGACTTGGTGGCCTGATCTCACACCTGACATCTGTAAGTTAGTGGCAGGATCCACCAA TrpThrTrpTrpProAspLeuThrProAspIleCysLysLeuValAlaGlySerThrLys 6301 ATGGGACCTCCCTGATCATACCGATCTTAGTAACCCACCCCCTGAAGAGCGGTGTGTCCC TrpAspLeuProAspHisThrAspLeuSerAsnProProProGluGluArgCysValPro 6361 AAACGGGATAGGGAGCACATATTGGTGTTCGGGGCAGTTTTACCGAGCTAATCTTAGAGC AsnGlyIleGlySerThrTyrTrpCysSerGlyGlnPheTyrArgAlaAsnLeuArgAla 6421 TGCACAATTTTATGTTTGCCCTGGTCAGGGTCAGAGCAAAAGGCTTCAACGAGAATGTGG AlaGlnPheTyrValCysProGlyGlnGlyGlnSerLysArgLeuGlnArgGluCysGly 6481 AGGGGCATCAGATTACTTTTGTGGTAAATGGACATGTGAAACGACAGGGGAAGCTTACTG GlyAlaSerAspTyrPheCysGlyLysTrpThrCysGluThrThrGlyGluAlaTyrTrp 6541 GAAGCCCTCCTCTGACTGGGACCTAATCACGGTAAAACGAGGAAGTGGCTATGATAGGTC LysProSerSerAspTrpAspLeuIleThrValLysArgGlySerGlyTyrAspArgSer 6601 AAACGAAGGAGAAAGAAACCCCTATAAATATCCAGAGAATGGGTGCGCTTTTAAAAACAG AsnGluGlyGluArgAsnProTyrLysTyrProGluAsnGlyCysAlaPheLysAsnSer 6661 CCCCCCAGGACCATGCAAAGGTAAATACTGCAACCCCCTACTTATAAAGTTCACCGAGAA ProProGlyProCysLysGlyLysTyrCysAsnProLeuLeuIleLysPheThrGluLys 6721 AGGGAAACAACACCGTCTAAGTTGGCTTAAAGGAAATAGGTGGGGTTGGCGAGTATACCT GlyLysGlnHisArgLeuSerTrpLeuLysGlyAsnArgTrpGlyTrpArgValTyrLeu 6781 TCCACTAAGAGATCCTGGGTTCATTTTCACGATCAGGCTGACAGTGAGAGACCTGGCGGT ProLeuArgAspProGlyPheIlePheThrIleArgLeuThrValArgAspLeuAlaVal 6841 GACACCTGTTGGGCCCAACAAGGTCCTTATAGAACAGGGCCCCCCAGTCGTACCGGCTCC ThrProValGlyProAsnLysValLeuIleGluGlnGlyProProValValProAlaPro 6901 CCCAAAGGTCCCAGCCGTACCAGCTCCACCAACTCCACAGCCCAACATAGTGGTACCCTC ProLysValProAlaValProAlaProProThrProGlnProAsnIleValValProSer 6961 CCTAGGGACTAATACTCCCCTCATAAAGCCTACCTTGGCTTCCCCACCGCCCCTAGGTAC LeuGlyThrAsnThrProLeuIleLysProThrLeuAlaSerProProProLeuGlyThr 7021 AGAGGACCGTCTGGTCAGTCTACTCCAGGGAGCTTTTTTAGCTTTAAATAGAACTAACCC GluAspArgLeuValSerLeuLeuGlnGlyAlaPheLeuAlaLeuAsnArgThrAsnPro 7081 TAATATGACTCAATCATGCTGGTTATGCTATACCTCTAGCCCCCCTTATTATGAAGGAAT AsnMetThrGlnSerCysTrpLeuCysTyrThrSerSerProProTyrTyrGluGlyIle 7141 AGCTCAGATCAGGACTTATAATATTACTTCAGATCATTCTCAATGTCTTTGGGGAGAAAA AlaGlnIleArgThrTyrAsnIleThrSerAspHisSerGlnCysLeuTrpGlyGluAsn 7201 CAGAAAGTTGACTCTGGCAGCAGTTTCAGGAAGAGGGCTTTGTTTGGGCCAGGTACCTCA ArgLysLeuThrLeuAlaAlaValSerGlyArgGlyLeuCysLeuGlyGlnValProGln 7261 GGATAAAGGGCACCTCTGTAATCAGACCCAGAACATCCAGTCTAGCAAAAGTGGTCAGTA AspLysGlyHisLeuCysAsnGlnThrGlnAsnIleGlnSerSerLysSerGlyGlnTyr 7321 TCTAGTGCCCCCCTTAGACACAGTATGGGCTTGCAATACCGGTCTCACTCCTTGTGTGTC LeuValProProLeuAspThrValTrpAlaCysAsnThrGlyLeuThrProCysValSer 7381 TATGTCTGTTTTTAATAGTTCCAAAGATTTCTGCATTTTGGTTCAGCTTATTCCTAGACT MetSerValPheAsnSerSerLysAspPheCysIleLeuValGlnLeuIleProArgLeu 7441 CCTGTATCATGATGATAGCTCCTTTTTAGACAAATTTGAGCGTTGGGTCCGCTGGAGAAG LeuTyrHisAspAspSerSerPheLeuAspLysPheGluArgTrpValArgTrpArgArg 7501 AGAGCCCGTTACCCTAACTTTGGCAGTTCTATTAGGATTAGGAGTAGCGGCTGGAGTAGG GluProValThrLeuThrLeuAlaValLeuLeuGlyLeuGlyValAlaAlaGlyValGly 7561 TACAGGAACCGCTGCCTTAATTAAGACCCCCCAATACTATGAAGAACTACGTGCAGCTAT ThrGlyThrAlaAlaLeuIleLysThrProGlnTyrTyrGluGluLeuArgAlaAlaMet 7621 GGATGTTGATCTTAGAACTATAGAACAGTCTATAACCAAATTAGAAGAATCTTTAACTTC AspValAspLeuArgThrIleGluGlnSerIleThrLysLeuGluGluSerLeuThrSer 7681 CCTGTCCGAAGTGGTGCTACAGAAAGGAAGGGGATTAGACTTATTATTCCTTAAAGAAGG LeuSerGluValValLeuGlnLysGlyArgGlyLeuAspLeuLeuPheLeuLysGluGly 37741 AGGACTCTGTGCTGCCCTAAAAGAAGAATGTTGTTTTTATGTTGACCATTCAGGAGTAAT GlyLeuCysAlaAlaLeuLysGluGluCysCysPheTyrValAspHisSerGlyValIle 7801 CAAAGATTCTATGGCCAAACTTAGAGAACGCCTAGATATACGTAAAAGAGAAAGAGAAAG LysAspSerMetAlaLysLeuArgGluArgLeuAspIleArgLysArgGluArgGluSer 7861 CCAACAAGGATGGTTCGAAAGCTGGTTTAATAAGTCCCCTTGGCTCACCACTCTCCTCTC GlnGlnGlyTrpPheGluSerTrpPheAsnLysSerProTrpLeuThrThrLeuLeuSer 7921 CACCATAGCAGGACCTTTGATTACTCTTATGCTTTTGCTTACTTTTGGCCCCTGCATCCT ThrIleAlaGlyProLeuIleThrLeuMetLeuLeuLeuThrPheGlyProCysIleLeu 7981 TAATAAGTTAGTAGCTTTTATTAGAGAAAGGATAAATGCAGTACAGGTTATGGTACTAAG AsnLysLeuValAlaPheIleArgGluArgIleAsnAlaValGlnValMetValLeuArg <-env| 8041 GCAACAATATCGGGTCCTTCAGGAGGTTGAAAACTCGCTCTAAGATTAGAGCTATTTCCT GlnGlnTyrArgValLeuGlnGluValGluAsnSerLeuEnd PPT |U3-> 8101 AAAAAGAGTGGGGAATGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGAC 8161 CCCTTCCATCTAGAGAGTGTTCCCAGAACACTCCTGAACTCTTCACCCTAGAATGCATTC 8221 CTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAACTAAAGACTGTTCCAAGAACATTTT 8281 TGAGATAAGGGCCTCCTGGAACAACCTCAGAATAAACCGGGTACATTGCCAAATAATAGG 8341 ACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGAACCCCTTGTCCCTTGGCAGAACCC promoter 8401 CTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCTCTCCCGCCTTGAGTTTTCCTATAT <-U3||R-> 8461 AAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCCTCTACACCACTAGGTGTATGAGTT inverse palindrome polyA signal 8521 TCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCTGCTGCTTTATTAAATCTTGCCTTC pol motif <-R||U5-> 8581 AACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTCCGCGGCTGTCCCGAGGCTTGAGTG <-U5| 8641 AGGGTCTCCCTTCGGGGGTCTTTCA