Select Page

Please read the FE.dox to see the 5 questions that need to be done. 
Price can negotiate.

Rules: The only modules allowed for this exam are the os and re modules.
Q1) Write a program that asks the user for a file containing a FASTA nucleotide sequence (included is a file called sequence.fasta you can use). Then prompt the user to select from the following menu:
A. Calculate DNA composition: This will print to the screen the numbers of A, G, C and T n_u_c_l_e_o_t_i_d_e_s_,_ _a_n_d_ _a_n_y_ _u_n_k_n_o_w_n_s_ _(_N_’s_)_._ _
B. Calculate AT content: Prints to the screen the percentage of AT in the sequence.
C. Calculate GC content: Prints to the screen the percentage of GC in the sequence.
D. Compliment: Prints to the screen the compliment of the DNA sequence.
E. Reverse compliment: Prints to the screen the reverse compliment.

Each menu item above should be implemented in its own function. The function should be called when the user selects the respective menu item. The functions should accept as argument the DNA sequence and then perform the appropriate calculationsalgorithm.
Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file.

Q2) Write a program that asks the user for a file containing a FASTA nucleotide sequence (you can use the same sequence.fasta file as above). Then prompt the user to select a frame (number 1 through 6). Your program should then find the translation (protein sequence) of the nucleotide sequence in that frame. Print the translation to the screen.
Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file.

Q3) Write a program that asks the user for a sequence in GenBank format (included is a file called sequence.gb that you can use). Your program should convert the GenBank formatted sequence into FASTA format. Write the FASTA formatted sequence to a file, name of which should include the accession number (i.e. NM_001250672.txt, where NM_001250672 is the accession number).
Q4) Write a program that asks the user for a file containing a nucleotide sequence AND the name of a restriction enzyme. Your program should return the positions in the sequence where the enzyme cuts. Parse out the enzymes and their cut sites from the attached RestrictionEnzymes.txt file.

Q5) Read in a whole genome (in FASTA format – _file called genome.txt, see attached) and compute the background codon frequencies. The background frequency of a codon is computed by the formula: background_frq(codon) = 100 * N(codon)/ Total_codons where N(codon) is the number of occurrence of the codon across the entire genome, and Total_codons is the total number of all codons in the whole genome. Print out the background frequency of each codon, from AAA to TTT. Use a dictionary in your solution. Your program should count codons that appear in all reading frames and then calculate and display the average.

>NM_001250672.2 Glycine max cationic peroxidase 2 (PRX2), mRNA
GAGCAAGAGTGAAGAGCGAAGAGAATGGCTCCCAAGGGTTTAATCTTTTTGGCTGTGTTATGCTTCTCAG
CACTGTCACTGAGTCGTTGTCTTGCGGAGGATAATGGACTTGTTATGAACTTCTACAAGGAATCATGCCC
TCAGGCTGAAGACATCATCAAAGAACAAGTCAAGCTTCTCTACAAGCGCCACAAGAACACTGCTTTCTCC
TGGCTCAGAAACATCTTCCATGACTGTGCTGTTCAGAGTTGTGATGCTTCACTGTTGCTGGACTCCACAA
GAAGGAGCTTGTCTGAGAAGGAAACAGATAGAAGCTTTGGGTTGAGAAATTTCAGGTACATTGAGACCAT
CAAAGAAGCTTTGGAAAGGGAATGCCCAGGAGTTGTTTCCTGTGCTGATATCCTCGTTCTCTCTGCCAGA
GATGGCATTGTTTCGCTAGGAGGTCCCCATATCCCTCTTAAAACAGGAAGAAGGGATGGTAGAAGGAGCA
GAGCCGATGTGGTAGAGCAGTTCCTCCCAGACCACAATGAATCCATTTCTGCAGTTCTTGACAAGTTTGG
TGCCATGGGAATTGACACCCCCGGCGTAGTTGCATTGCTTGGAGCACACAGTGTTGGTCGAACCCATTGT
GTGAAGTTGGTGCACCGTTTGTACCCAGAGATTGATCCAGCTCTGAACCCTGACCACGTCCCTCACATTC
TGAAGAAGTGCCCTGATGCCATTCCAGACCCTAAGGCCGTGCAGTACGTGAGAAACGACCGTGGCACCCC
CATGATTCTAGACAACAATTACTACAGAAATATATTGGACAACAAGGGCTTGTTGATAGTGGATCACCAA
CTAGCCAATGACAAGAGGACCAAGCCTTATGTGAAGAAAATGGCCAAGAGCCAGGACTATTTCTTCAAGG
AGTTTTCTAGAGCCATTACTTTGCTCTCTGAGAACAACCCTCTCACTGGCACAAAGGGTGAGATCAGAAA
GCAGTGCAATGCTGCCAACAAGCACCATGAGGAGCCTTAATTGCTTCCCGCTTAATTTGGGCCTTGAATT
TTCTTCCCCTTCTCTATGTGGAAGAAATCTGTAAGATATTATGCAAAAAATAATTAAGGTGTTTTTCTTT
AAATGGGTTGGTTGATTGGTTCAATGAACCGATCAAGACCACAGCAGGTTCATGGGGATGCGAGGATTAA
GACGCTTTGTTTTTTAATCTTCCGATGTCACTCTTGTTTGTTAGTTTGTTTTTTTATTTTTTATTTCAAT
AAGTACTGTGCAAGTAGGTTAGAGTTGGGTAGAAGGGCATGTTCATGGTGTTAATTACTATGTTATGTAT
GCATGTGAGTGCTGCTATCGATGGCAAGATGTCAATGTATGCGTGTAGTGCTGTTATCGATGAGAGTGAA
AATGTTTATGATATCCACACTAATAAAGCTAGCTTGCTCTTGCTACATAATAAATAAATCATGGCCCACG
GTCATTATACAAAAAAAAAAAAAAAAAA

AanI TTA’TAA

AarI CACCTGCNNNN’NNNN

AasI GACNNNN’NNGTC

AatI AGG’CCT

AatII GACGT’C

AbsI CC’TCGAGG

AccI GT’MKAC

AccII CG’CG

AccIII T’CCGGA

Acc16I TGC’GCA

Acc36I ACCTGCNNNN’NNNN

Acc65I G’GTACC

AccB1I G’GYRCC

AccB7I CCANNNN’NTGG

AccBSI CCG’CTC

AceIII CAGCTCNNNNNNN’NNNN

AciI C’CGC

AclI AA’CGTT

AclWI GGATCNNNN’N

AcoI Y’GGCCR

AcsI R’AATTY

AcuI CTGAAGNNNNNNNNNNNNNNNN’

AcvI CAC’GTG

AcyI GR’CGYC

AdeI CACNNN’GTG

AfaI GT’AC

AfeI AGC’GCT

AfiI CCNNNNN’NNGG

AflII C’TTAAG

AflIII A’CRYGT

AgeI A’CCGGT

AgsI TTS’AA

AhdI GACNNN’NNGTC

AhlI A’CTAGT

AjiI CAC’GTC

AjnI ‘CCWGG

AjuI GAANNNNNNNTTGGNNNNNNNNNNN’

AleI CCAANNNNNNNTTCNNNNNNNNNNNN’

AlfI CACNN’NNGTG

AloI GCANNNNNNTGCNNNNNNNNNNNN’

AluI GAACNNNNNNTCCNNNNNNNNNNNN’

AluBI GGANNNNNNGTTCNNNNNNNNNNNN’

AlwI AG’CT

Alw21I AG’CT

Alw26I GGATCNNNN’N

Alw44I GWGCW’C

AlwFI GTCTCN’NNNN

AlwNI G’TGCAC

Ama87I GAAAYNNNNNRTG

Aor13HI CAGNNN’CTG

Aor51HI C’YCGRG

AoxI T’CCGGA

ApaI AGC’GCT

ApaBI ‘GGCC

ApaLI GGGCC’C

ApeKI GCANNNNN’TGC

ApoI G’TGCAC

ApyPI G’CWGC

AquII R’AATTY

AquIII ATCGACNNNNNNNNNNNNNNNNNNNN’

AquIV GCCGNACNNNNNNNNNNNNNNNNNNNN’

ArsI GAGGAGNNNNNNNNNNNNNNNNNNNN’

AscI GRGGAAGNNNNNNNNNNNNNNNNNNN’

AseI GACNNNNNNTTYGNNNNNNNNNNN’

Asi256I CRAANNNNNNGTCNNNNNNNNNNNNN’

AsiGI GG’CGCGCC

AsiSI AT’TAAT

AspI G’ATC

Asp700I A’CCGGT

Asp718I GCGAT’CGC

AspA2I GACN’NNGTC

AspCNI GAANN’NNTTC

AspEI G’GTACC

AspLEI C’CTAGG

AspS9I GCCGC

AssI GACNNN’NNGTC

AsuII GCG’C

AsuC2I G’GNCC

AsuHPI AGT’ACT

AsuNHI TT’CGAA

AvaI CC’SGG

AvaII GGTGANNNNNNNN’

AvaIII G’CTAGC

AviII C’YCGRG

AvrII G’GWCC

AxyI ATGCAT

BaeI TGC’GCA

BaeGI C’CTAGG

BalI CC’TNAGG

BamHI ACNNNNGTAYCNNNNNNNNNNNN’

BanI GRTACNNNNGTNNNNNNNNNNNNNNN’

BanII GKGCM’C

BanIII TGG’CCA

BarI G’GATCC

BasI G’GYRCC

BauI GRGCY’C

BbeI AT’CGAT

Bbr7I GAAGNNNNNNTACNNNNNNNNNNNN’

BbrPI GTANNNNNNCTTCNNNNNNNNNNNN’

BbsI CCANNNN’NTGG

BbuI C’ACGAG

BbvI GGCGC’C

Bbv12I GAAGACNNNNNNN’NNNN

BbvCI CAC’GTG

BccI GAAGACNN’NNNN

BceAI GCATG’C

BcefI GCAGCNNNNNNNN’NNNN

BcgI GWGCW’C

BciVI CC’TCAGC

BclI CCATCNNNN’N

BcnI ACGGCNNNNNNNNNNNN’NN

BcuI ACGGCNNNNNNNNNNNN’N

BdaI CGANNNNNNTGCNNNNNNNNNNNN’

BfaI GCANNNNNNTCGNNNNNNNNNNNN’

BfiI GTATCCNNNNNN’

BfmI T’GATCA

BfoI CC’SGG

BfrI A’CTAGT

BfuI TGANNNNNNTCANNNNNNNNNNNN’

BfuAI C’TAG

BfuCI ACTGGGNNNNN’

BglI C’TRYAG

BglII RGCGC’Y

BisI C’TTAAG

BlnI GTATCCNNNNNN’

BlpI ACCTGCNNNN’NNNN

BlsI ‘GATC

BmcAI GCCNNNN’NGGC

Bme18I A’GATCT

Bme1390I GC’NGC

BmeRI C’CTAGG

BmeT110I GC’TNAGC

BmgI GCN’GC

BmgBI AGT’ACT

BmgT120I G’GWCC

BmiI CC’NGG

BmrI GACNNN’NNGTC

BmrFI C’YCGRG

BmsI GKGCCC

BmtI CAC’GTC

BmuI GG’NCC

BoxI GGN’NCC

BpiI ACTGGGNNNNN’

BplI CC’NGG

BpmI GCATCNNNNN’NNNN

Bpu10I GCTAG’C

Bpu14I ACTGGGNNNNN’

Bpu1102I GACNN’NNGTC

BpuAI GAAGACNN’NNNN

BpuEI GAGNNNNNCTCNNNNNNNNNNNNN’

BpuMI CTGGAGNNNNNNNNNNNNNNNN’

BpvUI CC’TNAGC

BsaI TT’CGAA

Bsa29I GC’TNAGC

BsaAI GAAGACNN’NNNN

BsaBI CTTGAGNNNNNNNNNNNNNNNN’

BsaHI CC’SGG

BsaJI CGAT’CG

BsaMI GGTCTCN’NNNN

BsaWI AT’CGAT

BsaXI YAC’GTR

BsbI GATNN’NNATC

Bsc4I GR’CGYC

BscAI C’CNNGG

BscGI GAATGCN’

Bse1I W’CCGGW

Bse8I ACNNNNNCTCCNNNNNNNNNN’

Bse21I GGAGNNNNNGTNNNNNNNNNNNN’

Bse118I CAACACNNNNNNNNNNNNNNNNNNNNN’

BseAI CCNNNNN’NNGG

BseBI GCATCNNNN’NN

BseCI CCCGT

BseDI ACTGGN’

Bse3DI GATNN’NNATC

BseGI CC’TNAGG

BseJI R’CCGGY

BseLI T’CCGGA

BseMI CC’WGG

BseMII AT’CGAT

BseNI C’CNNGG

BsePI GCAATGNN’

BseRI GGATGNN’

BseSI GATNN’NNATC

BseXI CCNNNNN’NNGG

BseX3I GCAATGNN’

BseYI CTCAGNNNNNNNNNN’

BsgI ACTGGN’

Bsh1236I G’CGCGC

Bsh1285I GAGGAGNNNNNNNNNN’

BshFI GKGCM’C

BshNI GCAGCNNNNNNNN’NNNN

BshTI C’GGCCG

BshVI C’CCAGC

BsiEI GTGCAGNNNNNNNNNNNNNNNN’

BsiHKAI CG’CG

BsiHKCI CGRY’CG

BsiSI GG’CC

BsiWI G’GYRCC

BslI A’CCGGT

BslFI AT’CGAT

BsmI CGRY’CG

BsmAI GWGCW’C

BsmBI C’YCGRG

BsmFI C’CGG

BsnI C’GTACG

Bso31I CCNNNNN’NNGG

BsoBI GGGACNNNNNNNNNN’NNNN

Bsp13I GAATGCN’

Bsp19I GTCTCN’NNNN

Bsp24I CGTCTCN’NNNN

Bsp68I GGGACNNNNNNNNNN’NNNN

Bsp119I GG’CC

Bsp120I GGTCTCN’NNNN

Bsp143I C’YCGRG

Bsp1286I T’CCGGA

Bsp1407I C’CATGG

Bsp1720I GACNNNNNNTGGNNNNNNNNNNNN’

BspACI CCANNNNNNGTCNNNNNNNNNNNNN’

BspCNI TCG’CGA

BspDI TT’CGAA

BspD6I G’GGCCC

BspEI ‘GATC

BspFNI GDGCH’C

BspGI T’GTACA

BspHI GC’TNAGC

BspLI C’CGC

BspMI CTCAGNNNNNNNNN’

BspNCI AT’CGAT

BspOI GACTCNNNN’NN

BspPI T’CCGGA

BspQI CG’CG

BspTI CTGGAC

BspT104I T’CATGA

BspT107I GGN’NCC

BspTNI ACCTGCNNNN’NNNN

BsrI CCAGA

BsrBI GCTAG’C

BsrDI GGATCNNNN’N

BsrFI GCTCTTCN’NNN

BsrGI C’TTAAG

BsrSI TT’CGAA

BssAI G’GYRCC

BssECI GGTCTCN’NNNN

BssHII ACTGGN’

BssKI CCG’CTC

BssMI GCAATGNN’

BssNI R’CCGGY

BssNAI T’GTACA

BssSI ACTGGN’

BssT1I R’CCGGY

Bst6I C’CNNGG

Bst98I G’CGCGC

Bst1107I ‘CCNGG

BstACI ‘GATC

BstAFI GR’CGYC

BstAPI GTA’TAC

BstAUI C’ACGAG

BstBI C’CWWGG

Bst2BI CTCTTCN’NNN

BstBAI C’TTAAG

Bst4CI GTA’TAC

BstC8I GR’CGYC

BstDEI C’TTAAG

BstDSI GCANNNN’NTGC

BstEII T’GTACA

BstENI TT’CGAA

BstF5I C’ACGAG

BstFNI YAC’GTR

BstH2I ACN’GT

BstHHI GCN’NGC

BstKTI C’TNAG

BstMAI C’CRYGG

BstMBI G’GTNACC

BstMCI CCTNN’NNNAGG

BstMWI GGATGNN’

BstNI CG’CG

BstNSI RGCGC’Y

BstOI GCG’C

BstPI GAT’C

BstPAI GTCTCN’NNNN

BstSCI ‘GATC

BstSFI CGRY’CG

BstSLI GCNNNNN’NNGC

BstSNI CC’WGG

BstUI RCATG’Y

Bst2UI CC’WGG

BstV1I G’GTNACC

BstV2I GACNN’NNGTC

BstXI ‘CCNGG

BstX2I C’TRYAG

BstYI GKGCM’C

BstZI TAC’GTA

BstZ17I CG’CG

BsuI CC’WGG

Bsu15I GCAGCNNNNNNNN’NNNN

Bsu36I GAAGACNN’NNNN

BsuRI CCANNNNN’NTGG

BsuTUI R’GATCY

BtgI R’GATCY

BtgZI C’GGCCG

BthCI GTA’TAC

BtrI GTATCCNNNNNN’

BtsI AT’CGAT

BtsCI CC’TNAGG

BtuMI GG’CC

BveI AT’CGAT

Cac8I C’CRYGG

CaiI GCGATGNNNNNNNNNN’NNNN

CciI GCNG’C

CciNI CAC’GTC

CdiI GCAGTGNN’

CdpI GGATGNN’

CelII TCG’CGA

CfoI ACCTGCNNNN’NNNN

CfrI GCN’NGC

Cfr9I CAGNNN’CTG

Cfr10I T’CATGA

Cfr13I GC’GGCCGC

Cfr42I CATC’G

ChaI GCGGAGNNNNNNNNNNNNNNNNNNNN’

CjeI GC’TNAGC

CjeNII GCG’C

CjePI Y’GGCCR

CjeP659IV C’CCGGG

CjuI R’CCGGY

CjuII G’GNCC

ClaI CCGC’GG

CpoI GATC’

CseI CCANNNNNNGTNNNNNNNNNNNNNNN’

CsiI ACNNNNNNTGGNNNNNNNNNNNNNN’

CspI GAGNNNNNGT

Csp6I CCANNNNNNNTCNNNNNNNNNNNNNN’

Csp45I GANNNNNNNTGGNNNNNNNNNNNNN’

CspAI CACNNNNNNNGAA

CspCI CAYNNNNNRTG

CstMI CAYNNNNNCTC

CviAII AT’CGAT

CviJI CG’GWCCG

CviKI-1 GACGCNNNNN’NNNNN

CviQI A’CCWGGT

DdeI CG’GWCCG

DinI G’TAC

DpnI TT’CGAA

DpnII A’CCGGT

DraI CAANNNNNGTGGNNNNNNNNNNNN’

DraII CCACNNNNNTTGNNNNNNNNNNNNN’

DraIII AAGGAGNNNNNNNNNNNNNNNNNNNN’

DraRI C’ATG

DrdI RG’CY

DrdII RG’CY

DrdIV G’TAC

DriI C’TNAG

DseDI GGC’GCC

EaeI GA’TC

EagI ‘GATC

Eam1104I TTT’AAA

Eam1105I RG’GNCCY

EarI CACNNN’GTG

EciI CAAGNACNNNNNNNNNNNNNNNNNNNN’

Ecl136II GACNNNN’NNGTC

EclXI GAACCA

Eco24I TACGACNNNNNNNNNNNNNNNNNNNN’

Eco31I GACNNN’NNGTC

Eco32I GACNNNN’NNGTC

Eco47I Y’GGCCR

Eco47III C’GGCCG

Eco52I CTCTTCN’NNN

Eco57I GACNNN’NNGTC

Eco72I CTCTTCN’NNN

Eco81I GGCGGANNNNNNNNNNN’

Eco88I GAG’CTC

Eco91I C’GGCCG

Eco105I GRGCY’C

Eco130I GGTCTCN’NNNN

Eco147I GAT’ATC

EcoHI G’GWCC

EcoICRI AGC’GCT

Eco57MI C’GGCCG

EcoNI CTGAAGNNNNNNNNNNNNNNNN’

EcoO65I CAC’GTG

EcoO109I CC’TNAGG

EcoRI C’YCGRG

EcoRII G’GTNACC

EcoRV TAC’GTA

EcoT14I C’CWWGG

EcoT22I AGG’CCT

EcoT38I ‘CCSGG

Eco53kI GAG’CTC

EgeI CTGRAGNNNNNNNNNNNNNNNN’

EheI CCTNN’NNNAGG

ErhI G’GTNACC

EsaBC3I RG’GNCCY

EsaSSI G’AATTC

Esp3I ‘CCWGG

FaeI GAT’ATC

FaiI C’CWWGG

FalI ATGCA’T

FaqI GRGCY’C

FatI GAG’CTC

FauI GGC’GCC

FauNDI GGC’GCC

FbaI C’CWWGG

FblI TC’GA

FinI GACCAC

FmuI CGTCTCN’NNNN

Fnu4HI CATG’

FokI YA’TR

FriOI AAGNNNNNCTTNNNNNNNNNNNNN’

FseI GGGACNNNNNNNNNN’NNNN

FspI ‘CATG

FspAI CCCGCNNNN’NN

FspBI CA’TATG

FspEI T’GATCA

Fsp4HI GT’MKAC

GdiII GGGAC

GlaI GGNC’C

GluI GC’NGC

GsaI GGATGNNNNNNNNN’NNNN

GsuI GRGCY’C

HaeI GGCCGG’CC

HaeII TGC’GCA

HaeIII RTGC’GCAY

HaeIV C’TAG

HapII CCNNNNNNNNNNNN’NNNN

HgaI GC’NGC

HgiEII C’GGCCR

HhaI GC’GC

Hin1I GC’NGC

Hin1II CCCAG’C

Hin4I CTGGAGNNNNNNNNNNNNNNNN’

Hin6I WGG’CCW

HinP1I RGCGC’Y

HincII GG’CC

HindII GAYNNNNNRTCNNNNNNNNNNNNNN’

HindIII GAYNNNNNRTCNNNNNNNNNNNNN’

HinfI C’CGG

HpaI GACGCNNNNN’NNNNN

HpaII ACCNNNNNNGGT

HphI GCG’C

Hpy8I GR’CGYC

Hpy99I CATG’

Hpy166II GAYNNNNNVTCNNNNNNNNNNNNN’

Hpy188I GABNNNNNRTCNNNNNNNNNNNNN’

Hpy188III G’CGC

HpyAV G’CGC

HpyCH4III GTY’RAC

HpyCH4IV GTY’RAC

HpyCH4V A’AGCTT

HpyF3I G’ANTC

HpyF10VI GTT’AAC

Hsp92I C’CGG

Hsp92II GGTGANNNNNNNN’

HspAI GTN’NAC

KasI CGWCG’

KflI GTN’NAC

KpnI TCN’GA

Kpn2I TC’NNGA

KspI CCTTCNNNNNN’

Ksp22I ACN’GT

KspAI A’CGT

Kzo9I TG’CA

LguI C’TNAG

LlaGI GCNNNNN’NNGC

LpnI GR’CGYC

LpnPI CATG’

Lsp1109I G’CGC

LweI G’GCGCC

MabI GG’GWCCC

MaeI GGTAC’C

MaeII T’CCGGA

MaeIII CCGC’GG

MalI T’GATCA

MaqI GTT’AAC

MauBI ‘GATC

MbiI GCTCTTCN’NNN

MboI CTNGAYG

MboII RGC’GCY

McaTI CCDGNNNNNNNNNN’NNNN

MfeI GCAGCNNNNNNNN’NNNN

MflI GCATCNNNNN’NNNN

MhlI A’CCWGGT

MjaIV C’TAG

MlsI A’CGT

MluI ‘GTNAC

MluNI GA’TC

MlyI CRTTGACNNNNNNNNNNNNNNNNNNNNN’

Mly113I CG’CGCGCG

MmeI CCG’CTC

MnlI ‘GATC

Mph1103I GAAGANNNNNNNN’

MreI GCGC’GC

MroI C’AATTG

MroNI R’GATCY

MroXI GDGCH’C

MscI GTNNAC

MseI TGG’CCA

MslI A’CGCGT

MspI TGG’CCA

Msp20I GAGTCNNNNN’

MspA1I GG’CGCC

MspCI TCCRACNNNNNNNNNNNNNNNNNNNN’

MspJI CCTCNNNNNNN’

MspR9I ATGCA’T

MssI CG’CCGGCG

MunI T’CCGGA

MvaI G’CCGGC

Mva1269I GAANN’NNTTC

MvnI TGG’CCA

MvrI T’TAA

MwoI CAYNN’NNRTG

NaeI C’CGG

NarI TGG’CCA

NciI CMG’CKG

NcoI C’TTAAG

NdeI CNNRNNNNNNNNN’NNNN

NdeII CC’NGG

NgoAVIII GTTT’AAAC

NgoMIV C’AATTG

NhaXI CC’WGG

NheI GAATGCN’

NlaIII CG’CG

NlaIV CGAT’CG

NlaCI GCNNNNN’NNGC

Nli3877I GCC’GGC

NmeAIII GG’CGCC

NmeDI CC’SGG

NmuCI C’CATGG

NotI CA’TATG

NruI ‘GATC

NsbI GACNNNNNTGANNNNNNNNNNNNN’

NsiI TCANNNNNGTCNNNNNNNNNNNNNN’

NspI G’CCGGC

NspV CAAGRAG

OliI G’CTAGC

PabI CATG’

PacI GGN’NCC

PaeI CATCACNNNNNNNNNNNNNNNNNNN’

PaeR7I CYCGR’G

PagI GCCGAGNNNNNNNNNNNNNNNNNNNNN’

PalAI RCCGGYNNNNNNN’NNNNN

PasI ‘GTSAC

PauI GC’GGCCGC

PceI TCG’CGA

PciI TGC’GCA

PciSI ATGCA’T

PcsI RCATG’Y

PctI TT’CGAA

PdiI CACNN’NNGTG

PdmI GTA’C

PfeI TTAAT’TAA

Pfl23II GCATG’C

Pfl1108I C’TCGAG

PflFI T’CATGA

PflMI GG’CGCGCC

PfoI CC’CWGGG

PhoI G’CGCGC

PinAI AGG’CCT

PlaDI A’CATGT

PleI GCTCTTCN’NNN

Ple19I WCGNNNN’NNNCGW

PmaCI GAATGCN’

PmeI GCC’GGC

PmlI GAANN’NNTTC

PpiI G’AWTC

PpsI C’GTACG

Ppu10I TCGTAG

Ppu21I GACN’NNGTC

PpuMI CCANNNN’NTGG

PscI T’CCNGGA

PshAI GG’CC

PshBI A’CCGGT

PsiI CATCAGNNNNNNNNNNNNNNNNNNNNN’

Psp03I GAGTCNNNN’N

Psp5II CGAT’CG

Psp6I CAC’GTG

Psp1406I GTTT’AAAC

Psp124BI CAC’GTG

PspCI GAACNNNNNCTCNNNNNNNNNNNNN’

PspEI GAGNNNNNGTTCNNNNNNNNNNNN’

PspGI GAGTCNNNN’N

PspLI A’TGCAT

PspN4I YAC’GTR

PspOMI RG’GWCCY

PspOMII A’CATGT

PspPI GACNN’NNGTC

PspPPI AT’TAAT

PspPRI TTA’TAA

PspXI GGWC’C

PsrI RG’GWCCY

PssI ‘CCWGG

PstI AA’CGTT

PstNI GAGCT’C

PsuI CAC’GTG

PsyI G’GTNACC

PteI ‘CCWGG

PvuI C’GTACG

PvuII GGN’NCC

RcaI G’GGCCC

RceI CGCCCARNNNNNNNNNNNNNNNNNNNN’

RgaI G’GNCC

RigI RG’GWCCY

RleAI CCYCAGNNNNNNNNNNNNNNN’

RpaBI VC’TCGAGB

RpaB5I GAACNNNNNNTACNNNNNNNNNNNN’

RruI GTANNNNNNGTTCNNNNNNNNNNNN’

RsaI RGGNC’CY

RsaNI CTGCA’G

RseI CAGNNN’CTG

RsrII R’GATCY

Rsr2I GACN’NNGTC

SacI G’CGCGC

SacII CGAT’CG

SalI CAG’CTG

SapI T’CATGA

SaqAI CATCGACNNNNNNNNNNNNNNNNNNNN’

SatI GCGAT’CGC

Sau96I GGCCGG’CC

Sau3AI CCCACANNNNNNNNNNNN’

SbfI CCCGCAGNNNNNNNNNNNNNNNNNNNN’

ScaI CGRGGACNNNNNNNNNNNNNNNNNNNN’

SchI TCG’CGA

SciI GT’AC

ScrFI G’TAC

SdaI CAYNN’NNRTG

SdeAI CG’GWCCG

SdeOSI CG’GWCCG

SduI GAGCT’C

SelI CCGC’GG

SetI G’TCGAC

SexAI GCTCTTCN’NNN

SfaAI T’TAA

SfaNI GC’NGC

SfcI G’GNCC

SfiI ‘GATC

SfoI CCTGCA’GG

Sfr274I AGT’ACT

Sfr303I GAGTCNNNNN’

SfuI CTC’GAG

SgeI CC’NGG

SgfI CCTGCA’GG

SgrAI CAGRAGNNNNNNNNNNNNNNNNNNNNN’

SgrBI GACNNNNRTGANNNNNNNNNNNN’

SgrDI TCAYNNNNGTCNNNNNNNNNNNNN’

SgsI GDGCH’C

SimI ‘CGCG

SinI ASST’

SlaI A’CCWGGT

SmaI GCGAT’CGC

SmiI GCATCNNNNN’NNNN

SmiMI C’TRYAG

SmlI GGCCNNNN’NGGCC

SmoI GGC’GCC

SmuI C’TCGAG

SnaI CCGC’GG

SnaBI TT’CGAA

SpeI CNNGNNNNNNNNN’NNNN

SphI GCGAT’CGC

SpoDI CR’CCGGYG

SrfI CCGC’GG

Sse9I CG’TCGACG

Sse8387I GG’CGCGCC

Sse8647I GG’GTC

SseBI G’GWCC

SsiI C’TCGAG

SspI CCC’GGG

SspDI ATTT’AAAT

SspD5I CAYNN’NNRTG

SstI C’TYRAG

SstII C’TYRAG

SstE37I CCCGCNNNN’NN

Sth132I GTATAC

Sth302II TAC’GTA

StrI A’CTAGT

StsI GCATG’C

StuI GCGGRAG

StyI GCCC’GGGC

StyD4I ‘AATT

SwaI CCTGCA’GG

TaaI AG’GWCCT

TaiI AGG’CCT

TaqI C’CGC

TaqII AAT’ATT

TasI G’GCGCC

TatI GGTGANNNNNNNN’

TauI GAGCT’C

TfiI CCGC’GG

TliI CGAAGACNNNNNNNNNNNNNNNNNNNN’

Tru1I CCCGNNNN’NNNN

Tru9I CC’GG

TscAI C’TCGAG

TseI GGATGNNNNNNNNNN’NNNN

TsoI AGG’CCT

Tsp45I C’CWWGG

Tsp509I ‘CCNGG

TspDTI ATTT’AAAT

TspEI ACN’GT

TspGWI ACGT’

TspMI T’CGA

TspRI GACCGANNNNNNNNNNN’

TssI CACCCANNNNNNNNNNN’

TstI ‘AATT

TsuI W’GTACW

Tth111I GCSG’C

Tth111II G’AWTC

UbaF9I C’TCGAG

UbaF11I T’TAA

UbaF12I T’TAA

UbaF13I CASTGNN’

UbaF14I G’CWGC

UbaPI TARCCANNNNNNNNNNN’

UnbI ‘GTSAC

Van91I ‘AATT

Vha464I ATGAANNNNNNNNNNN’

VneI ‘AATT

VpaK11AI ACGGANNNNNNNNNNN’

VpaK11BI C’CCGGG

VspI CASTGNN’

XagI GAGNNNCTC

XapI CACNNNNNNTCCNNNNNNNNNNNN’

XbaI GGANNNNNNGTGNNNNNNNNNNNNN’

XceI GCGAC

XcmI GACN’NNGTC

XhoI CAARCANNNNNNNNNNN’

XhoII TACNNNNNRTGT

XmaI TCGTA

XmaJI CTACNNNGTC

XmiI GAGNNNNNNCTGG

XmnI CCANNNNNTCG

XspI CGAACG

ZraI ‘GGNCC

ZrmI CCANNNN’NTGG

Zsp2I C’TTAAG


LOCUS NM_001250672 1498 bp mRNA linear PLN 18-OCT-2018
DEFINITION Glycine max cationic peroxidase 2 (PRX2), mRNA.
ACCESSION NM_001250672
VERSION NM_001250672.2
KEYWORDS RefSeq.
SOURCE Glycine max (soybean)
ORGANISM Glycine max
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
Pentapetalae; rosids; fabids; Fabales; Fabaceae; Papilionoideae; 50
kb inversion clade; NPAAA clade; indigoferoid/millettioid clade;
Phaseoleae; Glycine; Glycine subgen. Soja.
REFERENCE 1 (bases 1 to 1498)
AUTHORS Gijzen M, Miller SS, Bowman LA, Batchelor AK, Boutilier K and Miki
BL.
TITLE Localization of peroxidase mRNAs in soybean seeds by in situ
hybridization
JOURNAL Plant Mol. Biol. 41 (1), 57-63 (1999)
PUBMED 10561068
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from BT099403.1.
On Sep 4, 2012 this sequence version replaced NM_001250672.1.

##Evidence-Data-START##
Transcript exon combination :: AK244214.1, AK286032.1 [ECO:0000332]
RNAseq introns :: single sample supports all introns
SAMN00264986, SAMN00264988
[ECO:0000348]
##Evidence-Data-END##
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-1498 BT099403.1 1-1498 c
FEATURES Location/Qualifiers
source 1..1498
/organism=”Glycine max”
/mol_type=”mRNA”
/db_xref=”taxon:3847″
/chromosome=”17″
/map=”17″
gene 1..1498
/gene=”PRX2″
/note=”cationic peroxidase 2″
/db_xref=”GeneID:547513″
misc_feature 10..12
/gene=”PRX2″
/note=”upstream in-frame stop codon”
CDS 25..1020
/gene=”PRX2″
/EC_number=”1.11.1.7″
/note=”class III plant peroxidase”
/codon_start=1
/product=”cationic peroxidase 2 precursor”
/protein_id=”NP_001237601.1″
/db_xref=”GeneID:547513″
/translation=”MAPKGLIFLAVLCFSALSLSRCLAEDNGLVMNFYKESCPQAEDI
IKEQVKLLYKRHKNTAFSWLRNIFHDCAVQSCDASLLLDSTRRSLSEKETDRSFGLRN
FRYIETIKEALERECPGVVSCADILVLSARDGIVSLGGPHIPLKTGRRDGRRSRADVV
EQFLPDHNESISAVLDKFGAMGIDTPGVVALLGAHSVGRTHCVKLVHRLYPEIDPALN
PDHVPHILKKCPDAIPDPKAVQYVRNDRGTPMILDNNYYRNILDNKGLLIVDHQLAND
KRTKPYVKKMAKSQDYFFKEFSRAITLLSENNPLTGTKGEIRKQCNAANKHHEEP”
sig_peptide 25..96
/gene=”PRX2″
/inference=”COORDINATES: ab initio prediction:SignalP:4.0″
ORIGIN
1 gagcaagagt gaagagcgaa gagaatggct cccaagggtt taatcttttt ggctgtgtta
61 tgcttctcag cactgtcact gagtcgttgt cttgcggagg ataatggact tgttatgaac
121 ttctacaagg aatcatgccc tcaggctgaa gacatcatca aagaacaagt caagcttctc
181 tacaagcgcc acaagaacac tgctttctcc tggctcagaa acatcttcca tgactgtgct
241 gttcagagtt gtgatgcttc actgttgctg gactccacaa gaaggagctt gtctgagaag
301 gaaacagata gaagctttgg gttgagaaat ttcaggtaca ttgagaccat caaagaagct
361 ttggaaaggg aatgcccagg agttgtttcc tgtgctgata tcctcgttct ctctgccaga
421 gatggcattg tttcgctagg aggtccccat atccctctta aaacaggaag aagggatggt
481 agaaggagca gagccgatgt ggtagagcag ttcctcccag accacaatga atccatttct
541 gcagttcttg acaagtttgg tgccatggga attgacaccc ccggcgtagt tgcattgctt
601 ggagcacaca gtgttggtcg aacccattgt gtgaagttgg tgcaccgttt gtacccagag
661 attgatccag ctctgaaccc tgaccacgtc cctcacattc tgaagaagtg ccctgatgcc
721 attccagacc ctaaggccgt gcagtacgtg agaaacgacc gtggcacccc catgattcta
781 gacaacaatt actacagaaa tatattggac aacaagggct tgttgatagt ggatcaccaa
841 ctagccaatg acaagaggac caagccttat gtgaagaaaa tggccaagag ccaggactat
901 ttcttcaagg agttttctag agccattact ttgctctctg agaacaaccc tctcactggc
961 acaaagggtg agatcagaaa gcagtgcaat gctgccaaca agcaccatga ggagccttaa
1021 ttgcttcccg cttaatttgg gccttgaatt ttcttcccct tctctatgtg gaagaaatct
1081 gtaagatatt atgcaaaaaa taattaaggt gtttttcttt aaatgggttg gttgattggt
1141 tcaatgaacc gatcaagacc acagcaggtt catggggatg cgaggattaa gacgctttgt
1201 tttttaatct tccgatgtca ctcttgtttg ttagtttgtt tttttatttt ttatttcaat
1261 aagtactgtg caagtaggtt agagttgggt agaagggcat gttcatggtg ttaattacta
1321 tgttatgtat gcatgtgagt gctgctatcg atggcaagat gtcaatgtat gcgtgtagtg
1381 ctgttatcga tgagagtgaa aatgtttatg atatccacac taataaagct agcttgctct
1441 tgctacataa taaataaatc atggcccacg gtcattatac aaaaaaaaaa aaaaaaaa
//