A PCR primer bank for quantitative gene expression analysis
Xiaowei Wang and Brian Seed*
Department of Molecular Biology,Massachutts General Hospital,50Blossom Street,Boston,MA 02114,USA
Received September 15,2003;Revid and Accepted October 20,2003
ABSTRACT
Although gene expression proling by microarray analysis is a uful tool for asssing global levels of transcriptional activity,variability associated with the data ts usually requires that obrved differ-ences be validated by some other method,such as real-time quantitative polymera chain reaction (real-time PCR).However,non-specic amplication of non-target genes is frequently obrved in the latter,confounding the analysis in ~40%of real-time PCR attempts when primer-specic labels are not ud.Here we prent an experimentally validated algorithm for the identication of transcript-specic PCR primers on a genomic scale that can be applied t单位自我介绍
o real-time PCR with quence-independent detec-tion methods.An online databa,PrimerBank,has been created for rearchers to retrieve prime
r infor-mation for their genes of interest.PrimerBank currently contains 147404primers encompassing most known human and mou genes.The primer design algorithm has been tested by conventional and real-time PCR for a subt of 112primer pairs with a success rate of 98.2%.INTRODUCTION
Quantitative transcript abundance analysis by real-time PCR has become widely applied in recent years (1,2).Typical applications of this method monitor the amplicon production after each thermocycle by the appearance of a uorescent signal that is dependent on dye binding to the DNA product of the reaction or generated from a uorophore engineered in the primer quences.Becau of its simplicity and nsitivity,real-time PCR is now widely ud for preci evaluation of gene expression.
However the nsitivity of the method is also a liability.PCR can generate unintended products,especially when an RNA sample containing thousands of genes is ud as the template.The unexpected amplicons are usually the result of primer mispriming to non-target sites.The prence of extraneous amplicons complicates data analysis and can lead to incorrect inferences about message abundance.Extraneous amplicons are a particularly rious problem for the cheapest and most widely practiced form of real-time PCR,which relies
on uorescent detection of amplied DNA by quence non-lective dyes,such as SYBR Green I.
Most existing primer design programs are predicated on a single template of limited genetic complexity (3).Primer failures resulting from awed design tool predictions are sufciently widespread that veral online databas have been established as repositories for empirically validated primer quences submitted by rearchers (4).Unfortunately the databas contain primers for only a few hundred genes at prent.Thus,in most cas,an investigator will need to identify primers for genes of interest by trial and error.
With the advent of microarray technology (5),gene-specic oligo probe design has become the subject of multiple studies (610).A few genome-wide primer design programs have been developed for the production of amplicons with minimal potential for cross-hybridization,which are then spotted as probes in cDNA microarrays (1113).Most of the programs have ud BLAST (14)to identify gene-specic regions from which the PCR primers are designed.This strategy is appropriate for cDNA microarrays,since amplicon probe cross-hybridization is the main concern for microarray specicity.However,the programs are not designed for real-time PCR studies.
Here we prent an algorithm and its implementation to identify specic primers for real-time PCR.An online primer databa has been created to allow any investigator to freely retrieve primer information for genes of interest.The algorithm has been tested by conventional and real-time PCR e
xperiments for a subt of 112primer pairs and has been shown to be highly reliable.MATERIALS AND METHODS Mou total RNA
C57BL6mou liver total RNA was either purchad from Stratagene or prepared with Trizol (protocol available at h.harvard.edu/Parabiosys/resources/microarrays.php).Total RNAs from other mou tissues were from the Mou Total RNA Master Panel (Clontech).DNA contamin-ation of RNA samples was not assd prior to u,but has been evaluated for the above protocol in prior microarray analys.
PCR primer preparation
PCR primer quences were retrieved from the online PrimerBank databa.The primers were synthesized at the
*To whom correspondence should be addresd.Tel:+16177265975;Fax:+16177265962;Email:h.harvard.edu
Nucleic Acids Rearch,2003,Vol.31,No.24e154
DOI:10.1093/nar/gng154
Nucleic Acids Rearch,Vol.31No.24Oxford University Press 2003;all rights rerved
at Central South University on May 10, 2011
< Downloaded from
Molecular Biology Core Facility,Massachutts General Hospital.Both UV absorbance and capillary electrophoresis were ud to asss the quality of primer synthesis.
RTPCR
Rever transcription(RT)was carried out with the SuperSript First-Strand Synthesis System using the manufacturer's protocol(Invitrogen).A20m l RT reaction included5m g of total RNA,150ng of random hexamers,2m l of10Q RT buffer,4m l of25mM MgCl2,2m l of0.1M dithiothreitol,1m l of RNaOUT,1m l of50U/m l SuperScript II and DEPC-treated water.The RNA template was then removed by adding 1m l of RNa H and incubating at37C for20min. Conventional and real-time PCRs were carried out on an ABI Prism7000Sequence Detection System(Applied Biosystems).Conventional PCRs were sometimes also carried out on a PTC-200cycler(MJ Rearch).In both cas, hot-start PCR was performed with the SYBR Green PCR Master Mix(Appli
ed Biosystems).In brief,the PCR mixtures were pre-heated at50C for2min and then at95C for10min to activate the AmpliTaq Gold DNA polymera,followed by 40cycles of amplication(95C for15s;60C for30s;68C for40s).Anal extension step was performed at60C for 10min.The PCR products were checked on3.5%NuSieve3:1 Agaro gel(Cambrex Bio Science Rockland).Real-time PCR results were also analyzed using the ABI Prism7000SDS software(Applied Biosystems).
PrimerBank website
The PrimerBank databa is freely accessible at h.harvard.edu/primerbank/index.html.Detailed infor-mation for the primers in Supplementary Material Table S1 can be obtained from this website.
RESULTS
Figure1shows a simpliedow chart describing the primer lection algorithm.The algorithm was implemented in Perl as
a program called uPrimer.uPrimer requires~2days on a
1.5GHz Linux system to design primers for human or mou genes.
Gene quences
The principle source of gene quence information for this project is the NCBI protein databa GenPept( bi.v/Entrez/).The corresponding DNA coding quences were retrieved and redundant quences were clustered using a program called DeRedund(8).Low complexity regions may contribute to primer cross-reactivity (15)and thus are excluded by the 保姆合同范本照顾老人
DUST program(16).To further enhance quence complexity,a primer quence is rejected if it contains six or more contiguous identical residues and no primer candidate is considered from quence regions with ambiguous resi凤凰涅盘郭沫若
dues.
Two kinds of priming reactions are commonly ud in RT reactions:random priming and oligo(dT)priming.Oligo(dT) priming usually results in cDNA libraries enriched for mRNA and tends to over-reprent the3ends of transcripts.As the detection of different splice isoforms is one major goal in gene expression analysis,we expect to perform random priming in RT reactions.In general,maximum nsitivity in random priming lies clo to the5end of a coding quence(8). Therefore coding regions were scanned from the5end to the 3end until three qualied primer pairs had been picked. Primer uniformity
To facilitate the conduction of multiple PCRs,all human and mou primers are designed to have similar properties.All primers are1923nt long,with a preferred length of21 residues.This is long enough to permit generation of gene-specic primers,while reducing the potential for cross-reactivity and allowing cost-effective generation of large primer ts.The GC contents are also similar(3565%)to ensure uniform priming.Becau3end residues contribute most to non-specic primer extension,especially if the binding of the residues is relatively stable(17),the algorithm evaluates the D G value for the lastve residues at the3end and a threshold value of9kcal/mol is adopted for primer rejection.
The melting temperature(T m)determines the optimal annealing temperature.In recent years,signicant progress has been made to accurately estimate the T m of oligonucleo-tides(1820).The nearest neighbor method is to date the most accurate approach and is implemented by the following formula(18):
T公务员年度考核总结
m=D H/[D SR ln(C T/4)]
where R is the gas constant(1.987cal/Kmol),C T is the primer concentration,D His the enthalpy change and D Sis the entropy change.D Hand D Sare calculated by using the published thermody
namic parameters(18).The entropy change is dependent on salt concentration,so an entropy correction is performed:
D S=D S(1M Na+)+0.368Q(N1)Q ln[Na eq+], where N is the length of the primer and[Na eq+]is the Na+ equivalent concentration from all salts in a reaction.The default parameters for T m calculation are250nM primer and 0.15M Na eq+(21).Variations in primer and salt concentra-tions in other typical PCR conditions affect the T m values only slightly.All primer T m values are in the narrow range 6063C.
Since PCR efciency is decread for very long amplicons, only short amplicons of150350bp are considered during primer lection.Occasionally,if this requirement cannot be satid,a wider range of100800bp is ud.In general the larger amplicons are less attractive but they are included in the databa becau under some circumstances primer efciency may not be the foremost consideration for the end ur. Primer cross-reactivity
Mismatches are known to signicantly reduce priming stability(22,23)and at times even a single mismatch can destabilize a signicant length of DNA duplex(24,25). Therefore we expect contiguous ba pairing to be one of the most important factors in duplex stability.Our principal lter
for cross-reactivity is the rejection of primers containing contiguous residues that are also found in other quences.An analysis of the distribution of lengths of contiguous residues shared by two or more quences in the design space of mammalian coding regions showed that alter cut-off
e154Nucleic Acids Rearch,2003,Vol.31,No.24P AGE2OF8
at Central South University on May 10,
Downloaded from
rejecting perfect 15mer matches was the most stringent feasible lter (8).Non-unique 15mers can be efciently identied by a software `hashing'technique with 10mers as the basic hash keys (8).Every possible 15mer in a primer quence is compared to both strands of all known quences in the design space.The prence of a repetitive 15mer excludes a primer from further consideration.To further reduce cross-reactivity,BLAST arches for primer quence similarity were carried out against all known quences in the design space and qualied primers were required to
have
Figure 1.A simplied ow chart describing the primer design algorithm.
P AGE 3OF 8Nucleic Acids Rearch,2003,Vol.31,No.24e154
at Central South University on May 10, 2011
< Downloaded from
BLAST scores of less than30[the threshold values were reco传统节日古诗大全
mmended from previous studies(8)].
Random priming in RT reactions results in a signicant contribution of template from non-coding RNAs.To com-pensate for the abundance of the templates,more stringent lters were applied to minimize primer residues also found in non-coding RNAs.
The primer3end residues are esntial for controlling non-specic amplicons becau DNA polymera extension can be greatly reduced by mismatches(26,27).Therefore a more stringentlter should apply to cross-hybridization at the3ends.In our algorithm the cross-hybridizing T m for the3end perfectly matched residues does not exceed46C;the T m does not exceed42C when compared to non-coding RNA quences.
Sequence lf-complementarity
Secondary structure in the primer or target can retard primer annealing,leading to reduced PCR efciency.Although the prediction of primer condary structure is still challenging at prent,condary structure is most likely to occur in regions
of lf-complementarity(28).To reduce lf-complementar-ity,no contiguous5mer match is allowed anywhere between a primer and its complementary quence.To avoid picking primers from a quence region with high likelihood of condary structure,no contiguous9mer match is allowed when a primer quence is compared to the complementary strand of its cognate gene quence.A BLAST similarity arch for the primer quence is also carried out on the complementary strand and the score is required to be less than 18.
The formation of products arising from primers rving as template(primer dimers)can deplete free primers and result in poor PCR yield.Primer dimers are a common cau of real-time PCR quantitation failures when DNA intercalating dyes (e.g.SYBR Green I)are ud.To prevent primer homodimer formation,candidate primers are rejected if the four residues at the3end of a primer could be found in its complementary quence.Complementarity of the forward and rever primers in a primer pair is examined in the same way to prevent detrimental heterodimer formation.
Distribution of the rejected primers
15562332primers were evaluated before37277primer pairs were picked to cover15697mou genes.The very high rejection rate,99.5%,reectslter stringency.The distribu-tion of the rejected mou primers is shown in Figure2. Among the rejected primers,50.7%had too high or too low T m values,28.7%cross-hybridized to non-target genes,19.8% were rejected becau of quence lf-complementarity, 0.5%were from low complexity regions and0.3%were rejected becau of other properties(GC content and end stability).
The online primer databa
Successfully designed human and mou primers were imported in a MySQL databa installed on a Linux rver.
A web-bad interface was established to allow urs to query the primer databa,PrimerBank.Figure3shows the arch page of the website.147404primers were picked and included in PrimerBank to cover16293human and15697mou genes.There are veral ways to arch for primers:by GenBank accession no.,NCBI protein accession no., LocusLink ID,PrimerBank ID or Keyword(gene description). Batch primer retrieval is also available by entering multiple IDs at t
he same time.Detailed instructions are included in the Help page of the website.Becau of the quence redundancy in public quence databas,PrimerBank us LocusLink indexles(29,30),updated weekly from ftp://bi.nih. gov/,to map gene accessions to gene loci and associate the gene information with the primers.
Experimental evaluation of the primers
To evaluate the quality of the primers identied by the algorithm,112primer pairs reprenting108genes were tested in conventional RTPCR and real-time PCR experi-ments.The primer information was retrieved from PrimerBank and is summarized in Supplementary Material Table S1.The genes were chon becau they had been shown to be expresd in mou liver by microarray experi-ments and were of interest to local investigators(unpublished data).Some genes were from cloly related gene families. Among them,16genes were from the cytochrome P450 family andve genes were from the Dok family.
The results for the16cytochrome P450genes are included here as examples and the relevant primer information is summarized in Table1.The cytochrome P450genes are cloly related and the quence similarity is~90%between some family members.Despite the high template homology, all1
6PCRs resulted in single specic amplicons,determined by gel electrophoresis(Fig.4A).All16P450genes were also efciently amplied in real-time PCR and the amplication plots indicated no obvious correlation between amplicon length and PCR efciency(Fig.5A).The melting curve analysis indicated single amplicons for15of the P450genes (six examples shown in Fig.5B).PCR specicity was conrmed by quencing the PCR products.
An analysis of PCR efciency was also conducted by measuring the slope of a standard curve created from rially diluted templates.Six primer pairs with a range in
predicted Figure2.Distribution of the rejected mou primers.15562332primers were rejected during primer lection.They were rejected becau they could not meet the primer lection criteria for melting temperature(T m), cross-match to other quences,quence lf-complementarity,quence low complexity or other properties of the primers(GC content and end stability).
e154Nucleic Acids Rearch,2003,Vol.31,No.24P AGE4OF8
at Central South University on May 10,
Downloaded from
amplicon length of 152347bp were analyzed and yielded an efciency of 96T 4%.
Among the 112primer pairs tested,106detected their target genes in liver total RNA.Literature arching indicated that ve of the six undetected genes had been shown to be expresd in tissues other than liver (3135).Thus total RNA from embryo,brain,kidney or testis was ud to test primers designed for the genes (e Supplementary Material Table S1).In this ca ve primer pairs yielded single specic PCR products.Only one gene was not detected using the primer pairs w
e designed.Among the 106genes detected in liver,all except one primer pair resulted in single specic ampl如何增肌
icons on agaro gel (unpublished data).One primer pair yielded a minor band in addition to the desired major band (Fig.4B).Sequencing indicated this is a novel splice isoform that was not identied in GenBank.
The 112primer pairs were also tested in real-time PCR experiments.Melting curve analysis (plotted as the rst derivative of the absorbance with respect to temperature)
indicated the prence of single PCR products in 104PCRs.Six reactions resulted in bimodal rst derivative plots,although single bands were obrved by agaro gel.Sequencing results conrmed that the PCR products were homogeneous and correct,indicating that the obrved 劳动日记100字
heterogeneity in melting temperature was due to internal quence inhomogeneity (e.g.independently melting blocks of high and low GC content)rather than amplicon contam-ination.In summary,110out of 112primer pairs led to single specic PCR products yielding a primer design success rate of 98.2%.DISCUSSION Primer specicity
Most approaches to primer design are bad on the expectation of a single low complexity target quence.With existing tools one can design a number of primer pairs and then individually
Table 1.Primer information for 16cytochrome P450genes PrimerBank ID Protein accession no.Amplicon length Forward primer Rever primer Gene name 6753566_1NP_034123194ccaggtggtggaatcggtg tcttaaacctcttgagggccg P450,1a26681103_1NP_031838217atgctgacctcaggactcctc ggtagatggtgaatacaggacca P450,2a56753578_1NP_034130285gctcattctctggtcagatgttt cgcttgtggtctcagttcca P450,2b96681105_1NP_031839224cagatgaacagttcctgcgtt gatgaagtctcgtggctcact P450,2b136681109_1NP_031841218atctggtcgtgttcctagcg agtaggctttgagcccaaatac P450,2c294249591_1AAD13720193acaggcaaaccacatcgaaca gctacggtgtctaccaaccac P450,2c386857779_1NP_034134167gatccatttgtagtcttggtgct aaattggtaaggcactgccca P450,2c4013386414_1NP_083838245ttggagatgacttatgggctgt tccgttgaccacaaccacg P450,2d2611276065_1NP_067257168catcaccgttgccttgcttg gccaacttggttaaagacttggg P450,2e16753586_1NP_034137231tctgggaagcactccatctca ccactggtgattggcccaa P450,2j56681117_1NP_031846173atcctttgtccttgtcagtagca cagataaataaagtccacgcggt P450,3a161914796_1CAA72720297atatgggacctattctcatggct tcctcagatatggtaatggcctt P450,3a253738263_1BAA33804211ttccctgatggacgctcttta ccttcagctcactcatagcaaa P450,4a1021729747_1NP_031847347atgagtgcctctgctctgag ccattagctttt西红柿酱怎么做
gggtctgatct P450,4a126681121_1NP_031848183tttagccctacaaggtacttgga gtccttcagatggtgcccc P450,4a146681125_1
NP_031850
185
agcatttctttgatctggggg
ccatgtttcctttgctttgctct
P450,
7a1
Figure 3.A screenshot of the web interface for PrimerBank.There are veral ways to arch for primers:GenBank accession no.,NCBI protein accession no.,LocusLink ID,PrimerBank ID or Keyword (gene description).PrimerBank currently contains 147404primers designed for human and mou genes.
P AGE 5OF 8Nucleic Acids Rearch,2003,Vol.31,No.24e154
at Central South Un失落伤感的句子
iversity on May 10, 2011
< Downloaded from