GenBank

doi:10.1093/nar/gkl986

Nucleic Acids Research 2007 35(Database issue):D21-D25; doi:10.1093/nar/gkl986

This Article

	Abstract
	Print PDF (72K)
	Screen PDF (74K)
	Alert me when this article is cited
	Alert me if a correction is posted

Services

	Email this article to a friend
	Similar articles in this journal
	Similar articles in PubMed
	Alert me to new issues of the journal
	Add to My Personal Archive
	Download to citation manager
	Request Permissions
	Commercial Re-use Guidelines for Open Access NAR Content

Google Scholar

	Articles by Benson, D. A.
	Articles by Wheeler, D. L.

PubMed

	PubMed Citation
	Articles by Benson, D. A.
	Articles by Wheeler, D. L.

Nucleic Acids Research, 2007, Vol. 35, Database issue D21-D25
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Articles

GenBank

Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell and David L. Wheeler^*

National Center for Biotechnology Information, National Library of Medicine National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA

^*To whom correspondence should be addressed. Tel: +1 301 435 5950; Fax: +1 301 480 9241; Email: wheeler{at}ncbi.nlm.nih.gov

Received September 15, 2006. Accepted October 26, 2006.

ABSTRACT

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

GenBank (R) is a comprehensive database that contains publiclyavailable nucleotide sequences for more than 240 000 named organisms,obtained primarily through submissions from individual laboratoriesand batch submissions from large-scale sequencing projects.Most submissions are made using the web-based BankIt or standaloneSequin programs and accession numbers are assigned by GenBankstaff upon receipt. Daily data exchange with the EMBL Data Libraryin Europe and the DNA Data Bank of Japan ensures worldwide coverage.GenBank is accessible through NCBI's retrieval system, Entrez,which integrates data from the major DNA and protein sequencedatabases along with taxonomy, genome, mapping, protein structureand domain information, and the biomedical journal literaturevia PubMed. BLAST provides sequence similarity searches of GenBankand other sequence databases. Complete bimonthly releases anddaily updates of the GenBank database are available by FTP.To access GenBank and its related retrieval and analysis services,begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).

INTRODUCTION

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

GenBank (1) is a comprehensive public database of nucleotidesequences and supporting bibliographic and biological annotation,built and distributed by the National Center for BiotechnologyInformation (NCBI), a division of the National Library of Medicine(NLM), located on the campus of the US National Institutes ofHealth (NIH) in Bethesda, MD.

NCBI builds GenBank primarily from the submission of sequencedata from authors and from the bulk submission of expressedsequence tag (EST), genome survey sequence (GSS), and otherhigh-throughput data from sequencing centers. The US Officeof Patents and Trademarks also contributes sequences from issuedpatents. GenBank, the EMBL Data Library (2) in Europe, and theDNA Databank of Japan (DDBJ) (3) comprise the InternationalNucleotide Sequence Databases, and are members of a long-standingcollaboration in which information is exchanged daily to ensurea uniform and comprehensive collection of sequence information.NCBI makes the GenBank data available at no cost over the Internet,via FTP and via a wide range of web-based retrieval and analysisservices which operate on the GenBank data (4).

ORGANIZATION OF THE DATABASE

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

From its inception, GenBank has doubled in size about every18 months. It currently contains over 65 billion nucleotidebases from more than 61 million individual sequences, with 15million new sequences added in the past year. Contributionsfrom whole genome shotgun (WGS) projects supplement the datain the traditional divisions to bring the total beyond 145 billionbases. Complete genomes (www.ncbi.nlm.nih.gov/Genomes/index.html)continue to represent a growing portion of the database, withover 120 of more than 370 complete microbial genomes in GenBankdeposited over the past year. The number of eukaryote genomesfor which coverage and assembly are significant continues toincrease as well, with over 104 assemblies now available, includingthat of the reference human genome.

Sequence-based taxonomy
Database sequences are classified and can be queried using acomprehensive sequence-based taxonomy (www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html)developed by NCBI in collaboration with EMBL and DDBJ and withthe valuable assistance of external advisers and curators. Over240 000 named species are represented in GenBank and new speciesare being added at the rate of over 2900 per month. About 16%of the sequences in GenBank are of human origin and 13% of allsequences are human ESTs. After Homo sapiens, the top speciesin GenBank in terms of number of bases are Mus musculus, Rattusnorvegicus, Bos taurus, Danio rerio, Zea mays, Oryza sativa,Strongylocentrotus purpuratus, Sus scrofa, Xenopus tropicalis,and Canis familiaris.

GenBank records and divisions
Each GenBank entry includes a concise description of the sequence,the scientific name and taxonomy of the source organism, bibliographicreferences, and a table of features (www.ncbi.nlm.nih.gov/collab/FT/index.html)listing areas of biological significance, such as coding regionsand their protein translations, transcription units, repeatregions, and sites of mutations or modifications.

The files in the GenBank distribution have traditionally beenpartitioned into ‘divisions’ that roughly correspondto taxonomic groups such as bacteria (BCT), viruses (VRL), primates(PRI), and rodents (ROD). In recent years, divisions have beenadded to support specific sequencing strategies. In recent years,divisions have been added to support specific sequencing strategies.These include divisions for expressed sequence tag (EST), genomesurvey (GSS), high throughput genomic (HTG), high throughputcDNA (HTC), and environmental sample (ENV) sequences, makinga total of 18 divisions. For convenience in file transfer, thelarger divisions, such as the EST and PRI, are partitioned intomultiple files for the bimonthly GenBank releases on NCBI'sFTP site.

Expressed sequence tags
ESTs continue to be a major source of new sequence records andgene sequences, comprising over 21 billion nucleotide basesin GenBank release 155. Over the past year, the number of ESTshas increased by over 40% to a total of 38.3 million sequencesrepresenting more than 1200 different organisms. The top organismsrepresented in the EST division are H.sapiens (7.8 million records),M.musculus (4.7 million records), O.sativa (1.2 million records),Z.mays (1.1 million records), B.taurus (1.1 million records),and D.rerio (1.1 million records). As part of its daily processingof GenBank EST data, NCBI identifies through BLAST searchesall homologies for new EST sequences and incorporates that informationinto the companion database, dbEST (www.ncbi.nlm.nih.gov/dbEST/index.html)(5). The data in dbEST is processed further to produce the UniGenedatabase (www.ncbi.nlm.nih.gov/UniGene/) of more than 1.2 milliongene-oriented sequence clusters representing over 70 organisms,described more fully in (4).

Sequence-tagged sites (STSs), genome survey sequences (GSSs) and environmental sample sequences (ENV)
The STS division of GenBank (www.ncbi.nlm.nih.gov/dbSTS/index.html)contains over 883 000 sequences, including anonymous STSs basedon genomic sequence as well as gene-based STSs derived fromthe 3' ends of genes and ESTs. These STS records usually includemapping information.

The GSS division of GenBank (www.ncbi.nlm.nih.gov/dbGSS/index.html)has grown over the past year by 22% to a total of 14.9 millionrecords for over 600 organisms and comprises over 9.4 billionnucleotide bases. GSS records are predominantly single readsfrom bacterial artificial chromosomes (‘BAC-ends’)used in a variety of genome sequencing projects. The most highlyrepresented species in the GSS division are Z.mays (2.0 millionrecords), M.musculus (1.5 million records), H.sapiens (970 000records) and C.familiaris (854 000 records). Human GSS recordshave been used (www.ncbi.nlm.nih.gov/genome/clone) along withthe STS records in tiling the BACs for the Human Genome Project(6).

The ENV division of GenBank accommodates non-WGS sequences obtainedvia environmental sampling methods in which the source organismis unknown. Records in the ENV division contain ‘ENV’in the keyword field and use an ‘/environmental_sample’qualifier in the source feature. As of GenBank release 155,the ENV division of GenBank contained over 275 000 sequences,comprising 236 million base pairs, representing more than 4900studies.

High-throughput genomic (HTC) and high-throughput cDNA (HTC) sequences
The HTG division of GenBank (www.ncbi.nlm.nih.gov/HTGS/) containsunfinished large-scale genomic records that are in transitionto a finished state (7). These records are designated as Phase0–3 depending on the quality of the data. Upon reachingPhase 3, the finished state, HTG records are moved into theappropriate organism division of GenBank. As of release 155of GenBank, the HTG division contained 15.9 billion base pairsof sequence, an increase of almost 3 billion bases over thepast year.

The HTC division of GenBank accommodates HTC sequences. HTCsare of draft quality but may contain 5'-untranslated regions(5'-UTRs) and 3'-UTRs, partial coding regions, and introns.HTC sequences which are finished and of high quality are movedto the appropriate organism GenBank division. GenBank release155 contained more than 441 000 HTC sequences totaling over539 million bases. One project generating HTC data is describedin (8).

Whole genome shotgun sequence (WGS)
Over 80 billion bases of WGS sequence appears in GenBank assets of WGS contigs, many of them bearing annotations, originatingfrom a single sequencing project. These sequences are issuedaccession numbers consisting of a four-letter project ID, followedby a two-digit version number, and a six-digit contig ID. Hence,the WGS accession number ‘AAAA01072744’ is assignedto contig number ‘072744’ of the first version ofproject ‘AAAA’. WGS sequencing projects have contributedover 18 million contigs to GenBank, a 64% increase over thepast year. These primary sequences have been used to constructsome 760 000 large-scale assemblies of scaffolds and chromosomes.WGS project contigs for H.sapiens, C.familiaris, Pan trodlodytes,Macacca mulatta, Drosophila, Saccharomyces, and more than 450other organisms and environmental samples are available. Fora complete list of WGS projects with links to the data, seewww.ncbi.nlm.nih.gov/projects/WGS/WGSprojectlist.cgi.

WGS projects may be annotated. However, many low-coverage genomeprojects do not contain annotation. Because these sequence projectsare considered draft and not complete, these annotations maynot be tracked from one assembly version to the next and shouldbe considered preliminary.

Submitters of WGS sequences, and genomic sequences in general,are urged to use a new set of evidence tags of the form ‘/experimental=text’and ‘/inference=TYPE:text’, where ‘TYPE’is one of a number of standard inference types and ‘text’is made up of structured text. These new qualifiers replace‘evidence=experimental’ and ‘evidence=non-experimental’,respectively, which are no longer supported.

Special record types
Third Party Annotation
Third Party Annotation (TPA) records support the reporting ofpublished sequence annotation by a scientist other than theoriginal submitter of the primary sequence record in DDBJ/EMBL/GenBank.TPA records fall into one of two categories, ‘experimental’,in which case there is a direct experimental evidence for theexistence of the annotated molecule, and ‘inferential’,in which case the experimental evidence is indirect. TPA sequencesmay be created by assembling a number of primary sequences.The format of a TPA record (e.g. BK000016 [GenBank] ) is similar to thatof a conventional GenBank record but includes the label ‘TPA:’at the beginning of each Definition Line and the keywords ‘ThirdParty Annotation; TPA’ in the Keywords field. The Commentfield of TPA records lists the primary sequences used to assemblethe TPA sequence; the Primary field provides the base rangesof the primary sequences that contribute to the TPA sequence.

Over 5000 TPA records are contained in GenBank release 155,including over 2170 for Drosophila melanogaster, 950 for H.sapiens,330 for O.sativa and 290 for M.musculus. TPA sequences are notreleased to the public until their accession numbers or sequencedata and annotation appear in a peer-reviewed biological journal.TPA submissions to GenBank may be made using either BankIt,or Sequin. For more information on TPA, see www.ncbi.nlm.nih.gov/Genbank/TPA.html.

GenBank CON records for assemblies of smaller records
Although many genomes, such as bacterial genomes, are representedin GenBank as single sequences, it is desirable from the standpointsof data transfer and analysis to break some very long sequences,such as portions of eukaryotic genomes, into smaller segments.In these cases, CON division records for the entire sequenceare produced that contain assembly instructions to allow theseamless display and download of the full sequence. Many CONrecords also include annotations.

BUILDING THE DATABASE

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

The sequences and biological annotations in GenBank, and thecollaborating databases EMBL and DDBJ, are submitted primarilyby individual authors to one of the three databases, or by sequencingcenters as batches of EST, STS, GSS, HTC, WGS, or HTG sequences.Information is exchanged daily with DDBJ and EMBL so that thedaily updates from NCBI servers incorporate the most recentlyavailable sequence data from all sources.

Direct electronic submission
Virtually all records enter GenBank as direct electronic submissions(www.ncbi.nlm.nih.gov/Genbank/index.html), with the majorityof authors using the BankIt or Sequin programs. Many journalsrequire authors with sequence data to submit the data to a publicdatabase as a condition of publication.

GenBank staff can usually assign an accession number to a sequencesubmission within two working days of receipt, and do so ata rate of almost 1600 per day. The accession number serves asconfirmation that the sequence has been submitted and allowsreaders of articles in which the sequence is cited to retrievethe data. Direct submissions receive a quality assurance reviewthat includes checks for vector contamination, proper translationof coding regions, correct taxonomy, and correct bibliographiccitations. A draft of the GenBank record is passed back to theauthor for review before it enters the database. Authors mayask that their sequences be kept confidential until the timeof publication. Since GenBank policy requires that depositedsequence data be made public when the sequence or accessionnumber is published, authors are instructed to inform GenBankstaff of the publication date of the article in which the sequenceis cited in order to ensure a timely release of the data. Althoughonly the submitting scientist is permitted to modify sequencedata or annotations, all users are encouraged to report lagsin releasing data or possible errors or omissions to GenBankat update{at}ncbi.nlm.nih.gov.

NCBI works closely with sequencing centers to ensure timelyincorporation of bulk data into GenBank for public release.GenBank offers special batch procedures for large-scale sequencinggroups to facilitate data submission, including the program‘tbl2asn’, described at www.ncbi.nlm.nih.gov/Sequin/table.html.

Submission using BankIt
About one-third of author submissions are received through NCBI'sweb-based data submission tool, BankIt (www.ncbi.nlm.nih.gov/BankIt).Using BankIt, authors enter sequence information directly intoa form, and add biological annotations such as coding regions,or mRNA features. Free-form text boxes, list boxes, and pull-downmenus allow the submitter to further describe the sequence withouthaving to learn formatting rules or restricted vocabularies.BankIt validates submissions, flagging many common errors, andchecks for vector contamination using a variant of BLAST calledVecscreen, before creating a draft record in GenBank flat fileformat for the submitter to review. BankIt is the tool of choicefor simple submissions, especially when only one or a smallnumber of records is to be submitted (7). BankIt can also beused by submitters to update their existing GenBank records.

Submission using Sequin and tbl2asn
NCBI also offers a standalone multi-platform submission programcalled Sequin (www.ncbi.nlm.nih.gov/Sequin/index.html) thatcan be used interactively with other NCBI sequence retrievaland analysis tools. Sequin handles simple sequences such asa cDNA, as well as segmented entries, phylogenetic studies,population studies, mutation studies, environmental samples,and alignments for which BankIt and other web-based submissiontools are not well suited. Sequin has convenient editing andcomplex annotation capabilities and contains a number of built-invalidation functions for quality assurance. In addition, Sequinis able to accommodate large sequences, such as that of the5.6 Mb Escherichia coli genome, and read in a full complementof annotations via simple tables. Versions for Macintosh, PCand Unix computers are available via anonymous FTP at (ftp.ncbi.nih.gov)in the ‘sequin’ directory. Once a submission iscompleted, submitters can e-mail the Sequin file to the address(gb-sub{at}ncbi.nlm.nih.gov).

Submitters of large, heavily annotated genomes may find it convenientto use ‘tbl2asn’, referenced above under ‘Directsubmission’, to convert a table of annotations generatedvia an annotation pipeline into an ASN.1 record suitable forsubmission to GenBank.

Submission of barcode sequences
The Consortium for the Barcode of Life (CBOL) is an internationalinitiative to develop DNA barcoding as a tool for characterizingspecies of organisms using a short DNA sequence derived froma portion of the cytochrome oxidase subunit I gene. NCBI, incollaboration with CBOL (barcoding.si.edu/index\\s\\do5(d)etail.htm),has created an online tool for the bulk submission of barcodesequences to GenBank (www.ncbi.nlm.nih.gov/BankIt/barcode/)that allows users to upload files containing a batch of sequenceswith associated source information. It is anticipated that thistool will be used for other types of bulk submissions in thenear future.

Sequence identifiers and accession numbers
Each GenBank record, consisting of both a sequence and its annotations,is assigned a unique identifier, the accession number, thatis shared across the three collaborating databases (GenBank,DDBJ, EMBL) and remains constant over the lifetime of the recordeven when there is a change to the sequence or annotation. Eachversion of the DNA sequence within a GenBank record is alsoassigned a unique NCBI identifier, called a ‘gi’,that appears on the VERSION line of GenBank flatfile recordsfollowing the accession number. A third identifier of the form‘Accession.version’, also displayed on the VERSIONline of flatfile records, contains the information present inboth the gi and accession numbers. An entry appearing in thedatabase for the first time has an ‘Accession.version’identifier equivalent to the ACCESSION number of the GenBankrecord followed by ‘.1’ to indicate the first versionof the sequence for the record, e.g.

ACCESSION AF000001 [GenBank]

VERSION AF000001 [GenBank] .1 GI: 987654321

When a change is made to a sequence given in a GenBank record,a new gi number is issued to the sequence and the version extensionof the ‘Accession.version’ identifier is incremented.The accession number for the record as a whole remains unchangedand the older sequence remains available under the old ‘Accession.version’identifier and gi.

A similar system tracks changes in the corresponding proteintranslations. These identifiers appear as qualifiers for CDSfeatures in the FEATURES portion of a GenBank entry, e.g. /protein_id=‘AAA00001.1’.Protein sequence translations also receive their own uniquegi number, which appears as a second qualifier on the CDS feature,e.g. /db_xref=' GI:1233445'.

Ensuring stable access to sequence data
It is becoming increasingly popular for research groups to sharenew biological sequences and update existing sequences by directlyposting the data on the Web. While this is a convenient andeffective way to share the data among a set of collaborators,if original data and updates are not also submitted to a centralrepository, three significant problems arise; the access lifetimeof the data may be reduced, the full biological context of thedata may not be realized, and existing data in heavily usedcentralized databases will become outdated.

The ephemeral nature of much of the content on the web is partof the common experience of web users. In one attempt to quantifycontent lifetime, 360 randomly selected web pages were trackedfor a period of 4 years, and a half-life of only 2 years wasmeasured for the set (9). Although a well-maintained web pagecan certainly persist for longer than 2 years, the relativelyshort half-life reported for this set of pages reflects themany factors that can intervene to affect access to web-posteddata.

Even during the accessible lifetime of web-posted sequence data,however, the full biological context of a sequence may not berealized if the sequence cannot be conveniently compared withothers—perhaps derived from distantly related organismsthat are beyond the scope of the host web page.

In addition, if updates to sequences contained within centralizeddatabases are made to a web page, but not also made to correspondingrecords in the central database, the newer data will not reachthe wider research community and much of the impact of the datawill be lost.

Submission of sequence data to a centralized repository suchas GenBank solves these three problems. Researchers are ensuredstable access to the data via versioned bimonthly releases availableby FTP, NCBI-maintained as well as numerous third party interfacesto a uniform dataset, and the archival redundancy offered bythe tripartite International Nucleotide Sequence Databases collaboration.Combining new data with that of other researchers worldwidewithin a central database provides a broad biological contextthat stimulates discovery—keeping each sequence currentmagnifies the utility of all the sequences in the database.

RETRIEVING GenBank DATA

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

The Entrez system
The sequence records in GenBank are accessible via Entrez (www.ncbi.nlm.nih.gov/Entrez/),a flexible database retrieval system that covers over 30 biologicaldatabases. These include DNA and protein sequences derived fromGenBank and other sources, genome maps, population, phylogeneticand environmental sequence sets, gene expression data, the NCBItaxonomy, protein domain information, protein structures fromthe Molecular Modeling Database, MMDB (10); each database linkedto the scientific literature via PubMed and PubMed Central.

BLAST sequence-similarity searching
Sequence-similarity searches are the most fundamental and frequenttype of analysis performed on the GenBank data. NCBI offersthe BLAST (www.ncbi.nlm.nih.gov/BLAST/) family of programs todetect similarities between a query sequence and database sequences(11,12). BLAST searches may be performed on the NCBI's website,or via a set of standalone programs distributed by FTP. BLASTis discussed in a separate article in this issue (4).

Obtaining GenBank by FTP
NCBI distributes GenBank releases in the traditional flat-fileformat as well as in the Abstract Syntax Notation (ASN.1) formatused for internal maintenance. The complete bimonthly GenBankrelease and the daily updates, which also incorporate sequencedata from EMBL and DDBJ, are available by anonymous FTP fromNCBI at (ftp.ncbi.nih.gov) as well as from a mirror site atthe University of Indiana (ftp://bio-mirror.net/biomirror/genbank/).The complete release in the flat-file format is available ascompressed files in the directory, ‘genbank’ witha non-cumulative set of updates contained in ‘daily-nc’.A script is provided in the ‘tools’ directory ofthe GenBank FTP site to convert a set of daily updates intoa cumulative update.

MAILING ADDRESS

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

GenBank, National Center for Biotechnology Information, Building38A, Room 3N-301-B, 8600 Rockville Pike, Bethesda, MD 20894,USA. Tel: +1 301 496 2475; Fax: +1 301 480 9241.

ELECTRONIC ADDRESSES

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

NCBI Home Page: info{at}ncbi.nlm.nih.gov

Submission of sequence data to GenBank: gb-sub{at}ncbi.nlm.nih.gov

Revisions to or notification of release of ‘confidential’GenBank entries: update{at}ncbi.nlm.nih.gov

General information about NCBI and services: info{at}ncbi.nlm.nih.gov

CITING GenBank

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

If you use the GenBank database in your published research,we ask that this paper be cited.

ACKNOWLEDGEMENTS

Funding to pay the Open Access publication charges for thisarticle was provided by the National Institutes of Health.

Conflict of interest statement. None declared.

REFERENCES

TOP
ABSTRACT
INTRODUCTION
ORGANIZATION OF THE DATABASE
BUILDING THE DATABASE
RETRIEVING GenBank DATA
MAILING ADDRESS
ELECTRONIC ADDRESSES
CITING GenBank
REFERENCES

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2006) GenBank Nucleic Acids Res, . 34, 16–20 .
Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van denBroek, A., et al. (2006) EMBL Nucleotide Sequence Database: developments in 2005 Nucleic Acids Res, . 34, 10–15[Abstract/Free Full Text] .
Okubo, K., Sugawara, H., Gojobori, T., Tateno, Y. (2006) DDBJ in preparation for overview of research activities behind data submissions Nucleic Acids Res, . 34, 6–9 .
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al. (2006) Database resources of the National Center for Biotechnology Information Nucleic Acids Res, . 34, 173–180 .
Boguski, M.S., Lowe, T.M., Tolstoshev, C.M. (1993) dbEST—database for ‘expressed sequence tags’ Nature Genet, . 4, 332–333[CrossRef][ISI][Medline] .
Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., Evans, G.A. (1994) Genomic sequence sampling: a strategy for high resolution sequence-based physical mapping of complex genomes Nature Genet, . 7, 40–47[Medline] .
Kans, J. and Ouellette, B. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Chapter Submitting DNA Sequences to the Databases, (2001) NY John Wiley and Sons, Inc. pp. 65–81 .
Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M., Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H., et al. (2001) Functional annotation of a full-length mouse cDNA collection Nature, 409, 685–690[CrossRef][Medline] .
Koehler, W. (2002) Web page change and persistence—a four-year longitudinal study J. Am. Soc. Inform. Sci. Technol, . 53, 162–171[CrossRef] .
Marchler-Bauer, A., Anderson, J.B., Cherukuri, P.F., DeWeese-Scott, C., Geer, L.Y., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., et al. (2005) CDD: a Conserved Domain Database for protein classification Nucleic Acids Res, . 33, 192–196 .
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res, . 25, 3389–3402[Abstract/Free Full Text] .
Zhang, Z., Schäffer, A.A., Miller, W., Madden, T.L., Lipman, D.J., Koonin, E.V., Altschul, S.F. (1998) Protein sequence similarity searches using patterns as seeds Nucleic Acids Res, . 26, 3986–3990[Abstract/Free Full Text] .

This Article

	Abstract
	Print PDF (72K)
	Screen PDF (74K)
	Alert me when this article is cited
	Alert me if a correction is posted

Services

	Email this article to a friend
	Similar articles in this journal
	Similar articles in PubMed
	Alert me to new issues of the journal
	Add to My Personal Archive
	Download to citation manager
	Request Permissions
	Commercial Re-use Guidelines for Open Access NAR Content

Google Scholar

	Articles by Benson, D. A.
	Articles by Wheeler, D. L.

PubMed

	PubMed Citation
	Articles by Benson, D. A.
	Articles by Wheeler, D. L.