SEQUEST requires FASTA formatted databases. As far as SEQUEST is concerned, here's the description of what a FASTA formatted database is: The databases are in ASCII text format. There is a single header/description line per sequence entry. This header line is denoted by the first character of the line being the greater than '>' sign. The end of header is denoted by a carriage return-line feed. All other entries are sequence entries.
For example:
>104K_THEPA pir|P15711| 104 KD MICRONEME-RHOPTRY ANTIGEN. - THEILERIA PARVA. MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLTAVEMAGVKYL QVQHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLQFFIKSG DAWVTLSEHEYLAKLQEIRQAVHIESVFSLNMAFQLENNKYEVETHAKNGANMVTFIPRNOSome FASTA formatted databases available via FTP:
Protein databases
ftp://ncbi.nlm.nih.gov/genbank/genpept.fsa.Z
ftp://ncbi.nlm.nih.gov/genbank/README.genbank
ftp://ncbi.nlm.nih.gov/blast/db/nr.Z
ftp://ncbi.nlm.nih.gov/blast/db/README
ftp://ncbi.nlm.nih.gov/repository/OWL/owl.fasta.Z
ftp://ncbi.nlm.nih.gov/repository/OWL/README
ftp://ftp.ncifcrf.gov/pub/nonredun/protein.nrdb.Z
ftp://ftp.ncifcrf.gov/pub/nonredun/NRP.readme
*** nr from genbank and protein.nrdb from ncicrf are updated daily!!
EST sequences
ftp://ncbi.nlm.nih.gov/repository/dbEST/dbEST.weekly.FASTA.mmddyy.Z (mmddyy=date)
ftp://ncbi.nlm.nih.gov/repository/dbEST/README
ftp://ncbi.nlm.nih.gov/repository/dbEST/TIGR.dbEST.reports.mmddyy.Z (mmddyy=date)
ftp://ncbi.nlm.nih.gov/repository/unigene/Hs.seq.uniq.Z
ftp://ncbi.nlm.nih.gov/repository/unigene/README
ftp://ncbi.nlm.nih.gov/repository/unigene/Hs.info
Yeast sequence database
ftp://genome-ftp.stanford.edu/pub/yeast/yeast_protein/yeast_nrpep.fasta.Z
ftp://genome-ftp.stanford.edu/pub/yeast/yeast_ORFs/orf_trans.fasta.Z
ftp://genome-ftp.stanford.edu/pub/yeast/yeast_protein/README
![]()
Last updated 12/04/98