What is FASTA format?:

SEQUEST requires FASTA formatted databases. As far as SEQUEST is concerned, here's the description of what a FASTA formatted database is: The databases are in ASCII text format. There is a single header/description line per sequence entry. This header line is denoted by the first character of the line being the greater than '>' sign. The end of header is denoted by a carriage return-line feed. All other entries are sequence entries.

For example:

>104K_THEPA pir|P15711| 104 KD MICRONEME-RHOPTRY ANTIGEN. - THEILERIA PARVA.
MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLTAVEMAGVKYL
QVQHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLQFFIKSG
DAWVTLSEHEYLAKLQEIRQAVHIESVFSLNMAFQLENNKYEVETHAKNGANMVTFIPRNO
Some FASTA formatted databases available via FTP:

Protein databases

ftp://ncbi.nlm.nih.gov/genbank/genpept.fsa.Z
ftp://ncbi.nlm.nih.gov/genbank/README.genbank
ftp://ncbi.nlm.nih.gov/blast/db/nr.Z
ftp://ncbi.nlm.nih.gov/blast/db/README

ftp://ncbi.nlm.nih.gov/repository/OWL/owl.fasta.Z
ftp://ncbi.nlm.nih.gov/repository/OWL/README

ftp://ftp.ncifcrf.gov/pub/nonredun/protein.nrdb.Z
ftp://ftp.ncifcrf.gov/pub/nonredun/NRP.readme

*** nr from genbank and protein.nrdb from ncicrf are updated daily!!

EST sequences

ftp://ncbi.nlm.nih.gov/repository/dbEST/dbEST.weekly.FASTA.mmddyy.Z (mmddyy=date)
ftp://ncbi.nlm.nih.gov/repository/dbEST/README
ftp://ncbi.nlm.nih.gov/repository/dbEST/TIGR.dbEST.reports.mmddyy.Z (mmddyy=date)

ftp://ncbi.nlm.nih.gov/repository/unigene/Hs.seq.uniq.Z
ftp://ncbi.nlm.nih.gov/repository/unigene/README
ftp://ncbi.nlm.nih.gov/repository/unigene/Hs.info

Yeast sequence database

ftp://genome-ftp.stanford.edu/pub/yeast/yeast_protein/yeast_nrpep.fasta.Z
ftp://genome-ftp.stanford.edu/pub/yeast/yeast_ORFs/orf_trans.fasta.Z
ftp://genome-ftp.stanford.edu/pub/yeast/yeast_protein/README

[----------]

[SEQUEST Home]

Last updated 12/04/98