Research Tools

Since the 1980's, the Yates Lab has pioneered many new technologies used in the field of proteomics. The following tools have been developed by and are used by the Yates team.

SEQUEST
A method for performing protein identification & peptide sequencing by utilizing mass spectrometry fragmentation patterns to search protein and nucleotide databases has been developed by our lab Our program, SEQUEST, converts the character-based representation of amino acid sequences in a protein database to fragmentation patterns which are compared against the MS/MS spectrum generated on the target peptide. The algorithm initally identifies amino acid sequences in the database that match the measured mass of the peptide, compares fragment ions against the MS/MS spectrum, and generates a preliminary score for each amino acid sequence. A cross correlation analysis is then performed on the top 500 preliminary scoring peptides by correlating theoretical, reconstructed spectra against the experimental spectrum. Output results are displayed accordingly. In short, SEQUEST performs automated peptide/protein sequencing via database searching of MS/MS spectra without the need for any manual sequence interepretation, though it can make use of interpreted sequence information if available.
DTASelect / Contrast
DTASelect and Contrast were designed to make interpretation and comparison of proteomic data faster and more effective. DTASelect organizes and filters SEQUEST identifications, reducing the time required to interpret the results for each sample. Contrast differentiates multiple samples and comprises a powerful meta-analytical tool.
Census
Quantitative Analysis Tool for both labeling and labeling free analysis. Visit Census web page for more info.
RelEx
Quantitative Analysis Tool. Visit RelEx web page for more info.
ProLuCID
ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.
Mzxml2msn
mzxml2msn is a java program we developed to convert mzXML to ms1 and ms2 files. Although our ProLuCID and Census can handle mzXML files, we prefer to use ms2 and ms1 files for computational effeciency.
GutenTag
GutenTag is software to identify peptides by the sequence tagging technique. SEQUEST searches a sequence database by mass, but GutenTag searches with short sequences derived directly from the spectrum.
RawExtractor
RawExtractor is a program to extract MS and MS/MS spectra from RAW files generated by Thermo mass spectrometers, such as LTQ, LTQ-Orbitrap, LCQ, and stores the spectra in ms1, ms2 or mzXML file format. The spectra files generated by RawExtractor program are used as input for protein identification programs SEQUEST, ProLuCID and quantitatation program Census.
MudPIT
For truly complex protein samples, separation prior to mass spectrometry is increasingly necessary. MudPIT describes the process of digesting, separating, and identifying the components of samples consisting of thousands of proteins. Our protocol uses nanoscale strong cation exchange liquid chromatography upstream of reversed phase liquid chromatography online with microelectrospray.
Shamu Cluster
A modification of the SEQUEST algorithm allows the software to be run in parallel, sharing the protein identification task across several computers. Our Beowulf cluster, Shamu, has processed millions of spectra to date.
RC Clusters
Research Computing at TSRI has three SGI SuperComputers (2x64 CPU and 1x128 CPU, SGI Origin 2400 and 3800 respectively) and a LINUX Cluster (1584 nodes with 3936 CPU's). run_ms2, PEP_PROBE, ProLuCID and DTASelect2 have been ported to run on these clusters. Group members can obtain information on how to use RC computers here.
DFCalc
Biological mass spectrometry need not be limited to peptides, of course. DFCalc is software designed to assist the interpretation of tandem mass spectra from DNA molecules. The program predicts the fragment ions for known sequences, producing a list to be compared against a spectrum. [link fixed 6/11/02]
NoDupe
NoDupe identifies similarity among uninterpreted tandem mass spectra. Optionally, the program can remove duplicate copies of spectra.
QCorr
QCorr Add QCorr description here.
CPM
Charge Prediction Machine: Tool for Inferring Precursor Charge States of Low Resolution Electron Transfer Dissociation Tandem Mass Spectra
YADA
YADA can deisotope and decharge high-resolution mass spectra from large peptide molecules, link the precursor monoisotopic peak information to the corresponding tandem mass spectrum, and account for different co-fragmenting ion species (multiplexed spectra). YADA also enables a pipeline consisting of ProLuCID and DTASelect for analyzing large-scale middle-down proteomics data
Unitemare
Unitemare is a tool created by Johannes Graumann of CalTech for migrating existing SEQUEST results to the new unified file format. Unitemare was created in the Perl programming language.
PatternLab
Provides tools for analyzing shotgun proteomic data quantitated by spectral counting (with DTASelect) or labeling (with Census). The modules provide means to point differentialy expressed proteins / peptides (ACFold / TFold modules), find proteins with simmilar expression profiles in time-course experiments (TrendQuest module), find unique proteins to a state (Area proportional venn diagram module) and help interpreting results according to the Gene Ontology (Gene Ontology Explorer module). Minimum requirements: a computer with Windows XP SP2.

The Yates Touch

Despite the seriousness of the projects they embark upon, many of the technologies developed by the Yates group have been coined with a bit of humor.

For example, the Yates Lab was the lead inventor of SEQUEST, a computer algorithm that automatically correlates tandem mass spectrometry data to amino acid sequences in protein and nucleotide databases. The name was derived from the popular sci-fi television series, seaQuest, that ran in the mid-ninties.

The SEQUEST program was modified to run on a Beowulf Cluster called Shamu, a modification of the original algorithm that allows the software to be quickly run on large datasets using computing clusters. For more on our whimsical named technologies consult the TSIR News and Views.



 

Home | Resources | Published Works | Public Works | Team | Contact | Website