Research Tools

Since the 1980’s, the Yates Lab has pioneered many new technologies used in the field of proteomics. The following tools have been developed by and are used by the Yates team.


A method for performing protein identification & peptide sequencing by utilizing mass spectrometry fragmentation patterns to search protein and nucleotide databases has been developed by our lab Our program, SEQUEST, converts the character-based representation of amino acid sequences in a protein database to fragmentation patterns which are compared against the MS/MS spectrum generated on the target peptide. The algorithm initially identifies amino acid sequences in the database that match the measured mass of the peptide, compares fragment ions against the MS/MS spectrum, and generates a preliminary score for each amino acid sequence. A cross correlation analysis is then performed on the top 500 preliminary scoring peptides by correlating theoretical, reconstructed spectra against the experimental spectrum. Output results are displayed accordingly. In short, SEQUEST performs automated peptide/protein sequencing via database searching of MS/MS spectra without the need for any manual sequence interpretation, though it can make use of interpreted sequence information if available.

DTASelect / Contrast

DTASelect and Contrast were designed to make interpretation and comparison of proteomic data faster and more effective. DTASelect organizes and filters SEQUEST identifications, reducing the time required to interpret the results for each sample. Contrast differentiates multiple samples and comprises a powerful meta-analytical tool.





Quantitative Analysis Tool for both labeling and labeling free analysis. Visit Census web page for more info.



Quantitative Analysis Tool. Visit RelEx web page for more info.



ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.



mzxml2msn is a java program we developed to convert mzXML to ms1 and ms2 files. Although our ProLuCID and Census can handle mzXML files, we prefer to use ms2 and ms1 files for computational effeciency.


GutenTag is software to identify peptides by the sequence tagging technique. SEQUEST searches a sequence database by mass, but GutenTag searches with short sequences derived directly from the spectrum.


RawConverter provides the ability to take advantage of the high resolution and accuracy provided by the latest Thermo Fisher instruments. RawConverter extracts MS and tandem mass spectrometry (MS/MS) data from RAW files like its predecessor RawXtract but also selects the correct precursor mass-to-charge (m/z) ratios. It accepts RAW data generated by either data-dependent acquisition (DDA) or data-independent acquisition (DIA). The output file format can be MS1/MS2, MGF or mzXML.



RawExtractor is a program to extract MS and MS/MS spectra from RAW files generated by Thermo mass spectrometers, such as LTQ, LTQ-Orbitrap, LCQ, and stores the spectra in ms1, ms2 or mzXML file format. The spectra files generated by RawExtractor program are used as input for protein identification programs SEQUEST, ProLuCID and quantitatation program Census.




For truly complex protein samples, separation prior to mass spectrometry is increasingly necessary. MudPIT describes the process of digesting, separating, and identifying the components of samples consisting of thousands of proteins. Our protocol uses nanoscale strong cation exchange liquid chromatography upstream of reversed phase liquid chromatography online with microelectrospray.

Shamu Cluster

A modification of the SEQUEST algorithm allows the software to be run in parallel, sharing the protein identification task across several computers. Our Beowulf cluster, Shamu, has processed millions of spectra to date.

RC Clusters

Research Computing at TSRI has three SGI SuperComputers (2×64 CPU and 1×128 CPU, SGI Origin 2400 and 3800 respectively) and a LINUX Cluster (1584 nodes with 3936 CPU’s). run_ms2, PEP_PROBE, ProLuCID and DTASelect2 have been ported to run on these clusters. Group members can obtain information on how to use RC computers here.


Biological mass spectrometry need not be limited to peptides, of course. DFCalc is software designed to assist the interpretation of tandem mass spectra from DNA molecules. The program predicts the fragment ions for known sequences, producing a list to be compared against a spectrum. [link fixed 6/11/02]


NoDupe identifies similarity among uninterpreted tandem mass spectra. Optionally, the program can remove duplicate copies of spectra.


QCorr Add QCorr description here.


Charge Prediction Machine: Tool for Inferring Precursor Charge States of Low Resolution Electron Transfer Dissociation Tandem Mass Spectra


YADA can deisotope and decharge high-resolution mass spectra from large peptide molecules, link the precursor monoisotopic peak information to the corresponding tandem mass spectrum, and account for different co-fragmenting ion species (multiplexed spectra). YADA also enables a pipeline consisting of ProLuCID and DTASelect for analyzing large-scale middle-down proteomics data



Unitemare is a tool created by Johannes Graumann of CalTech for migrating existing SEQUEST results to the new unified file format. Unitemare was created in the Perl programming language.














protein inferencer