Research Tools

Since the 1980’s, the Yates Lab has pioneered many new technologies used in the field of proteomics. The following tools have been developed by and are used by the Yates team.

SEQUEST

A method for performing protein identification & peptide sequencing by utilizing mass spectrometry fragmentation patterns to search protein and nucleotide databases has been developed by our lab Our program, SEQUEST, converts the character-based representation of amino acid sequences in a protein database to fragmentation patterns which are compared against the MS/MS spectrum generated on the target peptide. The algorithm initially identifies amino acid sequences in the database that match the measured mass of the peptide, compares fragment ions against the MS/MS spectrum, and generates a preliminary score for each amino acid sequence. A cross correlation analysis is then performed on the top 500 preliminary scoring peptides by correlating theoretical, reconstructed spectra against the experimental spectrum. Output results are displayed accordingly. In short, SEQUEST performs automated peptide/protein sequencing via database searching of MS/MS spectra without the need for any manual sequence interpretation, though it can make use of interpreted sequence information if available.

DTASelect / Contrast

DTASelect

Census

Quantitative Analysis Tool for both labeling and labeling free analysis. Visit Census web page for more info.

Census

RelEx

Quantitative Analysis Tool. Visit RelEx web page for more info.

RelEx

ProLuCID

ProLuCID

ProLuCID-GUI

ProLuCID-GUI is a graphical user interface for the Prolucid database search engine for bottom-up proteomics protein identification. It takes ms2 files as input and output filtered peptide/protein identification results. Internally, it uses ProLuCID and DTASelect2 for database search and result filtering.

ProLuCID-GUI

Mzxml2msn

mzxml2msn is a java program we developed to convert mzXML to ms1 and ms2 files. Although our ProLuCID and Census can handle mzXML files, we prefer to use ms2 and ms1 files for computational effeciency.

GutenTag

GutenTag is software to identify peptides by the sequence tagging technique. SEQUEST searches a sequence database by mass, but GutenTag searches with short sequences derived directly from the spectrum.

RawConverter

RawConverter provides the ability to take advantage of the high resolution and accuracy provided by the latest Thermo Fisher instruments. RawConverter extracts MS and tandem mass spectrometry (MS/MS) data from RAW files like its predecessor RawXtract but also selects the correct precursor mass-to-charge (m/z) ratios. It accepts RAW data generated by either data-dependent acquisition (DDA) or data-independent acquisition (DIA). The output file format can be MS1/MS2, MGF or mzXML.

RawConverter

RawExtractor

RawExtractor is a program to extract MS and MS/MS spectra from RAW files generated by Thermo mass spectrometers, such as LTQ, LTQ-Orbitrap, LCQ, and stores the spectra in ms1, ms2 or mzXML file format. The spectra files generated by RawExtractor program are used as input for protein identification programs SEQUEST, ProLuCID and quantitatation program Census.

RawExtractor1.8

RawXtract1.9.9.2

MudPIT

For truly complex protein samples, separation prior to mass spectrometry is increasingly necessary. MudPIT describes the process of digesting, separating, and identifying the components of samples consisting of thousands of proteins. Our protocol uses nanoscale strong cation exchange liquid chromatography upstream of reversed phase liquid chromatography online with microelectrospray.

Shamu Cluster

A modification of the SEQUEST algorithm allows the software to be run in parallel, sharing the protein identification task across several computers. Our Beowulf cluster, Shamu, has processed millions of spectra to date.

RC Clusters

Research Computing at TSRI has three SGI SuperComputers (2×64 CPU and 1×128 CPU, SGI Origin 2400 and 3800 respectively) and a LINUX Cluster (1584 nodes with 3936 CPU’s). run_ms2, PEP_PROBE, ProLuCID and DTASelect2 have been ported to run on these clusters. Group members can obtain information on how to use RC computers here.

DFCalc

Biological mass spectrometry need not be limited to peptides, of course. DFCalc is software designed to assist the interpretation of tandem mass spectra from DNA molecules. The program predicts the fragment ions for known sequences, producing a list to be compared against a spectrum. [link fixed 6/11/02]

NoDupe

NoDupe identifies similarity among uninterpreted tandem mass spectra. Optionally, the program can remove duplicate copies of spectra.

QCorr

QCorr Add QCorr description here.

CPM

Charge Prediction Machine: Tool for Inferring Precursor Charge States of Low Resolution Electron Transfer Dissociation Tandem Mass Spectra

YADA

YADA can deisotope and decharge high-resolution mass spectra from large peptide molecules, link the precursor monoisotopic peak information to the corresponding tandem mass spectrum, and account for different co-fragmenting ion species (multiplexed spectra). YADA also enables a pipeline consisting of ProLuCID and DTASelect for analyzing large-scale middle-down proteomics data

YADA

Unitemare

Unitemare is a tool created by Johannes Graumann of CalTech for migrating existing SEQUEST results to the new unified file format. Unitemare was created in the Perl programming language.

GutenTag

ShowCGI

SQTSort

Colander

NoDupe

DeBunker

Charge_Prediction_Machine

Unitemare

QCorr

mzxml2msn

microcapillary_stage

pressure_cell

quality_check

protein inferencer

GlycoMotifFilter

real-time search