Shamelessly patterned after (plagiarised from) the
Avalon FAQ at LANL ... #113 on Nov. 1998
Top 500 Supercomputers List
Each node of the machine is a DEC/Compaq Alpha workstation in an ATX case
and contains
If I had to do it all over again, the SCSI hard drive, SCSI CDROM drive, and Matrox graphics cards would be replaced with cheaper ATA/IDE drives and minimal graphics cards. I would keep a decent graphics card in the single master node. Memory is so cheap now that I would also think about upping the memory in each node to at least 128MB and more likely 256MB.
The IBM Deskstar 8.4GB IDE drives were not originally on the initial cluster setup and were added at a later date for sequence database storage.
With a Bay Networks
350T fast ethernet switch. The 350T is a 16 port 10Base-T/100Base-TX
autosense switch.
A Raritan MCP16 MasterConsole KVM (keyboard, video, mouse) switch (16-nodes) is used to allow access to each node from one monitor, PS/2 keyboard, and PS/2 mouse set.
With the two 16-port switches, we have the ability to add four more nodes to this cluster before requiring additional connectivity hardware.
50-something thousand American dollars (in late 1997/early 1998).
I would guess the price for the entire setup as of early 1999 would be
in the mid-30K range (and falling!).
Started out using NT4 but ended up running Red Hat Linux.
Too long ... the PVM
port for NT-Win32/Alpha wasn't working correctly for me
initially and Linux was a whole learning experience in its own right. If I
had to do it all over again now, I could probably have everything up and
running in a matter of a few days with most of the time involved in installing
the OS on each box.
I'll get some industry standard benchmarks up here as soon as I figure
out what they are how to run them.
For those mass spectrometrists out there, here are overall search times
that it takes to analyze five hundred (500) MS/MS spectra using a
PVM
port of SEQUEST
version 27 through various databases (benchmarks run on 12/98).
+/- 1.0 amu mass tolerance used in all searches; tryptic searches
(including preprocessed database searches) allowed 1 internal cleavage
site. DNA databases were searched against the translated forward 3 reading
frames.
| Database | # sequence entries | PVM search enzyme=none (HH:MM:SS) | PVM search enzyme=trypsin (HH:MM:SS) | PVM search preprocessed DB enzyme=trypsin (HH:MM:SS) | ||||
| Unigene (clustered human ESTs) | 52,277 | 00:38:37 | 00:15:58 | 00:01:21 | ||||
| Non-redundant protein | 382,465 | 01:43:56 | 01:24:14 | 00:01:37 | ||||
| Human protein | 58,692 | 00:07:33 | 00:05:15 | 00:01:16 | ||||
| Yeast ORFs | 6,351 | 00:02:38 | 00:01:38 | 00:00:52 |
To give some perspective to these numbers, I would guess a majority of SEQUEST users out there experience search times anywhere from 1 to 10+ minutes for a single MS/MS spectrum through a protein database on Intel x86 and slower DEC Alpha boxes.
The benchmark times vary from run to run (+/- seconds to a few minutes) and probably due to the communication across nodes varying (e.g. sometimes the slaves processes all start up quickly ... other times it takes a little longer to start them all). Binaries compiled with the C compiler supplied with Digital Unix 4.0B don't seem to be appreciably faster those compiled with gcc. Also, I timed searches with local databases stored on the IBM 5400 rpm IDE drive vs. the Quantum 7200 rpm Ultra SCSI drive and there doesn't seem to be a significant difference in performance whether the databases reside on one drive or the other.
We use RedHat 5.0 and 6.2.
It is running the basic 2.0.30 kernel compiled with the 0.89F
tulip
ethernet driver written by Donald Becker at
CESDIS,
NASA Goddard Space Flight Center.
The standard gcc 2.0. compiler that came with RedHat 5.0.
I use the PVM
(Parallel Virtual Machine) software package.
We bought them from Aspen Systems
and Lodgepole Technology Inc.
(they worked together for this order). The IBM hard drives came
from Hard Drives Northwest.
The additional 256MB SDRAM for the master node was ordered from
The Memory Man.
We started out with NT4.0 but had a hard time getting
PVM (3.4beta6)
to run on it. Finally gave up and went with Linux. Linux (or any
Unix) makes working with the cluster so much easier (for example
I can telnet to the master node and into the slave nodes ...
something I didn't bother to figure out how to do in NT since
telnet there is a Windows gui application.)
The problem we're trying to solve is coarsed grained and
not I/O bound. Myrinet would
have added significant cost to each node without much if any
return in performance.
Seeking a witty name, our lab members took a few minutes
and brainstormed ... the best name that we came up with is Shamu.
Speed, strength, size, ... ????
We needed a big/fast computer to run our analysis software and this
solution seemed to be the best balance of cost and utility.
Webmaster <dtabb@scripps.edu>
Updated 7/3/2000