The Richmond Computing Clusters
--Home
--Users
--Projects
--Hardware
--Software
--Links
-
Welcome
This section is an introduction to the software configuration of the research cluster and some of the packages that are available for using the cluster.
-
Overview
There are two clusters in SpiderWulf; the current one is called quark; the old one is called pscm1. The OS of the master and the remote nodes on each cluster is Linux. The quark cluster uses Red Hat Enterprise Linux Server release 5.5 (Tikanga) and pscm1 uses the Scyld distribution. The master node (named pscm1) runs kernel 2.4.19.

The Scyld distribution is built for running a beowulf cluster. It is based on a process migration package called bproc. IMPORTANT: bproc is the _ONLY_ way to run programs on node. There is no rsh and no ssh to nodes. Programs are run on nodes using the bproc system. Everything that runs on the cluster runs on top of bproc; bproc is the lowest and ONLY level of communication between the master and the nodes.

There is bproc and everything else.

-
Queuing system
The queuing system uses the standard Linux batch and atd packages for submitting jobs to the cluster. The scripts that are used to submit jobs to cluster are listed below. We can run in two modes.
  1. Data files can be copied from the fileserver to the slave nodes before analysis. In this case the script waits 1-2 minutes before submitting a job on the next slave node. This is to prevent the system from 'locking up' when too many slaves are copying too much data over the network.
  2. Data files can be left on the slaves between analysis runs ('round-robin mode'). In this case only 15 seconds is now used between submitting jobs so this is much faster. Of course, you have to be careful to not overload the disks on the slave nodes. The steps for switching from one mode to the other are shown below,
    1. Here is an example of the topmost script for using the cluster: submit_eod3c.pl. It does housekeeping and submits the jobs to the cluster. It uses the batch command to submit a job to the cluster which executes the second script below.
    2. This is the script that performs the tasks on the slave node: run_root_on_node.pl.
    3. These are some guidelines for switching to round robin mode and back.
  3. To change the data set on quark see the guidelines here.
-
Nimbus Documentation
This section contains documentation on booting, configuring, and using the Nimbus distribution of beowulf and linux.
  1. booting.
  2. bpcp.
  3. bpctl.
  4. bpsh.
  5. contents.
  6. master.
  7. node specifications.
-
bproc
bproc is a process migration system. It can start processes on the master and then migrate them to a node. After migration the process is represented by a ghost on the master. This differs radically from a system using rsh to log into a node and the running a command. Most importantly:

1) Processes started using bproc still show up on the master in the ps and top commands. Signals can be send to the process on the master. This means that you can kill programs that are running on a node by issuing

kill -9 15345 
if you process was assigned PID 15345.

2) Process start time is VERY fast. Its on the order of milliseconds.

This is a short list of commands available to the users of bproc:

bpstat: Used to obtain node status: up, unavailable, error, reboot, halt, pwroff, or down. Will be 'up' for all nodes that are ready for use. bpstat can give you process space information (which processes run on which nodes), but nothing about current CPU load.

bpsh: The most important command. It is used for running commands on nodes. Example:

     bpsh 5 ls
runs ls on node 5 in the current directory. It starts a SUBSHELL on a node, meaning that all properties of the shell are taken from the current shell (current directory, path, environment variables, ...).

Another example:

     bpsh 5 ls -la /scratch/gilfoyle/e5
lists all the files on node 5 in the directory /scratch/gilfoyle/e5/.

Yet another example:

     bpsh 5 df -h
lists the size and available space on all the disks mounted on node 5. The output is
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1              18G  3.7G   13G  22% /
192.168.1.1:/usr       12G  8.3G  3.5G  70% /usr
192.168.1.3:/data1   1008G  440G  517G  46% /data1
192.168.1.3:/data2   1008G  795G  162G  84% /data2
192.168.1.3:/data3   1008G  318G  639G  34% /data3
which shows the local disk (/dev/hda1) with 13 GBytes available, the RAID disks, and the /usr area on the master.

bpctl: There is a configuration command called bpctl that has a feature worth noting: Nodes can be restricted to be used by certain user names or groups. If you need this feature let me know.

-
Cluster status information
Tools for obtaining information about currently running programs are build on top of bproc. The main one is bpstat. The usage of this command is described here. A useful form of this command is
     bpstat -U all
which shows the status of all the nodes and is continuously updated. This is useful when rebooting the cluster.

The command 'wakinyanmon' will pop up a status display which will show you CPU use, memory, swap, and disk use, and the temperatures in the nodes. A sample of the display is here.

Please let me know if you need anything specific. As I start using the cluster on a regular basis I'll add what I need and let you know.

-
Email archive
I have kept most of the emails that were exchanged between the Richmond group and our vendor during the commissioning of the cluster. A Mozilla file containing those messages is here. Beware, this is a large file! An easier version to read is here.
-
Remote editing
If you are logging into the cluster remotely and use the Emacs editor you might find the Lisp extension to Emacs called TRAMP useful. It enables you to run Emacs on your home machine and remotely access a file to edit on the cluster. The website is here. To use it in Emacs you start to open a file in the usual way (^c^f or use the `File' menu) and enter the filename on the remote machine with the following syntax.
/[`IP address']/`path'/`filename'
So if I want to open the file eppi1.C in my account in the subdirectory `test' on the machine catherine I would use the following.
/[catherine.richmond.edu]/home/gilfoyle/test/eppi1.C
TRAMP will assume the remote username is the same as the local one. If it isn't, then use the following syntax.
/[myusername@catherine.richmond.edu]/home/gilfoyle/test/eppi1.C

With the installation of the firewall you may have to make multiple hops to access your files. An example is below for hopping through another machine first ('dude.richmond.edu') to get to catherine.richmond.edu.

/[multi/ssh:gilfoyle@dude.richmond.edu/ssh:gilfoyle@catherine.richmond.edu]/home/gilfoyle/test/eppi1.C
In this case you need to explicitly specify the username and method (ssh in this case) for the remote access and also alert TRAMP this will be multiple hops (the `multi' in the first section of the command).

With the installation of RedHat Enterprise on my Richmond machine I found that tramp no longer worked because RHE 3.0 did not include a program called mimencode that tramp requires. I merely copied an old version of mimencode from a RedHat 7.3 machine and copied it into the /usr/bin area on my Richmond machine. This seemed to work. This may become an issue when the new master is installed.

-
Setting up the RAIDs
Occasionally we have problems when rebooting the fileserver. Either they won't mount or we get a message like 'stale NFS file handle'. This occurs when the fileserver is rebooted when the cluster is up and running and the RAIDs were not unmounted before the cluster reboot. To get the RAIDs working properly after rebooting the fileserver you may have to do the following as root on the fileserver.
vgscan
vgchange -a y
mount /data1
mount /data2
mount /data3
exportfs -r
Then back on the cluster do the following.
unmount /data1
mount /data1
unmount /data2
mount /data2
unmount /data3
mount /data3
-
Software FAQ

Q) How do you reference the different nodes?
A) A typical method for specifying nodes is:

-1 - the master 
5  - slave node 5
The slave nodes are numbered 0-48.

Q) How much disk space is available on a slave node?
A) Use the following command for, say, node 21. Each slave has a total of 18 GByte of space.

bpsh 21 df -h