Running

Nimbus comes pre-loaded with a slightly modified version of MPICH that knows about beomap and bproc. These modifications allow MPI applications to run conveniently on a Nimbus cluster without a need for various wrapper applications (such as mpirun)

The number of processes spawned, and where they are run is controlled through environment variables that are read by the beomap system. In brief, they are:

Should something go wrong and you need to kill the job, all of the processes will be seen with the ps command on the master. You can then use the kill command as if the jobs are local. This is in sharp contrast to 1st generation Beowulf where you would need to rsh around the cluster killing off runaway processes.

Compiling and linking

You may use the mpicc wrapper just as the unmodified MPICH normally does. Otherwise, all that is necessary is to link your application with bproc and mpi using -lbproc -lmpi in your command line. That's actually all that mpicc does for you anyway.

Programming

For maximum compatibility, the bproc modified MPICH is meant to meet the normal assumptions of an MPI program. It also offers a few features that can be quite useful, though caution is advised to avoid breaking compatibility with old school systems.

The primary difference is in the exact behavior of MPI_Init. On 1st generation Beowulf systems, and many others, some sort of wrapper (mpirun for example) is responsible for using rsh to run a copy of the program on all desired nodes. The command line arguments are pre-pended with MPI arguments that assign rank, and specify how the child processes should connect back to the root rank.

On a Nimbus system, the program simply starts executing on the master as any non-MPI program would. Inside MPI_Init, a call is made to beomap to get a list of available CPUs which will meet the requirements of the environment variables documented above. Then, a series of bproc_rfork calls are mad to fork child processes and migrate them to their intended CPU. MPI_Init will return in each process of the job.

As long as MPI_Init is the first thing done in the program (as it should be), there is no difference. If other initialization is done first, there may be subtle differences in behavior. A few of the differences may be fatal. Foremost, files must NOT be opened before MPI_Init. the act of migration causes all open file handles other than stdin, stdout, and stderr to become invalid. Other cases where subtle behavior changes may be noted include fetching a random seed value from /dev/random. On a bproc system, that fetch will only happen on the root rank. other ranks will get a copy of the seed value instead. Don't do this unless you would then bcast the value anyway.

It may be tempting to do some setup steps like the above before calling MPI_Init as an elegant way to skip a bcast or 12. While it IS elegant, it is not portable, thus should probably be avoided.

To emphasize: MAKE MPI_Init the first thing your program does unless you have thought very carefully about unintended consequences.

Note that the semantics of MPI_Init in Nimbus meet standards requirements. The differences are in the undefined gray areas.