[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: thanks and a long question



hi steven,

the latest.

1. i put the scripts you sent in /usr/lib/beoboot/bin and restarted 
beowulf.

2. i powered down the slaves 0-2 (actually our admin person did this.
i'm 
at jefferson lab today and tomorrow.) and brought them back up.

3. slaves 0,2 came up with an error and slave 1 did not come back at
all.

i don't know if this is related to the scripts or not. i will be back at
richmond tomorrow evening and try to bring things back myself.

you said you would modify the node_up script. is that the one you sent
me?

jerry

steven james wrote:
> 
> Greetings,
> 
> The reboot of the master is not actually necessary. Instead, you can just
> do:
> 
> /etc/init.d/beowulf restart
> on the master and reboot the slaves. Note that the restart command will
> crash any running jobs on the cluster (of course, so does rebooting the
> master :-)
> 
> For item 4, it may be the size of the library at issue, or it may be
> confused by the number of library paths. I have seen that before (in
> particular w/ the Intel compiler libraries). It may be that I will need to
> modify the node_up script to preload /usr/root/PRO/lib. I will be happy to
> take care of that.
> 
> Alternatively, placing the attached scripts into
> /usr/lib/beoboot/bin (make sure to chmod +x the scripts) should cause the
> nodes to preload the needed library and make sure they can find them.
> 
> The instructions for running X should not be necessary. I suppose since
> the X libs are linked against, they get loaded even when the command
> options say don't use X.
> 
> Hope the eveninng beer was good (he says over the half-pot sized cup of
> morning coffee).
> 
> G'day,
> sjames
> 
-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482