[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

the Richmond saga continues



Hi Steven,

   The saga continues. After you made your changes last Friday I was 
able to run root on the slaves 0-2. I could execute it from the master 
using the following command. 

bpsh 0 root -b -q /scratch/gilfoyle/e5/24023/run_eod3.C

I was also able to run my scripts for just those two nodes. On Sunday, 
I rebooted the remaining nodes (3-48), removed the /home area and put 
in a link home->/usr/home. I then started to run ten jobs which would 
run on nodes 0-9. The master hung: wouldn't budge. I rebooted the 
master and brought up slaves 0-5 and tried again and got the same 
results. After rebooting the master and slaves 0-5 this is what I have 
noticed. 

1. I ran my scripts without running root and they appeared to work!

2. There are two sub-directories on slave 0, /include and /cint that 
are not visible on any of the other slaves. These two subdirectories 
are needed by root.  This would seem to be a smoking gun for the 
problem except for one thing. Slave 1 seemed to run root successfully 
even though those areas are not visible to it.

3. I can run root on slaves 3-5 from the master using the bpsh 
command. The master only gets hung when I am running my script. I am 
using perl for these scripts and I have attached them to this message. 
Perhaps there is some library that perl needs??

4. The problem seems to be with the nodes that I rebooted on Sunday 
and not the ones you worked on last Friday. Did I reboot them 
incorrectly? I checked some of the permissions of directories on the 
slaves and they all appear to be the same.

I have rebooted the master and nodes 0-5. I am at JLab this week so I
can only work on this sporadically, but I will try to get as much 
done as I can.

Let me know what you think.

Jerry

p.s. description of perl scripts:

submit_eod3c.pl - main script, does some housekeeping and generates the
input file for the batch command.

run_root_on_node2.pl - copies files over to the slave, runs root, and 
cleans up.





-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482

Attachment: submit_eod3c.pl
Description: Perl program

Attachment: run_root_on_node3.pl
Description: Perl program