yo, the sage continues. attached is a copy of my latest call for help to steven james. jerry -- Dr. Gerard P. Gilfoyle Physics Department e-mail: ggilfoyl@richmond.edu University of Richmond, VA 23173 phone: 804-289-8255 USA fax: 804-289-8482
--- Begin Message ---
- To: steven james <pyro@linuxlabs.com>
- Subject: the Richmond saga continues
- From: gilfoyle <ggilfoyl@richmond.edu>
- Date: Mon, 11 Nov 2002 15:38:20 -0500
Hi Steven, The saga continues. After you made your changes last Friday I was able to run root on the slaves 0-2. I could execute it from the master using the following command. bpsh 0 root -b -q /scratch/gilfoyle/e5/24023/run_eod3.C I was also able to run my scripts for just those two nodes. On Sunday, I rebooted the remaining nodes (3-48), removed the /home area and put in a link home->/usr/home. I then started to run ten jobs which would run on nodes 0-9. The master hung: wouldn't budge. I rebooted the master and brought up slaves 0-5 and tried again and got the same results. After rebooting the master and slaves 0-5 this is what I have noticed. 1. I ran my scripts without running root and they appeared to work! 2. There are two sub-directories on slave 0, /include and /cint that are not visible on any of the other slaves. These two subdirectories are needed by root. This would seem to be a smoking gun for the problem except for one thing. Slave 1 seemed to run root successfully even though those areas are not visible to it. 3. I can run root on slaves 3-5 from the master using the bpsh command. The master only gets hung when I am running my script. I am using perl for these scripts and I have attached them to this message. Perhaps there is some library that perl needs?? 4. The problem seems to be with the nodes that I rebooted on Sunday and not the ones you worked on last Friday. Did I reboot them incorrectly? I checked some of the permissions of directories on the slaves and they all appear to be the same. I have rebooted the master and nodes 0-5. I am at JLab this week so I can only work on this sporadically, but I will try to get as much done as I can. Let me know what you think. Jerry p.s. description of perl scripts: submit_eod3c.pl - main script, does some housekeeping and generates the input file for the batch command. run_root_on_node2.pl - copies files over to the slave, runs root, and cleans up. -- Dr. Gerard P. Gilfoyle Physics Department e-mail: ggilfoyl@richmond.edu University of Richmond, VA 23173 phone: 804-289-8255 USA fax: 804-289-8482Attachment: submit_eod3c.pl
Description: Perl programAttachment: run_root_on_node3.pl
Description: Perl program
--- End Message ---