[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

the latest on the cluster



hi sasko,

   hope you enjoyed philadelphia last week. to update you on the
cluster, we are still not fully operational. we had problems
last week getting our software running on the slave nodes
because of problems finding libraries, input files, etc. we 
finally got things set up last friday (with lots of help from
steven james), but i ran into another problem over the weekend.
the symptoms are the following.

1. i was able to run the scripts that call our analysis code
(called root) to analyze a couple of files on two slaves.

2. when i try to analyze five or more files, the jobs start 
up and then pscm1 `hangs' and i get no response. this 
happened for the first time sunday afternoon and again sunday
night. this is the current state of the cluster.

3. when you get a chance today could you go over to physics
and reboot pscm1. also reboot, the first 5-10 slaves. there is
no sense in taking the time to reboot all the slaves since 
things are failing with 5-10 slaves.

4. once we get rebooted, i will check some stuff and send a 
message to steven james.

i have shifts at jlab all week so i will have only a little time
to work on this during this week.

thanks-in-advance,

jerry
-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482