[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

latest on the Richmond cluster



hi steven,

   here is the latest. 

the good news is the cluster seems to be working at some level. some
of the behavior is significantly different from before the upgrade 
so we may still have some issues to resolve.

when i last left you i had submitted a large number of jobs (about 
115) to run and left for chicago for thanksgiving. things had looked 
good for smaller numbers of jobs. when i returned only the first 55 
jobs had been run. the remaining ones were sitting in the batch queue 
(used the bbq and atq commands to see this). even more strange was 
that the first 55 jobs that got submitted were still running root 
after 3 days!! i killed those jobs by hand (kill -9), the perl scripts 
finished up, and the jobs in the batch queue remained there and never 
got started.  my questions are the following.

1. is the apparent limit of 55 jobs fixed? can we raise it? it seems 
reasonable to run two jobs per machine (one per cpu). the 'atd -l ' 
command looks like it should work (according to the man page).

2. after i killed the long-running root executables, i thought the 
queued up jobs would get submitted, but they didn't. do you have any 
idea why?

3. i noticed that root found no good events even when it ran 
successfully with a smaller number of submitted jobs. this is 
mysterious since this code and these scripts worked before the 
upgrade. i will investigate this problem this week. if you have any 
ideas, please let me know.

4. the last problem i'm having is that i submitted some jobs tonight 
(sunday) and they immediately go into the 'b' queue and don't get 
submitted. they are listed under the atq command, but do not appear 
when i execute the bbq command. this i don't understand.

let me know what you think.

jerry

-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482