[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: latest on the Richmond cluster



Greetings,

Your /var filesystem is full. At depends on /var to operate. (job output
spools there).

G'day,
sjames



On Sun, 1 Dec 2002, gilfoyle wrote:

> hi steven,
> 
>    here is the latest. 
> 
> the good news is the cluster seems to be working at some level. some
> of the behavior is significantly different from before the upgrade 
> so we may still have some issues to resolve.
> 
> when i last left you i had submitted a large number of jobs (about 
> 115) to run and left for chicago for thanksgiving. things had looked 
> good for smaller numbers of jobs. when i returned only the first 55 
> jobs had been run. the remaining ones were sitting in the batch queue 
> (used the bbq and atq commands to see this). even more strange was 
> that the first 55 jobs that got submitted were still running root 
> after 3 days!! i killed those jobs by hand (kill -9), the perl scripts 
> finished up, and the jobs in the batch queue remained there and never 
> got started.  my questions are the following.
> 
> 1. is the apparent limit of 55 jobs fixed? can we raise it? it seems 
> reasonable to run two jobs per machine (one per cpu). the 'atd -l ' 
> command looks like it should work (according to the man page).
> 
> 2. after i killed the long-running root executables, i thought the 
> queued up jobs would get submitted, but they didn't. do you have any 
> idea why?
> 
> 3. i noticed that root found no good events even when it ran 
> successfully with a smaller number of submitted jobs. this is 
> mysterious since this code and these scripts worked before the 
> upgrade. i will investigate this problem this week. if you have any 
> ideas, please let me know.
> 
> 4. the last problem i'm having is that i submitted some jobs tonight 
> (sunday) and they immediately go into the 'b' queue and don't get 
> submitted. they are listed under the atq command, but do not appear 
> when i execute the bbq command. this i don't understand.
> 
> let me know what you think.
> 
> jerry
> 
> 

-- 
-------------------------steven james, director of research, linux labs
... ........ ..... ....                     230 peachtree st nw ste 701
the original linux labs                             atlanta.ga.us 30303
      -since 1995                              http://www.linuxlabs.com
                                   office 404.577.7747 fax 404.577.7743
-----------------------------------------------------------------------