[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: latest on the Richmond cluster
Greetings,
Your /var filesystem is full. At depends on /var to operate. (job output
spools there).
G'day,
sjames
On Sun, 1 Dec 2002, gilfoyle wrote:
> hi steven,
>
> here is the latest.
>
> the good news is the cluster seems to be working at some level. some
> of the behavior is significantly different from before the upgrade
> so we may still have some issues to resolve.
>
> when i last left you i had submitted a large number of jobs (about
> 115) to run and left for chicago for thanksgiving. things had looked
> good for smaller numbers of jobs. when i returned only the first 55
> jobs had been run. the remaining ones were sitting in the batch queue
> (used the bbq and atq commands to see this). even more strange was
> that the first 55 jobs that got submitted were still running root
> after 3 days!! i killed those jobs by hand (kill -9), the perl scripts
> finished up, and the jobs in the batch queue remained there and never
> got started. my questions are the following.
>
> 1. is the apparent limit of 55 jobs fixed? can we raise it? it seems
> reasonable to run two jobs per machine (one per cpu). the 'atd -l '
> command looks like it should work (according to the man page).
>
> 2. after i killed the long-running root executables, i thought the
> queued up jobs would get submitted, but they didn't. do you have any
> idea why?
>
> 3. i noticed that root found no good events even when it ran
> successfully with a smaller number of submitted jobs. this is
> mysterious since this code and these scripts worked before the
> upgrade. i will investigate this problem this week. if you have any
> ideas, please let me know.
>
> 4. the last problem i'm having is that i submitted some jobs tonight
> (sunday) and they immediately go into the 'b' queue and don't get
> submitted. they are listed under the atq command, but do not appear
> when i execute the bbq command. this i don't understand.
>
> let me know what you think.
>
> jerry
>
>
--
-------------------------steven james, director of research, linux labs
... ........ ..... .... 230 peachtree st nw ste 701
the original linux labs atlanta.ga.us 30303
-since 1995 http://www.linuxlabs.com
office 404.577.7747 fax 404.577.7743
-----------------------------------------------------------------------