[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

problem with University of Richmond cluster



Hi Steven,

   I am one of the physicists from the University of Richmond (along 
with Mike Vineyard) that is using the cluster you delivered to us 
earlier this year. We have recently run into a problem which is 
limiting our ability to make full use of the cluster. The problem is 
the following. Until a couple of days ago we have never tried to run 
multiple jobs on each slave node in the entire cluster. On Tuesday, 
for the first time, I submitted 148 jobs evenly distributed among the 
48 slave nodes. After about 4-5 hours no more jobs were running, but I 
noticed that only about 1/3 of the submitted jobs produced any usable 
output. Today, I was running another large set of jobs and found I 
could no longer run any new processes even from the command line of a 
shell. For example, I would type in 'ls' and get back 'no more 
processes'. It appears there is an upper limit on the number of 
processes that can be run on the master. Once you exceed that limit it 
looks like any new attempts to start a process are essentially 
ignored. In submitting the full set of 148 jobs many were not run 
because they would have exceeded this upper limit on the allowed 
number of processes. Right now I can run no more than about 40 jobs on 
the cluster without encountering this problem. This is fewer than one 
job per slave node. Each job I submit starts three separate processes 
so I am starting 120 processes. In searching the web, there are 
discussions of this limitation and a solution (which involves building 
a new kernel). The urls are below. I have also attached the scripts I 
am using to do the data analysis (one shell script and one perl 
script). Any help you can provide would be greatly appreciated.

Thanks-in-advance,

Jerry Gilfoyle

http://www.ltsp.org/documentation/lts_ig_v2.4/lts_ig_v2.4-14.html

http://www.geocrawler.com/archives/3/61/1998/10/0/2207294/

-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482

Attachment: submit_eod3b.pl
Description: Binary data

File attachment: run_root_on_node2.sh

The file attached to this email was removed because files of this type are not accepted for delivery by your email gateway.