[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: even more questions about the Richmond cluster



Greetings,

This time, I will answer inline:


Quoting gilfoyle <ggilfoyl@richmond.edu>:

> hi steven,
> 
>    thanks for the quick response. a couple more questions. i cc'ed mike
> vineyard so he hears the latest.
> 
> 1. can we just turn off sending mail to the user? it would be painful
> if
> i got 160 emails every time i do an analysis run.
> 
If all stdout and stderr from the job is redirected into a file, no mail will be
sent.


> 2. i have heard that sendmail has some security problems. is that true?
> otherwise i have no great problem running it beyond point #1.
> 
Sendmail does have security problems. I wouldn't allow the outside world to talk
to it. The problems are not of the remote execution or gain root privs variety,
the major problem is spammers using it as a relay.


> 3. i've already symlinked /var/spool/at/spool to /data3/spool so it has
> 
> plenty of space. it seems like an easy fix to do the same with 
> /var/spool/mqueue.
> 
Agreed. That would certainly take care of the problems with exceeding the var
filesystem.


> 4. running one job per compute node is fine.
> 
> i've also noticed a couple of other things. using beomap to allocate
> nodes
> creates the following 'problem'. by compiling root we gained a very
> large (83)
> factor in speed. this means the many of the runs get analyzed in a few
> minutes thus freeing up that node. with the long sleep time i'm using
> between
> job submissions, the low-number slave nodes are usually done before we 
> get to the high-number nodes. on the analysis run this morning i did
> not
> use any node about 14. this also means there is lots of data dumped on
> the
> first few nodes (about 11 GBytes in several cases). this has the danger
> of
> filling up the slave's disk (which holds 18 GByte). perhaps picking the
> slave nodes 'by hand' as i do in one of my scripts is a better way to
> distrubute the jobs??
> 

Assigning in a round robin sounds like a good way to go there. I hope to have
significant improvements to the batch system in the next couple of months. I
will certainly keep you posted.

I did some testing on the batch system. It is quite likely that the problem is
that batch is failing to insert NO_LOCAL=1 into the environment (which causes
bogus decisions from beomap that atd throws out). Adding NO_LOCAL=1 to the
environment when using batch may be of help.

G'day,
sjames


> Steven James wrote:
> > 
> > Greetings,
> > 
> > Just for variety, I'll answer the questions from easy to hard order
> :-)
> > 
> > /var/spool/mqueue filling:
> > var/spool/mqueue is where outgoing mail goes for sendmail to process
> (even if
> > it's going from and to a local user).
> > 
> > Currently, sendmail isn't running on pscm1, so the mail piles up in
> mqueue
> > rather than being delivered to user mailbox files in /var/spool/mail.
> > 
> > /var/spool/at is where the batch system holds job output until
> completion. Upon
> > completion, it will mail the output to the job's owner.
> > 
> > Probably, sendmail should be running on the master.
> > 
> > To avoid overfilling /var, /var/spool can either be symlinked to a
> larger volume
> > (/usr or to one of the RAID volumes) or have your batch scripts
> redirect the
> > output to a scratch file and then move (or bpcp) it to the user's home
> directory
> > when the run is complete.
> > 
> > I am currently tracking down a bug in the batch queue that pushes jobs
> off to
> > the b queue and leaves them there when they can't be immediatly
> dispatched. I
> > believe there is a workaround for that, I'll dig that up and send it
> to you. The
> > ultimate solution will be an update to the at rpm.
> > 
> > Changing the loadaverage threshold won't do anything here. The bproc
> mods to the
> > batch queue system use a 'slot allocation' rather than loadavg. The
> slot
> > allocator assumes that the best allocation is 1 job per CPU on a
> compute node.
> > 
> > That is done for three reasons. First, that's generally a better
> allocation for
> > any scientific computation while loadavg is better suited for general
> purpose
> > servers. Second, loadavg is more expensive to compute in that it will
> generate
> > extra net traffic and disrupt the cache and take extra locks on the
> compute
> > nodes (which will hurt performance). Compute jobs often have bursts of
> I/O where
> > the job will sleep waiting, followed by long periods of pure
> computation. That
> > can cause a brief dip in apparent loadavg and lead to bad
> allocations.
> > 
> > G'day,
> > sjames
> > 
> > Quoting gilfoyle <ggilfoyl@mindspring.com>:
> > 
> > > Hi Steven,
> > >
> > >    Happy New Year and yet another question about the Richmond
> cluster.
> > > I have been experimenting with different ways of running the
> cluster
> > > and I have run into a problem with the batch system. I'm submitting
> > > jobs in two different ways; one uses the beomap command to allocate
> > > slave nodes and the other just picks the slave nodes `by hand'. I'm
> > > using this second method because the limiting factor now is the
> > > ability to transfer the data files to the slave nodes. I was
> thinking
> > > that I could transfer the data on the first pass and leave it there
> > > for later passes to speed things up. The problem now is that after
> > > many jobs are submitted (from 60-100 of so) the remaining jobs get
> > > sent to the `b' batch queue and never run. This has happened even
> when
> > > the /var/spool area is not full. My thoughts are the following.
> > >
> > > 1. Can we reset the average cpu load with the 'atd -l' command?
> I've
> > > tried this and it seems to have little effect.
> > >
> > > 2. Can we restart the jobs in the queue? Now they just sit there
> and
> > > never get started.
> > >
> > > 3. In some of the recent analysis runs, the /var/spool/mqueue area
> has
> > > filled up and hung things up. Before it was the /var/spool/at or
> > > /var/spool/mail areas. Do you have any idea what would cause that?
> > > Should we make a link for /var/spool/mqueue to one of RAID disks so
> > > there is plenty of space?
> > >
> > > Let me know what you think?
> > >
> > > Thanks-in-advance,
> > >
> > > jerry
> > >
> > > --
> > > Dr. Gerard P. Gilfoyle
> > > Physics Department                e-mail: ggilfoyl@richmond.edu
> > > University of Richmond, VA 23173  phone:  804-289-8255
> > > USA                               fax:    804-289-8482
> > >
> > 
> > ----------------------------steven james, director of research, linux
> labs
> > 
> > LinuxBIOS Cluster Solutions                   230 peachtree st nw ste
> 2705
> > 
> > High-Speed Colocation, Hosting,                        atlanta.ga.us
> 30303
> > 
> > Linux Hardware, Development & Support            
> http://www.linuxlabs.com
> > 
> > * Visit us at SuperComputing 2002, Booth 1441 *   office/fax
> 404.577.7747/3
> > 
> >
> --------------------------------------------------------------------------
> 
> -- 
> Dr. Gerard P. Gilfoyle
> Physics Department                e-mail: ggilfoyl@richmond.edu
> University of Richmond, VA 23173  phone:  804-289-8255
> USA                               fax:    804-289-8482
> 



----------------------------steven james, director of research, linux labs

LinuxBIOS Cluster Solutions                   230 peachtree st nw ste 2705

High-Speed Colocation, Hosting,                        atlanta.ga.us 30303

Linux Hardware, Development & Support             http://www.linuxlabs.com

* Visit us at SuperComputing 2002, Booth 1441 *   office/fax 404.577.7747/3

--------------------------------------------------------------------------