[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: even more questions about the Richmond cluster



Greetings,

Life?!? I can relate!
G'day,
sjames



On Thu, 2 Jan 2003, gilfoyle wrote:

> thanks!
> 
>    i'll work on this some more this evening (life?, what life?).
> 
> jerry
> 
> Steven James wrote:
> > 
> > Greetings,
> > 
> > This time, I will answer inline:
> > 
> > Quoting gilfoyle <ggilfoyl@richmond.edu>:
> > 
> > > hi steven,
> > >
> > >    thanks for the quick response. a couple more questions. i cc'ed mike
> > > vineyard so he hears the latest.
> > >
> > > 1. can we just turn off sending mail to the user? it would be painful
> > > if
> > > i got 160 emails every time i do an analysis run.
> > >
> > If all stdout and stderr from the job is redirected into a file, no mail will be
> > sent.
> > 
> > > 2. i have heard that sendmail has some security problems. is that true?
> > > otherwise i have no great problem running it beyond point #1.
> > >
> > Sendmail does have security problems. I wouldn't allow the outside world to talk
> > to it. The problems are not of the remote execution or gain root privs variety,
> > the major problem is spammers using it as a relay.
> > 
> > > 3. i've already symlinked /var/spool/at/spool to /data3/spool so it has
> > >
> > > plenty of space. it seems like an easy fix to do the same with
> > > /var/spool/mqueue.
> > >
> > Agreed. That would certainly take care of the problems with exceeding the var
> > filesystem.
> > 
> > > 4. running one job per compute node is fine.
> > >
> > > i've also noticed a couple of other things. using beomap to allocate
> > > nodes
> > > creates the following 'problem'. by compiling root we gained a very
> > > large (83)
> > > factor in speed. this means the many of the runs get analyzed in a few
> > > minutes thus freeing up that node. with the long sleep time i'm using
> > > between
> > > job submissions, the low-number slave nodes are usually done before we
> > > get to the high-number nodes. on the analysis run this morning i did
> > > not
> > > use any node about 14. this also means there is lots of data dumped on
> > > the
> > > first few nodes (about 11 GBytes in several cases). this has the danger
> > > of
> > > filling up the slave's disk (which holds 18 GByte). perhaps picking the
> > > slave nodes 'by hand' as i do in one of my scripts is a better way to
> > > distrubute the jobs??
> > >
> > 
> > Assigning in a round robin sounds like a good way to go there. I hope to have
> > significant improvements to the batch system in the next couple of months. I
> > will certainly keep you posted.
> > 
> > I did some testing on the batch system. It is quite likely that the problem is
> > that batch is failing to insert NO_LOCAL=1 into the environment (which causes
> > bogus decisions from beomap that atd throws out). Adding NO_LOCAL=1 to the
> > environment when using batch may be of help.
> > 
> > G'day,
> > sjames
> > 
> > > Steven James wrote:
> > > >
> > > > Greetings,
> > > >
> > > > Just for variety, I'll answer the questions from easy to hard order
> > > :-)
> > > >
> > > > /var/spool/mqueue filling:
> > > > var/spool/mqueue is where outgoing mail goes for sendmail to process
> > > (even if
> > > > it's going from and to a local user).
> > > >
> > > > Currently, sendmail isn't running on pscm1, so the mail piles up in
> > > mqueue
> > > > rather than being delivered to user mailbox files in /var/spool/mail.
> > > >
> > > > /var/spool/at is where the batch system holds job output until
> > > completion. Upon
> > > > completion, it will mail the output to the job's owner.
> > > >
> > > > Probably, sendmail should be running on the master.
> > > >
> > > > To avoid overfilling /var, /var/spool can either be symlinked to a
> > > larger volume
> > > > (/usr or to one of the RAID volumes) or have your batch scripts
> > > redirect the
> > > > output to a scratch file and then move (or bpcp) it to the user's home
> > > directory
> > > > when the run is complete.
> > > >
> > > > I am currently tracking down a bug in the batch queue that pushes jobs
> > > off to
> > > > the b queue and leaves them there when they can't be immediatly
> > > dispatched. I
> > > > believe there is a workaround for that, I'll dig that up and send it
> > > to you. The
> > > > ultimate solution will be an update to the at rpm.
> > > >
> > > > Changing the loadaverage threshold won't do anything here. The bproc
> > > mods to the
> > > > batch queue system use a 'slot allocation' rather than loadavg. The
> > > slot
> > > > allocator assumes that the best allocation is 1 job per CPU on a
> > > compute node.
> > > >
> > > > That is done for three reasons. First, that's generally a better
> > > allocation for
> > > > any scientific computation while loadavg is better suited for general
> > > purpose
> > > > servers. Second, loadavg is more expensive to compute in that it will
> > > generate
> > > > extra net traffic and disrupt the cache and take extra locks on the
> > > compute
> > > > nodes (which will hurt performance). Compute jobs often have bursts of
> > > I/O where
> > > > the job will sleep waiting, followed by long periods of pure
> > > computation. That
> > > > can cause a brief dip in apparent loadavg and lead to bad
> > > allocations.
> > > >
> > > > G'day,
> > > > sjames
> > > >
> > > > Quoting gilfoyle <ggilfoyl@mindspring.com>:
> > > >
> > > > > Hi Steven,
> > > > >
> > > > >    Happy New Year and yet another question about the Richmond
> > > cluster.
> > > > > I have been experimenting with different ways of running the
> > > cluster
> > > > > and I have run into a problem with the batch system. I'm submitting
> > > > > jobs in two different ways; one uses the beomap command to allocate
> > > > > slave nodes and the other just picks the slave nodes `by hand'. I'm
> > > > > using this second method because the limiting factor now is the
> > > > > ability to transfer the data files to the slave nodes. I was
> > > thinking
> > > > > that I could transfer the data on the first pass and leave it there
> > > > > for later passes to speed things up. The problem now is that after
> > > > > many jobs are submitted (from 60-100 of so) the remaining jobs get
> > > > > sent to the `b' batch queue and never run. This has happened even
> > > when
> > > > > the /var/spool area is not full. My thoughts are the following.
> > > > >
> > > > > 1. Can we reset the average cpu load with the 'atd -l' command?
> > > I've
> > > > > tried this and it seems to have little effect.
> > > > >
> > > > > 2. Can we restart the jobs in the queue? Now they just sit there
> > > and
> > > > > never get started.
> > > > >
> > > > > 3. In some of the recent analysis runs, the /var/spool/mqueue area
> > > has
> > > > > filled up and hung things up. Before it was the /var/spool/at or
> > > > > /var/spool/mail areas. Do you have any idea what would cause that?
> > > > > Should we make a link for /var/spool/mqueue to one of RAID disks so
> > > > > there is plenty of space?
> > > > >
> > > > > Let me know what you think?
> > > > >
> > > > > Thanks-in-advance,
> > > > >
> > > > > jerry
> > > > >
> > > > > --
> > > > > Dr. Gerard P. Gilfoyle
> > > > > Physics Department                e-mail: ggilfoyl@richmond.edu
> > > > > University of Richmond, VA 23173  phone:  804-289-8255
> > > > > USA                               fax:    804-289-8482
> > > > >
> > > >
> > > > ----------------------------steven james, director of research, linux
> > > labs
> > > >
> > > > LinuxBIOS Cluster Solutions                   230 peachtree st nw ste
> > > 2705
> > > >
> > > > High-Speed Colocation, Hosting,                        atlanta.ga.us
> > > 30303
> > > >
> > > > Linux Hardware, Development & Support
> > > http://www.linuxlabs.com
> > > >
> > > > * Visit us at SuperComputing 2002, Booth 1441 *   office/fax
> > > 404.577.7747/3
> > > >
> > > >
> > > --------------------------------------------------------------------------
> > >
> > > --
> > > Dr. Gerard P. Gilfoyle
> > > Physics Department                e-mail: ggilfoyl@richmond.edu
> > > University of Richmond, VA 23173  phone:  804-289-8255
> > > USA                               fax:    804-289-8482
> > >
> > 
> > ----------------------------steven james, director of research, linux labs
> > 
> > LinuxBIOS Cluster Solutions                   230 peachtree st nw ste 2705
> > 
> > High-Speed Colocation, Hosting,                        atlanta.ga.us 30303
> > 
> > Linux Hardware, Development & Support             http://www.linuxlabs.com
> > 
> > * Visit us at SuperComputing 2002, Booth 1441 *   office/fax 404.577.7747/3
> > 
> > --------------------------------------------------------------------------
> 
> 

-- 
-------------------------steven james, director of research, linux labs
... ........ ..... ....                     230 peachtree st nw ste 701
the original linux labs                             atlanta.ga.us 30303
      -since 1995                              http://www.linuxlabs.com
                                   office 404.577.7747 fax 404.577.7743
-----------------------------------------------------------------------