[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: status of the Richmond cluster

To: steven james <pyro@linuxlabs.com>
Subject: Re: status of the Richmond cluster
From: gilfoyle <ggilfoyl@richmond.edu>
Date: Mon, 18 Nov 2002 11:09:14 -0500
References: <Pine.LNX.4.21.0211151801310.16355-100000@ucontrol.mobiledns.com>

hi steven,

   i tried running things yesterday and got the following.

1. i tried running my perl scripts on slaves 10-11 (i.e. analyze two
runs) and root did not run. the other tasks in the perl script were
done correctly.

2. i tried running root with the bpsh command from pscm1. i executed the
command in the area /scratch/gilfoyle/e5/24023 which is the area on the
slave. what is the jargon for this? mirror/ghost directory? it did not
run correctly or produce any output. however, there is a core file in
the 
/scratch/gilfoyle/e5/24023 area on slave 10.

3. i tried running my perl scripts on slaves 0-1 since they worked
before.
they worked!! root ran and produced output files with filled histograms
and all the good stuff.

4. i tried running root on pscm1 (to look at the results of step 3) and 
it did not run! it flashes its little greeting (which is an X-window
function) and then crashes. the core file is in 
/home/gilfoyle/eod/run/results/.

if you want to run this yourself the commands are the following.

1. to run root: 

root<cr>

if you want to do more than that, let me know and i can give you a
quick how-to for looking at data.

2. to run root on slave 10: 

bpsh 10 root -b -q /scratch/gilfoyle/e5/24023/run_eod3.C

the data files are already on the slave. usually i would delete them
after an analysis run, but i have them left them on the disk now for
testing.

3. to submit a job to the cluster.

a. go to /home/gilfoyle/eod/run.
b. execute submit_eod3c.pl<cr>

the scripts are submit_eod3c.pl and run_root_on_node3.pl. the main 
input file is /home/gilfoyle/eod/run/E5_run_numbers.inp which 
determines which runs to analyze. right now it only lists 2 runs so
only two runs will get analyzed when you run submit_eod3c.pl. the
script submit_eod3c.pl sets some parameters including which slaves
to run the analysis on. for example, see the parameter first_node
in submit_eod3c.p.

let me know if there is more that will help. i'm starting to get a bit
desperate to get this thing working.

jerry

steven james wrote:
> 
> Greetings,
> 
> I believe I have all of the library issues dealt with.
> 
> I noticed a possably confusing behaviour that might have been the root of
> some of this.
> 
> Perl depends on several libraries in /lib to run. Unlike those in
> /usr/lib, they were being managed by caching rather than just being
> available from NFS. It can take about a minute for the libs to be fetched
> from the master. During that time, the app will appear hung, but will
> eventually start.
> 
> I have pre-cached the files onto the node's local drive to try to avoid
> that delay.
> 
> Since the libs are cached, once that startup penelty is paid, it doesn't
> happen again for those libs on that node until reboot.
> 
> You can see this happen using tcpdump (I have a binary of it in my home
> directory). The libs are transferred as a stream of multicast packets.
> 
> Please let me know if this gets it going. If problems remain, a good
> approach might be for me to make a copy of your test data and try the runs
> myself until the expected results come up.
> 
> G'day,
> sjames
> 
> On Thu, 14 Nov 2002, gilfoyle wrote:
> 
> > hi steven,
> >
> >    i'm checking in (when there is no beam) to find out the
> > status of the cluster. have the library issues been resolved?
> > if so, what was the solution? i'm itching to let this thing
> > get cooking.
> >
> > jerry
> >
> >
> 
> --
> -------------------------steven james, director of research, linux labs
> ... ........ ..... ....                     230 peachtree st nw ste 701
> the original linux labs                             atlanta.ga.us 30303
>       -since 1995                              http://www.linuxlabs.com
>                                    office 404.577.7747 fax 404.577.7743
> -----------------------------------------------------------------------

-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482

Follow-Ups:
- Re: status of the Richmond cluster
  - From: steven james <pyro@linuxlabs.com>

References:
- Re: status of the Richmond cluster
  - From: steven james <pyro@linuxlabs.com>

Prev by Date: Re: status of the Richmond cluster
Next by Date: Re: CLAS software @ UR
Previous by thread: Re: status of the Richmond cluster
Next by thread: Re: status of the Richmond cluster
Index(es):
- Date
- Thread