[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: status of the Richmond cluster and thanks to Steven James]



hi sasko,

   attached is the latest status report on the cluster which appears
to be working! thanks for all your help in getting the cluster working.
since that problem is fixed, there are some administrative type things
i want to work in the next few weeks and need your help.

1. before the upgrade i had set up my account on pscm1 with the
same uid as my account on gpg2. this enabled me to access my
'pscm1' files from gpg2 transparently. since the upgrade this is
no longer true and i cannot edit and delete files on pscm1 from
gpg2. what is the best way to fix this? i could change my uid
on gpg2 to match the one on pscm1. is there a more elegant way
to do this? i want my files on pscm1 to be accessible on by me
and no one else.

2. when i am on mfv1 and i try to ssh to gpg2 i get the following
message.

physxcd:gilfoyle> ssh gpg2
ssh_exchange_identification: Connection closed by remote host

i tried various fixes and none work. do you have any ideas?

3. node 8 in the cluster is still dead. can you try to resurrect
it? if you can't we should send it back to linuxlabs to get fixed
or replaced.

4. node 48 in the cluster is also dead. can you try to resurrect
it? if you can't we should send it back with node 8.

5. we got money from the university faculty research committee to
purchase some new nodes to add to the cluster. this would involve
removing some of the nodes from the old cluster (psc1) so we can 
use the rack for the new ones. we should talk in the next week
or two to plan this move.

6. adnan iqbal will be working with me over break and i may have 
him help us with some of the stuff above.

let me know what you think.

jerry


-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482
--- Begin Message ---
Status of the Richmond cluster:

It works! Yesterday I did the first full analysis using the entire 
cluster. It involved processing 148 data runs consisting of 1440 files. 
This is the E5, 4.232 GeV data set. It took about 12 hours to complete 
and processed 71790968 events. Of those events, 10384406 were ep 
events and 911913 were ep(n) events. I don't know how this compares 
with the performance of the JLab farm, but if anyone knows, please 
tell me. I want to thank Steven James for all his help in upgrading 
the cluster and solving the long string of problems we encountered in 
getting to this point.

I have attached copies of the two perl scripts that I used to perform 
this analysis run. They are heavily commented so I won't bore you with 
the details here. If you see ways to improve them, let me know. One 
important point to realize is that with this many experimental runs to 
process, I had to build into my script the ability to wait until a 
slave node becomes available. In our previous work, we did not pay 
attention to this. As a result, I ended up filling up the /var area 
with stuff which caused the remaining jobs to fail or not even be 
started. Be aware of this 'feature' if your jobs mysteriously 
disappear.

There are undoubtedly other bugs, problems, etc to solve. Please start 
doing your analyses so that we can find and fix those problems.

The cluster run I did yesterday is NOT the full 4.232 GeV data set,
but it is a large fraction of it. There are bunches of data files that 
I have yet to move over since they were pulled off the silo after I
first moved this data set to the cluster.

I will be generating a webpage in the next few weeks to keep 
documentation, notes, advice, etc about using the cluster.


later, 

jerry

-- 
Dr. Gerard P. Gilfoyle
Physics Department                e-mail: ggilfoyl@richmond.edu
University of Richmond, VA 23173  phone:  804-289-8255
USA                               fax:    804-289-8482

Attachment: submit_eod3d.pl
Description: Perl program

Attachment: run_root_on_node3.pl
Description: Perl program


--- End Message ---