hi sasko, attached is the latest status report on the cluster which appears to be working! thanks for all your help in getting the cluster working. since that problem is fixed, there are some administrative type things i want to work in the next few weeks and need your help. 1. before the upgrade i had set up my account on pscm1 with the same uid as my account on gpg2. this enabled me to access my 'pscm1' files from gpg2 transparently. since the upgrade this is no longer true and i cannot edit and delete files on pscm1 from gpg2. what is the best way to fix this? i could change my uid on gpg2 to match the one on pscm1. is there a more elegant way to do this? i want my files on pscm1 to be accessible on by me and no one else. 2. when i am on mfv1 and i try to ssh to gpg2 i get the following message. physxcd:gilfoyle> ssh gpg2 ssh_exchange_identification: Connection closed by remote host i tried various fixes and none work. do you have any ideas? 3. node 8 in the cluster is still dead. can you try to resurrect it? if you can't we should send it back to linuxlabs to get fixed or replaced. 4. node 48 in the cluster is also dead. can you try to resurrect it? if you can't we should send it back with node 8. 5. we got money from the university faculty research committee to purchase some new nodes to add to the cluster. this would involve removing some of the nodes from the old cluster (psc1) so we can use the rack for the new ones. we should talk in the next week or two to plan this move. 6. adnan iqbal will be working with me over break and i may have him help us with some of the stuff above. let me know what you think. jerry -- Dr. Gerard P. Gilfoyle Physics Department e-mail: ggilfoyl@richmond.edu University of Richmond, VA 23173 phone: 804-289-8255 USA fax: 804-289-8482
--- Begin Message ---
- To: Mike Vineyard <vineyarm@union.edu>, Luninita Tudor <luminita@jlab.org>, steven james <pyro@linuxlabs.com>, Markus Geiger <markus@linuxlabs.com>, Francisco Chinchilla <fchinchi@richmond.edu>
- Subject: status of the Richmond cluster and thanks to Steven James
- From: gilfoyle <ggilfoyl@richmond.edu>
- Date: Thu, 05 Dec 2002 14:18:39 -0500
Status of the Richmond cluster: It works! Yesterday I did the first full analysis using the entire cluster. It involved processing 148 data runs consisting of 1440 files. This is the E5, 4.232 GeV data set. It took about 12 hours to complete and processed 71790968 events. Of those events, 10384406 were ep events and 911913 were ep(n) events. I don't know how this compares with the performance of the JLab farm, but if anyone knows, please tell me. I want to thank Steven James for all his help in upgrading the cluster and solving the long string of problems we encountered in getting to this point. I have attached copies of the two perl scripts that I used to perform this analysis run. They are heavily commented so I won't bore you with the details here. If you see ways to improve them, let me know. One important point to realize is that with this many experimental runs to process, I had to build into my script the ability to wait until a slave node becomes available. In our previous work, we did not pay attention to this. As a result, I ended up filling up the /var area with stuff which caused the remaining jobs to fail or not even be started. Be aware of this 'feature' if your jobs mysteriously disappear. There are undoubtedly other bugs, problems, etc to solve. Please start doing your analyses so that we can find and fix those problems. The cluster run I did yesterday is NOT the full 4.232 GeV data set, but it is a large fraction of it. There are bunches of data files that I have yet to move over since they were pulled off the silo after I first moved this data set to the cluster. I will be generating a webpage in the next few weeks to keep documentation, notes, advice, etc about using the cluster. later, jerry -- Dr. Gerard P. Gilfoyle Physics Department e-mail: ggilfoyl@richmond.edu University of Richmond, VA 23173 phone: 804-289-8255 USA fax: 804-289-8482Attachment: submit_eod3d.pl
Description: Perl programAttachment: run_root_on_node3.pl
Description: Perl program
--- End Message ---