[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

cube status

To: mvineyar@richmond.edu
Subject: cube status
From: Francisco Chinchilla <fchinchi@richmond.edu>
Date: Wed, 19 Jun 2002 16:21:35 -0400
Cc: aiqbal@richmond.edu, ggilfoyl@richmond.edu, prubin@richmond.edu
Sender: Francisco Chinchilla <fchinchi@richmond.edu>

I succesfully ran about 200 of the h2root files on the old cluster when the 
cube crashed.  I attached a monitor and did the following:

1) restart network...nope.  Still couldn't get a "df" on psc1, it would hang, 
and "cd /net/pscr1/" gave a a "service not available" msg.

2) I hit the green button and the system would hang when probing pci bus #4.  
One of the hard drives was making a funny noise and taking a bit longer before 
its light turned off during bootup.  I called raidzone and they said that one 
of the hard drives probly went bad (which is the case) and that since we were 
using RAID0 we lost all the data.  I am tinkering right now with the hard 
drive that went bad to see if I can get anything out of it.  I will try to see 
if we can salvage the other data and lose just the one that was in the bad 
hard drive, but that may be impossible to do.  I will keep you posted as soon 
as I have more news.

"Did u back up the data?"
No.  It is quite hard to back up over 250GB of data. I am not sure we have 
enought tapes to even do that.  I believe that is why the different levels of 
RAID exist, even though you do lose usable disk space, there is some 
redundancy and recovery from an error is feasible.
This might be a good time to move all the data from pscm1:/data# into data1, 
and then switch data2 and data3 from RAID0 to another RAID (5 or 10), then do 
the same for data1.
Of course, you can always argue that the data could be re-downloaded, etc, so 
it really is your call.  I can send statistics on space lost and all that from 
each RAID configuration if you want me to.

Francisco Chinchilla

Prev by Date: Physics Computer Lab Relocation Costs
Next by Date: powering up/down the new cluster
Previous by thread: Physics Computer Lab Relocation Costs
Next by thread: powering up/down the new cluster
Index(es):
- Date
- Thread