RaidCore 4852 -- DON'T DO IT!

If you are looking at inexpensive storage solutions, please read this in its entirety.  I was in the same boat as you, and lost my precious data due to countless problem with a broadcom raidcore product.  The technician was quick to say it was my problem and release liability from the company when in fact the problem was in the card itself.  My advice to you: BE CAREFUL AND READ THIS!

 


Configuration:
  • Tyan 2518 Motherboard
  • 4u rackmount case
  • 500 watt power supply
  • supermicro 5 drive sata cage
  • 3 gig ecc 133 sdram
  • 2x 1.4s pentium 3 chips
  • raidcore 4852 "raid" controller
  • 8x 200 GB Western Digital Caviars

When you do a search on raidcore, you find a lot of results on benchmarking.  Something you dont see anything about it recovering the array when it actually fails.  They boast a very high MTBF (mean time before failure) but lets be realistic for a few.

I wanted a bunch of storage.  I went out and bought all this hardware brand new.  I paid through the nose.  I was interested in stability more than anything.  I constructed the box first, then started researching reasonably priced storage.  SCSI was instantly out of the question.  I came across raidcore's name and read up on em.  The card was very reasonably priced (compared to competitors) and had RAVE reviews.  I went out and got 10 WD2000JD 200 GB SATA drives (8 to go in the array, 2 for desktop), a supermicro hotswap cage, and the RAIDCORE 4852 Controller.  I had no idea what the next month had in store for me....


I thought, all new hardware!  What a treat!  Nothing can stop me now!  Data warehousing, here I come! 

The first skittish thing I noticed..

8 x 200 gb (200.94 gb formatted capacity) = 1,607.52 gb (Formatted)

Raid 5 Configuration (estimate about 25% for the parity) 1,607.52 GB * .75 = 1,205.64 GB

So that's 1.2 Terabytes for a raid 5 configuration.

Now, for some reason, the RaidCore 4852 only showed 1.0 Terabytes in this array, at 100% utilization on EVERY drive.  Off the rip, I lost 200 GB.  I wasn't that concerned.  For cryin out loud, I have my first TERABYTE of storage!  I was excited!  Time to get and OS on it and fill it on up!

The array init'd pretty fast, and I was on my way to a stable computer (so I thought).

I made 2 partitions.  One 30gb for the OS and the rest for data.  Yippee!


I started copying all my backup cd's to the array.  Stuff I have had for years and years.  I wanted to have everything at the drop of a hat.  Before you know it, I have about 200 gigs taken up, and all my CD's stacked on the desk.  What can be better?!?

So I spend the next few days organizing everything.  With that much space, it would be easy to get everything messy, quick!  So yadda yadda.. got everything in order and was as happy as a kid on Christmas morning.

I am pretty much done with needing to log into the box locally, so I mount it back in my rack.  I feel a sense of gratification as I screw the sucker in...

2 days later, I go to grab some applications off the server and its not responding.  Confused and disoriented, I walk downstairs to my rack.  I find that the os is not responding and hit the reset button to reboot and see what going on with it.  It comes up NO OS INSTALLED.  My heart fluttered!  I just spent over $2000 in the storage system alone.  I wasn't expecting this for at least another 8-12 months! (but it is running windows)

I frantically grab the case out of the rack and bring it upstairs to my office, pop it open and hook up the CD ROM drive so I can boot the CD and perform a repair on the OS.  Everything is going as planned and I get through the dos portion of setup and into the gui part.  next thing I know, a drive goes out on me WHILE I am repairing the OS to get running again.  I thought to myself, what the heck is going on here?!??!  What kind of luck is this, you know?  So I think to myself, no biggie, its raid 5, fault tolerant, right?  I take the drive out that stopped responding and replaced it with one from my desktop.  I was proud of myself for keeping a spare!

I reboot, toss the spare in and go into the controller's bios looking for the rebuild feature.  LOW AND BEHOLD, ITS NOT THERE!  All I see is a BIOS screen this bitches because the array went critical, AND NO WAY TO FIX IT!

I shot an email over to raidcore tech support and waited for a reply.  OF COURSE it was like 2AM when this happened, so they were not open for business.

I would post the thread, but I lost about a week of email.  My personal folder was on the server with the raid array.  We go back and forth a few times and the tech support guy informs me that I HAVE TO HAVE AN OS TO DO A REBUILD OF THE ARRAY.  So what does that mean?  THIS CONTROLLER IS NOT HARDWARE RAID!  Okay, no biggie, I do as he suggests and grab an ide drive.  Thank GOODNESS I had one just laying around after my SATA upgrade in my desktop.  If I didn't have this drive sitting there, I would be off to the store to purchase one.

I hook the drive up and repair the OS so I can work off this hardware.  I get it up and running, initialize the new drive and it started rebuilding automatically.  Unfortunately, it transformed the array so that it raid 5'd 7 drives, instead of the 8.  But the size didn't change.  I didn't care.. whatever, here is my data back the way I had it. 

You ask what happened to the drive I replaced?  I set it as a spare, as directed by raidcore support.  They said it would start the rebuild automatically and use the new drive.  It didn't, but like I said, there was my data.  Cool, let's go on with life.  It only took 3 days to get back running again.  My OS was hosed though, windows boot CD would not even see my old OS to repair it.  I reinstalled and reconfigured.  Tossed it back in the rack and away we went again.  I was a little distraught with what I just went through but found no need to dwell on it.  I had what mattered -- my data intact.


Sad Story-- it does not end here.  2 weeks later, something similar happened. 

I email raidcore again, and check out the response I get (cut and pasted from the email):

ME: The reason all this happened this time is just like last time.  Somehow, the os had been corrupted, I went to repair the os yet again and the same thing happened..  a drive said it failed… 

RAIDCORE: The new Version 1.2 has enhanced bad block code that will more clearly indentify bad drives and mark them offline.  We have a drive in our lab that reports media errors and our code writes the data bad to the drive based on our parity drive and the drive reports the data is written back sucessfully while it has not been.  This was shown to cause corruption, so we added code to mark drives that report multiple errors within the same sector as bad.

WOW!  A "Production Environmentally Ready Product"  getting RAVE reviews up and down on the internet with a bug like this in the firmware!  What does this tell you about this company?  Perhaps they need to do some more testing. 

My outlook at this point:

I paid good money for a card that was supposed to provide speed and fault tolerance for my precious data.

What I got:

I got to BETA TEST a product, and paid RETAIL for it.


So what's going on now?
Because of this little "error" in their code, my partition table is hosed.  I am out of pocket $1500 for some data recovery software that MIGHT be able to restore my data.  The saddest thing of all:  I trusted this card enough that I had ALL of my business information on the massive storage share.  Set me back about 2 months AND I had another small business owner's computer backed up to the array while I was reinstalling his Operating System.  ALL of his Business information is lost as well!

What do I get from raidcore concerning this issue?  I will show you:

Hi Austin,
 
I talked with our developer about this issue.  You should be able to see the contents of the disk with the array in this state.  He does not believe that the recovery is interferring with this.  We believe that the multiple disk errors caused data corruption on the array and affected the partion information.  At this point even if the transform was running it would not result in a usable disk.  It is also possible that the 64 bit PCI slot running at 66MHz was the culprit.
 
At this point the only thing you might try is some type of partition repair tool (ie Partition Magic).  I would give this a low probability of success, so unless you already have a copy, I would not recommend buying it.
 
If you should decide to re-install, I assume you have already returned the two faulty disks for replacement.  I would suggest you run some load tests on the system to see if the 64 bit PCI bus is stable before loading critical data on the system.
 
Sorry I don't have a better solution for you.
 
Thanks ... *EDITED*
RAIDCore Support

Yet more sad details:

I have 2 machines with the exact configuration (minus the sata and raidcore) and have NEVER had any issues with the 64 bit bus.  I think the culprit would be shoddy code in the firmware coupled with software raid instead of REAL hardware raid.  Buy a 3ware 9500 and save yourself the troubles.

The 2 drives he stated as "Failed" was one that was falsely reported by the card, during the 2nd OS rebuild, was not really bad.  I tested it in my desktop, and it came out fine.  I placed it back in the array expecting a rebuild, and it still just sits there.  The other drive was actually the spare that went bad and has been advance replaced by Western Digital shipped 2nd day air.  (Now that is a great company!)


CURRENT STATUS:  Waiting for a raw data recovery on this freaking TERABYTE..... 10:35pm 9-3-04, 2 weeks after the 2nd crash.....  I paid $1500 for the software as well.. mind you....

09/28/2004 -- Still Waiting on the recovery........  it SAYS its 1/2 way done.....

10/20/2004 -- This recovery is like the energizer bunny..  just keeps going and going and going.... 72% there..  1.5 billion to go

11/12/2004 -- The recovery finally finished, well sort of.. it happened to freeze at 100%.... What luck eh?  I already bought me a 3ware 9500s-8.  This card is WAY WAY WAY less problematic than the raidcore and is superfast!  Before you have a problem with raidcore, go 3ware!


From here, my intention is to recover as much data as possible from this nasty error left in the firmware, toss this card and buy a real hardware SATA raid card, from an established company, *3WARE*

Disclaimer:  This is my personal experience with the raidcore 4852 sata raid card.  Your mileage may vary, but I DOUBT IT!


Hit Counter