View Single Post
  #4  
Old 06-14-2011, 01:34 PM
titus's Avatar
titus titus is offline
Administrator
 
Join Date: Aug 2001
Location: Hong Kong
Posts: 3,163
titus has disabled reputation
Default

Hello,

Just for the record, this past weekend's fiasco was totally embarrassing. This has to be the longest outage we have sustained, despite being an announced planned outage with data loss.

Ultimately it's the hosting company's fault in not being able to get the RAID array built despite having replaced a faulty drive on the old server, and basically required me to enter into another year's contract just to get a server built but didn't follow through to assist me in transferring unless I chased and called and chatted.

Yet having said that, I did make a rather embarrassing mistake (not going to comment further here) resulting in about 18 hrs of data loss, followed by having to rebuild two servers from scratch. That was the easy part.

Then there was an attempt to transfer the web server across and believe me, we tried just about everything under the radar from using normal scp, fast scp, mounting eSATA, direct cross cable connection, etc but each time it's crawling at an average rate of 3Mb/s. Now given the server is about 250GB one can imagine the time it's going to take.

In the end, I rebuilt it as well. It turned out had I made the decision to rebuild, I could have brought the server back up by the target time at 18:00 PST on Sunday. Yes 3 servers rebuilt by hand. All in all not bad in terms of recovery time given everything has to be re-installed. I did get save somewhat because I have multiple back ups available on hand.

Anyway, as a result of this, I'm going to look seriously into taking the data protection and service resiliency to another level. There are a lot of work to do for this as there are many solutions available. Do we use two sets of servers sharing a SAN, do we use an additional server doing CDP, do we use two sets of servers with web load balanced and database replication, etc. This needs to tie into a plan that I have to use mem cache as well. Other considerations would be how to address uploaded files appearing on both web servers if we go that route.

So this will side track on the recent efforts on the reference library and the vBulletin 4 and wiki upgrade.

One step at a time.

Titus
__________________
A link to http://www.yahoo.com
Reply With Quote