Canreef Aquatics Bulletin Board - Emergency Maintenance 5 Jun 11

Canreef Aquatics Bulletin Board (http://www.canreef.com/vbulletin/index.php)

- Lounge (http://www.canreef.com/vbulletin/forumdisplay.php?f=14)

- - Emergency Maintenance 5 Jun 11 (http://www.canreef.com/vbulletin/showthread.php?t=76066)

Hello,

Sorry guys give me a few days to fix this. I seriously need some sleep. I'll have a lot of work to do aside from the system time. :redface:

Titus

Hello,

Just for the record, this past weekend's fiasco was totally embarrassing. This has to be the longest outage we have sustained, despite being an announced planned outage with data loss.

Ultimately it's the hosting company's fault in not being able to get the RAID array built despite having replaced a faulty drive on the old server, and basically required me to enter into another year's contract just to get a server built but didn't follow through to assist me in transferring unless I chased and called and chatted.

Yet having said that, I did make a rather embarrassing mistake (not going to comment further here) resulting in about 18 hrs of data loss, followed by having to rebuild two servers from scratch. That was the easy part.

Then there was an attempt to transfer the web server across and believe me, we tried just about everything under the radar from using normal scp, fast scp, mounting eSATA, direct cross cable connection, etc but each time it's crawling at an average rate of 3Mb/s. Now given the server is about 250GB one can imagine the time it's going to take.

In the end, I rebuilt it as well. It turned out had I made the decision to rebuild, I could have brought the server back up by the target time at 18:00 PST on Sunday. Yes 3 servers rebuilt by hand. All in all not bad in terms of recovery time given everything has to be re-installed. I did get save somewhat because I have multiple back ups available on hand.

Anyway, as a result of this, I'm going to look seriously into taking the data protection and service resiliency to another level. There are a lot of work to do for this as there are many solutions available. Do we use two sets of servers sharing a SAN, do we use an additional server doing CDP, do we use two sets of servers with web load balanced and database replication, etc. This needs to tie into a plan that I have to use mem cache as well. Other considerations would be how to address uploaded files appearing on both web servers if we go that route.

So this will side track on the recent efforts on the reference library and the vBulletin 4 and wiki upgrade.

One step at a time.

Titus

I appreciate all of your hard work! Great that you finally got it going.

Just as a side note. Our servers went deadsville the otherday (ones that I look after) and one call and 4 hours later everything went back up. Everything was down including e-mail!

Isn't it the job of the ISP to fix that. I do all of the programming, database management, user management .... etc.. but when things get pooched.. then they fix it.

Unless you're doing co-location. Would it be more cost effective to just run it on their servers?

I'm only saying this from experience. Years ago we used to run the servers here at work and had to manage, manage, manage, updated, patch, etc... Now I don't have to do that and just concentrate on the other side of things. Saved time. Even backups are done by them (and CRON) so we never lose any data.

Anyhow.. glad I can get my fix again.. and I know how you feel. I've been there. Even when ours went down I imagined talking to staff .. "umm. ya.. we lost everything.. sorry.." thankfully in both our cases.. that didn't happen!

No worries, we all make mistakes!

At least we only lost a tiny bit of data, compared to the huge BCA forum data loss a year or two ago.

P.S. Appreciate all the hard work gettin us back online! :)

-Cody

Hmmm, just swo you know -- I don't think I'm the only other IT related person on here that would offer free advice :)

I do work in the public sector or things, so I don't do consulting anymore -- which translates to some reasonable advise without trying to make a buck.....

Anyways, I currently run a pretty large server setup (about 10 racks full at the moment). As we ONLY run open source solutions mostly due to cost, the solutions are kinda neat.

The first off advise is this: If you are running your own servers, virtualize. This allow you to expand, move things, and as well as not waste on extra server until you really really need it.

My current new and shiny setup -- pushing about 100G daily.
8 VM's total 8 cores/16 GB RAM
1) squid as a reverse proxy,load balancer -- handles 50% of the load at about 5% of 1 CPU, 256 MB RAM
2) bulk storage using NFS for load balanced apache
3,4,5) matching apache web servers
6) memcache server 128Mb RAM less tiny CPU usage
7) Mysql server -- on fast disk, everything tweaked for fast DB access
8) mysql slave -- allows read only access, and backup run from here without slowing down sites

Backups --- we have a MASSIVE backup system, but it runs "rsbackup". we use ZFS on FreeBSD, with de-duplication, and filesystem level snapshots.
The backup server calls to the NFS/DB servers and does an rsync that pulls only the changed data, then does a snapshot, then replicates the backs to a second FreeBSD box in a separate data center. Files are stored as a copy of the file system and are compressed by ZFS filesystem.
Backups are small on size (only stores the diff of the files), snapshots happen in a few seconds, and we can roll any server back to any snapshot time. In a few cases the backups happen about every 10 min or so.
Of course the backup server holds about 40Tb of storage, but it backs up about 90 servers as a full backup daily for about 1 year. (these cost us about $4000 each, but are a 5U case).

Whew -- that is what I get for being a linux geek.
BTW -- your slow transfers were due to SSH slow downs -- It doesn't perform well over long high speed links -- look into the HPN patches (10X speed).

Quote:

Originally Posted by titus (Post 618237)

Hello,

Thanks. I've looked into a number of options and the latest on the table is to setup another box mirroring everything I have on the current box. In addition to the daily database dump which gets transferred onto another box also for safety, I'm looking into doing a local database cluster. If possible, I'll also setup a link to my home so I'll either get an asynchronous replication or if not, a scheduled dump transfer.

The cluster is the bit where I need to do some serious testing as I only know of one person using vBulletin is testing this setup.

Anyway, I just but a new PC to replace my laptop so I get some serious gear here to improve my work efficiency. Sweet stuff. Shuttle box, iCore-7 Sandy Bridge, 16GB, 10,000rpm WD Velociraptor, Radeon 6870. I'm going to setup at least 2 screens so I can see things better. Then I'll setup my two other boxes here to test the database cluster.

Titus

Clusters and backups

Quote:

Originally Posted by titus (Post 619454)

Just from the sites I run behind the "load balancer" -- if you are going to run two, use both. If one fails, it falls over to the other without missing a beat.
Most importantly, I can make config changes, and restart apache 1 at a time without an outage. And if the new settings break things (I check before I restart the other apache servers), I get a chance to fix it by just turning if off.

Quote:

Originally Posted by titus (Post 619454)

The cluster is the bit where I need to do some serious testing as I only know of one person using vBulletin is testing this setup.

For a lot of DB work I use mysql. The cluster setup is NOT very good. The DB needs to fit into ram, and if you have to replay the logs from the last dump. (many many hours in my case). CLusters would work if you have 5+ DB machines in separate data centres. I don't have that.

Master-Slave replication is a nice backup solution. The slave can run on much less hardware as it does not handle read-only queries. Slave are async, and can be shutdown, and just catch up when re-started. As no queries go to the slave, it can be used for other things. In my case it is where the backups run from, and is a live backup itself in my "backup" data center. It takes about 30 mins to setup a slave from an existing DB, and if you run one at home -- turn it on, let it catch up (data transfers quickly, and runs the relay-log after), turn it back off. Great solution from a home backup, as well as having backups run without slowing down the database of the live site.

Quote:

Originally Posted by titus (Post 619454)

Anyway, I just but a new PC to replace my laptop so I get some serious gear here to improve my work efficiency. Sweet stuff. Shuttle box, iCore-7 Sandy Bridge, 16GB, 10,000rpm WD Velociraptor, Radeon 6870. I'm going to setup at least 2 screens so I can see things better. Then I'll setup my two other boxes here to test the database cluster.

Sadly, I find that desktop machine no longer keep up with me :) As I'm a linux guy, the desktop is just a fancy display, and I ssh into everything to do work -- however I run multi-monitor (4 in one office and 2 in my other office). I will NEVER, EVER go back to working with just one. Thankfully my employer understands that a few bucks on dual displays improve the efficiency so much that it is just silly not to :lol: Especially when I'm just so gosh darn expensive.

Titus[/quote]