07/26/2005 Archived Entry: "RTF FTW!"
My morning has been anything but calm.
I had hoped that we'd be able to have a nice, quiet, issue-free launch, but sadly, that was not to be.
At approximately 08:30, I begin hearing muttering from across the way, where our workload coordinator sits--he's the poor sap who interfaces with the help desk and hands out tickets to us. People are having difficulty accessing our file shares.
The problem gets worse. More people call. By 09:00, we have a bona-fide service outage in progress. Internal web sites hosted on the SAN are not responding. Most file shares are inaccessible. Everyone from the site VP to the janitor is crapping their drawers. Since we're the prime contractor and provide, among other things, MER support for NASA, words like "possible countdown hold" start getting bandied about. Me & my lead are elbows-deep into our SAN, trying to find out WTF is happening, but there are no errors, no alerts, no hardware problems, no nothing--just slowness.
By 09:15 we bring the networking group into the mix, and they immediately start howling about how there are no network issues, it's all our fault and all of our hardware sucks. I call $GIGANTIC_STORAGE_VENDOR and open a severity-1 call and get bandied about between multiple engineers. It is revealed, through frantic troubleshooting, that the network segment on which the file shares and web server storage volumes live is saturated.
Networking comes on and says that our upstream WAN link (which also carries public Internet traffic for our site) is pegged at 100% and is dropping packets like crazy, and at the same time someone catches sight of a hardware failure light on our local DC. At some point in here the shuttle actually launches, but everybody is too busy on the phones with support to notice.
The cause turned out to be a comedy of errors that breaks down like this:
1) At some point last night, our local DC poops itself
2) This morning, four thousand people simultaneously try to watch the live streaming video of the shuttle launch
3) All those connections saturate the WAN link that carries Internet traffic
4) Users then try to access file shares while watching video
5) Since the local DC has pooped itself, an upstream DC takes over authentication duties
6) Authentication request is sent over the WAN link, which is busy carrying four thousand users' worth of streaming video
7) Panic and chaos ensues as nobody can access any file shares because the authentication requests usually don't get through
The problem vanished like a fart in the wind after the shuttle reached orbit. Performance right now is perfect.
Hooray for NASA and the astronauts on orbit--I wish I was with them.
Replies: 1 Comment
Cheapest Disney Vacation and Orlando Vaction Homes Links and Resources.Great source of information ...Book Online and Save.http://www.cheapdisneyvacationspackagesandtickets.com/ Disney Vacation and Orlando Vacation Packages
Posted by Disney Vacation @ 08/06/2005 12:56 PM CST
[Previous entry: "YOU EARTHLINGS ARE SO STUPID!"] [Return to archive...] [Next entry: "CV joints"]