Friday, May 11, 2007

Exchange Cluster issue

About two weeks ago I was pretty much alone running the NY side of things. My boss the Director was out in our other office in London then Shanghai and my Admin was on vacation. So I was left to handle the back-end and make decisions on my own, AGAIN!

It was a dark and stormy Tuesday night...(it was just dark) the phone rang right as my wife tells me that my baby girl has a fever of 100+ degree's yikes! It's my boss on the phone and he says he can't connect to the exchange server from Shanghai. We are on an MPLS so everything should work. So I dig up my laptop and have a million things running through my head as I am most worried about why my daughter has such a high fever. I boot up and VPN into the office to check things out. At first glance everything looks fine. I am in my Outlook and I can OWA in as well. So what is he talking about. I VNC ALLLLLLLLLLLLLLLL the way to the shanghai server and see if I can do anything from there and I can. So what is the deal here. He tell me he keeps getting an error when trying to open outlook and OWA. So I try to login from there and I can OWA fine. I try to use his credentials from the same box and I get the error. I use his credentials on my box in the NY office (remote desktop + VPN is great) and I get the error too. So what the hell I say.

I start snooping around the exchange server manager to see if I can see anything abnormal nothing. Nothing b/c the damn thing gives no errors and the app does not refresh so I didn't know there was a problem until later. I start checking the event log, mind you it is going on 11pm and I am getting sleepy and worried about my daughter and this damn problem here at the same time. The event log was saying that the mailbox store was having problems wiring the the disk and was stopping I think it said. But that didn't register b/c I wasn't focused on this problem my daughter was boiling up and I was scared to shit. I'm still on the phone with my boss and he tells me he has to go to a meeting over there and will call me back.

I'm off the phone worried about two things. My work and my daughter. Well my daughter has went to bed and fever came down and my work was really starting to get to me. It was about 12am now I am just realizing what is happening. I at first thought my transaction logs filled up so I checked the space and it was fine then I reread the event error and was like hmm. Then is dawned on me the store can't write b/c the drive is FULL. I check and sure enough the 100% full. Then I really lost it b/c all what was going on had me not thinking logical. At that point I thought new information was over writing existing information (why? like I said I was worried about my daughter all night and not thinking straight) So I look back in the exchange system manger and refresh the mailbox stores and mailbox store #4 was down. I nearly had a heart attack. In that second I though my bosses mailbox and others were completely gone and I got up from the dinning room table walked into the living room and collapsed on the floor. It felt like all the blood drained from my head and extremities and pooled up in my stomach. Did I have an anxiety attack or pannick attack or both? After a few minutes on the floor I got up and regaining my composure. I was able to analyze what had happened and came up with a game plan to resolve the issue. I needed to move one of the mailbox stores to another partition to free up space in this one so that all store can come back online. I went to bed and got up at 3am drove into work and moved the mailbox store. it took about 15 minutes to move to 16GB store. But that did the trick.

What I need to do next (still) is shrink the database with the esutil tool to reclaim the white space. In all I should get back about 25GB. What caused all of this all of a sudden was when we moved to the cluster. The limits in the stores were not put back allowing the users to fill up their mailboxes in a matter of a month. We are back on track now and all is good again. For now!

No comments: