Monday, October 16, 2006

8 hour tech support call

About a week an a half ago it was a Thursday everything was well for the most part. One of my Admins says he was getting errors in exchange system manger when trying to click on objects. The first thing I think was hmm... How long has that server been up and running? You know the usual thought when something goes wrong in a windows environment. So I check my outlook to see if emails were flowing and it was fine. It was already after 9am so rebooting the server to clean up errors wasn't an option at that point. I make a self note to reboot the server in the morning. A few hours pass by and I get a call from our overseas office asking if we are having issues with emails. I log into exchange and the server seems fine. No errors on the screen, task manager clear, no memory leaks and nothing frozen. As soon as I get off the phone I get calls from users saying they are getting email bounce backs. I then know something is wrong. I run into the server to see what is going on again. NOTHING! All looks fine from the typical admin perspective. So to not waste time I reboot the server. Once I did that NONE of the exchange services started. I tried to start them manually and no joy. So I knew something was wrong. I did nothing in exchange for weeks how could all of a sudden something just happen. I call back the overseas office to tell them yes we are having a problem how did you notice to problem. They say they have a consultant in installing Intellysync software on their end and they said they did nothing to anything else. So I called Microsoft. Hours roll by after explaining the problem. These were what seemed to be text book guys and went by the book. Little errors in AD these guys wanted to stray from the matter at hand and resolve them first. We did have a DC error but it was a rogue DC acting up BUT not affecting AD itself. These guys spent a few hours on that alone and I had to basically tell them we are going to demote this DC get it offline and continue with the matter at hand. It's only been 6 hours that I was on the phone.

To make a long story short. The permissions for exchange site name and anywhere that inherited these permissions were stripped somehow. We had to add them back one by one and even when we finished doing this the services still would not start. The phone call was already at 7.5 hours and it this time the MS tech decided to say lets reinstall exchange. I'm like WHAT! He says it again but we will select reinstall from the drop down box. Basically we did reinstall exchange right on top of the current install, wiping out the service pack as well. After the install we went ahead and reinstall the service pack too. After this we attempted to restart the services and they came up. I rebooted the server to make sure the service started themselves and they did.

After 8 hours the problem was finally resolved. No one still knows why these permissions were stripped out but I have my suspicions.

3 comments:

Anonymous said...

Nice. It's nice to know that I'm not the only one that occasionally sees the MS gremlin.

Anonymous said...

At a school district I helped with over the summer, the one admin sat down and went to log in as the main domain admin logon, and it would not let him login. The account wasn't locked or anything. They weren't able to do anything untill he set the password with the old one again, then it worked. I've seen alot of strange errors when it comes to AD. The weirdest is both sites (big installed, one 500+ computers, the other several thousand), all of the sudden a computer will lose its association with the domain, and cannot logon untill it is re-joined. Doesn't matter if Deepfreeze is on the machine or not. YAy for the Microsoft Feature!

Nocturnalis said...

We get the can't login issue here a few times as well. when my guys are on the floor they call me up and tell me that the workstation can't login not even with Admin. I just tell them to take the workstation off the domain and rejoin it. I'm so immune to that problem now that I categorize it with rebooting to solve other problems. LOL!