Tuesday, March 6, 2012

ec2-107-21-251-69.compute-1.amazonaws.com what is this?

I had a user come up to me this week with a very weird issue. They had a jailbroken iphone and ever since a few weeks ago, it had been freezing on him. Well my first reply was, thats what you get when you jailbreak an iPhone ;) Actually jailbreaking an iPhone gives you a lot of freedom to do things on it.

Anyways, I looked through the phone and did not find anything odd on it. I then enabled a ssh server on it and connected. I decided to check if the phone was doing any weird tcp connections.

Doing a netstat revealed all. In between the "normal" traffic was one that I had not seen before. ec2-107-21-251-69.compute-1.amazonaws.com
It seemed that connection to this destination was always kept alive. Since the version of netstat on the iPhone does not have options to reveal the PID of the offending process, I was at a loss.

I googled the above address and managed to find some articles that linked it to Viber (this is like Skype). Since SBSetting was already installed on the iPhone, I looked through the currently running processes and found Viber listed. I killed it, and presto! The connection to ec2-107-21-251-69.compute-1.amazonaws.com was no more!

The iPhone was still slow so in the end, the best solution was to install the legit iOS on it :(

The cluster service has determined that this node does not have the latest copy of cluster configuration data

A few days ago I got a big surprise on my Exchange 2010 Servers. While doing a routine daily check via Exchange Management Console, I noticed that one of my DAG servers was reporting its status as Failed. This looked really weird since the server itself was online and I could ping it.

I RDP'd to the problematic server and checked the eventlogs. To my dismay, I found that the following was being logged in the system logs

Event ID 1564 - File share witness resource 'File Share Witness (\\Server1.domain.com\DAG.domain.com)' failed to arbitrate for the file share '\\Server1.domain.com\DAG.domain.com'. Please ensure that file share '\\Server1.domain.com\DAG.domain.com' exists and is accessible by the cluster.

I opened the Failover Cluster Manager and found that the above server had been marked as down. This explained why the server was being reported with a status of Failed in the Exchange Management Console.

I checked the permissions on the DAG share above. The NTFS permissions looked alright. However, the share permissions had an unresolved SID. I took this to be the culprint and doing a bit of googling I found that the cluster computer account should be listed in the share permissions with Full Control. Since this was not listed, I took the liberty of adding in DAG$ in the share permissions with Full Control (my DAG name is called DAG .. yea yea very original).

After doing the above, I noticed that the above error was no longer being shown,  but instead the following error appeared.
"Event ID 1561 The cluster service has determined that this node does not have the latest copy of cluster configuration data."

And to add salt to my injury,  the server was still marked as down in Failover Cluster Manager. I googled the error and managed to find some articles on it. One of the support articles from Microsoft said to start all the other nodes and if they started, then the affected node will read the configuration off them and start. However, since I already had a node running (the other DAG server), this didnt quite apply.

I tried restarting the cluster services on the affected server, but even this did not resolve the issue. Restarting the server was no help either.

Finally, with a stroke of genius (and luck), I decided to restart the cluster. So from the Failover Cluster Manager,  I right clicked on the cluster name (which quite originally was called dag.domain.com) and then under More Actions  I selected Shut down Cluster. A prompt came up asking if I really wanted to shut down the cluster. I chose Yes.

After a few minutes (well actually 2min), with fingers crossed, I started the cluster (from Failover Cluster Manager, right click on the DAG name and from More Actions select Start Cluster). Viola, both the DAG servers came back online!

I quickly checked Exchange Management Console and saw that both the servers were now being reported as online. The problematic server was now being updated from the other server (you might see a huge CPU spike on the problematic server while the updates are copied to it)

Take care and until the next time. And remember, with windows, you restart :)