Windows Cluster failed with System Event Log Critical Error Failover Clustering event ID 1135
For the past 2 – 3 months the cluster has not been working stably. Randomly cluster failed and generating failover clustering Event ID 1135.
“Log Name: System
Source: Microsoft-Windows-Failover Clustering
Date: 9/3/2013 8:01:45 PM
Event ID: 1135
Task Category: None
Cluster node ‘EXTRANET1′ was removed from the active fail over cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.”
Windows Cluster running on Server 2008 Enterprise Sp2 x64 installed.
There is High impact because Windows Clustering is Unstable.
Resume Windows Failover Clustering original stable state.
If the problem is found to be due to third-party code we will provide information to substantiate this.
Troubleshooting & Resolution:
Network drivers have been updated latest one (confirm with the hardware vendor of NIC make and module).
Checked and upgrade storage driver including power path. All Windows patches have been updated successfully.
Windows server cluster configuration has been checked and nothing found wrong, but still the issue persists.
Advance settings for NIC parameters
1. Recheck if the Broadcom drivers are updated. 2. Disabled TCP Chimney and all the offloading feature from both OS and NIC level.
Please backup registry key and parameter before making any registry changes.
a. Disable RSS in the Registry by adding a DWORD registry key value for
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableRSS and setting it to 0.
b. DisableTaskOffload in the Registry by adding a DWORD value for
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\DisableTaskOffload and set it to 1.
c. Disable TCPChimney in the Registry by adding a DWORD value for
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPChimney and set it to 0.
d. Disable EnableTCPA in the Registry by adding a DWORD value for
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPA and set it to 0.
2.Made the following changes on your physical NIC
For examples, below is some of the features commonly seen in NIC’s advanced properties:
– IPv4 Checksum Offload
– IPv6 Checksum Offload
– IPv4 Large Send Offload
– IPv6 Large Send Offload
– Receive Side Scaling
– TCP Connection Offload (IPv4)
– TCP Connection Offload (IPv6)
3. Increased the timeout: Copy and paste below commands in elevated command prompt.
Cluster /props SameSubnetDelay = 2000
Cluster /props SameSubnetThreshold = 10
Cluster /props CrossSubnetDelay = 2000
Cluster /props CrossSubnetThreshold = 10
After implementing these steps the Windows server cluster will be Stable and working correctly.
If you like this post, then your valuable feedback will be appreciated and if you hold any doubts or query, then feel free to comment down below I will be glad to help you. You can also Subscribe to my Newsletter below to remain updated with my latest posts which will right away delivered to your Inbox as soon as I publish or you can also Follow me on Twitter or like me on Facebook or Add me to your Google Plus Circles!