Tuesday, June 13, 2017

SQL Cluster connection problem after removing a node

After removing a node from SQL Cluster – some machines trying to access the DB.
investigation discovered a ping timeout exactly every 5 minutes.
5 minutes is the default timeout of ARP entries.
Checking the ARP during the ping timeouts discovered the MAC address of the removed SQL node.
Shutting the node down solved the issue.
IP Conflicts…

Tuesday, June 06, 2017

Connect Secured Service Fabric Cluster by Windows Security - from non trusted domain

If you have a Service Fabric cluster which you secured using Windows security - you need to use the following powershell command to connect to it:
Connect-ServiceFabricCluster -ConnectionEndpoint "ServerName:19000" –WindowsCredential
This, however, requires you to be logged in as a memeber of the domain of the cluster or a trusted domain.
If this is not the case - you will fail to login (after long wait in my case).

If you need to connect from a different domain, you can use this workaround:
(Notice: you must run this elevated as Administrator)
runas /netonly /user:Domain\Username powershell
Give the password when asked for, and this will open a new Powershell where you can run the Connect command succesfully.

the other way is to add this Windows Credential to your Credential Manager in Control Panel:

Then you can simplly call the connect command - It will connect without any problem.

Azure Service Fabric - Windows security using gMSA - Details

On your Domain Controller:

Check if you already have a Kds Key:
Get-KdsRootKey
If not run the next line:
Add-KdsRootKey -EffectiveTime ((get-date).addhours(-10))
Validate it was created:
Get-KdsRootKey
Create the gMSA: (where its name in this sample is gMSA-SF-1, and it has 4 machines SF1,SF2,SF3,SF4)
New-ADServiceAccount -Name gMSA-SF-1 -DNSHostName gMSA-SF-1.myDomain.local -PrincipalsAllowedToRetrieveManagedPassword SF1$,SF2$,SF3$,SF4$ -ServicePrincipalNames ServiceFabric/gMSA-SF-1.myDomain.local
If later on, you need to add/remove nodes:
Set-ADServiceAccount -Identity gMSA-SF-1 -PrincipalsAllowedToRetrieveManagedPassword SF1$, SF2$, SF3$, SF4$, SF5$
Don't forget to create a Domain group with all users that should get Admin rights on the cluster using the UI - in the example below the name of this group is "SFAdmins"

On each of the cluster machines (before deploying the Service Fabric Cluster):

Add the Powershell support to manage AD:
Add-WindowsFeature RSAT-AD-PowerShell
Install the gMSA:
Install-AdServiceAccount gMSA-SF-1
Configuring the Security section:

"security": {
            "ServerCredentialType": "Windows",
            "WindowsIdentities": {
                "ClustergMSAIdentity": "mydomain.local\\gMSA-SF-1",
                "ClusterSPN": "ServiceFabric/gMSA-SF-1.mydomain.local",
                "ClientIdentities": [
                    {
                        "Identity": "mydomain.local\\SFAdmins",
                        "IsAdmin": true
                    }
                ]
            }

        },

Azure Service Fabric - Windows security on Standalone - Timeout

while trying to deploy new secured Service Fabric cluster (using Windows Security) i got the following error for all the nodes:
Timed out waiting for Installer Service to complete for machine SF1.mydomain.local. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
 When trying the same configiuration file, but without the security section, or with Certificate based security - everything worked perfectly.

Microsoft's engineer gave me a solution:
Replace your FQDN in the nodes configuration (the "iPAddress" property) to NetBios hostName.
This has solved the issue.