In this article, I will explain a scenario that one of my MemSQL leaf nodes failed to join the cluster after a reboot has happened from the Microsoft Azure cloud.
The memsql-admin list-nodes will list all nodes in the cluster and its status.
Prod-master-node-1:~$ memsql-admin list-nodes
+————+————+—————————+——+—————+————–+———+—————-+——————–+
| MemSQL ID | Role | Host | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+————+————+—————————+——+—————+————–+———+—————-+——————–+
| F1A958A8C7 | Leaf | Prod-leaf-node-1 | 3306 | Running | True | 6.7.12 | Online | 2 |
| 9281A98A28 | Leaf | Prod-leaf-node-2 | 3306 | Running | True | 6.7.12 | Online | 1 |
| 6272C7B82 | Leaf | Prod-leaf-node-3 | 3306 | Stopped | False | 6.7.12 | Unknown | 2 |
| 0BB4FE131AB| Leaf | Prod-leaf-node-4 | 3306 | Running | True | 6.7.12 | Online | 1 |
Here, a restart has happened from Azure cloud side on Prod-leaf-node-3 VM and the Memsql process state is changed to stopped and the recovery state changed to “Unknown”. Here our data was online because the buddy nide (secondary) partition is in a healthy state.
To bring back the node online I have performed a restart from the master aggregator.
vm-memsqlmasteragg-prod-1:~$ memsql-admin start-node –memsql-id 6272C7B82B
Toolbox is about to perform the following actions on host vm-memsqlleaf-prod-6:
· Run ‘memsqlctl start-node –memsql-id 6272C7B82B’
Would you like to continue? [y/N]: y
✓ Started node on vm-memsqlleaf-prod-6
Operation completed successfully
After that, the node state is changed to Recovering.
+————+————+—————————+——+—————+————–+———+—————-+——————–+
| MemSQL ID | Role | Host | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+————+————+—————————+——+—————+————–+———+—————-+——————–+
| F1A958A8C7 | Leaf | Prod-leaf-node-1 | 3306 | Running | True | 6.7.12 | Online | 2 |
| 9281A98A28 | Leaf | Prod-leaf-node-2 | 3306 | Running | True | 6.7.12 | Online | 1 |
| 6272C7B82 | Leaf | Prod-leaf-node-3 | 3306 | Running | False | 6.7.12 | Recovering | 2 |
| 0BB4FE131A | Leaf | Prod-leaf-node-4 | 3306 | Running | True | 6.7.12 | Online | 1 |
I have executed “show leaves” command from Memsql command prompt to know the status. There it shows it is in attaching
memsql> show leaves;
+———————–+——+——————–+———————–+———–+———–+——————–+——————————+——–+
| Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections Average_Roundtrip_Latency_ms | NodeId |
+———————–+——+——————–+———————–+———–+———–+——————–+—————————–
| leaf-node-1 | 3306 | 2 | leaf-node-3 | 3306 | online | 154 | 0.928 | 8 |
| leaf-node-2 | 3306 | 1 | leaf-node-6 | 3306 | online | 262 | 1.482 | 9 |
| leaf-node-3 | 3306 | 2 | leaf-node-5 | 3306 | attaching | 55 | 0.785 | 10 |
| leaf-node-4 | 3306 | 1 | leaf-node-8 | 3306 | online | 118 |
After waiting for three and a half an hour ( Maybe due to the huge amount of date) the node recovery state is changed to Online.
memsql> show leaves;
+———————–+——+——————–+———————–+———–+———–+——————–+——————————+——–+
| Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | NodeId |
+———————–+——+——————–+———————–+———–+———–+——————–+—————————–
| leaf-node-1| 3306 | 2 | vm-memsqlleaf-prod-3 | 3306 | online | 154 | 0.928 | 8 |
| leaf-node-2 | 3306 | 1 | vm-memsqlleaf-prod-6 | 3306 | online | 262 | 1.482 | 9 |
| leaf-node-3 | 3306 | 2 | vm-memsqlleaf-prod-5 | 3306 | online | 55 | 0.785 | 10 |
| leaf-node-4 | 3306 | 1 | vm-memsqlleaf-prod-8 | 3306 | online | 118 | 1.22