Primary/secondary switchover - ApsaraDB for MongoDB - Alibaba Cloud Documentation Center

This topic describes primary/secondary switchover issues in ApsaraDB for MongoDB.

Why does primary/secondary switchover occur in an instance?

The following causes result in primary/secondary switchover for an instance:

Manual switchover: You or an authorized Alibaba Cloud technical expert manually triggers primary/secondary switchover.
Hidden risks: Alibaba Cloud detects that the instance has potential risks that may affect its normal services. ApsaraDB for MongoDB starts O&M tasks to fix the vulnerabilities and performs primary/secondary switchover within a specified maintenance window.
You can query processed O&M tasks in historical events, and manage pending O&M tasks in scheduled events.
Host offline: An exception occurs on the host where a node in the instance is deployed, which may affect the normal services of the instance. ApsaraDB for MongoDB considers that the host is offline and triggers primary/secondary switchover to replace the risky node.
Instance exceptions: When Alibaba Cloud detects that the instance is faulty and cannot normally work, ApsaraDB for MongoDB immediately triggers primary/secondary switchover to timely recover the instance and shorten the downtime period.

If primary/secondary switchover is triggered due to host offline or an instance exception, you receive a notification through an internal message or email in the following format:

[Alibaba Cloud] Dear ****: Your ApsaraDB for MongoDB instance dds-bp**** (name: ****) has an exception. The high-availability system has triggered switchover to ensure the stable running of your instance. We recommend that you check whether your application is still connected to your instance and configure your application to automatically reconnect to your instance.

What are impacts of primary/secondary switchover?

Impacts:

During the primary/secondary switchover, a transient connection occurs for about 30 seconds.
If you connect your application to a primary node, the read/write operations of your application are affected by the primary/secondary switchover.

Business deployment recommendations:

We recommend that you ensure your application can be reconnected to your instance after it is disconnected and handle exceptions to protect business continuity.
We recommend that you connect your application to your instance by using an SRV connection string URI or a connection string URI in a production environment. When a node fails, the read and write operations of your application are not affected by the primary/secondary switchover. For more information, see Connect to a replica set instance or Connect to a sharded cluster instance.

How do I configure manual primary/secondary switchover?

ApsaraDB for MongoDB allows you to configure manual primary/secondary switchover, so that you can perform real-time disaster recovery drills and verify the exception handling capabilities of your client. If your instance is deployed in multiple zones, you can also configure manual switchover to allow your application to connect to the nearest node.

Configure primary/secondary switchover for a replica set instance.
Configure primary/secondary failover for a sharded cluster instance.
Standalone instances do not support primary/secondary switchover due to architectural limits.

Why do I receive an error when writing data to my replica set instance?

Issue description

When writing data to your replica set instance, you receive one of the following error messages: "errmsg": "not master", "code": 10107, "codeName": "NotMaster", "errmsg": "not master", "code": 10107, "codeName": "NotWritablePrimary", or Time out after 30000ms while waiting for a server that matches writableServerSelector.

Cause

Primary/secondary switchover occurrs in the replica set instance, which causes the node roles to change. If you connect your application to a primary node, the node becomes a secondary node after the primary/secondary switchover. Therefore, write operations fail.

Solutions

Manually switch node roles to change the node that your application connects to back to a primary node.
We recommend that you connect your application in a production environment to your instance by using an SRV connection string URI or a connection string URI. When a node fails, the read and write operations of your application are not affected by the primary/secondary switchover. For more information, see Connect to a replica set instance.
We recommend that you ensure your application can be reconnected to your instance after it is disconnected and handle exceptions to protect business continuity.