[SDBM-2310] Extend Mysql replication metric reporting to support mixed replication topologies#22485
[SDBM-2310] Extend Mysql replication metric reporting to support mixed replication topologies#22485eric-weaver merged 6 commits intomasterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 51485de760
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
Review from lu-zhengda is dismissed. Related teams and files:
- database-monitoring-agent
- mysql/datadog_checks/mysql/mysql.py
- mysql/tests/common.py
- mysql/tests/compose/mysql8-hybrid.yaml
- mysql/tests/conftest.py
- mysql/tests/test_mysql.py
- mysql/tests/test_unit.py
- mysql/tests/variables.py
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 594db4ae48
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| def _collect_replication_metrics(self, db, results, above_560): | ||
| # Get replica stats | ||
| results.update(self._get_replica_stats(db)) | ||
| results.update(self._get_replicas_connected_count(db, above_560)) | ||
| return REPLICA_VARS | ||
| replica_stats = self._get_replica_stats(db) | ||
| replicas_connected = self._get_replicas_connected_count(db, above_560) | ||
|
|
||
| # When group replication is not active, always report traditional replication metrics | ||
| # so _check_replication_status can emit WARNING for sources with 0 replicas. | ||
| has_traditional_replication = ( | ||
| replica_stats or replicas_connected.get('Replicas_connected', 0) > 0 or not self._group_replication_active | ||
| ) |
There was a problem hiding this comment.
Avoid reporting classic replication metrics on group-only setups
When group replication is active without any classic replicas, the replicas_connected probe can still be non‑zero because SQL_REPLICA_WORKER_THREADS matches any PROCESSLIST_COMMAND LIKE 'Binlog dump%', which group replication also uses for its internal channels. With the new gate, that replicas_connected > 0 path treats a pure group‑replication node as having traditional replicas and returns REPLICA_VARS, which in turn emits mysql.replication.* metrics and source/replica service checks that were previously suppressed for group-only nodes. This causes false positives on group‑only clusters; consider filtering out group replication channels or basing the decision on replica status rows instead of binlog dump thread counts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This looks to be a very minimal edge case that might show up only while a replication group member is bootstrapping replication from the primary, which is technically done via traditional replication
…d replication topologies (DataDog#22485) * Add support for collecting traditional replication metrics when group replication is enabled * Add test harness for hybrid replication topology * Add changelog * Fix replica offline and add tests * Run ddev validate ci --sync Signed-off-by: lukepatrick <lukephilips@gmail.com>
What does this PR do?
Currently when replication metric reporting is enabled we report back group replication OR traditional primary/replica replication. It's possible for Mysql topologies to make use of both replication methods. The changes in this pull request extend support so that we collect both if they are both identified as active.
A new test harness for integration tests was added which spins up a 3 node group replication setup with a 4th instance that is configured as a traditional replication replica of node1.
Motivation
The ability to collect metrics for mixed replication topologies was requested in https://datadoghq.atlassian.net/browse/SDBM-2310
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged