Defragment the disks of an instance

This topic describes how to run the compact command to deefragment the disks of an ApsaraDB for MongoDB instance to increase disk utilization. Disks fragments are generated due to a high number of data insert and update operations or the data deletion. The command is used to defragment the disks of primary and secondary nodes in the instance.

We recommend that you use the storage analysis feature displayed in the ApsaraDB for MongoDB console to defragment the disks of the instance. This way, the defragmentation is simplified and the impacts on your business are minimized. The feature allows you only to defragment the disks of hidden nodes in the instance. To defragment the disks of primary and secondary nodes in the instance, perform a Primary/secondary switchover.

Only the following versions support the storage analysis feature for defragmentation:

MongoDB 8.0: all minor versions.
MongoDB 7.0: all minor versions.
MongoDB 6.0: all minor versions.
MongoDB 5.0: all minor versions.
MongoDB 4.4: V5.0.7 or later.
MongoDB 4.2: V4.0.23 or later.

Prerequisites

The storage engine of the instance is WiredTiger.

Usage notes

Data backup: Before you defragment the disks of your instance, we recommend that you back up the instance data. For more information, see Configure manual backup for an instance.
Impacts of the compact command:
The compactcommand is used for defragmentation.
- Blocked read/write operations and degraded performance
  - If you run the compact command on an instance that runs a version earlier than MongoDB 4.4, the database to which a specified collection belongs is locked and read/write operations performed on the database are blocked. When an excessive number of fragments are generated, the execution duration of the compact increases. In this case, hidden nodes in the instance may have longer replication latencies. We recommend that you perform this operation during off-peak hours and increase the oplog size based on the data volume written to your instance or upgrade your instance to MongoDB 4.4 or later. For more information about how to upgrade the version of an instance, see Upgrade the major version of an instance.
  - If you run the compact command on an instance that runs MongoDB 4.4 or later, read/write operations performed on the instance are not blocked. However, instance performance may be degraded. We recommend that you this operation during off-peak hours.
- Node rebuilding
  - In an instance that runs MongoDB 3.4.x, MongoDB 4.0.x, MongoDB 4.0.22 or earlier, or MongoDB 5.0.6 or earlier, a node on which the compact command is running enters the RECOVERING state. If the node remains this state for a long period of time, the node is identified by the instance detection component as an unhealthy node. This triggers rebuilding operations. For more information about MongoDB versions, see MongoDB minor versions.
  - In an instance that runs a version later than the preceding versions, a node on which the compact command is running remains in the SECONDARY state. This does not trigger rebuilding operations.
Invalid execution of the compact command:
The compact command cannot be run in the following scenarios. For more information, see Open source code.
- The size of a physical collection is less than 1 MB.
- Among the first 80% of the file storage, the free storage is less than 20%. Among the first 90% of the file storage, the free storage is less than 10%.

Defragmentation duration: The time required to defragment the disks of an instance by running the compact command depends on factors, such as the amount of data in collections and the system load.
Others:
- If you run the compact command, the released storage may be smaller than the free storage. In this case, make sure that the next execution of the compact command is initiated after the previous execution is completed. This avoids the frequent and repeated execution.
- The compact command can be run when an instance is locked due to a full disk space.

Background information

Why are disk fragments generated?

Cause: After data is deleted from an instance, the storage used by the deleted data is marked as free storage. Newly written data may be directly stored in the free storage, or stored in the end of files after the storage of the files is expanded. As a result, some free storage is not used. Unused free storage constitutes disk fragments.
Impacts: More disk fragments cause lower disk utilization. For example, if the disk size is 100 GB, among which fragments occupy 20 GB and business data occupies 60 GB, the disk utilization is 80%. However, the effective disk utilization is 60%.

When do I need to defragment the disks of my instance?

Delete a large amount of data at a time
After a large amount of data is deleted, the disk space occupied by these documents is not actively returned to the operating system, but is preferentially used to store newly written data. This way, a large amount of disk space may not be effectively used.
Important
Manual data deletion and the time to live (TTL) mechanism do not automatically trigger defragmentation. You must manually defragment the disks of your instance.
Execute a large number of data writes for a long period of time
If your instance executes a large number of data writes for a long period of time, such as frequent data inserts, updates, and deletes, fragmented disk space keeps increasing over time, resulting in a large number of disk fragments.
Confirm that the disk space is insufficient and fragments occupy at least 20% of disk space
If the disk space of your instance becomes insufficient, such as if the disk utilization reaches 85% to more than 90%, defragmentation releases the storage occupied by fragments, thus reducing disk utilization and relieving disk space pressure.

View disk space

View the storage of a specified collection

You can run the db.runCommand({collStats: <collection_name>}) command to view the storage of a specified collection. Some keywords in the output of the command:

size: the logical storage size of the collection.
storageSize: the physical storage size of the collection.
freeStorageSize: the size of the free storage that can be defragmented. The keyword is required only for instances that run MongoDB 4.4 or later.

If you run the remove command to delete documents, the size value decreases. However, the storageSize value does not necessarily decrease. In this case, you can view the numeric value that is calculated by the freeStorageSize value divided by the storageSize value. A high numeric value indicates a high defragmentation rate.

Note

For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.

Estimate disk space to be defragmented

Use the mongo shell to connect to an instance. To reduce business impacts, we recommend that you connect to a secondary node in a replica set instance. Connection methods vary based on the instance architecture. For more information, see the following topics:
Switch to the database where a specific collection is stored.
Syntax:
```
use <database_name>
```
<database_name> indicates the name of the database to which the collection belongs.
Note
You can run the show dbs command to query the name of the current database.
Example:
Switch to the test_database database.
```
use test_database
```
View disk space to be defragmented for the collection.
Syntax:
```
db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
```
<collection_name> indicates the collection name.
Note
You can run the show tables command to query the name of the current collection.
Example:
```
db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]
```
The following result is returned.
```
207806464
```
This result indicates that the estimated disk space to be defragmented is 207,806,464 bytes.

Standalone or replica set instance

A standalone instance has only one node. You can connect to the primary node and run the compact command to defragment the disks of the primary node.
A replica set instance has multiple nodes. You must defragment the disks of primary and secondary nodes in the instance.
Important
- To reduce business impacts, we recommend that you defragment the disks of secondary nodes in the instance, perform a primary/secondary switchover to switch the primary node to a secondary node, and then defragment the disks of the new secondary node. For more information about how to perform a primary/secondary switchover, see Configure a primary/secondary switchover for a replica set instance.
- If the replica set instance has read-only nodes, you must defragment the disks of the nodes by using the method similar to the method to defragment the disks of primary and secondary nodes.

Connect to a standalone or replica set instance by using the mongo shell. Connection methods vary based on the instance architecture. For more information, see the following topics:
- Connect to a standalone instance by using the mongo shell
- Connect to a replica set instance by using the mongo shell
Switch to the database where a specific collection is stored.
Syntax:
```
use <database_name>
```
<database_name> indicates the name of the database to which the collection belongs.
Note
You can run the show dbs command to query the name of the current database.
Example:
Switch to the replica_database database.
```
use replica_database
```
View disk space occupied by the database before defragmentation.
```
db.stats()
```
Note
This command can be used without any changes.
Defragment the disks of the collection.
Syntax:
```
db.runCommand({compact:"<collection_name>",force:true})
```
Parameters in the preceding command:
- <collection_name>: the collection name.
  Note
  You can run the show tables command to query the name of the current collection.
- force: Optional. Set the value to true.
  This parameter is required if you run the command on the primary node of an instance that runs MongoDB 4.2 or earlier.
Example:
```
db.runCommand({compact:"sharded_collection"})
```
The following result is returned:
```
{ "ok" : 1 }
```
View disk space occupied by the database after defragmentation.
```
db.stats()
```
Note
This command can be used without any changes.

Sharded cluster instance

For a sharded cluster instance, you need only to defragment the disks of shards in the instance. The mongos and ConfigServer components in the instance do not store user data. In addition, more add and update operations and less delete operations are performed on the components. Therefore, you do not need to defragment the disks of the mongos and ConfigServer components.

Note

The compact command is not supported on the read-only nodes of the instance Therefore, the disk fragments of the nodes cannot be defragmented.

Use the mongo shell to connect to a sharded cluster instance. For more information, see Connect to a sharded cluster instance by using the mongo shell.
Switch to the database where a specific collection is stored.
Syntax:
```
use <database_name>
```
<database_name> indicates the name of the database to which the collection belongs.
Note
You can run the show dbs command to query the name of the current database.
Example:
Switch to the sharded_database database.
```
use sharded_database
```
View disk space occupied by the database before defragmentation.
```
db.stats()
```
Note
This command can be used without any changes.
Defragment the disks of the collection.
You must defragment the disks of primary and secondary nodes in a shard in the instance.
Important
To reduce business impacts, we recommend that you defragment the disks of secondary nodes in the shard, perform a primary/secondary switchover to switch the primary node to a secondary node, and then defragment the disks of the new secondary node. For more information about how to perform a primary/secondary switchover, see Configure a primary/secondary switchover for a sharded cluster instance.
- Defragment the disks of secondary nodes in the shard.
  This operation is performed in the mongo shell in a different manner from that in mongosh. Select the operations that suit your client.
  Note
  Compared with mongosh V1.x, mongosh V2.x allows you to configure the read preference. For more information, see Read Preference.
  mongo shell
  Syntax:
  db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
  Parameters in the preceding command:
  <Shard ID>: the shard ID.
  Note
  You can log on to the ApsaraDB for MongoDB console and view the shard ID in the Shard List section on the Basic Information page.
  <collection_name>: the collection name.
  Note
  You can run the show tables command to query the name of the current collection.
  Example:
  db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})
  mongosh V1.x
  Syntax:
  db.getMongo().setReadPref('secondary') db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}})
  Parameters in the preceding command:
  <Shard ID>: the shard ID.
  Note
  You can log on to the ApsaraDB for MongoDB console and view the shard ID in the Shard List section on the Basic Information page.
  <collection_name>: the collection name.
  Note
  You can run the show tables command to query the name of the current collection.
  Example:
  db.getMongo().setReadPref('secondary') db.runCommand({runCommandOnShard:"d-2ze91ae9d55d6604","command":{compact:"test"}})
  mongosh V2.x
  Syntax:
  db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}},{readPreference: "secondary"})
  Parameters in the preceding command:
  <Shard ID>: the shard ID.
  Note
  You can log on to the ApsaraDB for MongoDB console and view the shard ID in the Shard List section on the Basic Information page.
  <collection_name>: the collection name.
  Note
  You can run the show tables command to query the name of the current collection.
  Example:
  db.runCommand({runCommandOnShard:"d-2ze657bce53fb6d4","command":{compact:"test_collection"}}, { readPreference: "secondary" })
- Defragment the disks of the primary node in the shard.
  Syntax:
```
db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})
```
  Parameters in the preceding command:
  - <Shard ID>: the shard ID.
    Note
    You can log on to the ApsaraDB for MongoDB console and view the shard ID in the Shard List section on the Basic Information page.
  - <collection_name>: the collection name.
    Note
    You can run the show tables command to query the name of the current collection.
  - force: Optional. Set the value to true.
    This parameter is required if you run the command on a sharded cluster instance that runs MongoDB 4.2 or earlier.
  Example:
```
db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
```
View disk space occupied by the database after defragmentation.
```
db.stats()
```
Note
This command can be used without any changes.

FAQ

What do I do if the "Compaction interrupted on table:xxx due to cache eviction pressure' on server xxx." error is returned?

When you run the compact command on an instance that has small specifications and runs an earlier version, the instance may exit. We recommend that you perform this operation during off-peak hours.

Prerequisites

Usage notes

Background information

View disk space

Estimate disk space to be defragmented