All Products
Search
Document Center

MaxCompute:Split Size hint

Last Updated:Jan 06, 2025

In MaxCompute, you can use a Split Size hint to adjust the split size. This helps control the concurrency and optimize the computing performance. Split sizes can be applied to tables. The unit of a split size is MB. The default value is 256 MB.

Precautions

  • If you use the Split Size hint for a clustered table and bucketing operations are performed on the clustered table for optimizing the computing performance, the Split Size hint becomes invalid.

  • You can change the value of the split size to a value that is a factor or multiple of 256 MB, such as 512 MB.

  • If data in a table is read multiple times when you execute an SQL statement, the smallest split size is used for splitting. For example, the src table is read twice when you execute a statement:

    • If one split size is set to 1 MB and the other split size is set to 10 MB, the split size 1 MB is used for splitting.

    • If one split size is set to 1 MB and the other split size is not configured, the split size 1 MB is used for splitting.

Scenarios

  • If a large number of subtasks in a job are waiting for resources but no resources can be allocated to the subtasks, you can increase the split size to reduce the concurrency of subtasks. This way, the time for starting and stopping subtasks is reduced.

  • If the concurrency of subtasks is low and a current subtask does not return results within an expected period of time, you can decrease the split size to increase the concurrency of subtasks, provided that sufficient resources are available in the resource pool. This adjustment can help reduce the overall job run duration.

Examples

-- Set the split size to 1 MB. This setting indicates that a job is split into subtasks based on a size of 1 MB when data in the src table is read.
SELECT a.key FROM src a /*+split_size(1)*/ JOIN src2 b ON a.key=b.key;