All Products
Search
Document Center

MaxCompute:TPC-DS performance test

Last Updated:Apr 28, 2025

MaxCompute offers superior performance in the industry and is suitable for queries of terabytes, petabytes, or even exabytes of data. This topic describes how to perform a big data benchmark TPC-DS test using the public datasets and test tools provided by MaxCompute to verify its performance. MaxCompute provides two different performance testing methods: Method 1 is a TPC-DS test based on the new MaxQA query acceleration engine, and Method 2 is a TPC-DS test based on the MCQA engine.

Preparations

  1. Configure an environment.

    • Before you perform a TPC-DS test, activate MaxCompute and create a project.

    • Method 1: Activate MaxQA (MaxCompute Query Acceleration 2.0). It is currently in public preview. You can click the public preview link to participate, and refer to the public preview plan for the schedule of when each region will start the public preview.

    • Method 2: Activate MCQA (MaxCompute Query Acceleration 1.0), for other regions where MaxQA public preview is not enabled.

  2. Prepare a test tool.

    MaxCompute provides a TPC-DS automated performance test tool to help you quickly complete a TPC-DS test and automatically generate test results.

    Important

    The test tool can be used only in Linux in which a Java Development Kit (JDK) of 1.7 or later is installed.

    Click mc_tpcds_benchmark to download the tool package attachment, and execute the following command on the Linux server to decompress it.

    unzip mc_tpcds_benchmark.zip

    The following code shows the directory structure of the decompressed file.

    .
    |_t1c7039e3-2a1d-451b-bfda-d14c49016243-tpc-ds-tool.zip
    |_config
    |_init_tools.sh
    |_load_table.sh
    |_logs
    |_odps_clt
    |_patches
    |_pt.sh
    |_queries_1
    |_queries_1.quality
    |_queries_10
    |_queries_100
    |_queries_1000
    |_queries_10000
    |_queries_100000
    |_querygen.sh
    |_results
    |_run_stream.sh
    |_run_stream.sh.offline
    |_sqls
    |_start_session_only.sh
    |_start_session.sql
    |_start_session.sql_tmp
    |_tools_file
    |_tt.sh
    |_v2.10.1rc3
  3. Obtain a test dataset.

    MaxCompute provides public datasets. You do not need to prepare test data yourself. All data is stored in the MaxCompute public project BIGDATA_PUBLIC_DATASET. For more information, see Overview of public datasets.

    TPC-DS test datasets are divided into 10 GB, 100 GB, 1 TB, and 10 TB datasets based on the data size. The following table describes the datasets.

    Category

    Introduction

    Dataset name

    Schema name

    TPC-DS

    TPC-DS is a decision support benchmark that models several generally applicable aspects of decision support systems, including queries and data maintenance, enabling new technologies such as big data systems to perform benchmark tests.

    • TPC-DS 10-GB performance test dataset

    • TPC-DS 100-GB performance test dataset

    • TPC-DS 1-TB performance test dataset

    • TPC-DS 10-TB performance test dataset

    • tpcds_10g

    • tpcds_100g

    • tpcds_1t

    • tpcds_10t

Test process

Modify the test tool configuration file

Go to the mc_tpcds_benchmark directory of the decompressed test tool and modify the config file. Because the test tool supports both MaxQA and MCQA modes, there are slight differences in the modifications required for the configurations beyond the basic configuration.

Basic configuration

Configuration item

Description

Value

ODPS_CLT_CMD

The absolute path of the executable file of the MaxCompute client.

The client provided in this toolkit is in the odps_clt directory of the working directory. Modify the corresponding configuration. For more information, see Connect to MaxCompute by using the local client (odpscmd).

Example: /xxxxx/mc_tpcds_benchmark/odps_clt/bin/odpscmd

PROJECT

The MaxCompute project that is used for the test.

Example: tpcds_test

SF

The data size of the TPC-DS test.

Unit: GB. 1 indicates 1 GB. 1000 indicates 1 TB. You can change the value based on your test requirements.

Default value: 1000

Differential configuration - MCQA vs MaxQA mode

Configuration item

Description

MCQA value

MaxQA value

ODPS_CLT_CMD

The absolute path of the executable file of the MaxCompute client.

The client provided in this toolkit is in the odps_clt directory of the working directory. Modify the corresponding configuration. For more information, see Connect to MaxCompute by using the local client (odpscmd).

Example: /xxxxx/mc_tpcds_benchmark/odps_clt/bin/odpscmd

Example: /xxxxx/mc_tpcds_benchmark/odps_clt/bin/odpscmd

PROJECT

The MaxCompute project that is used for the test.

Example: tpcds_test

Example: tpcds_test

SF

The data size of the TPC-DS test.

Unit: GB. 1 indicates 1 GB. 1000 indicates 1 TB. You can change the value based on your test requirements.

Default value: 1000. Currently, MaxCompute public datasets provide data in four sizes: 10 GB, 100 GB, 1 TB, and 10 TB. Therefore, SF can be set to 10, 100, 1000, or 10000.

Default value: 1000. Currently, MaxCompute public datasets provide data in four sizes: 10 GB, 100 GB, 1 TB, and 10 TB. Therefore, SF can be set to 10, 100, 1000, or 10000.

MODE

Sets the mode for this benchmark run

MCQA

MaxQA

MAXQA_QUOTA_NAME

The name of the MaxQA interactive Quota group used for testing, which can be found on the quota management page of the MaxCompute console. Note that you need to fill in the alias of the Quota (i.e., the name you gave it yourself)

N/A

Example: maxqa_test_quota

SQL_FLAGS

The built-in flag parameters of MaxCompute. You do not need to modify the configuration of these parameters.

  • set odps.sql.session.result.cache.enable=false: Disables the result cache in MCQA mode to ensure that each query is executed independently.

  • set odps.sql.allow.cartesian=true: Enables SQL to support Cartesian product calculations.

  • set odps.sql.session.query.timeout=600: The timeout period for Fuxi jobs in MCQA mode.

  • set odps.sql.mcqa2.result.cache.enable=false: Disables the result cache in MaxQA mode to ensure that each query is executed independently.

  • set odps.sql.allow.cartesian=true: Enables SQL to support Cartesian product calculations.

Execute the test

Run the following command in the mc_tpcds_benchmark directory to start the TPC-DS test:

nohup sh pt.sh > pt.log 2>&1 &

After successful execution, a pt.log log file is automatically generated in the mc_tpcds_benchmark directory. You can run the following command to view the detailed task logs:

tail -f pt.log

View MaxCompute task execution status

You can log on to the MaxCompute console, switch the region in the upper-left corner, and select Workspace > Jobs in the left-side navigation pane. On the Job Management page, you can query the execution status of tasks. You can also click Actions in the column of the target job and then click LogView to view the details of the job. For more information, see Manage jobs.

作业运维1

View test results

After the task is successfully executed, a test result file console_test_result.csv is automatically generated in the mc_tpcds_benchmark directory. You can view the test execution results, including the total time consumed, the execution time of each query, and the corresponding Logview information.