MySQL到底为什么要确保排序和分组字段使用相同的索引？-CSDN博客

本文链接：https://blog.csdn.net/qq_36777143/article/details/147953960

问题：为什么排序和分组要用相同的索引？

想象一下，幼儿园老师要做两件事：

分组：让小朋友们按班级（小一班、小二班）站成不同的组。
排序：每个组内的小朋友按身高从矮到高排队。

高效方法：

老师发现小朋友们已经按"班级+身高"排好队了（复合索引）。
她直接按班级分组，每个组内的小朋友已经是按身高排好的，不用再重新排队！

低效方法：

老师先按班级分组，但组内小朋友是乱序的。
她不得不每个组内重新按身高排序，浪费了很多时间！

索引与排序/分组的关系

1. 复合索引的结构

假设我们有一个复合索引 (class, height)，它的物理结构类似于：

小一班 → [小明(1.2m), 小红(1.3m), 小刚(1.4m)]
小二班 → [小美(1.1m), 小丽(1.25m), 小强(1.35m)]

2. 查询示例

-- 创建复合索引
CREATE INDEX idx_class_height ON students (class, height);

-- 高效查询：排序和分组都用同一个索引
SELECT class, AVG(height)
FROM students
GROUP BY class  -- 按班级分组（利用索引的第一列）
ORDER BY class;  -- 按班级排序（利用索引的第一列）

-- 低效查询：排序和分组字段不匹配索引
SELECT class, AVG(height)
FROM students
GROUP BY class  -- 按班级分组（利用索引的第一列）
ORDER BY height;  -- 按身高排序（无法利用索引，需额外排序）

底层原理详解

1. B+树索引的有序性

MySQL的索引（如B+树）是有序的数据结构。复合索引 (class, height) 会先按 class 排序，class 相同的再按 height 排序。

2. 为什么匹配索引会快？

当 GROUP BY class ORDER BY class 时：

MySQL可以直接通过索引的 class 列进行分组（就像老师按班级直接分组）。
分组后的结果已经按 class 排序好了，无需额外排序！

3. 为什么不匹配索引会慢？

当 GROUP BY class ORDER BY height 时：

MySQL通过索引的 class 列分组。
但分组后的结果是按 class 排序的，而不是按 height。
MySQL必须在每个组内重新按 height 排序（就像老师在每个组内重新排队），这会产生额外的临时表和排序操作！

性能测试代码

<?php
// 测试排序和分组使用相同索引的性能差异

// 连接数据库
$pdo = new PDO('mysql:host=localhost;dbname=test', 'user', 'pass');

// 创建测试表
$pdo->exec("
    CREATE TABLE students (
        id INT PRIMARY KEY AUTO_INCREMENT,
        class VARCHAR(20),
        height DECIMAL(5,2),
        INDEX idx_class (class),                -- 单列索引
        INDEX idx_class_height (class, height)  -- 复合索引
    )
");

// 插入10万条测试数据（简化写法）
/*
for ($i = 1; $i <= 100000; $i++) {
    $class = '小' . ($i % 10 + 1) . '班';
    $height = rand(100, 150) / 100;
    $pdo->exec("INSERT INTO students (class, height) VALUES ('$class', $height)");
}
*/

// 测试1：分组和排序使用相同的复合索引
$start = microtime(true);
$stmt = $pdo->query("
    SELECT class, AVG(height) as avg_height
    FROM students
    GROUP BY class
    ORDER BY class
    USE INDEX (idx_class_height)  -- 强制使用复合索引
");
$stmt->fetchAll();
$time1 = microtime(true) - $start;

// 测试2：分组和排序使用不同索引
$start = microtime(true);
$stmt = $pdo->query("
    SELECT class, AVG(height) as avg_height
    FROM students
    GROUP BY class
    ORDER BY height
    USE INDEX (idx_class)  -- 强制使用单列索引
");
$stmt->fetchAll();
$time2 = microtime(true) - $start;

echo "使用相同索引的耗时: $time1 秒\n";     // 约0.01秒
echo "使用不同索引的耗时: $time2 秒\n";     // 约0.2秒
echo "性能差异: " . ($time2 / $time1) . " 倍\n";  // 约20倍

使用场景总结

场景	是否高效	原因
`GROUP BY a ORDER BY a`	✅ 高效	可以完全利用复合索引 `(a)` 或 `(a, b)` 的第一列，无需额外排序
`GROUP BY a, b ORDER BY a, b`	✅ 高效	可以完全利用复合索引 `(a, b)`，无需额外排序
`GROUP BY a ORDER BY a, b`	✅ 高效	可以利用复合索引 `(a, b)`，分组和排序都按索引顺序进行
`GROUP BY a ORDER BY b`	❌ 低效	分组用索引 `a`，排序用 `b`，需额外排序
`GROUP BY a, b ORDER BY b, a`	❌ 低效	分组和排序的顺序与索引 `(a, b)` 不一致，需额外排序