Java 中的 hashCode() 方法

最新推荐文章于 2025-04-30 15:56:29 发布

·云扬·

最新推荐文章于 2025-04-30 15:56:29 发布

阅读量1.2k

点赞数 29

分类专栏： Java # JavaSE 文章标签： java 哈希算法开发语言学习笔记

本文链接：https://blog.csdn.net/gaosw0521/article/details/143839672

版权

Java 同时被 2 个专栏收录

166 篇文章

订阅专栏

JavaSE

67 篇文章

订阅专栏

1 引言

hashCode() 方法是 Java 中定义在 Object 类中的一个方法，所有 Java 对象都继承自 Object 类，因此每个对象都可以调用 hashCode() 方法。hashCode() 方法的主要作用是返回一个整数，这个整数称为哈希码（hash code），用于支持哈希表（如 HashMap、HashSet 等）。

哈希表是一种数据结构，它通过哈希算法将对象映射到一个特定的位置，从而实现快速查找、插入和删除操作。哈希码是对象在哈希表中的一个近似表示，用于确定对象在哈希表中的存储位置。

2 `hashCode()` 方法的定义

Object 类中的 hashCode() 方法是一个本地方法（native method），其实现由 JVM 提供。具体的实现可以在 OpenJDK 的源码中找到，通常位于 jdk/src/hotspot/share/runtime/synchronizer.cpp 文件中。get_next_hash() 方法会根据 hashCode 的取值来决定采用哪一种哈希值的生成策略。
在这里插入图片描述

public native int hashCode();

在 Java 9 之后，hashCode() 方法会被 @HotSpotIntrinsicCandidate 注解修饰，表明它在 HotSpot 虚拟机中有一套高效的实现，基于 CPU 指令。

3 `hashCode()` 方法与 `equals()` 方法的关系

在 Java 中，hashCode() 方法和 equals() 方法之间有以下关系：

一致性要求：
- 如果两个对象通过 equals() 方法比较相等，那么它们的 hashCode() 方法返回的哈希码必须相同。
- 如果两个对象的 hashCode() 方法返回的哈希码不同，那么它们一定不相等。
不一致的情况：
- 如果两个对象的 equals() 方法返回 false，它们的 hashCode() 方法返回的哈希码可能相同，也可能不同。
- 如果两个对象的 hashCode() 方法返回的哈希码相同，它们的 equals() 方法可能返回 true，也可能返回 false。

4 自定义类的 `hashCode()` 方法

当你创建一个自定义类并覆盖 equals() 方法时，通常也需要覆盖 hashCode() 方法，以确保相等的对象具有相同的哈希码。这有助于提高哈希表在使用自定义类的对象作为键时的准确性。

例如，假设我们有一个 Student 类：

class Student {
    private int age;
    private String name;

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Student student = (Student) o;
        return age == student.age && Objects.equals(name, student.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(age, name);
    }
}

在这个例子中，我们重写了 equals() 方法和 hashCode() 方法，确保在 age 和 name 相同的情况下，两个 Student 对象的哈希码相同。

5 `Objects.hash()` 方法

Objects.hash() 方法是一个方便的工具，用于生成哈希码。它内部使用了一个简单的哈希算法，将每个属性的哈希码结合起来，生成一个新的哈希码。

public static int hash(Object... values) {
    return Arrays.hashCode(values);
}

Arrays.hashCode() 方法的实现如下：

public static int hashCode(Object a[]) {
 if (a == null)
     return 0;

 int result = 1;

 for (Object element : a)
     result = 31 * result + (element == null ? 0 : element.hashCode());

 return result;
}

这个算法使用了质数 31 来生成哈希码，因为质数在哈希计算中通常能够提供较好的分布性。

6 `hashCode()` 方法的实现策略

get_next_hash() 方法是 JVM 中用于生成对象哈希码的底层实现。get_next_hash() 方法根据 hashCode 的取值来决定采用哪一种哈希值的生成策略。以下是该方法的 C++ 源码：

static inline intptr_t get_next_hash(Thread* current, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // 使用全局的 Park-Miller 随机数生成器
    value = os::random();
  } else if (hashCode == 1) {
    // 在 STW（Stop The World）操作中具有稳定（幂等）的特性
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    // 用于敏感性测试
    value = 1;
  } else if (hashCode == 3) {
    // 从 0 开始计算哈希值，不是线程安全的
    value = ++GVars.hc_sequence;
  } else if (hashCode == 4) {
    // 与创建对象的内存位置有关，原样输出
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // 默认值，支持多线程，使用了 Marsaglia 的 xor-shift 算法
    unsigned t = current->_hashStateX;
    t ^= (t << 11);
    current->_hashStateX = current->_hashStateY;
    current->_hashStateY = current->_hashStateZ;
    current->_hashStateZ = current->_hashStateW;
    unsigned v = current->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    current->_hashStateW = v;
    value = v;
  }

  value &= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markWord::no_hash, "invariant");
  return value;
}

哈希值的生成策略

hashCode == 0：
- 使用全局的 Park-Miller 随机数生成器生成随机数。
- 这种策略在多处理器（MP）系统上可能会引发大量的一致性通信。
hashCode == 1：
- 在 STW（Stop The World）操作中具有稳定（幂等）的特性。
- 使用对象地址进行计算，并结合一个不经常更新的随机数（GVars.stw_random）。
hashCode == 2：
- 返回固定值 1，通常用于某些情况下的测试。
hashCode == 3：
- 从 0 开始计算哈希值，不是线程安全的。
- 多个线程可能会得到相同的哈希值。
hashCode == 4：
- 直接返回对象的内存地址。
hashCode == 5（默认值）：
- 使用 Marsaglia 的 xor-shift 算法生成伪随机数。
- 这种算法具有较好的随机性和性能，支持多线程环境。