公司的生产系统大多是java应用,特别是java消费者,前段时间频繁出现故障。消息队列会突然堵塞,查看消费者日志,发现处理延时明显延长,而此时网络无丢包。最后多方排查,是java消费者运行出现死锁。
这种问题可以通过jstack来dump获取java stack和native stack信息查明问题。
如:
jstack [-l] pid
提供一个测试的例子:
/**
* 死锁例子
* @author crane.ding
* @since 2011-3-20
*/
public class DeadLock {
public static void main(String[] args) {
final Object obj_1 = new Object(), obj_2 = new Object();
Thread t1 = new Thread("t1"){
@Override
public void run() {
synchronized (obj_1) {
try {
Thread.sleep(3000);
} catch (InterruptedException e) {}
synchronized (obj_2) {
System.out.println("thread t1 done.");
}
}
}
};
Thread t2 = new Thread("t2"){
@Override
public void run() {
synchronized (obj_2) {
try {
Thread.sleep(3000);
} catch (InterruptedException e) {}
synchronized (obj_1) {
System.out.println("thread t2 done.");
}
}
}
};
t1.start();
t2.start();
}
}
保存为DeadLock.java。以上DeadLock类是一个死锁的例子,假使在我们不知情的情况下,运行DeadLock后,发现等了N久都没有在屏幕打印线程完成信息。这个时候我们就可以使用ps –ef|grep xxx查看该程序的pid值和使用jstack来生产堆栈结果问题。
javac DeadLock.java
jar cvf deadlock.jar DeadLock*.class
java -cp deadlock.jar DeadLock &
ps –ef|grep deadlock.jar #获取deadlock.jar进程的pid为1448
jstack -l 1448 > deadlock.jstack
结果文件deadlock.jstack内容类似如下:
2012-01-20 20:15:20
Full thread dump Java HotSpot(TM) Client VM (19.1-b02 mixed mode, sharing):
"DestroyJavaVM" prio=6 tid=0x00316800 nid=0x9fc waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"t2" prio=6 tid=0x02bcf000 nid=0xc70 waiting for monitor entry [0x02f6f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.demo.DeadLock$2.run(DeadLock.java:40)
- waiting to lock <0x22a297a8> (a java.lang.Object)
- locked <0x22a297b0> (a java.lang.Object)
Locked ownable synchronizers:
- None
"t1" prio=6 tid=0x02bce400 nid=0xba0 waiting for monitor entry [0x02f1f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.demo.DeadLock$1.run(DeadLock.java:25)
- waiting to lock <0x22a297b0> (a java.lang.Object)
- locked <0x22a297a8> (a java.lang.Object)
Locked ownable synchronizers:
- None
"Low Memory Detector" daemon prio=6 tid=0x02bb9400 nid=0xa6c runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"CompilerThread0" daemon prio=10 tid=0x02bb2800 nid=0xcb8 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Attach Listener" daemon prio=10 tid=0x02bb1000 nid=0x7f4 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Signal Dispatcher" daemon prio=10 tid=0x02bd2800 nid=0xd80 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Finalizer" daemon prio=8 tid=0x02bab000 nid=0xe1c in Object.wait() [0x02d3f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229e1148> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x229e1148> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Locked ownable synchronizers:
- None
"Reference Handler" daemon prio=10 tid=0x02ba6800 nid=0xbe0 in Object.wait() [0x02cef000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229e1048> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x229e1048> (a java.lang.ref.Reference$Lock)
Locked ownable synchronizers:
- None
"VM Thread" prio=10 tid=0x02b6a400 nid=0x568 runnable
"VM Periodic Task Thread" prio=10 tid=0x02bc8400 nid=0x75c waiting on condition
JNI global references: 878
Found one Java-level deadlock:
=============================
"t2":
waiting to lock monitor 0x02baaeec (object 0x22a297a8, a java.lang.Object),
which is held by "t1"
"t1":
waiting to lock monitor 0x02baa2bc (object 0x22a297b0, a java.lang.Object),
which is held by "t2"
Java stack information for the threads listed above:
===================================================
"t2":
at com.demo.DeadLock$2.run(DeadLock.java:40)
- waiting to lock <0x22a297a8> (a java.lang.Object)
- locked <0x22a297b0> (a java.lang.Object)
"t1":
at com.demo.DeadLock$1.run(DeadLock.java:25)
- waiting to lock <0x22a297b0> (a java.lang.Object)
- locked <0x22a297a8> (a java.lang.Object)
Found 1 deadlock.
从这个结果文件我们一看到发现了一个死锁,具体是线程t2在等待线程t1,而线程t1在等待线程t2造成的,同时也记录了线程的堆栈和代码行数,通过这个堆栈和行数我们就可以去检查对应的代码块,从而发现问题和解决问题。