业务反馈机房中有机器经常假死,业务无响应,登陆上去分析,原来是发生了softlockup.
一 softlockup发生原因
1.死锁(等待锁资源)
2.进程一直在某个循环运行,缺少调度检查(cond_resched())
3.当前任务关抢占时间太长(preempt_disable,spin_lock())
4.中断风暴(irq storm)导致CPU无法进行调度
5.softirq/tasklet执行时间太长,导致CPU无法进行调度
6.实时线程占有cpu,饿死watchdog线程(资源超卖)
7.调度器bug(nr_running计算出错)
8.硬件bug(cpu在idle状态)
9.虚拟化超卖导致cpu 出现严重的steal,从而出现softlockup(steal很高)
二 分析过程
1,首先查看引起softlockup的是因为拿不到ext4 的group block spinlock. 可以确定softlockup是因为spinlock导致的死锁问题.
2.谁拿走了spinlock
使用bt -a查看所有cpu堆栈,发现全部cpu都已经死锁了,而且都在两个点,
2766246.212211] [<ffffffff81741bb0>] _raw_spin_lock+0x20/0x30
[2766246.212238] [<ffffffffa029e606>] ext4_free_inode+0x536/0x650 [ext4]
[2766246.212249] [<ffffffffa02a8cfb>] ext4_evict_inode+0x44b/0x4c0 [ext4]
[2766246.212252] [<ffffffff8126d05a>] evict+0xba/0x190
[2766246.212254] [<ffffffff8126d4d2>] iput+0x1b2/0x230
[2766246.212257] [<ffffffff8126720b>] dentry_unlink_inode+0xab/0xe0
[2766246.212260] [<ffffffff812681e6>] __dentry_kill+0xb6/0x160
[2766246.212262] [<ffffffff812683f1>] dput+0x161/0x270
[2766246.212266] [<ffffffffa050c170>] ovl_dentry_release+0x20/0x60 [overlay]
[2766246.212268] [<ffffffff81268205>] __dentry_kill+0xd5/0x160
2766246.212454] [<ffffffff81741bb0>] _raw_spin_lock+0x20/0x30
[2766246.212454] [<ffffffffa029eb41>] __ext4_new_inode+0x421/0x14b0 [ext4]
[2766246.212455] [<ffffffffa02b29f6>] ext4_create+0xc6/0x1c0 [ext4]
[2766246.212456] [<ffffffff8125cd17>] vfs_create+0x127/0x1a0
[2766246.212456] [<ffffffffa050f3bb>] ovl_create_real+0xab/0x220 [overlay]
[2766246.212457] [<ffffffffa0510693>] ovl_create_or_link.part.5+0x1e3/0x6e0 [overlay]
[2766246.212457] [<ffffffffa050dba9>] ? ovl_override_creds+0x19/0x20 [overlay]
[2766246.212458] [<ffffffffa0512a38>] ? ovl_copy_up+0xc8/0x137 [overlay]
[2766246.212459] [<ffffffff8126c1c0>] ? alloc_inode+0x30/0x80
[2766246.212459] [<ffffffff8126c05b>] ? inode_sb_list_add+0x3b/0x50
也就是拿走spinlock的进程,没有在cpu上运行,证明进程拿了spinlock,却被调度出去了
用foreach bt -a 查看所有进程堆栈,找到一个可疑进程
PID: 20410 TASK: ffff8831bb6d0000 CPU: 2 COMMAND: "nginx"
#0 [ffffc900465d7820] __schedule at ffffffff8173ca3b
#1 [ffffc900465d78a8] _cond_resched at ffffffff8173d1c6
#2 [ffffc900465d78c0] __getblk_gfp at ffffffff81289acf
#3 [ffffc900465d7930] find_inode_bit at ffffffffa029d368 [ext4]
#4 [ffffc900465d7978] __ext4_new_inode at ffffffffa029ee33 [ext4]
#5 [ffffc900465d7a30] ext4_create at ffffffffa02b29f6 [ext4]
#6 [ffffc900465d7aa8] vfs_create at ffffffff8125cd17
#7 [ffffc900465d7ae8] ovl_create_real at ffffffffa050f3bb [overlay]
#8 [ffffc900465d7b20] ovl_create_or_link at ffffffffa0510693 [overlay]
查看源代码
__ext4_new_inode在拿了group的spinlock后,调用了find_inode_bit,最终调用到可休眠接口__getblk_gfp,导致被调度出去.
很显然这个ext4的原生bug. 查看linux主线,以及修复了这个bug.