从kernel报错信息分析进程segment fault

运行环境

硬件环境为Hygon(x86架构),操作系统为Uos(4.19.0-amd64-desktop)。

问题

glmark2-es2长稳测试时,打印segment fault后退出。用户态除了打印segment fault,无其它有用信息。查看内核日志发现如下打印:

[180478.017641] glmark2-es2[3473]: segfault at 1f2e8 ip 00007f24741bfca0 sp 00007ffdbd47aa98 error 4 in libGLESv2_XXgpu.so.1.1.213621[7f247414e000+180000]
[180478.017646] Code: 00 44 8b 83 00 30 00 00 4c 8d 0d 9e 14 13 00 89 e9 ba 41 00 00 00 be 19 00 00 00 48 8b 38 31 c0 e8 e5 c1 0b 00 e9 bc fd ff ff <0f> b6 06 80 3f 00 89 c2 74 25 84 d2 b8 ff ff ff ff 74 1c 8b 4f 08

内核日志分析

x86打印该信息的函数如下,以下为kernel-5.4.191的代码,打印信息除了一些小区别,基本一致:

linux-5.4.191\arch\x86\mm\fault.c

static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
        unsigned long address, struct task_struct *tsk)
{
    const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;

    if (!unhandled_signal(tsk, SIGSEGV))
        return;

    if (!printk_ratelimit())
        return;

    printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
        loglvl, tsk->comm, task_pid_nr(tsk), address,
        (void *)regs->ip, (void *)regs->sp, error_code);

    print_vma_addr(KERN_CONT " in ", regs->ip);

    printk(KERN_CONT "\n");

    show_opcodes(regs, loglvl);   //打印code
}

linux-5.4.191\mm\memory.c

void print_vma_addr(char *prefix, unsigned long ip)
{
    struct mm_struct *mm = current->mm;
    struct vm_area_struct *vma;

    /*
     * we might be running from an atomic context so we cannot sleep
     */
    if (!down_read_trylock(&mm->mmap_sem))
        return;

    vma = find_vma(mm, ip);
    if (vma && vma->vm_file) {
        struct file *f = vma->vm_file;
        char *buf = (char *)__get_free_page(GFP_NOWAIT);
        if (buf) {
            char *p;

            p = file_path(f, buf, PAGE_SIZE);
            if (IS_ERR(p))
                p = "?";
            printk("%s%s[%lx+%lx]", prefix, kbasename(p),
                    vma->vm_start,
                    vma->vm_end - vma->vm_start);
            free_page((unsigned long)buf);
        }
    }
    up_read(&mm->mmap_sem);
}

linux-5.4.191\arch\x86\kernel\dumpstack.c

void show_opcodes(struct pt_regs *regs, const char *loglvl)
{
#define PROLOGUE_SIZE 42
#define EPILOGUE_SIZE 21
#define OPCODE_BUFSIZE (PROLOGUE_SIZE + 1 + EPILOGUE_SIZE)
    u8 opcodes[OPCODE_BUFSIZE];
    unsigned long prologue = regs->ip - PROLOGUE_SIZE;
    bool bad_ip;

    /*
     * Make sure userspace isn't trying to trick us into dumping kernel
     * memory by pointing the userspace instruction pointer at it.
     */
    bad_ip = user_mode(regs) &&
        __chk_range_not_ok(prologue, OPCODE_BUFSIZE, TASK_SIZE_MAX);

    if (bad_ip || probe_kernel_read(opcodes, (u8 *)prologue,
                    OPCODE_BUFSIZE)) {
        printk("%sCode: Bad RIP value.\n", loglvl);
    } else {
        printk("%sCode: %" __stringify(PROLOGUE_SIZE) "ph <%02x> %"
               __stringify(EPILOGUE_SIZE) "ph\n", loglvl, opcodes,
               opcodes[PROLOGUE_SIZE], opcodes + PROLOGUE_SIZE + 1);
    }
}

根据函数show_opcodes可知,出现segment fault时,会打印造成segment fault指令前的42字节(PROLOGUE_SIZE)字节指令,打印造成segment fault指令(包含该指令)后面的22字节指令(PROLOGUE_SIZE + 1)。造成semgnt fault的指令以<>开始。

参考《How do you read a segfault kernel log message》,打印信息各个字段解释如下:

How do you read a segfault kernel log message

This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]

56

When the report points to a program, not a shared library
Run addr2line -e myapp 080513b (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

If it's a shared library
In the libfoo.so[NNNNNN+YYYY] part, the NNNNNN is where the library was loaded.(这里有错误,NNNNNN应该是出错指令所在的segment对应的VMA的起始虚拟地址,YYYY是该VMA的虚拟地址空间大小,见上面代码中的函数print_vma_addr) Subtract this from the instruction pointer (ip) and you'll get the offset into the .so of the offending instruction. Then you can use objdump -DCgl libfoo.so and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .so doesn't have optimizations you can also try using addr2line -e libfoo.so <offset>.

What the error means
Here's the breakdown of the fields:

address 794ef0 - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0) 即访问该地址的数据时出现了错误
ip 080513b - instruction pointer, ie. where the code which is trying to do this lives
sp 794ef0 - stack pointer
error - Architecture-specific flags; see arch/*/mm/fault.c for your platform.

libfoo.so - 出错指令所在的库

NNNNNN - 出错指令所在的segment对应的VMA的起始虚拟地址,见函数print_vma_addr

YYYY - VMA的虚拟地址空间大小,见函数print_vma_addr

根据上面解释,可知内核打印信息指示指令地址0x00007f24741bfca0访问了地址0x1f2e8的数据,造成了segment fault,出错指令所在的segment对应的VMA的起始虚拟地址为0x7f247414e000,该VMA的虚拟地址空间大小为0x180000,error为4。分析内核代码流程可知,x86的error号对应的代码如下,4的含义即用户态读取内存时,没有相应的page(即页表未对该地址进行映射),且不是因为取指令时出现的错误:

enum x86_pf_error_code {
    X86_PF_PROT    =        1 << 0,
    X86_PF_WRITE    =        1 << 1,
    X86_PF_USER    =        1 << 2,
    X86_PF_RSVD    =        1 << 3,
    X86_PF_INSTR    =        1 << 4,
    X86_PF_PK    =        1 << 5,
};

寻找引起错误的指令

因为该问题极难复现,且没有生成coredump文件,只能通过内核打印的错误信息进一步分析。需要找出执行哪条指令引起的错误。从内核打印的错误信息可知是动态链接库libGLESv2_XXgpu.so.1.1.213621中出现了错误。

运行如下指令得到program header:

root@test-System-Product-Name:/home/segment# readelf -l libGLESv2_XXgpu.so.1.1.213621

Elf file type is DYN (Shared object file)
Entry point 0x11c90
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000010a78 0x0000000000010a78  R      0x1000
  LOAD           0x0000000000011000 0x0000000000011000 0x0000000000011000
                 0x000000000017f9e1 0x000000000017f9e1  R E    0x1000
  LOAD           0x0000000000191000 0x0000000000191000 0x0000000000191000
                 0x0000000000052a14 0x0000000000052a14  R      0x1000
  LOAD           0x00000000001e4540 0x00000000001e5540 0x00000000001e5540
                 0x0000000000004119 0x00000000000063c0  RW     0x1000
  DYNAMIC        0x00000000001e7d30 0x00000000001e8d30 0x00000000001e8d30
                 0x0000000000000280 0x0000000000000280  RW     0x8
  NOTE           0x0000000000000270 0x0000000000000270 0x0000000000000270
                 0x0000000000000024 0x0000000000000024  R      0x4
  TLS            0x00000000001e4540 0x00000000001e5540 0x00000000001e5540
                 0x0000000000000000 0x0000000000000010  R      0x8
  GNU_EH_FRAME   0x00000000001c5490 0x00000000001c5490 0x00000000001c5490
                 0x000000000000378c 0x000000000000378c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x00000000001e4540 0x00000000001e5540 0x00000000001e5540
                 0x0000000000003ac0 0x0000000000003ac0  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   01     .init .plt .plt.got .text .fini
   02     .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .data.rel.ro .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.gnu.build-id
   06     .tbss
   07     .eh_frame_hdr
   08
   09     .init_array .fini_array .data.rel.ro .dynamic .got

总共有10个segment,其中Segment Sections...中的00、01、......、09与上面Program Headers中的每行一一对应,即00对应Program Headers中第一行,以此类推。segment 01包含.init .plt .plt.got .text .fini总共5个section,它的type为LOAD,LOAD类型的Segtment会被加载到进程的虚拟地址空间,即内核会为每个类型为LOAD的segment创建一个VMA。segment 01的flag为R E,即可读、可运行,为指令。

Align为该segment起始地址对齐要求,即起始地址按0x1000对齐,该segment在动态链接库文件中的偏移为0x0000000000011000,按0x1000对齐了的,该segment对应的VMA的起始虚拟地址也需要按0x1000对齐。从内核打印的出错信息可知,出错指令所在的segment对应的VMA的起始虚拟地址为0x7f247414e000,该地址按0x1000对齐了的,该VMA的虚拟地址空间大小为0x180000,根据上面的readelf -l指令可知该segment的大小为0x000000000017f9e1,而VMA大小是page的整数倍,所以VMA的大小为0x180000。

segment中的所有section具有相同的flag,运行如下命令可知,该segment的section的flag为 A (alloc), X (execute):

root@test-System-Product-Name:/home/segment# readelf -S libGLESv2_XXgpu.so.1.1.213621
There are 30 section headers, starting at offset 0x1e87a8:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             0000000000000270  00000270
       0000000000000024  0000000000000000   A       0     0     4
  [ 2] .hash             HASH             0000000000000298  00000298
       0000000000001134  0000000000000004   A       4     0     8
  [ 3] .gnu.hash         GNU_HASH         00000000000013d0  000013d0
       0000000000000b04  0000000000000000   A       4     0     8
  [ 4] .dynsym           DYNSYM           0000000000001ed8  00001ed8
       0000000000003630  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000005508  00005508
       0000000000002b9f  0000000000000000   A       0     0     1
  [ 6] .gnu.version      VERSYM           00000000000080a8  000080a8
       0000000000000484  0000000000000002   A       4     0     2
  [ 7] .gnu.version_r    VERNEED          0000000000008530  00008530
       00000000000000a0  0000000000000000   A       5     4     8
  [ 8] .rela.dyn         RELA             00000000000085d0  000085d0
       0000000000007230  0000000000000018   A       4     0     8
  [ 9] .rela.plt         RELA             000000000000f800  0000f800
       0000000000001278  0000000000000018  AI       4    24     8
  [10] .init             PROGBITS         0000000000011000  00011000
       0000000000000017  0000000000000000  AX       0     0     4
  [11] .plt              PROGBITS         0000000000011020  00011020
       0000000000000c60  0000000000000010  AX       0     0     16
  [12] .plt.got          PROGBITS         0000000000011c80  00011c80
       0000000000000010  0000000000000008  AX       0     0     8
  [13] .text             PROGBITS         0000000000011c90  00011c90
       000000000017ed45  0000000000000000  AX       0     0     16
  [14] .fini             PROGBITS         00000000001909d8  001909d8
       0000000000000009  0000000000000000  AX       0     0     4
  [15] .rodata           PROGBITS         0000000000191000  00191000
       0000000000034490  0000000000000000   A       0     0     32
  [16] .eh_frame_hdr     PROGBITS         00000000001c5490  001c5490
       000000000000378c  0000000000000000   A       0     0     4
  [17] .eh_frame         PROGBITS         00000000001c8c20  001c8c20
       000000000001adf4  0000000000000000   A       0     0     8
  [18] .tbss             NOBITS           00000000001e5540  001e4540
       0000000000000010  0000000000000000 WAT       0     0     8
  [19] .init_array       INIT_ARRAY       00000000001e5540  001e4540
       0000000000000010  0000000000000008  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       00000000001e5550  001e4550
       0000000000000010  0000000000000008  WA       0     0     8
  [21] .data.rel.ro      PROGBITS         00000000001e5560  001e4560
       00000000000037d0  0000000000000000  WA       0     0     32
  [22] .dynamic          DYNAMIC          00000000001e8d30  001e7d30
       0000000000000280  0000000000000010  WA       5     0     8
  [23] .got              PROGBITS         00000000001e8fb0  001e7fb0
       0000000000000040  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         00000000001e9000  001e8000
       0000000000000640  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         00000000001e9640  001e8640
       0000000000000019  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           00000000001e9660  001e8659
       00000000000022a0  0000000000000000  WA       0     0     32
  [27] .comment          PROGBITS         0000000000000000  001e8659
       0000000000000029  0000000000000001  MS       0     0     1
  [28] .gnu_debuglink    PROGBITS         0000000000000000  001e8684
       000000000000001c  0000000000000000           0     0     4
  [29] .shstrtab         STRTAB           0000000000000000  001e86a0
       0000000000000103  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

反汇编分析

通过上面分析,知道了出错指令在VMA中的偏移为0x00007f24741bfca0 - 0x7f247414e000=0x71ca0,该偏移也是在segment 01中的偏移,因为该segment的起始位置在动态链接库文件中的偏移为0x0000000000011000,所以出错指令在动态链接库文件中的偏移为0x71ca0 + 0x0000000000011000 =  0x82ca0,运行如下命令获取动态链接库的汇编代码:

root@test-System-Product-Name:/home/segment# objdump -DCgl libGLESv2_XXgpu.so.1.1.213621 | more

libGLESv2_XXgpu.so.1.1.213621:     file format elf64-x86-64


Disassembly of section .note.gnu.build-id:

0000000000000270 <.note.gnu.build-id>:
 270:   04 00                   add    $0x0,%al
 272:   00 00                   add    %al,(%rax)
 274:   14 00                   adc    $0x0,%al
......

Disassembly of section .init:

0000000000011000 <_init@@Base>:
   11000:   48 83 ec 08             sub    $0x8,%rsp
   11004:   48 8b 05 b5 7f 1d 00    mov    0x1d7fb5(%rip),%rax        # 1e8fc0 <__gmon_start__>
   1100b:   48 85 c0                test   %rax,%rax
   1100e:   74 02                   je     11012 <_init@@Base+0x12>
   11010:   ff d0                   callq  *%rax
   11012:   48 83 c4 08             add    $0x8,%rsp
   11016:   c3                      retq

Disassembly of section .plt:

0000000000011020 <PVRSRVReleaseDeviceMapping@plt-0x10>:
   11020:   ff 35 e2 7f 1d 00       pushq  0x1d7fe2(%rip)        # 1e9008 <_fini@@Base+0x58630>
   11026:   ff 25 e4 7f 1d 00       jmpq   *0x1d7fe4(%rip)        # 1e9010 <_fini@@Base+0x58638>
   1102c:   0f 1f 40 00             nopl   0x0(%rax)
......

Disassembly of section .plt.got:

0000000000011c80 <__cxa_finalize@plt>:
   11c80:   ff 25 52 73 1d 00       jmpq   *0x1d7352(%rip)        # 1e8fd8 <__cxa_finalize@GLIBC_2.2.5>
   11c86:   66 90                   xchg   %ax,%ax
......

Disassembly of section .text:

0000000000011c90 <glGetPointerv@@Base-0x42a60>:
   11c90:   53                      push   %rbx
   11c91:   8b 9f 10 ad 00 00       mov    0xad10(%rdi),%ebx
   11c97:   41 89 d3                mov    %edx,%r11d
   11c9a:   31 c9                   xor    %ecx,%ecx
   11c9c:   45 31 c9                xor    %r9d,%r9d
   11c9f:   89 f6                   mov    %esi,%esi
   11ca1:   39 cb                   cmp    %ecx,%ebx

......

0000000000082a10 <glFinish@@Base>:
   82a10:   41 54                   push   %r12
   82a12:   55                      push   %rbp
   82a13:   53                      push   %rbx
   82a14:   48 8d 3d 95 65 16 00    lea    0x166595(%rip),%rdi        # 1e8fb0 <_fini@@Base+0x585d8>
......

   82c6a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
   82c70:   48 8b 83 10 30 00 00    mov    0x3010(%rbx),%rax
   82c77:   44 8b 83 00 30 00 00    mov    0x3000(%rbx),%r8d
   82c7e:   4c 8d 0d 9e 14 13 00    lea    0x13149e(%rip),%r9        # 1b4123 <_fini@@Base+0x2374b>
   82c85:   89 e9                   mov    %ebp,%ecx
   82c87:   ba 41 00 00 00          mov    $0x41,%edx
   82c8c:   be 19 00 00 00          mov    $0x19,%esi
   82c91:   48 8b 38                mov    (%rax),%rdi
   82c94:   31 c0                   xor    %eax,%eax
   82c96:   e8 e5 c1 0b 00          callq  13ee80 <glTexStorage2D@@Base+0x36c0>
   82c9b:   e9 bc fd ff ff          jmpq   82a5c <glFinish@@Base+0x4c>

   82ca0:   0f b6 06                movzbl (%rsi),%eax
   82ca3:   80 3f 00                cmpb   $0x0,(%rdi)
   82ca6:   89 c2                   mov    %eax,%edx
   82ca8:   74 25                   je     82ccf <glFinish@@Base+0x2bf>
   82caa:   84 d2                   test   %dl,%dl
   82cac:   b8 ff ff ff ff          mov    $0xffffffff,%eax
   82cb1:   74 1c                   je     82ccf <glFinish@@Base+0x2bf>
   82cb3:   8b 4f 08                mov    0x8(%rdi),%ecx

   82cb6:   8b 56 08                mov    0x8(%rsi),%edx
......

Disassembly of section .fini:

00000000001909d8 <_fini@@Base>:
  1909d8:   48 83 ec 08             sub    $0x8,%rsp
  1909dc:   48 83 c4 08             add    $0x8,%rsp
  1909e0:   c3                      retq
......

红色和蓝色的部分刚好匹配出错时内核打印的code,红色是出错的指令,该指令将rsi寄存器指向的内存的数据赋值给寄存器eax,很明显是rsi寄存器的值出错了,指向了错误的内存。

寻找出错指令对应的源代码

出现错误的指令是属于函数glFinish,查看glFinish源代码和汇编指令,发现glFinish源代码不应该有这么多汇编指令,可能和该动态链接库是release版有关,release包含的符号不全,因为加载动态链接库时如果外部调用了该库的接口,需要知道接口的地址,所以只有对外部的接口的符号信息是必须的,而库内部调用的接口的符号信息不是必需的。

使用gdb进一步分析动态链接库,gdb加载动态链接库后调试信息如下:

root@test-System-Product-Name:/home/shen/segment# gdb ./libGLESv2_XXgpu.so.1.1.213621
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./libGLESv2_XXgpu.so.1.1.213621...
(No debugging symbols found in ./libGLESv2_XXgpu.so.1.1.213621)
(gdb) info symbol glFinish
glFinish in section .text

(gdb) info symbol CompareVariables
No symbol table is loaded.  Use the "file" command.

(gdb) list *(0x0000000000082a10)
No symbol table is loaded.  Use the "file" command.

(gdb) list *(0x0000000000082ca0)
No symbol table is loaded.  Use the "file" command.

(gdb) disassemble glFinish
Dump of assembler code for function glFinish:
   0x0000000000082a10 <+0>:     push   %r12
   0x0000000000082a12 <+2>:     push   %rbp
   0x0000000000082a13 <+3>:     push   %rbx
   0x0000000000082a14 <+4>:     lea    0x166595(%rip),%rdi        # 0x1e8fb0
   0x0000000000082a1b <+11>:    callq  0x11560 <__tls_get_addr@plt>
   0x0000000000082a20 <+16>:    mov    0x8(%rax),%rbx
   0x0000000000082a27 <+23>:    test   %rbx,%rbx
.......
   0x0000000000082c77 <+615>:   mov    0x3000(%rbx),%r8d
   0x0000000000082c7e <+622>:   lea    0x13149e(%rip),%r9        # 0x1b4123
   0x0000000000082c85 <+629>:   mov    %ebp,%ecx
   0x0000000000082c87 <+631>:   mov    $0x41,%edx
   0x0000000000082c8c <+636>:   mov    $0x19,%esi
   0x0000000000082c91 <+641>:   mov    (%rax),%rdi
   0x0000000000082c94 <+644>:   xor    %eax,%eax
   0x0000000000082c96 <+646>:   callq  0x13ee80
   0x0000000000082c9b <+651>:   jmpq   0x82a5c <glFinish+76> //参考上面jmpq刚好是5个字节,该指令后地址即0x82ca0
End of assembler dump.

(gdb) disassemble 0x0000000000082a10
Dump of assembler code for function glFinish:
   0x0000000000082a10 <+0>:     push   %r12
   0x0000000000082a12 <+2>:     push   %rbp
   0x0000000000082a13 <+3>:     push   %rbx
   0x0000000000082a14 <+4>:     lea    0x166595(%rip),%rdi        # 0x1e8fb0
   0x0000000000082a1b <+11>:    callq  0x11560 <__tls_get_addr@plt>
   0x0000000000082a20 <+16>:    mov    0x8(%rax),%rbx
   0x0000000000082a27 <+23>:    test   %rbx,%rbx
.....
   0x0000000000082c77 <+615>:   mov    0x3000(%rbx),%r8d
   0x0000000000082c7e <+622>:   lea    0x13149e(%rip),%r9        # 0x1b4123
   0x0000000000082c85 <+629>:   mov    %ebp,%ecx
   0x0000000000082c87 <+631>:   mov    $0x41,%edx
   0x0000000000082c8c <+636>:   mov    $0x19,%esi
   0x0000000000082c91 <+641>:   mov    (%rax),%rdi
   0x0000000000082c94 <+644>:   xor    %eax,%eax
   0x0000000000082c96 <+646>:   callq  0x13ee80
   0x0000000000082c9b <+651>:   jmpq   0x82a5c <glFinish+76>
End of assembler dump.

(gdb) disassemble 0x0000000000082ca0
No function contains specified address.

反汇编函数glFinish,到0x0000000000082c9b+jump 5字节=0x0000000000082ca0就结束了。运行命令disassemble 0x0000000000082ca0,提示"No function contains specified address."。

运行命令add-symbol-file libGLESv2_XXgpu.dbg加载debug symbol file,调试信息如下:

(gdb) add-symbol-file libGLESv2_XXgpu.dbg
add symbol table from file "libGLESv2_XXgpu.dbg"
(y or n) y
Reading symbols from libGLESv2_XXgpu.dbg...
 

(gdb) disassemble 0x0000000000082ca0
Dump of assembler code for function CompareVariables:
   0x0000000000082ca0 <+0>:     movzbl (%rsi),%eax
   0x0000000000082ca3 <+3>:     cmpb   $0x0,(%rdi)
   0x0000000000082ca6 <+6>:     mov    %eax,%edx
   0x0000000000082ca8 <+8>:     je     0x82ccf <CompareVariables+47>
   0x0000000000082caa <+10>:    test   %dl,%dl
   0x0000000000082cac <+12>:    mov    $0xffffffff,%eax
   0x0000000000082cb1 <+17>:    je     0x82ccf <CompareVariables+47>
   0x0000000000082cb3 <+19>:    mov    0x8(%rdi),%ecx
   0x0000000000082cb6 <+22>:    mov    0x8(%rsi),%edx
   0x0000000000082cb9 <+25>:    add    $0x1,%ecx
   0x0000000000082cbc <+28>:    add    $0x1,%edx
   0x0000000000082cbf <+31>:    sub    0x4(%rdi),%ecx
   0x0000000000082cc2 <+34>:    sub    0x4(%rsi),%edx
   0x0000000000082cc5 <+37>:    cmp    %edx,%ecx
   0x0000000000082cc7 <+39>:    ja     0x82ccf <CompareVariables+47>
   0x0000000000082cc9 <+41>:    setb   %al
   0x0000000000082ccc <+44>:    movzbl %al,%eax
   0x0000000000082ccf <+47>:    repz retq
End of assembler dump.


(gdb) list *(0x0000000000082ca0)
0x82ca0 is in CompareVariables (compiler/psc/inst.c:607).
602     compiler/psc/inst.c: No such file or directory.

(gdb) info symbol CompareVariables
CompareVariables in section .text of /home/segment/libGLESv2_XXgpu.dbg


(gdb) info symbol CompareVariables
CompareVariables in section .text of /home/segment/libGLESv2_XXgpu.dbg


(gdb) list CompareVariables
602     compiler/psc/inst.c: No such file or directory.

0x0000000000082ca0刚好是函数CompareVariables的第一句指令,因为调试环境没有源代码,所以指令list *(0x0000000000082ca0)提示"compiler/psc/inst.c: No such file or directory."。

函数CompareVariables源代码如下:

typedef struct tagPSC_VARIABLE {
    IMG_BOOL bInUse;
    IMG_UINT32 ui32FirstID;
    IMG_UINT32 ui32LastID;
    IMG_UINT32 ui32AlignmentInDW;
    IMG_UINT32 ui32LifetimeStart;
    IMG_UINT32 ui32LifetimeEnd;
    IMG_UINT32 ui32FirstHwReg;
    IMG_UINT32 ui32LastHwReg;
} PSC_VARIABLE;

typedef struct tagPSC_CONTEXT
{
......
    PSC_VARIABLE *psVariables;

......
    LABEL_LOCATION *psLabels;
    LABEL_REQUEST *psLabelRequests;
......
}PSC_CONTEXT;

static int C_CALLCONV CompareVariables(void const *pvLhs, void const *pvRhs)
{
    PSC_VARIABLE const *psLhs = pvLhs;
    PSC_VARIABLE const *psRhs = pvRhs;

    if (psLhs->bInUse && psRhs->bInUse)
    {
        IMG_UINT32 ui32LhsSize = psLhs->ui32LastID - psLhs->ui32FirstID + 1;
        IMG_UINT32 ui32RhsSize = psRhs->ui32LastID - psRhs->ui32FirstID + 1;
        if (ui32LhsSize > ui32RhsSize)
        {
            return -1;
        }
        else if (ui32LhsSize < ui32RhsSize)
        {
            return 1;
        }
        else
        {
            return 0;
        }
    }
    else if (psLhs->bInUse)
    {
        return -1;
    }
    else if (psRhs->bInUse)
    {
        return 1;
    }
    return 0;
}

//只有函数CompilePreAmble会调用CompareVariables
XX_INTERNAL void CompilePreAmble(PPSC_CONTEXT psContext)
{

    ......

    /*
     * Sort the variables from largest to smallest and count how many there are.
     */
    if (psContext->ui32VariablesCapacity > 0)
    {
        qsort(psContext->psVariables, psContext->ui32VariablesCapacity, sizeof(PSC_VARIABLE), CompareVariables);
    }

    ......
}

对照CompareVariables源代码,分析CompareVariables的汇编指令如下所示:

AT&T汇编和Intel汇编有区别,比如源操作数和目的操作数位置相反,Intel语法中第一个是目的操作数,第二个是源操作数,下面为AT&T汇编:
(gdb) disassemble 0x0000000000082ca0
Dump of assembler code for function CompareVariables:
   0x0000000000082ca0 <+0>:     movzbl (%rsi),%eax         //(%rsi)即psRhs->bInUse
   0x0000000000082ca3 <+3>:     cmpb   $0x0,(%rdi)         //(%rdi)即psLhs->bInUse
   0x0000000000082ca6 <+6>:     mov    %eax,%edx           //edx设置为psRhs->bInUse
   0x0000000000082ca8 <+8>:     je     0x82ccf <CompareVariables+47>  //判断psLhs->bInUse是否等于0,如果等于0则跳转
   0x0000000000082caa <+10>:    test   %dl,%dl            //将psRhs->bInUse与psRhs->bInUse进行与操作,如果等于0则设置标志寄存器的ZF=1;否则设置ZF=0
   0x0000000000082cac <+12>:    mov    $0xffffffff,%eax    //eax  -1
   0x0000000000082cb1 <+17>:    je     0x82ccf <CompareVariables+47> //如果psRhs->bInUse等于0(即ZF==1)则跳转,且返回值eax在上一条指令已经被设置为-1
   0x0000000000082cb3 <+19>:    mov    0x8(%rdi),%ecx  //0x8(%rdi)即psLhs->ui32LastID
   0x0000000000082cb6 <+22>:    mov    0x8(%rsi),%edx  //0x8(%rsi)即psRhs->ui32LastID
   0x0000000000082cb9 <+25>:    add    $0x1,%ecx
   0x0000000000082cbc <+28>:    add    $0x1,%edx
   0x0000000000082cbf <+31>:    sub    0x4(%rdi),%ecx  //0x4(%rdi)即psLhs->ui32FirstID,对应 ui32LhsSize = psLhs->ui32LastID - psLhs->ui32FirstID + 1;
   0x0000000000082cc2 <+34>:    sub    0x4(%rsi),%edx  //0x4(%rsi)即psRhs->ui32FirstID,对应 ui32RhsSize = psRhs->ui32LastID - psRhs->ui32FirstID + 1;
   0x0000000000082cc5 <+37>:    cmp    %edx,%ecx      // edx是源操作数,ecx是目的操作数
   0x0000000000082cc7 <+39>:    ja     0x82ccf <CompareVariables+47> //如果ecx > edx,则跳转
   0x0000000000082cc9 <+41>:    setb   %al    //根据前面的CMP结果来设置al,如果ecx < edx,则设置al为1;否则设置al为0
   0x0000000000082ccc <+44>:    movzbl %al,%eax  //设置eax,eax为函数返回值
   0x0000000000082ccf <+47>:    repz retq
End of assembler dump.

出错的指令将psRhs->bInUse赋值给寄存器eax,所以访问指针psRhs指向的地址的数据时出现了错误,推测psRhs指针值出现了错误,结合内核打印信息psRhs的值就是0x1f2e8。

因为出现错误时没有堆栈、寄存器值等信息,所以无法继续深入分析汇编指令。只能分析调用CompareVariables相关的代码,看是否有问题,分析所有调用CompareVariables的流程,未发现问题,怀疑可能是内存被踩了。

通知测试人员打开生成coredump的功能,后续问题复现后继续分析。因为只有函数CompilePreAmble会调用CompareVariables,pvRhs就是指针psContext->psVariables指向的数组中的某个元素,有coredump文件后可以通过出现错误时的寄存器值倒推出psContext的值,然后可以查看psContext的其他成员是否正常,如果不正常可能就是内存被踩了。

补充

查看进程的动态链接库的地址

使用命令cat /proc/[proccess id]/maps可以查看进程使用的动态链接库在进程中的虚拟地址范围,其中每一行对应内核中的一个VMA(各个VMA都不相同):

root@test-System-Product-Name:/home# cat /proc/50859/maps 
55e0c4b51000-55e0c4b5c000 r--p 00000000 08:02 21759090                   /usr/sbin/sshd
55e0c4b5c000-55e0c4bdb000 r-xp 0000b000 08:02 21759090                   /usr/sbin/sshd
55e0c4bdb000-55e0c4c23000 r--p 0008a000 08:02 21759090                   /usr/sbin/sshd
55e0c4c23000-55e0c4c27000 r--p 000d1000 08:02 21759090                   /usr/sbin/sshd
55e0c4c27000-55e0c4c28000 rw-p 000d5000 08:02 21759090                   /usr/sbin/sshd
55e0c4c28000-55e0c4c2d000 rw-p 00000000 00:00 0
55e0c609b000-55e0c6149000 rw-p 00000000 00:00 0                          [heap]
7f28e424c000-7f28e4252000 r--p 00000000 08:02 21760411                   /usr/lib/x86_64-linux-gnu/libnss_systemd.so.2
7f28e4252000-7f28e4278000 r-xp 00006000 08:02 21760411                   /usr/lib/x86_64-linux-gnu/libnss_systemd.so.2
7f28e4278000-7f28e4284000 r--p 0002c000 08:02 21760411                   /usr/lib/x86_64-linux-gnu/libnss_systemd.so.2
7f28e4284000-7f28e4287000 r--p 00037000 08:02 21760411                   /usr/lib/x86_64-linux-gnu/libnss_systemd.so.2
7f28e4287000-7f28e4288000 rw-p 0003a000 08:02 21760411                   /usr/lib/x86_64-linux-gnu/libnss_systemd.so.2
7f28e42e5000-7f28e42e7000 r--p 00000000 08:02 22417503                   /usr/lib/x86_64-linux-gnu/security/pam_gnome_keyring.so
7f28e42e7000-7f28e42ed000 r-xp 00002000 08:02 22417503                   /usr/lib/x86_64-linux-gnu/security/pam_gnome_keyring.so
7f28e42ed000-7f28e42f0000 r--p 00008000 08:02 22417503                   /usr/lib/x86_64-linux-gnu/security/pam_gnome_keyring.so
7f28e42f0000-7f28e42f1000 r--p 0000a000 08:02 22417503                   /usr/lib/x86_64-linux-gnu/security/pam_gnome_keyring.so
7f28e42f1000-7f28e42f2000 rw-p 0000b000 08:02 22417503                   /usr/lib/x86_64-linux-gnu/security/pam_gnome_keyring.so
......

gdb加载debug symbol file到指定地址

使用gdb调试的时候,如果动态链接库是release版本,可以通过如下方式加载debug symbol file:

gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x0000fffff7fcd0c0  0x0000fffff7fe5468  Yes (*)     /lib/ld-linux-aarch64.so.1
0x0000fffff7f9f890  0x0000fffff7fb65c0  Yes (*)     /usr/local/lib/libtest.so.1
0x0000fffff7e4bbc0  0x0000fffff7f3b190  Yes         /lib/aarch64-linux-gnu/libc.so.6
0x0000fffff7dfea50  0x0000fffff7e0ddec  Yes         /lib/aarch64-linux-gnu/libpthread.so.0

Load the symbol file with the address from the share library

(gdb) add-symbol-file ./libsrc/libtest.dbg 0x0000fffff7f9f890
add symbol table from file "./libsrc/libtest.dbg" at
 .text_addr = 0xfffff7f9f890
(y or n) y
Reading symbols from ./libsrc/libtest.dbg...

其中0x0000fffff7f9f890就是动态库的加载地址,可以通过info sharedlibrary查到。 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值