Bug 66077

Summary: Oops reading i915_error_state
Product: DRI Reporter: Chris Wilson <chris>
Component: DRM/IntelAssignee: Mika Kuoppala <mika.kuoppala>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Chris Wilson 2013-06-23 12:31:26 UTC
[76973.700434] BUG: unable to handle kernel paging request at 0000000000001000
    [76973.700468] IP: [<ffffffff811fb84d>] strnlen+0xd/0x40
    [76973.700494] PGD 10387b067 PUD 15411a067 PMD 0
    [76973.700515] Oops: 0000 [#1] SMP
    [76973.700529] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_stats cpufreq_conservative binfmt_misc nfs lockd fscache sunrpc coretemp crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core acpi_cpufreq mperf evdev processor ext4 crc16 jbd2 mbcache sd_mod crc_t10dif ahci libahci libata scsi_mod thermal fan
    [76973.700718] CPU: 2 PID: 13279 Comm: intel_error_dec Not tainted 3.10.0-rc6+ #99
    [76973.700746] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
    [76973.700793] task: ffff880153fa2100 ti: ffff880152310000 task.ti: ffff880152310000
    [76973.700820] RIP: 0010:[<ffffffff811fb84d>]  [<ffffffff811fb84d>] strnlen+0xd/0x40
    [76973.700849] RSP: 0018:ffff880152311cb0  EFLAGS: 00010286
    [76973.700869] RAX: ffffffff81774000 RBX: ffff880154000072 RCX: fffffffffffffffe
    [76973.700895] RDX: 0000000000001000 RSI: ffffffffffffffff RDI: 0000000000001000
    [76973.700920] RBP: ffff880152311cb0 R08: 000000000000ffff R09: 000000000000ffff
    [76973.700948] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000001000
    [76973.700974] R13: ffff880154001000 R14: 00000000ffffffff R15: 0000000000000000
    [76973.701000] FS:  00007f00fc942880(0000) GS:ffff88015e300000(0000) knlGS:0000000000000000
    [76973.701029] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [76973.701050] CR2: 0000000000001000 CR3: 000000015939a000 CR4: 00000000001407e0
    [76973.701075] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [76973.701103] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [76973.701128] Stack:
    [76973.701136]  ffff880152311ce8 ffffffff811fdafb ffff880154000072 ffff880154001000
    [76973.701167]  ffff880152311d58 ffffffff817a703e ffffffff817a703e ffff880152311d48
    [76973.701198]  ffffffff811fe8a9 0000000a0000ffff 0000000000000001 ffff880154000000
    [76973.701229] Call Trace:
    [76973.701242]  [<ffffffff811fdafb>] string.isra.5+0x3b/0xf0
    [76973.701264]  [<ffffffff811fe8a9>] vsnprintf+0x1d9/0x670
    [76973.701286]  [<ffffffff812d13c9>] i915_error_printf+0xb9/0x160
    [76973.701311]  [<ffffffff812d15fc>] print_error_buffers+0x18c/0x1e0
    [76973.701335]  [<ffffffff812d1a5e>] i915_error_state_read+0x40e/0x8e0
    [76973.701361]  [<ffffffff81114527>] vfs_read+0x97/0x160
    [76973.701381]  [<ffffffff81114f74>] SyS_read+0x44/0x90
    [76973.701403]  [<ffffffff8141a552>] system_call_fastpath+0x16/0x1b
    [76973.701424] Code: c0 01 80 38 00 75 f7 48 29 f8 5d c3 31 c0 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d 4e ff 48 85 f6 48 89 e5 74 2a <80> 3f 00 74 25 48 89 f8 31 d2 eb 10 0f 1f 80 00 00 00 00 48 83
    [76973.701535] RIP  [<ffffffff811fb84d>] strnlen+0xd/0x40
    [76973.701550]  RSP <ffff880152311cb0>
    [76973.701559] CR2: 0000000000001000

The patch I sent doesn't fix the bug - it seems to be using "%s", (void *)4096. Bizarre.
Comment 1 Chris Wilson 2013-06-24 14:42:22 UTC
This is just stack corruption. If I break the long err_printf() of the buffer object into multiple calls, then It Just Works (TM).
Comment 2 Chris Wilson 2013-06-30 08:27:58 UTC
commit c7815d42dfce7439cde3f23766e7dea59ffa7e1a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 29 23:26:50 2013 +0100

    drm/i915: Break up the large vsnprintf() in print_error_buffers()
    
    So it appears that I have encountered some bogosity when trying to call
    i915_error_printf() with many arguments from print_error_buffers(). The
    symptom is that the vsnprintf parser tries to interpret an integer arg
    as a character string, the resulting OOPS indicating stack corruption.
    Replacing the single call with its 13 format specifiers and arguments
    with multiple calls to i915_error_printf() worked fine. This patch goes
    one step further and introduced i915_error_puts() to pass the strings
    simply.
    
    It may not fix the root cause, but it does prevent my box from dying and
    I think helps make print_error_buffers() more friendly.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66077
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch
Comment 3 Mika Kuoppala 2014-01-24 11:14:19 UTC
commit e29bb4ebbf000ff9ac081d29784a3331618f012e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 20 10:20:59 2013 +0100

    drm/i915: Use a temporary va_list for two-pass string handling
    
    In
    
    commit edc3d8848dc9fe2a470316363dab8ef211d77e01
    Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Date:   Thu May 23 13:55:35 2013 +0300
    
        drm/i915: avoid big kmallocs on reading error state
    
    we introduce a two-pass mechanism for splitting long strings being
    formatted into the error-state. The first pass finds the length, and the
    second pass emits the right portion of the string into the accumulation
    buffer. Unfortunately we use the same va_list for both passes, resulting
    in the second pass reading garbage off the end of the argument list. As
    the two passes are only used for boundaries between read() calls, the
    corruption is only rarely seen.
    
    This fixes the root cause behind
    
    commit baf27f9b17bf2f369f3865e38c41d2163e8d815d
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Sat Jun 29 23:26:50 2013 +0100
    
        drm/i915: Break up the large vsnprintf() in print_error_buffers()
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: stable@vger.kernel.org
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.