Bug 31404

Summary: [SNB] some simple operations will cause kernel oops
Product: DRI Reporter: zhao jian <jian.j.zhao>
Component: DRM/IntelAssignee: Zou Nan hai <nanhai.zou>
Status: CLOSED FIXED QA Contact:
Severity: critical    
Priority: highest CC: chris, jbarnes
Version: XOrg git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.0.log
none
dmesg of the kernel oops when doing rendercheck none

Description zhao jian 2010-11-04 23:54:14 UTC
Environment:
------------
platform: HuronRiver
Libdrm:         (master)2.4.22-12-ga52e61b5c888444435929a2770f14109c3a94f2f
Mesa:           (master)d3fcadf8400360f4db45a4deb45b3b260e880b49
Xserver:(master)xorg-server-1.9.0-184-ga52efb096e166e325deb3d6b502671f339a4fa15
Xf86_video_intel: (master)2.12.902-43-g52b32436b9e14a3e13818f80102150ff5bc3c002
Kernel: (drm-intel-fixes)16a02cf08a2de0863daf7ebb91718d7c6bbe7f9c

Bug detailed description:
--------------------------
We start x and run some rendercheck tests for a while about 5 minutes there will be kernel oops. Or we just start gnome, and move the window and type some keyboards then it will kernel oops. There are both kernel oops on render ring and blt ring. 

Reproduce steps: 
--------------------------
1. xinit
2. rendercheck -o src,over,overreverse,xor -t blend
Comment 1 zhao jian 2010-11-05 00:02:45 UTC
Created attachment 40059 [details]
xorg.0.log
Comment 2 zhao jian 2010-11-05 00:03:59 UTC
Created attachment 40060 [details]
dmesg of the kernel oops when doing rendercheck
Comment 3 Hai 2010-11-05 00:12:16 UTC
When use mplayer to play a video with xv output, system will stop response(oops will be reported)

Reproduce steps:
----------------
1. xinit
2. mplayer -vo xv mediafile
3. move and resize window
Comment 4 Hai 2010-11-05 00:25:09 UTC
Following is the sys log of playing video with xv output.

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:Oops: 0002 [#1] SMP

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:02/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:Process X (pid: 2848, ti=f5a20000 task=f59a2580 task.ti=f5a20000)

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:Stack:

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:Call Trace:

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:Code: 20 8b 53 10 c7 04 02 01 00 80 10 8b 43 20 8d 50 04 89 53 20 8b 53 10 c7 44 02 04 80 00 00 00 8b 43 20 8d 50 04 89 53 20 8b 53 10 <89> 7c 02 04 8b 43 20 8d 50 04 89 53 20 8b 53 10 c7 44 02 04 00

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:EIP: [<f8425658>] blt_ring_add_request+0x5d/0xa5 [i915] SS:ESP 0068:f5a21d94

Message from syslogd@x-hnr1 at Nov  6 02:44:44 ...
 kernel:CR2: 00000000f8b60000
Comment 5 Wang Zhenyu 2010-11-05 01:50:42 UTC
Note that I can't see this on sandybridge desktop with D2 CPU. So this might be hw stepping related.
Comment 6 Chris Wilson 2010-11-05 01:59:48 UTC
This is a side effect of bug 31370. I've cherry-picked the workaround from -next onto -staging that should prevent this:

commit a3d4677623ca677c18f90484157be69c4c5f8312
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 5 08:56:38 2010 +0000

    drm/i915/ringbuffer: Use the HEAD auto-reporting mechanism
    
    My Sandybridge only reports 0 for the ring buffer registers, causing it
    to hang as soon as we exhaust the available ring. As a workaround, take
    advantage of our huge ring buffers and use the auto-reporting mechanism
    to update the status page with the HEAD location every 64 KiB.
    
    Cherry-picked from 6aa56062eaba67adfb247cded244fd877329588d.
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 Chris Wilson 2010-11-05 02:01:05 UTC
(In reply to comment #5)
> Note that I can't see this on sandybridge desktop with D2 CPU. So this might be
> hw stepping related.

What does /sys/kernel/debug/dri/0/i915_*ringbuffer_info report? Can you start drm-intel-next without it barfing? And what revision is SNB D2?
Comment 8 zhao jian 2010-11-05 08:06:46 UTC
(In reply to comment #6)
> This is a side effect of bug 31370. I've cherry-picked the workaround from
> -next onto -staging that should prevent this:
> 
> commit a3d4677623ca677c18f90484157be69c4c5f8312
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Nov 5 08:56:38 2010 +0000
> 
>     drm/i915/ringbuffer: Use the HEAD auto-reporting mechanism
> 
>     My Sandybridge only reports 0 for the ring buffer registers, causing it
>     to hang as soon as we exhaust the available ring. As a workaround, take
>     advantage of our huge ring buffers and use the auto-reporting mechanism
>     to update the status page with the HEAD location every 64 KiB.
> 
>     Cherry-picked from 6aa56062eaba67adfb247cded244fd877329588d.
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Chris, what do you mean by "side effect of bug 31370"? The KMS can be enabled well here.  Of course I can have a test with this commit and patch it to drm-intel-fixes branch.
Comment 9 Chris Wilson 2010-11-05 08:15:06 UTC
Bug 31370 is the hw doesn't read the correct values from the ringbuffer registers which is the root cause of the problem here.
Comment 10 zhao jian 2010-11-07 22:28:41 UTC
(In reply to comment #6)
> This is a side effect of bug 31370. I've cherry-picked the workaround from
> -next onto -staging that should prevent this:
> commit a3d4677623ca677c18f90484157be69c4c5f8312
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Nov 5 08:56:38 2010 +0000
>     drm/i915/ringbuffer: Use the HEAD auto-reporting mechanism
>     My Sandybridge only reports 0 for the ring buffer registers, causing it
>     to hang as soon as we exhaust the available ring. As a workaround, take
>     advantage of our huge ring buffers and use the auto-reporting mechanism
>     to update the status page with the HEAD location every 64 KiB.
>     Cherry-picked from 6aa56062eaba67adfb247cded244fd877329588d.
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

We tested with commit a3d4677623ca677c18f90484157be69c4c5f8312 on -staging branch, both the rendercheck and media work well without kernel oops. So Chris, can you pull it into -fixes branch? Thanks.
Comment 11 zhao jian 2010-11-09 01:08:08 UTC
The commit a3d4677623ca677c18f90484157be69c4c5f8312 was cherry-picked to drm-intel-fixes now, and it works well now with 3f8ff0e72d75fdbe7f2cba2c4015fd9fdd9e13fd on drm-intel-fixes branch. so closed.
Comment 12 Jari Tahvanainen 2016-09-23 10:38:44 UTC
Verified->Closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.