52424 – [Bisected SNB Regression rc6] glxgears causes GPU hung

Bug 52424 - [Bisected SNB Regression rc6] glxgears causes GPU hung

Summary: [Bisected SNB Regression rc6] glxgears causes GPU hung

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	All Linux (All)

Importance:	high major
Assignee:	Daniel Vetter
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-07-24 02:54 UTC by lu hua
Modified:	2017-10-06 14:48 UTC (History)
CC List:	11 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (106.41 KB, text/plain) 2012-07-24 02:54 UTC, lu hua	no flags	Details
use 10 usec delay for snb forcewake ack (6.64 KB, patch) 2012-07-25 09:27 UTC, Daniel Vetter	no flags	Details \| Splinter Review
use 10 usec delay for snb forcewake ack v2 (6.64 KB, patch) 2012-07-25 09:29 UTC, Daniel Vetter	no flags	Details \| Splinter Review
use posting reads (1.69 KB, patch) 2012-07-26 09:00 UTC, Daniel Vetter	no flags	Details \| Splinter Review
10 usec dely, simpler patch (801 bytes, patch) 2012-07-26 09:25 UTC, Daniel Vetter	no flags	Details \| Splinter Review
HNRnetconsole (12.70 KB, text/plain) 2012-07-30 07:36 UTC, lu hua	no flags	Details
dmesg on HuronRiver drm-intel-fixes kernel (74.80 KB, text/plain) 2012-08-07 04:51 UTC, lu hua	no flags	Details
error state (2.32 MB, text/plain) 2012-08-29 01:10 UTC, lu hua	no flags	Details
SNB dmesg (39.42 KB, text/plain) 2012-09-25 06:29 UTC, lu hua	no flags	Details
glxgears hang dmesg (28.71 KB, text/plain) 2012-10-15 03:13 UTC, shui yangwei	no flags	Details
netconsole log (37.52 KB, text/plain) 2012-10-19 02:55 UTC, lu hua	no flags	Details
system hang after glxgears---netconsole messages (43.73 KB, text/plain) 2013-02-05 06:36 UTC, cancan,feng	no flags	Details
system hang after glxgears with recent kernel (27.75 KB, text/plain) 2013-02-17 03:39 UTC, cancan,feng	no flags	Details
Show Obsolete (2) View All

Description lu hua 2012-07-24 02:54:07 UTC

Created attachment 64582 [details]
dmesg

System Environment:
--------------------------
Arch:             i386
Platform:         sandybridge
Libdrm:	(master)libdrm-2.4.37-11-gfaf26b689d4a2a6d1e851a1ea2fd657406eebfff
Mesa:	(master)cfdf60f236a525a0309146ce2da156bd3856c8b7
Xserver:(master)xorg-server-1.12.99.902
Xf86_video_intel:(master)2.20.1
Libva:	(staging)f12f80371fb534e6bbf248586b3c17c298a31f4e
Libva_intel_driver:(staging)82fa52510a37ab645daaa3bb7091ff5096a20d0b
Kernel:	(drm-intel-next-queued) b17a616d43882fe1b5818748abcbf89af3033f2d

Bug detailed description:
-------------------------
It happens on sandybridge with -queued kernel. It doesn't happen on -fixes kernel.
The last known good commit:1edc2c89df6cc1730cb2329fbecfe041b8dcc2e0
The last known bad commit: b17a616d43882fe1b5818748abcbf89af3033f2d

run ./glxglears
output:
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 152952752
xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0
removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 152952752
xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0
removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 152952752
xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0
3 frames in 6.0 seconds =  0.500 FPS
removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 152952832
xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0
removing GPU device /sys/devices/pci0000:00/0000:00:02.0/drm/card0 152952832
xf86: remove device 0 /sys/devices/pci0000:00/0000:00:02.0/drm/card0
intel_do_flush_locked failed: Input/output error

Calltrace:
[   45.912225]  [<c022310a>] warn_slowpath_common+0x63/0x78
[   45.912251]  [<f81b890a>] ? gen6_gt_check_fifodbg+0x28/0x3e [i915]
[   45.912255]  [<c0223183>] warn_slowpath_fmt+0x26/0x2a
[   45.912280]  [<f81b890a>] gen6_gt_check_fifodbg+0x28/0x3e [i915]
[   45.912305]  [<f81b894b>] __gen6_gt_force_wake_put+0x13/0x15 [i915]
[   45.912328]  [<f81b8971>] gen6_gt_force_wake_put+0x24/0x32 [i915]
[   45.912352]  [<f81bc2fc>] gen6_ring_put_irq+0x9f/0xa6 [i915]
[   45.912370]  [<f8197e02>] __wait_seqno+0x21f/0x2e9 [i915]
[   45.912374]  [<c0239dc6>] ? remove_wait_queue+0x27/0x27
[   45.912393]  [<f819a134>] i915_gem_throttle_ioctl+0x7f/0xa3 [i915]
[   45.912407]  [<f80c0dc1>] drm_ioctl+0x2d8/0x397 [drm]
[   45.912426]  [<f819a0b5>] ? i915_gem_busy_ioctl+0x81/0x81 [i915]
[   45.912431]  [<c02ea934>] ? fsnotify+0x1b2/0x1c8
[   45.912435]  [<c0272655>] ? __audit_syscall_exit+0x32e/0x349
[   45.912440]  [<c0209a72>] ? syscall_trace_leave+0x2d/0x117
[   45.912452]  [<f80c0ae9>] ? drm_copy_field+0x4f/0x4f [drm]
[   45.912455]  [<c02cdaad>] do_vfs_ioctl+0x43b/0x46c
[   45.912459]  [<c022ec4a>] ? recalc_sigpending+0x12/0x39
[   45.912462] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[   45.912464] [drm:i915_reset] *ERROR* Failed to reset chip.
[   45.912470]  [<c022f6c3>] ? __set_task_blocked+0x66/0x6c
[   45.912473]  [<c0227cc5>] ? do_setitimer+0xb0/0x1a1
[   45.912478]  [<c02c15a0>] ? rw_verify_area+0xc3/0xe6
[   45.912481]  [<c02310d3>] ? __set_current_blocked+0x27/0x39
[   45.912483]  [<c02cdb1f>] sys_ioctl+0x41/0x62
[   45.912488]  [<c053d3ac>] sysenter_do_call+0x12/0x22
[   45.912492]  [<c0530000>] ? rcu_init_percpu_data.constprop.4+0x89/0xc2
[   45.912495] ---[ end trace 7153fccb7e47fa3b ]---

Reproduce steps:
----------------------------
1. xinit
2. ./glxgears

Comment 1 Daniel Vetter 2012-07-24 08:50:57 UTC

Judging by the dmesg, the gpu just falls off the earth and doesn't respond to anything any more ...

Comment 2 Daniel Vetter 2012-07-24 08:52:04 UTC

Can you please retest this on latest drm-intel-testing branch? The sha1 is from before I've rebased the drm-intel-next-queued branch.

Comment 3 Daniel Vetter 2012-07-24 08:53:02 UTC

If it is still broken on drm-intel-testing, please bisect where this issue has been introduced.

Comment 4 lu hua 2012-07-25 07:40:25 UTC

It also happens on latest drm-intel-testing branch commit b5430f2760caadd38009.

Comment 5 lu hua 2012-07-25 07:43:06 UTC

Bisect shows:74792b53cfc2f235bc0e2eef39029817dc2cb726 is the first bad commit
commit 74792b53cfc2f235bc0e2eef39029817dc2cb726
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Sun Jul 15 09:42:38 2012 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Fri Jul 20 12:21:42 2012 +0200

    drm/i915: Workaround hang with BSD and forcewake on SandyBridge

    For reasons that are not apparent to anybody, 990bbdadaba (drm/i915:
    Group the GT routines together in both code and vtable) breaks the use
    of the BitStream Decoder ring on SandyBridge. The active ingredient of
    that patch is the conversion from a udelay(10) to a udelay(1) in the
    busy-wait loop of waiting for the forcewake acknowledge. If we restore
    that udelay(10) or insert another udelay(1) afterwards (or any wait
    longer than 250ns) everything works again. An alternative is also to
    remove any delay from the busy-wait loop.

    Given that in the atomic sections we want to complete the wait as quick
    as possible to avoid blocking the CPU for too long, it makes sense to
    remove the delay altogether and simply spin on the exit condition until
    it completes. So we replace the udelay(1) with cpu_relax().

    Papers over regression from

    commit 990bbdadabaa51828e475eda86ee5720a4910cc3
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Mon Jul 2 11:51:02 2012 -0300

        drm/i915: Group the GT routines together in both code and vtable

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Comment 6 Chris Wilson 2012-07-25 08:00:29 UTC

Confirmed the revert.

Comment 7 Chris Wilson 2012-07-25 08:24:09 UTC

I suspect in my case, it is just the GPU never completing the ACK that is then hard hanging the machine.

Comment 8 Daniel Vetter 2012-07-25 09:27:02 UTC

Created attachment 64646 [details] [review]
use 10 usec delay for snb forcewake ack

Can you please test this patch?

Comment 9 Daniel Vetter 2012-07-25 09:29:02 UTC

Created attachment 64648 [details] [review]
use 10 usec delay for snb forcewake ack v2

Meh, I've attached the wrong patch.

Comment 10 Chris Wilson 2012-07-25 10:07:37 UTC

This is only affecting one particular machine of mine, an i5-2500 gt1 desktop. The two laptops I have work fine, and non-GL x11perf/cairo-traces work fine.

Hmm, I suspect the difference between when I was originally playing with the patch and dinq is the enabling of contexts....

Comment 11 Chris Wilson 2012-07-25 10:25:01 UTC

Assuming I can trust dev_priv->no_hw_contexts, the switch to hw contexts (along at least) is not a factor.

Comment 12 lu hua 2012-07-26 07:49:37 UTC

Retest on -queued kernel commit f27b92651d72e863c3 with the patch. Issue still exists.

Comment 13 Daniel Vetter 2012-07-26 09:00:00 UTC

Created attachment 64704 [details] [review]
use posting reads

Another patch for you to try. If this one here works, please also check whether #51738 is also fixed by this.

Comment 14 Chris Wilson 2012-07-26 09:19:52 UTC

A posting flush on top of the dropped patch is sufficient to make glxgears happy. I would prefer to use the cpu_relax() patch as it is closer to kernel documentation on how to busy-wait.

Comment 15 Chris Wilson 2012-07-26 09:21:11 UTC

Your patch is overkill in that it address issues never seen nor should happen without a substantial elephant. Unless you think where there is one, there is a herd.

Comment 16 Daniel Vetter 2012-07-26 09:25:56 UTC

Created attachment 64706 [details] [review]
10 usec dely, simpler patch

If the previous patch doesn't help, please also test this one. Again, please ensure that bug #51738 is also fixed.

Comment 17 lu hua 2012-07-27 05:51:34 UTC

This issue still exists on drm-intel-next-queued kernel ab3951eb74e7c33a2f5b7b64d7 with these patch.


#bug 51738 fixed by patch 64704.

Comment 18 Daniel Vetter 2012-07-27 06:26:21 UTC

(In reply to comment #17)
> This issue still exists on drm-intel-next-queued kernel
> ab3951eb74e7c33a2f5b7b64d7 with these patch.

drm-intel-next-queued doesn't contain the patch any more that you've bisected this regression to. Can you please double check whether:
- plain -queued still hangs.
- whether apply patch 64704 really causes this to hang again.

Comment 19 lu hua 2012-07-27 08:01:29 UTC

We have one desktop and one laptap.
This issue goes away on sugarbay desktop with -queued kernel commit f27b92651d72e863c308ea5dca5615fc98e38ca6, but it still exist on huronriver notebook.

Add patch 64704 to drm-intel-next-queued kernel ab3951eb74e7c33a2f5b7b64d7, GPU hangs.

Comment 20 lu hua 2012-07-27 08:04:25 UTC

Add patch 64706 to drm-intel-next-queued kernel ab3951eb74e7c33a2f5b7b64d7, GPU also hangs.

Comment 21 lu hua 2012-07-30 07:36:19 UTC

Created attachment 64937 [details]
HNRnetconsole

Comment 22 lu hua 2012-07-30 07:38:20 UTC

This issue still exists on HuronRiver with -queued kernel ab3951eb74e7c33a2f.I attached the netconsole log.

Comment 23 Chris Wilson 2012-08-04 11:42:26 UTC

Are we now seeing a side-effect from hw contexts conflating this bug report? Just sounds eerily like bug 52429.

Comment 24 Daniel Vetter 2012-08-05 19:53:47 UTC

To check Chris' theory, can you please try the "disable hw context" patch from that bug, i.e. https://bugs.freedesktop.org/attachment.cgi?id=64962 on top of latest -fixes?

Comment 25 lu hua 2012-08-06 08:25:13 UTC

Add the "disable hw context" patch to the latest -queued kernel ab3951eb74e7c, Issue still exists.
Add the "disable hw context" patch to -fixes kernel e844b990b1df924, it also happens.

Comment 26 Daniel Vetter 2012-08-06 08:53:27 UTC

Ok, we need more details on where exactly this does blow up:
- Which platforms does it affect exactly? All snb platforms, or only some (list of pci ids please)?
- Just to make sure, can you please check whether new userspace on an older kernel and also older userspace on new kernels isn't affected? Just to make sure that we're not hunting a mesa regression here.

Also, can you please attach an updated dmesg (from latest drm-intel-fixes), I've (hopefully) smashed all the patches that Chris needs to make his machine work onto that git branch.

Comment 27 lu hua 2012-08-07 04:51:47 UTC

Created attachment 65217 [details]
dmesg on HuronRiver drm-intel-fixes kernel

Comment 28 lu hua 2012-08-07 05:05:07 UTC

Add "disable hw context" patch to -fixes kernel e844b990b1df924, This issue still happens on huronriver laptop. It passes on sugarbay desktop.

Comment 29 Daniel Vetter 2012-08-07 11:50:42 UTC

Can you please be more specific about the platforms you can reproduce this on (i.e. exact pci ids) and on which snb platforms you _can't_ reproduce this on (again, with pci ids)?

At least in our testing here, Chris could only reproduce such hangs on his gt1 ...

Comment 30 lu hua 2012-08-09 03:19:29 UTC

It still hangs on GT1 Desktop i5-2400S and Huronriver laptop(lenovo) i7-2630QM.

It doesn't happen on GT1 desktop i7-2600 and  GT2 Desktop i7-2600K.

Comment 31 Ben Widawsky 2012-08-13 02:30:39 UTC

Umm, nobody cares that there are ATA errors in the netconsole?

Comment 32 Florian Mickler 2012-08-20 21:56:25 UTC

A patch referencing this bug report has been merged in Linux v3.6-rc2:

commit 6af2d180f82151cf3d58952e35a4f96e45bc453a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jul 26 16:24:50 2012 +0200

    drm/i915: fix forcewake related hangs on snb

Comment 33 lu hua 2012-08-22 07:59:39 UTC

This issue still happens  on -queued kernel bd590bef35cd6f9b01.

dmesg:
[  114.236276] Call Trace:
[  114.236288]  [<c02259c6>] warn_slowpath_common+0x63/0x78
[  114.236320]  [<f81bacd5>] ? gen6_gt_check_fifodbg+0x28/0x3e [i915]
[  114.236326]  [<c0225a3f>] warn_slowpath_fmt+0x26/0x2a
[  114.236347]  [<f81bacd5>] gen6_gt_check_fifodbg+0x28/0x3e [i915]
[  114.236372]  [<f81bad28>] __gen6_gt_force_wake_put+0x1c/0x1e [i915]
[  114.236393]  [<f81bad4e>] gen6_gt_force_wake_put+0x24/0x32 [i915]
[  114.236414]  [<f81be61c>] gen6_ring_put_irq+0x9f/0xa6 [i915]
[  114.236430]  [<f8199ed4>] __wait_seqno+0x22d/0x2f7 [i915]
[  114.236435]  [<c023ce96>] ? remove_wait_queue+0x27/0x27
[  114.236451]  [<f8199fe8>] i915_wait_seqno+0x4a/0x53 [i915]
[  114.236466]  [<f819a01e>] i915_gem_object_wait_rendering+0x2d/0x61 [i915]
[  114.236482]  [<f819a601>] i915_gem_object_set_to_gtt_domain+0x3a/0x105 [i915]
[  114.236499]  [<f819ba88>] i915_gem_set_domain_ioctl+0x64/0xa1 [i915]
[  114.236511]  [<f80c1d35>] drm_ioctl+0x2d8/0x397 [drm]
[  114.236527]  [<f819ba24>] ? i915_gem_mmap_gtt_ioctl+0x1a/0x1a [i915]
[  114.236533]  [<c02efa8c>] ? fsnotify+0x1b2/0x1c8
[  114.236537]  [<c02c6591>] ? do_readv_writev+0x118/0x125
[  114.236548]  [<f80c1a5d>] ? drm_copy_field+0x4f/0x4f [drm]
[  114.236552]  [<c02d2a15>] do_vfs_ioctl+0x43b/0x46c
[  114.236556]  [<c04b11df>] ? sys_recv+0x18/0x1a
[  114.236560]  [<c02d2a87>] sys_ioctl+0x41/0x62
[  114.236564]  [<c05488cc>] sysenter_do_call+0x12/0x22
[  114.236567] ---[ end trace 51f0a5ea70333edc ]---
[  114.236584] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[  114.236590] [drm:i915_reset] *ERROR* Failed to reset chip.

Comment 34 Ben Widawsky 2012-08-22 23:27:47 UTC

Can you try mesa 8.0.4?

Comment 35 lu hua 2012-08-23 07:30:38 UTC

Test on -queued kernel cee4ab0284fac1c6da5997802cf2d826898da316 and mesa 8.0.4(commit: c1f4867c89adb1a6b19d66ec8ad146115909f0a7). System also hangs.

dmesg:
[  206.687500] Call Trace:
[  206.687508]  [<c02259c6>] warn_slowpath_common+0x63/0x78
[  206.687528]  [<f81bacd5>] ? gen6_gt_check_fifodbg+0x28/0x3e [i915]
[  206.687534]  [<c0225a3f>] warn_slowpath_fmt+0x26/0x2a
[  206.687554]  [<f81bacd5>] gen6_gt_check_fifodbg+0x28/0x3e [i915]
[  206.687576]  [<f81bad28>] __gen6_gt_force_wake_put+0x1c/0x1e [i915]
[  206.687593]  [<f818c779>] i915_read32+0x5a/0xc8 [i915]
[  206.687611]  [<f819cb9a>] i915_gem_init_ppgtt+0xe2/0x18e [i915]
[  206.687628]  [<f818cfd3>] i915_reset+0xea/0x114 [i915]
[  206.687644]  [<f8190db4>] i915_error_work_func+0x8d/0xc3 [i915]
[  206.687654]  [<c0239307>] process_one_work+0x1b1/0x2d8
[  206.687670]  [<f8190d27>] ? gen6_pm_rps_work+0x83/0x83 [i915]
[  206.687675]  [<c023978b>] worker_thread+0x1e1/0x2a4
[  206.687680]  [<c02436fc>] ? complete+0x34/0x3e
[  206.687685]  [<c02395aa>] ? rescuer_thread+0x15b/0x15b
[  206.687690]  [<c023c9c3>] kthread+0x67/0x6c
[  206.687696]  [<c023c95c>] ? kthread_freezable_should_stop+0x49/0x49
[  206.687700]  [<c0548e36>] kernel_thread_helper+0x6/0xd
[  206.687704] ---[ end trace 84da1372ba356a6a ]---
[  206.705270] [drm:__gen6_gt_force_wake_get] *ERROR* Force wake wait timed out
[  206.707262] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out

Comment 36 Ben Widawsky 2012-08-24 19:35:37 UTC

Can you please try the 3 patches here:

https://patchwork.kernel.org/patch/1372421/
https://patchwork.kernel.org/patch/1372431/
https://patchwork.kernel.org/patch/1372441/

Comment 37 lu hua 2012-08-28 02:52:25 UTC

(In reply to comment #36)
> Can you please try the 3 patches here:
> 
> https://patchwork.kernel.org/patch/1372421/
> https://patchwork.kernel.org/patch/1372431/
> https://patchwork.kernel.org/patch/1372441/

Add the 3 patches to -queued kernel d7c3b937bdf45f0b844400b7bf6fd3ed50bac604.
System also hangs.

Call Trace:
[   67.106977]  [<c02259c6>] warn_slowpath_common+0x63/0x78
[   67.107011]  [<f81bbf6e>] ? gen6_gt_check_fifodbg+0x28/0x3e [i915]
[   67.107021]  [<c0225a3f>] warn_slowpath_fmt+0x26/0x2a
[   67.107053]  [<f81bbf6e>] gen6_gt_check_fifodbg+0x28/0x3e [i915]
[   67.107087]  [<f81bbfaf>] __gen6_gt_force_wake_put+0x13/0x15 [i915]
[   67.107117]  [<f81bbfd5>] gen6_gt_force_wake_put+0x24/0x32 [i915]
[   67.107121] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[   67.107124] [drm:i915_reset] *ERROR* Failed to reset chip.
[   67.107162]  [<f81bf8a0>] gen6_ring_put_irq+0x9f/0xa6 [i915]
[   67.107190]  [<f819a889>] __wait_seqno+0x22d/0x2f7 [i915]
[   67.107200]  [<c023ce96>] ? remove_wait_queue+0x27/0x27
[   67.107229]  [<f819ccf6>] i915_gem_set_domain_ioctl+0xee/0x186 [i915]
[   67.107252]  [<f80c1d35>] drm_ioctl+0x2d8/0x397 [drm]
[   67.107279]  [<f819cc08>] ? i915_gem_mmap_gtt_ioctl+0x1a/0x1a [i915]
[   67.107291]  [<c02efa8c>] ? fsnotify+0x1b2/0x1c8
[   67.107299]  [<c02c6591>] ? do_readv_writev+0x118/0x125
[   67.107319]  [<f80c1a5d>] ? drm_copy_field+0x4f/0x4f [drm]
[   67.107329]  [<c02d2a15>] do_vfs_ioctl+0x43b/0x46c
[   67.107337]  [<c04b11ff>] ? sys_recv+0x18/0x1a
[   67.107345]  [<c02d2a87>] sys_ioctl+0x41/0x62
[   67.107355]  [<c054894c>] sysenter_do_call+0x12/0x22
[   67.107361] ---[ end trace 820f30d165a4debc ]---

Comment 38 Ben Widawsky 2012-08-28 15:54:39 UTC

And can you show the error state again, please? Also, I think Daniel pull the patches into -queued, so you could just try that instead now.

Comment 39 Ben Widawsky 2012-08-28 16:08:41 UTC

(In reply to comment #38)
> And can you show the error state again, please? Also, I think Daniel pull the
> patches into -queued, so you could just try that instead now.

Blarg. I confused this with instdone patches. Forget the -queued comment. Please just send the error state, thanks.

Comment 40 lu hua 2012-08-29 01:10:26 UTC

Created attachment 66255 [details]
error state

Comment 41 lu hua 2012-09-03 06:38:49 UTC

It also happens on -fixes kernel(commit:0fb8728aeb9b67c018fd3573d65d0b2ba9a3e249)

Comment 42 Ben Widawsky 2012-09-24 16:25:08 UTC

Can you confirm this still occurs, and please re-confirm the bisect if it does.

Thanks.

Comment 43 lu hua 2012-09-25 06:29:34 UTC

Created attachment 67665 [details]
SNB dmesg

It still happens on the latest -queued kernel(commit:398b7a1b882a655ee84bd)

Comment 44 shui yangwei 2012-10-15 02:46:48 UTC

This bug also exist with Kernel: (drm-intel-testing)6760818aad5622d7f20d7f1c45d75a8165aeaf24.

Comment 45 shui yangwei 2012-10-15 03:13:51 UTC

Created attachment 68565 [details]
glxgears hang dmesg

Comment 46 Florian Mickler 2012-10-15 20:55:58 UTC

A patch referencing a commit referencing this bug report has been merged in Linux v3.7-rc1:

commit 8dee3eea3ccd3b6c00a8d3a08dd715d6adf737dd
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Sat Sep 1 22:59:50 2012 -0700

    drm/i915: Never read FORCEWAKE

Comment 47 Chris Wilson 2012-10-17 16:57:04 UTC

Can you please retest using drm-intel-nightly/drm-intel-fixes, with specifically this commit:

commit f8f2ac9a76b0f80a6763ca316116a7bab8486997
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Oct 3 19:34:24 2012 -0700

    drm/i915: Fix GT_MODE default value
    
    I can't even find how I figured this might be needed anymore. But sure
    enough, the value I'm reading back on platforms doesn't match what the
    docs recommends.

Comment 48 lu hua 2012-10-19 02:55:29 UTC

Created attachment 68772 [details]
netconsole log

It still happens on -nightly branch(commit ffc170a47b56)
commit ffc170a47b568c28bcad917e03deaf8fee46f4d6
Merge: 6547fef 0a3af26

Comment 49 shui yangwei 2012-10-29 02:53:12 UTC

Kernel: (drm-intel-testing)b9960e75b5a348759c6e8c9ffb3f45e40ad702a5
Some additional commit info:
Merge: 8e740cd 1623392
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Oct 26 21:10:48 2012 +0200


Test with this kernel, issue reproduced

Comment 50 Jesse Barnes 2012-11-14 18:48:46 UTC

If this is an SDV we may be running into early revision differences.  The first revs of SNB used different register offsets for some of the forcewake activity... offset 0xa090 instead of 0xa18c.  Probably not the issue here though. :(

Comment 51 shui yangwei 2012-11-28 02:37:59 UTC

lspci:
-----------------
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

This issue still exists on such HW SNB machine.

Comment 52 Jesse Barnes 2012-12-11 19:19:19 UTC

Random test request:

https://patchwork.kernel.org/patch/1819391/

Comment 53 lu hua 2012-12-12 02:50:14 UTC

(In reply to comment #52)
> Random test request:
> 
> https://patchwork.kernel.org/patch/1819391/


Add this patch into -queued kernel(commit: 20afbda209d708), issue still occurs.

Comment 54 lu hua 2012-12-12 07:21:51 UTC

Disable RC6, issue goes away.

Comment 55 Chris Wilson 2012-12-12 09:27:24 UTC


*** This bug has been marked as a duplicate of bug 50619 ***

Comment 56 Florian Mickler 2013-01-26 10:51:05 UTC

A patch referencing a commit referencing this bug report has been merged in Linux v3.8-rc5:

commit b514407547890686572606c9dfa4b7f832db9958
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Thu Jan 17 10:24:09 2013 +0200

    drm/i915: fix FORCEWAKE posting reads

Comment 57 Jani Nikula 2013-01-28 14:02:38 UTC

(In reply to comment #56)
> A patch referencing a commit referencing this bug report has been merged in
> Linux v3.8-rc5:
> 
> commit b514407547890686572606c9dfa4b7f832db9958
> Author: Jani Nikula <jani.nikula@intel.com>
> Date:   Thu Jan 17 10:24:09 2013 +0200
> 
>     drm/i915: fix FORCEWAKE posting reads

There's one more indirection than you suggest: the above commit references a commit that references a commit that references this bug. I'm sorry but I am unsure whether this is signal or noise...

Comment 58 cancan,feng 2013-02-05 06:34:04 UTC

Environment1:
----------------------------------------
Kernel: (drm-intel-fixes)4518f611ba21ba165ea3714055938a8984a44ff9
Some additional commit info:
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jan 23 16:16:35 2013 +0100

    drm/i915: dump UTS_RELEASE into the error_state

Environment2:
---------------------------------------
Kernel: (drm-intel-next-queued)7d37beaaf3dbc6ff16f4d32a4dd6f8c557c6ab50
Some additional commit info:
Author: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Date:   Fri Feb 1 10:14:20 2013 +0900

    GPU/i915: Fix acpi_bus_get_device() check in drivers/gpu/drm/i915/intel_opregion.c

Description:
----------------------------------------
We found system hang while doing glxgears both in branch -next-queued and -fixes, So I reopened this bug.

Comment 59 cancan,feng 2013-02-05 06:36:51 UTC

Created attachment 74217 [details]
system hang after glxgears---netconsole messages

I attached hang messages by netconsole tool.

Comment 60 Daniel Vetter 2013-02-05 12:03:53 UTC

Feng, just to check: Does disabling rc6 still work around these hangs on the affected machine?

Comment 61 cancan,feng 2013-02-06 05:53:13 UTC

(In reply to comment #60)
> Feng, just to check: Does disabling rc6 still work around these hangs on the
> affected machine?

OK, I have checked it, after disabling rc6 glxgears can work well.

Comment 62 Ben Widawsky 2013-02-14 05:49:55 UTC

Can you attach the error state with a recent kernel?

Comment 63 cancan,feng 2013-02-17 03:39:05 UTC

Created attachment 74962 [details]
system hang after glxgears with recent kernel

Comment 64 cancan,feng 2013-02-17 03:46:23 UTC

(In reply to comment #62)
> Can you attach the error state with a recent kernel?

I have attached error info with a recent kernel, kernel info:

------------------------------------------------
Description :
The Linux Kernel, the operating system core itself
Kernel: (drm-intel-next-queued)e14809c6e68d0b07aa0119affacc3a683c4c48ac
Some additional commit info:
Author: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Date:   Wed Feb 13 22:20:22 2013 +0100

    drm/i915: Set i9xx sdvo clock limits according to specifications

------------------------------------------------
This issue only happens when I enable rc6. So, if I disable rc6, then do glxgears, system can run normally.

Comment 65 Chris Wilson 2013-03-19 07:40:43 UTC

Bug 62141 has the bisect that triggers this reoccurrence.

Comment 66 Chris Wilson 2013-04-17 18:02:00 UTC

Treating the reoccurrence as a distinct bug now being tracked in mesa.

*** This bug has been marked as a duplicate of bug 62141 ***

Comment 67 Elizabeth 2017-10-06 14:48:54 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.