Bug 56916

Summary: [965gm regression] Black screen when under high load on kernel 3.7 (was OK on 3.6)
Product: DRI Reporter: Cedric Godin <cedric>
Component: DRM/IntelAssignee: Imre Deak <imre.deak>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: ben, chris, daniel, florian, hugo, jbarnes, mika.kuoppala, syrjala
Version: XOrg git   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg of the crash
none
i915_error_state after the crash
none
disable unbound tracking
none
disable cpu relocs completely
none
dmesg from crashed 3.7-rc4 with "disable cpu relocs completely" patch
none
config file for the 3.7 kernel
none
make the shrinker less aggressive
none
Overallocate fenced regions
none
Align surface sizes to an even tile row none

Description Cedric Godin 2012-11-09 10:35:23 UTC
Created attachment 69799 [details]
dmesg of the crash

When my laptop is under heavy load (compile, rsync, ...) the screen goes black.
And in the dmesg a lot of messages in the form of:

[  528.932020] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  528.932025] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  529.028586] ------------[ cut here ]------------
[  529.028598] WARNING: at drivers/gpu/drm/i915/intel_display.c:1049 intel_enable_pipe+0x160/0x1b0()
[  529.028600] Hardware name: HP Compaq 6910p (GB950EA#UUG)
[  529.028601] PLL state assertion failure (expected on, current off)
[  529.028603] Modules linked in: tun i2c_i801 acpi_cpufreq mperf hid_generic usbhid hid arc4 snd_hda_codec_analog snd_hda_intel snd_hda_codec 8250_pci kvm_intel iwl4965 snd_pcm iwlegacy snd_page_alloc mac80211 snd_timer hp_accel 8250_core lis3lv02d e1000e cfg80211 kvm snd hp_wmi serial_core battery psmouse sr_mod cdrom input_polldev sparse_keymap uhci_hcd ac wmi
[  529.028632] Pid: 2605, comm: upowerd Not tainted 3.7.0-rc2-00008-g0390c88 #1
[  529.028634] Call Trace:
[  529.028640]  [<c10306d8>] ? warn_slowpath_common+0x78/0xb0
[  529.028643]  [<c125c1e0>] ? intel_enable_pipe+0x160/0x1b0
[  529.028645]  [<c125c1e0>] ? intel_enable_pipe+0x160/0x1b0
[  529.028648]  [<c10307a3>] ? warn_slowpath_fmt+0x33/0x40
[  529.028650]  [<c125c1e0>] ? intel_enable_pipe+0x160/0x1b0
[  529.028653]  [<c125ebf0>] ? i9xx_crtc_mode_set+0xc40/0x1240
[  529.028656]  [<c12635b5>] ? intel_set_mode+0x525/0x870
[  529.028661]  [<c1264820>] ? intel_get_load_detect_pipe+0x2b0/0x3a0
[  529.028665]  [<c12f1d08>] ? bit_xfer+0x178/0x4c0
[  529.028670]  [<c10eebf4>] ? __d_instantiate_unique+0xe4/0x130
[  529.028673]  [<c10d36a5>] ? kmem_cache_alloc+0x55/0xa0
[  529.028677]  [<c112d0e9>] ? sysfs_open_file+0x179/0x240
[  529.028680]  [<c10a9613>] ? prep_new_page+0x113/0x1d0
[  529.028685]  [<c127ca20>] ? intel_tv_detect+0x80/0x3f0
[  529.028688]  [<c10a9843>] ? get_page_from_freelist+0x173/0x3e0
[  529.028693]  [<c1230e65>] ? status_show+0x35/0x80
[  529.028696]  [<c1230e30>] ? dpms_show+0x50/0x50
[  529.028700]  [<c128a298>] ? dev_attr_show+0x18/0x50
[  529.028702]  [<c112d237>] ? sysfs_read_file+0x87/0x140
[  529.028705]  [<c10dae55>] ? do_sys_open+0x165/0x1c0
[  529.028708]  [<c112d1b0>] ? sysfs_open_file+0x240/0x240
[  529.028710]  [<c10dba5b>] ? vfs_read+0x8b/0x130
[  529.028713]  [<c10dbb4a>] ? sys_read+0x4a/0x90
[  529.028717]  [<c13c55fa>] ? sysenter_do_call+0x12/0x22
[  529.028719] ---[ end trace 1562ac833b2f8043 ]---
[  529.440024] [drm:i915_reset] *ERROR* Failed to reset chip.
[  529.445880] ------------[ cut here ]------------
[  529.445887] WARNING: at drivers/gpu/drm/i915/intel_display.c:1049 intel_enable_pipe+0x160/0x1b0()
[  529.445889] Hardware name: HP Compaq 6910p (GB950EA#UUG)
[  529.445890] PLL state assertion failure (expected on, current off)
[  529.445892] Modules linked in: tun i2c_i801 acpi_cpufreq mperf hid_generic usbhid hid arc4 snd_hda_codec_analog snd_hda_intel snd_hda_codec 8250_pci kvm_intel iwl4965 snd_pcm iwlegacy snd_page_alloc mac80211 snd_timer hp_accel 8250_core lis3lv02d e1000e cfg80211 kvm snd hp_wmi serial_core battery psmouse sr_mod cdrom input_polldev sparse_keymap uhci_hcd ac wmi
[  529.445919] Pid: 2605, comm: upowerd Tainted: G        W    3.7.0-rc2-00008-g0390c88 #1
[  529.445920] Call Trace:

Nothing can't resurrect the screen (switch to console, kill of X), but the machine is still responding on the ssh so I was able to take dmesg, i915_error_state, meminfo, mtrr, slabinfo, swaps, vmallocinfo, vmstat, zoneinfo if any is of need.

I attach the dmesg and the i915_error_state
Last kernel used with the same problem is 3.7-rc4
Comment 1 Cedric Godin 2012-11-09 10:36:17 UTC
Created attachment 69800 [details]
i915_error_state after the crash
Comment 2 Chris Wilson 2012-11-09 10:44:01 UTC
The hang is reminiscent of bug 55984.
Comment 3 Daniel Vetter 2012-11-09 19:56:56 UTC
How quickly can you reproduce this? If you can hit this easily, can you please attempt a bisect to root-cause the commit that introduced the problem for you?
Comment 4 Daniel Vetter 2012-11-09 20:21:23 UTC
Can you also please give us your exact mesa version?
Comment 5 Daniel Vetter 2012-11-10 12:20:34 UTC
Also: Do you have any swap partition enabled?
Comment 6 Cedric Godin 2012-11-11 10:49:58 UTC
I can reproduce it easily,so will attempt a bisect. I have a 2G swap enabled and my mesa version is (I use gentoo with their x11 overlay) :

for mesa : snb-magic-12553-gb534c39
for drm : libdrm-2.4.39-16-g14db948
for xf86-video-intel : 2.20.9-43-gfb5205a (with uxa, no sna)

If you want me to try other versions, no problem.
Comment 7 Daniel Vetter 2012-11-14 09:29:32 UTC
Ok, two things for you to test please:

- Can you please test Chris' fastboot branch from  http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=fastboot Despite it's name it also contains some trickery with memory barrier which might help here.

- Our QA discovered a random corruption issue (bug #56859) and bisected it to

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu <wujianguo@huawei.com>
Date:   Mon Oct 8 16:33:06 2012 -0700

    mm: fix-up zone present pages

  Can you please test whether reverting that commit changes anything?
Comment 8 Cedric Godin 2012-11-14 11:58:08 UTC
I tried with the reverted commit, but the problem is still here.
I'm still bisecting. I did it once but don't think I was correct in doing it because it gave me a merge commit as first bad one :

commit 9db908806b85c1430150fbafe269a7b21b07d15d
Merge: 4d7127d 72f36d5
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Oct 13 13:22:01 2012 -0700

    Merge tag 'md-3.7' of git://neil.brown.name/md

so restarting it.

but before, i'll test the fastboot branch and report.
Comment 9 Cedric Godin 2012-11-14 12:17:23 UTC
just to be sure that this is what I have to test; I did:

git clone http://cgit.freedesktop.org/~ickle/linux-2.6/ -b fastboot fastboot

and have :

v2.6.32-rc1-168511-g7da6bfc

Is that OK ?
Comment 10 Chris Wilson 2012-11-14 12:22:57 UTC
(In reply to comment #9)
> just to be sure that this is what I have to test; I did:
> 
> git clone http://cgit.freedesktop.org/~ickle/linux-2.6/ -b fastboot fastboot
> 
> and have :
> 
> v2.6.32-rc1-168511-g7da6bfc
> 
> Is that OK ?

That's commit 7da6bfcd589270bcd35bfcf0b029403c52e5ad06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 13 11:43:29 2012 +0000

    drm/i915: Only preserve the BIOS modes if they are the preferred ones

which is the tip of fastboot, so it should be fine. Just you are lacking a few tags. :)
Comment 11 Cedric Godin 2012-11-14 13:12:14 UTC
I tested the fastboot branch and it seems stable. Usually after several minutes, it crashes but here not yet.
Do you still want me to finish my bisect or try something else ?
Comment 12 Cedric Godin 2012-11-14 13:25:39 UTC
and of course just after pushing the send button, it crashed :-S
so the fastboot has the problem too. sorry for being too quick to answer
Comment 13 Daniel Vetter 2012-11-15 13:16:07 UTC
Created attachment 70113 [details] [review]
disable unbound tracking

Silly me just noticed that the unbound tracking has been merged into 3.7, not 3.6. This has a big enough impact to explain all kinds of things. Please try the attached patch, thanks.
Comment 14 Cedric Godin 2012-11-16 11:40:24 UTC
I tested it on a 3.7-rc4 (?) kernel, without success.
Comment 15 Daniel Vetter 2012-11-16 18:24:28 UTC
Created attachment 70169 [details] [review]
disable cpu relocs completely

I'm not completely sure, but I think we haven't ruled this one out yet. Please test, thanks
Comment 16 Daniel Vetter 2012-11-17 10:54:16 UTC
For reference, please attach a full dmesg, thanks.
Comment 17 Daniel Vetter 2012-11-18 16:05:40 UTC
Ping for dmesg - we have similar reports spanning a few different platforms, and we're trying to hunt down common patterns. Kernel version really doesn't matter.
Comment 18 Cedric Godin 2012-11-19 09:12:39 UTC
Sorry for the late answer. I tested the patch with the same result.
I attach the dmesg from this boot.
Comment 19 Cedric Godin 2012-11-19 09:14:17 UTC
Created attachment 70247 [details]
dmesg from crashed 3.7-rc4 with "disable cpu relocs completely" patch
Comment 20 Daniel Vetter 2012-11-19 16:29:57 UTC
Ok, yet another new theory ... please attach your kernel .config, thanks.
Comment 21 Cedric Godin 2012-11-19 16:31:16 UTC
Created attachment 70268 [details]
config file for the 3.7 kernel
Comment 22 Chris Wilson 2012-11-22 09:42:43 UTC
Can you please try the tree from http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug55984 and see if that improves matters?
Comment 23 Cedric Godin 2012-11-22 15:12:50 UTC
Well, after a 3 hours uptime, it didn't yet crash. So it is more stable.
So for me, it's the first 3.7 good kernel. Thanks !
Comment 24 Chris Wilson 2012-11-22 15:16:10 UTC
(In reply to comment #23)
> Well, after a 3 hours uptime, it didn't yet crash. So it is more stable.
> So for me, it's the first 3.7 good kernel. Thanks !

So that I know which of the many branches I labeled as bug55984 today, can you please tell me which commit you are running? Thanks.
Comment 25 Cedric Godin 2012-11-22 15:46:37 UTC
> git describe 
v2.6.32-rc1-157061-g966339d

> uname -r
3.6.0-rc7-157061-g966339d

I hope it's really a 3.7 ;-)
Comment 26 Chris Wilson 2012-11-22 15:55:22 UTC
Oh noes, wrong branch... Sorry.

Interestingly though that is my master branch from just before the merge with 3.7-rc2, so it still has all of the contentious features. However, to focus on the present, presuming you did something like:

$ git remote add ickle -f git://people.freedesktop.org/~ickle/linux-2.6

You want to do a

$ git checkout -b bug56916 ickle/bug59844

build, install, test.
Comment 27 Cedric Godin 2012-11-22 16:06:37 UTC
Just to be sure:

> git remote add ickle -f git://people.freedesktop.org/~ickle/linux-2.6
Updating ickle
remote: Counting objects: 22446, done.
remote: Compressing objects: 100% (7007/7007), done.
remote: Total 21006 (delta 16562), reused 18081 (delta 13994)
Receiving objects: 100% (21006/21006), 3.77 MiB | 175 KiB/s, done.
Resolving deltas: 100% (16562/16562), completed with 653 local objects.
From git://people.freedesktop.org/~ickle/linux-2.6
 * [new branch]      2.6.38     -> ickle/2.6.38
 * [new branch]      845g       -> ickle/845g
 * [new branch]      8xx-cache-coherency -> ickle/8xx-cache-coherency
 * [new branch]      amalgam    -> ickle/amalgam
 * [new branch]      async      -> ickle/async
 * [new branch]      broken-vm  -> ickle/broken-vm
 * [new branch]      bug48652   -> ickle/bug48652
 * [new branch]      bug55984   -> ickle/bug55984
 * [new branch]      derrmr     -> ickle/derrmr
 * [new branch]      direct-gtt -> ickle/direct-gtt
 * [new branch]      drm-intel-fixes -> ickle/drm-intel-fixes
 * [new branch]      drm-intel-next -> ickle/drm-intel-next
 * [new branch]      drm-intel-testing -> ickle/drm-intel-testing
 * [new branch]      fastboot   -> ickle/fastboot
 * [new branch]      fence-pin  -> ickle/fence-pin
 * [new branch]      for-airlied -> ickle/for-airlied
 * [new branch]      for-danvet -> ickle/for-danvet
 * [new branch]      for-imre   -> ickle/for-imre
 * [new branch]      for-jiri   -> ickle/for-jiri
 * [new branch]      gen2-pageflip -> ickle/gen2-pageflip
 * [new branch]      gen3-pageflip -> ickle/gen3-pageflip
 * [new branch]      gtt        -> ickle/gtt
 * [new branch]      intel-next -> ickle/intel-next
 * [new branch]      irq-poll   -> ickle/irq-poll
 * [new branch]      ivb-vsync  -> ickle/ivb-vsync
 * [new branch]      master     -> ickle/master
 * [new branch]      next       -> ickle/next
 * [new branch]      old-queue  -> ickle/old-queue
 * [new branch]      panel-refactor -> ickle/panel-refactor
 * [new branch]      pinleak    -> ickle/pinleak
 * [new branch]      ppgtt      -> ickle/ppgtt
 * [new branch]      reap-mmap-offsets -> ickle/reap-mmap-offsets
 * [new branch]      remove-pipelining -> ickle/remove-pipelining
 * [new branch]      ring-freq  -> ickle/ring-freq
 * [new branch]      scatterlist -> ickle/scatterlist
 * [new branch]      set-cache-level -> ickle/set-cache-level
 * [new branch]      snb        -> ickle/snb
 * [new branch]      stolen     -> ickle/stolen
 * [new branch]      stutter    -> ickle/stutter
 * [new branch]      total-gtt  -> ickle/total-gtt
 * [new branch]      unbound    -> ickle/unbound
 * [new branch]      unbound-cache -> ickle/unbound-cache
 * [new branch]      upstream   -> ickle/upstream
 * [new branch]      vm         -> ickle/vm
 * [new branch]      vmap       -> ickle/vmap
 * [new branch]      wait-seqno -> ickle/wait-seqno
 * [new branch]      xv-overlay -> ickle/xv-overlay
 * [new branch]      xv-pinleak -> ickle/xv-pinleak

> git describe 
v3.7-rc4
> git checkout -b bug56916 ickle/bug59844
fatal: Cannot update paths and switch to branch 'bug56916' at the same time.
Did you intend to checkout 'ickle/bug59844' which can not be resolved as commit?
> git checkout -b bug56916 
M       drivers/gpu/drm/i915/i915_gem_execbuffer.c
Switched to a new branch 'bug56916'
> git describe 
v3.7-rc4
> git branch 
* bug56916
  master
  radeon

Is it ok (yes, I'm a new comer to "more advanced git usage") ?
If so, I'll build, install, test ;-)
Comment 28 Chris Wilson 2012-11-22 16:10:04 UTC
Gah, I meant bug55984. So,

$ git checkout bug56916
$ git reset --hard ickle/bug55984
Comment 29 Cedric Godin 2012-11-22 16:16:25 UTC
> git checkout bug56916
M       drivers/gpu/drm/i915/i915_gem_execbuffer.c
Already on 'bug56916'
> git reset --hard ickle/bug55984
HEAD is now at 889b020 drm/i915: Avoid forcing relocations through the mappable GTT or CPU
> git describe 
v3.7-rc5-209-g889b020

Hope it's ok now, sorry if I'm "slow" :-)
Comment 30 Cedric Godin 2012-11-23 11:52:02 UTC
ok I tested it and the problem is still there.
Comment 31 Chris Wilson 2012-11-23 12:00:26 UTC
(In reply to comment #30)
> ok I tested it and the problem is still there.

That matches the results I found yesterday as well. So far, ickle/for-imre is the only 3.7 branch that is surviving.
Comment 32 Daniel Vetter 2012-12-19 13:41:53 UTC
Created attachment 71808 [details] [review]
make the shrinker less aggressive

Duct-tape solution if it is one, but imo very much worth a try.
Comment 33 Cedric Godin 2012-12-20 10:07:03 UTC
i will try it asap.
I redid several bisects that pointed to

bf7ad8eeab995710c766df49c9c69a8592ca0216 is the first bad commit
commit bf7ad8eeab995710c766df49c9c69a8592ca0216
Author: Michel Lespinasse <walken@google.com>
Date:   Mon Oct 8 16:30:37 2012 -0700

rbtree: move some implementation details from rbtree.h to rbtree.c

rbtree users must use the documented APIs to manipulate the tree
structure.  Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

it seems to not be the culprit but to expose more the bug.
The only problem I can see (not a de velopper) is that it changes

-static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
-{
-       rb->rb_parent_color = (rb->rb_parent_color & 3) | (unsigned long)p;
-}

to :

+#define rb_color(r)   ((r)->__rb_parent_color & 1)

...

+static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
+{
+       rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
+}

so changing the "& 3" to "& 1".

I tried to apply that change to a working kernel but had no crash and reverting it from 3.7 didn't make a stable kernel either.
Comment 34 Daniel Vetter 2012-12-20 10:10:34 UTC
Hm, that's a very strange bisect - at most this should effect code generation a bit and move a few functions around in the compiled kernel. But we already know that this 3.7 regression is most likely a side-effect of some seemingly unrelated change, which then brings a probably pre-existing bug up.
Comment 35 Cedric Godin 2012-12-20 16:18:12 UTC
so far, so good. I tested the 3.7 kernel with this patch and I can still see my screen, so for me this patch may have a Tested-by from my side  :-)
Thanks
Comment 36 Chris Wilson 2012-12-21 09:08:49 UTC
Created attachment 71909 [details] [review]
Overallocate fenced regions

So, that patch just has the effect of changing the eviction order so that cached bo are no longer preferentially thrown out. All pointing towards a latent bug elsewhere.

The error-states in https://bugzilla.redhat.com/show_bug.cgi?id=877461 follow the same pattern as I've observed with invalid surface sizes (an EU is idle waiting for the never-returning sampler, whilst all other EU are busy stalling for the shared resource). So based on that observation, let's attach surface allocation and please try the attached patch for the DDX (UXA).
Comment 37 Chris Wilson 2012-12-21 12:02:09 UTC
Also available for testing: https://patchwork.kernel.org/patch/1896161/

If the suggestion is that memory layout and eviction, play a critical row, above at least is one genuine bug that we can fix.
Comment 38 Chris Wilson 2012-12-21 13:50:50 UTC
Created attachment 71931 [details] [review]
Align surface sizes to an even tile row

A slightly more refined patch.
Comment 39 Cedric Godin 2012-12-26 11:31:41 UTC
(In reply to comment #36)
> Created attachment 71909 [details] [review] [review]
> Overallocate fenced regions

Do you want me to test it with a crashing 3.7 kernel and a 2.20.16-48-g52fd223 + patch intel driver ?
And for the 2 other patches, which one should I test now ? both together, the last one only ?
Comment 40 Chris Wilson 2012-12-26 11:34:28 UTC
(In reply to comment #39)
> (In reply to comment #36)
> > Created attachment 71909 [details] [review] [review] [review]
> > Overallocate fenced regions
> 
> Do you want me to test it with a crashing 3.7 kernel and a
> 2.20.16-48-g52fd223 + patch intel driver ?
> And for the 2 other patches, which one should I test now ? both together,
> the last one only ?

So far, we have a positive report for combining the xf86-video-intel patch and the kernel eviction fix, that is both of the patches from comment 37 and 38. So please try that combination first.
Comment 41 Cedric Godin 2012-12-27 18:05:55 UTC
I tested successfully the intel driver (2.20.16-48-g52fd223) with the patch and a 3.7 kernel without the patch and it seems to be enough for me. Will patch the kernel and retest.
Comment 42 Chris Wilson 2012-12-30 10:39:14 UTC
xf86-video-intel commit 736b89504a32239a0c7dfb5961c1b8292dd744bd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Dec 30 10:32:18 2012 +0000

    uxa: Align surface allocations to even tile rows
    
    Align surface sizes to an even number of tile rows to cater for sampler
    prefetch. If we read beyond the last page we may catch the PTE in a
    state of flux and trigger a GPU hang. Also detected by enabling invalid
    PTE access checking.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
    References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk
Comment 43 Cedric Godin 2013-01-03 10:07:55 UTC
Grr, this bug is driving me crazy. I just hit it again today by visiting the web page referenced in http://thread.gmane.org/gmane.comp.video.dri.devel/78328

So I wanted to patch a 3.7 kernel with your patch from #37 but had this compile error:

CC      drivers/gpu/drm/drm_hashtab.o
CC      drivers/gpu/drm/drm_mm.o
drivers/gpu/drm/drm_mm.c: In function ‘drm_mm_scan_remove_block’:
drivers/gpu/drm/drm_mm.c:612:3: erreur: implicit declaration of function ‘__drm_mm_hole_node_end’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[3]: *** [drivers/gpu/drm/drm_mm.o] Erreur 1
make[2]: *** [drivers/gpu/drm] Erreur 2
make[1]: *** [drivers/gpu] Erreur 2                            
make: *** [drivers] Erreur 2
Comment 44 Chris Wilson 2013-01-03 10:49:44 UTC
Just change the __drm to drm: https://bugs.freedesktop.org/attachment.cgi?id=72022
Comment 45 Cedric Godin 2013-01-07 15:40:50 UTC
Just to let you know that today I had the problem again :-(
Is there a way I can help you ?
Comment 46 Daniel Vetter 2013-01-07 15:41:34 UTC
(In reply to comment #45)
> Just to let you know that today I had the problem again :-(
> Is there a way I can help you ?

Was that with any of the patches discussed applied?
Comment 47 Cedric Godin 2013-01-07 15:46:33 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > Just to let you know that today I had the problem again :-(
> > Is there a way I can help you ?
> 
> Was that with any of the patches discussed applied?

Yes, both the kernel and the intel driver patches were applied.
Comment 48 Daniel Vetter 2013-01-07 15:50:22 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > (In reply to comment #45)
> > > Just to let you know that today I had the problem again :-(
> > > Is there a way I can help you ?
> > 
> > Was that with any of the patches discussed applied?
> 
> Yes, both the kernel and the intel driver patches were applied.

Which on of the two kernel patches? "make the shrinker less aggressive" and/or "drm: Only evict the blocks required to create the requested hole"?
Comment 49 Cedric Godin 2013-01-07 15:55:45 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > (In reply to comment #45)
> > > > Just to let you know that today I had the problem again :-(
> > > > Is there a way I can help you ?
> > > 
> > > Was that with any of the patches discussed applied?
> > 
> > Yes, both the kernel and the intel driver patches were applied.
> 
> Which on of the two kernel patches? "make the shrinker less aggressive"
> and/or "drm: Only evict the blocks required to create the requested hole"?

Patchwork drm: Only evict the blocks required to create the requested hole
Comment 50 Daniel Vetter 2013-01-07 16:06:07 UTC
(In reply to comment #49)
> (In reply to comment #48)
> > Which on of the two kernel patches? "make the shrinker less aggressive"
> > and/or "drm: Only evict the blocks required to create the requested hole"?
> 
> Patchwork drm: Only evict the blocks required to create the requested hole

Can you please test the "make shrinker less aggressive" too? Maybe on top of all the current patches.
Comment 51 Cedric Godin 2013-01-07 16:19:50 UTC
(In reply to comment #50)

...

> Can you please test the "make shrinker less aggressive" too? Maybe on top of
> all the current patches.

Sure, will report if any problem.
Comment 52 Daniel Vetter 2013-01-10 17:15:10 UTC
Everyone please retest with latest drm-intel-fixes from

http://cgit.freedesktop.org/~danvet/drm-intel

I've just merged a bunch of duct-tapes for this issue.
Comment 53 Daniel Vetter 2013-01-14 17:35:45 UTC
Consolidating all gen4/5 i/o related hangs.

*** This bug has been marked as a duplicate of bug 55984 ***
Comment 54 Florian Mickler 2013-01-19 23:00:32 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc4:

commit 93927ca52a55c23e0a6a305e7e9082e8411ac9fa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jan 10 18:03:00 2013 +0100

    drm/i915: Revert shrinker changes from "Track unbound pages"
Comment 55 Jari Tahvanainen 2016-10-07 05:32:01 UTC
Patch merged, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.