Bug 68462

Summary: [SNB regression]IGT/gem_evict_alignment system hang
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Ben Widawsky <ben>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: ben, chris
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg on nightly
none
Fix.
none
Fix, mk2.
none
Fix, mk3. none

Description lu hua 2013-08-23 05:18:39 UTC
Created attachment 84495 [details]
dmesg

System Environment:
--------------------------
Platform:  Sandybridge/Ivybridge/Haswell
Kernel: (drm-intel-next-nightly)38a7ce25365492682031e0b68d249684f79de3d4(26b32d7 884020bf)

Bug detailed description:
-----------------------------
It causes  system on sandybridge with -nightly, -queued,-fixes kernel.
It's a new case. It fails on haswell,ivybridge, ironlake.
gem_evict_everything also has this issue.

no output info on sandybridge.

Call trace:
[  573.747870]  [<f81841c4>] ? i915_gem_free_object+0x57/0x171 [i915]
[  573.747971]  [<f809cc27>] ? drm_gem_object_free+0x1a/0x1b [drm]
[  573.748003]  [<f809ce2c>] ? drm_gem_object_handle_unreference_unlocked+0x8a/0xbc [drm]
[  573.748044]  [<f809cf17>] ? drm_gem_handle_delete+0x86/0x8e [drm]
[  573.748076]  [<f809cfcb>] ? drm_gem_dumb_destroy+0x7/0x7 [drm]
[  573.748107]  [<f809ba6d>] ? drm_ioctl+0x222/0x308 [drm]
[  573.748136]  [<f809cfcb>] ? drm_gem_dumb_destroy+0x7/0x7 [drm]
[  573.748168]  [<c02a5cd5>] ? page_add_new_anon_rmap+0x2f/0x8b
[  573.748199]  [<f809b84b>] ? drm_copy_field+0x47/0x47 [drm]
[  573.749638]  [<c02c051c>] ? vfs_ioctl+0x18/0x21
[  573.751069]  [<c02c0ee8>] ? do_vfs_ioctl+0x3ec/0x42c
[  573.752490]  [<c02b1744>] ? kmem_cache_free+0xb7/0xbe
[  573.753904]  [<c02b1744>] ? kmem_cache_free+0xb7/0xbe
[  573.755297]  [<c02c3520>] ? d_kill+0xb3/0xb9
[  573.756673]  [<c02c3520>] ? d_kill+0xb3/0xb9
[  573.758047]  [<c02c3520>] ? d_kill+0xb3/0xb9
[  573.759413]  [<c02c0f71>] ? SyS_ioctl+0x49/0x74
[  573.760783]  [<c08796fa>] ? sysenter_do_call+0x12/0x22
[  573.762157] Code: f0 e8 60 e6 ff ff 8d 73 68 39 73 68 74 0f ba 60 0a 00 00 b8 cd ba 1c f8 e8 4f 79 0a c8 39 73 68 75 28 8b 43 78 8d 53 74 8b 4b 74 <89> 41 04 89 08 8b 87 70 13 00 00 89 97 70 13 00 00 81 c7 6c 13
[  573.765297] EIP: [<f8183531>] i915_vma_unbind+0x118/0x144 [i915] SS:ESP 0068:f4cdfe20
[  573.766798] CR2: 0000000000100104
[  573.768290] ---[ end trace 4b1741a2c9657822 ]---

output on ironlake:
Test assertion failure function copy, file gem_evict_alignment.c:117:
Failed assertion: ret == error

Reproduce steps:
----------------------------
1. ./gem_evict_alignment
Comment 1 Chris Wilson 2013-08-23 12:33:02 UTC
We expect this to fail on -nightly at the moment, but it should pass on -fixes.

Can you please give the full details for the failure from -fixes?
Comment 2 lu hua 2013-08-26 05:24:15 UTC
(In reply to comment #1)
> We expect this to fail on -nightly at the moment, but it should pass on
> -fixes.
> 
> Can you please give the full details for the failure from -fixes?

It works well on -fixes kernel.
Comment 3 Chris Wilson 2013-08-26 19:40:10 UTC
Should be fixed by

commit f833c65abf79c2456fe8e8c487e3d78b9c329daa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Aug 26 11:23:47 2013 +0200

    drm/i915: More vma fixups around unbind/destroy
Comment 4 lu hua 2013-08-27 02:09:22 UTC
Following new cases also fails on PNV,ILK,IVB.HSW with -nightly and -queued kernel and works well on -fixes kernel. 
They also cause call trace on sandybridge with -nightly and -queued kernel.
igt/gem_evict_alignment/minor-interruptible	
igt/gem_evict_alignment/minor-normal	
igt/gem_evict_everything/minor-interruptible	
igt/gem_evict_everything/minor-normal	

run ./gem_evict_alignment --run-subtest minor-interruptible
output on ivb:
Test assertion failure function copy, file gem_evict_alignment.c:117:
Failed assertion: ret == error
Subtest minor-interruptible: FAIL
Comment 5 Daniel Vetter 2013-08-27 07:35:19 UTC
With latest -nightly all subcases should work. Please verify.
Comment 6 lu hua 2013-08-28 05:09:10 UTC
It still happens on latest -nightly kernel.
Comment 7 lu hua 2013-08-28 05:09:49 UTC
Created attachment 84765 [details]
dmesg on nightly
Comment 8 Daniel Vetter 2013-08-28 09:07:44 UTC
Assigning to Ben since apparently he can reproduce this.
Comment 9 Chris Wilson 2013-08-28 13:09:56 UTC
Created attachment 84792 [details] [review]
Fix.
Comment 10 Chris Wilson 2013-08-28 16:25:21 UTC
Created attachment 84800 [details] [review]
Fix, mk2.
Comment 11 Chris Wilson 2013-08-28 21:30:43 UTC
Created attachment 84818 [details] [review]
Fix, mk3.

By popular demand, here's the fixed spurious hunk.
Comment 12 Daniel Vetter 2013-08-29 21:05:40 UTC
Please retest with latest drm-intel-nightly:

commit a2420ab56204e8e86b1b56199daf19dfc2b4c43d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Aug 29 19:50:31 2013 +0200

    drm/i915: Fix list corruption in vma_unbind
Comment 13 lu hua 2013-09-02 07:34:07 UTC
Fixed on latest -nightly branch.
output:
Subtest minor-normal: SUCCESS
Test requirement not met in function major_evictions, file gem_evict_alignment.c:158:
Test requirement: ((uint64_t)count * size / (1024 * 1024) < intel_get_total_ram_mb() * 9 / 10)
Subtest major-normal: SKIP
Subtest minor-interruptible: SUCCESS
Test requirement not met in function major_evictions, file gem_evict_alignment.c:158:
Test requirement: ((uint64_t)count * size / (1024 * 1024) < intel_get_total_ram_mb() * 9 / 10)
Subtest major-interruptible: SKIP
Comment 14 Daniel Vetter 2013-09-02 08:15:38 UTC
Ben, Chris: Do you still hit this oops or can we close this as fixed on latest -nightly?
Comment 15 Chris Wilson 2013-09-02 11:35:41 UTC
gem_evict_alignment still passes and I have not reproduced the error I saw elsewhere.
Comment 16 Ben Widawsky 2013-09-03 23:51:48 UTC
I haven't reproduced the error either; I think it's safe to close, and QA can reopen if it pops back up.
Comment 17 Daniel Vetter 2013-09-04 09:40:42 UTC
Ok, markging this as resolved then.
Comment 18 lu hua 2013-09-05 06:28:28 UTC
Verified.Fixed.
Comment 19 Jari Tahvanainen 2017-08-14 08:30:13 UTC
Moving old bug from Verified to Closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.