Bug 101791 - [BAT][SKL][GVT-d] igt@drv_module_reload@basic-reload-inject produces a general protection fault
Summary: [BAT][SKL][GVT-d] igt@drv_module_reload@basic-reload-inject produces a genera...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: highest critical
Assignee: Daniel Vetter
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-14 14:48 UTC by Martin Peres
Modified: 2017-07-19 12:43 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: power/Other


Attachments

Description Martin Peres 2017-07-14 14:48:58 UTC
On CI_DRM_2826, the machine fi-skl-gvtdvm produced a general protection fault, when running the test igt@drv_module_reload@basic-reload-inject.

Here is the fault:

[  492.780615] general protection fault: 0000 [#1] PREEMPT SMP
[  492.780622] Modules linked in: i915(-) snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem crct10dif_pclmul crc32_pclmul ghash_clmulni_intel prime_numbers e1000 i2c_piix4 [last unloaded: snd_hda_intel]
[  492.780650] CPU: 0 PID: 175 Comm: kworker/0:3 Tainted: G     U  W       4.12.0-CI-CI_DRM_2826+ #1
[  492.780659] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
[  492.780671] Workqueue: events output_poll_execute
[  492.780677] task: ffff88007aee4f40 task.stack: ffffc900001e0000
[  492.780684] RIP: 0010:drm_setup_crtcs+0x144/0xc00
[  492.780690] RSP: 0018:ffffc900001e3ce8 EFLAGS: 00010202
[  492.780697] RAX: 6b6b6b6b6b6b6b6b RBX: 000000000000002b RCX: 0000000000000001
[  492.780704] RDX: 0000000000000384 RSI: 0000000000000640 RDI: 00000000ffffffff
[  492.780711] RBP: ffffc900001e3d80 R08: 0000000000000001 R09: 0000000000000000
[  492.780718] R10: 00000000000002d0 R11: 0000000000000000 R12: 0000000000000002
[  492.780727] R13: 0000000000000640 R14: 0000000000000384 R15: ffff88007571ddd8
[  492.780734] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  492.780744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  492.780750] CR2: 00007f4ddb07d000 CR3: 0000000074c50000 CR4: 00000000003406f0
[  492.780757] Call Trace:
[  492.780764]  drm_fb_helper_hotplug_event.part.15+0x6e/0x90
[  492.780771]  drm_fb_helper_hotplug_event+0x15/0x20
[  492.780802]  intel_fbdev_output_poll_changed+0x15/0x20 [i915]
[  492.780809]  drm_kms_helper_hotplug_event+0x22/0x30
[  492.780815]  output_poll_execute+0x94/0x190
[  492.780823]  process_one_work+0x1fe/0x670
[  492.780830]  worker_thread+0x201/0x3b0
[  492.780836]  kthread+0x10f/0x150
[  492.780841]  ? process_one_work+0x670/0x670
[  492.780847]  ? kthread_create_on_node+0x40/0x40
[  492.780854]  ret_from_fork+0x27/0x40
[  492.780859] Code: 20 31 db 45 31 e4 85 c0 0f 8e aa 06 00 00 44 8b 6d 94 44 8b 75 90 49 8b 47 28 49 63 d4 44 89 ee 41 83 c4 01 48 8b 04 d0 44 89 f2 <48> 8b 38 48 8b 87 60 01 00 00 ff 50 20 01 c3 45 3b 67 20 7c d6 
[  492.780915] RIP: drm_setup_crtcs+0x144/0xc00 RSP: ffffc900001e3ce8
[  492.781914] ---[ end trace 366146d5da34f1ad ]---

Full results: https://intel-gfx-ci.01.org/CI/CI_DRM_2826/fi-skl-gvtdvm/igt@drv_module_reload@basic-reload-inject.html
Comment 1 Chris Wilson 2017-07-14 15:54:57 UTC
It's a use after free (kasan where are you!) in the async fbdev paths.

CI_DRM_2726 was:

mmit 346fb4e0b9660e2fe888f870608d287e1980f665
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jul 6 15:00:20 2017 +0200

    drm/i915: Protect against deferred fbdev setup
    
    We could probably hit this already with our current async fbdev init,
    but it's much easier to hit this with the new deferred fbdev setup
    that I'm working on polishing.
    
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20170706130023.28417-2-daniel.vetter@ffwll.ch

commit 88be58be886f1215cc73dc8c273c985eecd7385c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jul 6 15:00:19 2017 +0200

    drm/i915/fbdev: Always forward hotplug events
    
    With deferred fbdev setup we always need to forward hotplug events,
    even if fbdev isn't fully set up yet. Otherwise the deferred setup
    will neer happen.
    
    Originally this check was added in
    
    commit c45eb4fed12d278d3619f1904885bd0d7bcbf036 (tag: drm-intel-next-fixes-2016-08-05)
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Wed Jul 13 18:34:45 2016 +0100
    
        drm/i915/fbdev: Check for the framebuffer before use
    
    But the specific case of the hotplug function blowing up was fixed in
    
    commit 50c3dc970a09b3b60422a58934cc27a413288bab
    Author: Daniel Vetter <daniel.vetter@ffwll.ch>
    Date:   Fri Jun 27 17:19:22 2014 +0200
    
        drm/fb-helper: Fix hpd vs. initial config races
    
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20170706130023.28417-1-daniel.vetter@ffwll.ch
Comment 2 Daniel Vetter 2017-07-14 15:59:18 UTC
drm_setup_crtcs+0x144 any way to decode that into a line in a source somewhere?
Comment 3 Daniel Vetter 2017-07-14 16:03:35 UTC
Hm, unload kills the fbdev before we stop the poll work. That's not going to end well.
Comment 4 Daniel Vetter 2017-07-14 16:07:59 UTC
Pretty sure this was broken in

commit 2013bfc0238b9a77f7e9223aed03b1cef1b5cc34
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri Nov 6 15:08:32 2015 +0200

    drm/i915: Do fbdev fini first during unload

There's full comments explaining even why this is not a good idea.

I'll fix this up properly by unregistering first and cleanup later on.
Comment 5 Chris Wilson 2017-07-14 16:20:54 UTC
Looking at that patch again, it is essentially saying it wants the async_synchronize_full() first (now intel_fbdev_sync()). And indeed, calling fini before unregister is prone to disappoints like this.
Comment 6 Daniel Vetter 2017-07-14 19:15:44 UTC
https://patchwork.freedesktop.org/series/27327/

This shold fix it.
Comment 7 Martin Peres 2017-07-17 09:46:09 UTC
Daniel re-sent an updated patchset: https://patchwork.freedesktop.org/series/27337/

The tests are passing now, waiting for the review to happen.
Comment 8 Daniel Vetter 2017-07-19 12:34:59 UTC
Should be fixed with

commit 4f256d8219f230cb11b49343931dc0b2fa8bd149
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Jul 15 00:46:55 2017 +0200

    drm/i915: Fix fbdev unload sequence
Comment 9 Martin Peres 2017-07-19 12:43:28 UTC
Thanks, looks good!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.