Bug 100375 - forced EDID's can cause a amdgpu to null ptr deref
Summary: forced EDID's can cause a amdgpu to null ptr deref
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/other (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-24 10:36 UTC by Edward O'Callaghan
Modified: 2017-10-15 08:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edward O'Callaghan 2017-03-24 10:36:39 UTC
[  307.570505] [drm] Got external EDID base block and 0 extensions from "edid/768x384.bin" for connector "VGA-1"
[  445.605230] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 60
[  445.605232] Raw EDID:
[  445.605235]          00 ff ff ff ff ff ff 00 39 f6 05 04 16 07 02 00
[  445.605236]          10 17 01 03 81 1e 17 b4 ea c1 e5 a3 57 4e 9c 23
[  445.605237]          1d 50 54 21 08 00 01 01 01 01 01 01 01 01 01 01
[  445.605238]          01 01 01 07 01 01 91 26 4f ff ff ff ff ff ff ff
[  445.605239]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605240]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605240]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605241]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.606369] [drm:amdgpu_connector_dvi_detect [amdgpu]] *ERROR* HDMI-A-1: probed a monitor but no|invalid EDID




 # reboot

INIT: Sending processes the KILL signal
[  521.758143] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  521.765999] IP: [<ffffffff8116984d>] set_root+0x1d/0xa0
[  521.771242] PGD 0 [  521.773080] 
[  521.774580] Oops: 0000 [#1] SMP
[  521.777717] Modules linked in: amdgpu blackmagic_io(PO) ttm backlight hid_sony led_class
[  521.785920] CPU: 2 PID: 3694 Comm: hyperflow-engin Tainted: P           O    4.9.6-gentoo-r1 #1
[  521.794610] Hardware name: BIOSTAR Group A68N-5200/A68N-5200, BIOS 4.6.5 09/03/2015
[  521.802255] task: ffff880225698c40 task.stack: ffffc90000db8000
[  521.808165] RIP: 0010:[<ffffffff8116984d>]  [<ffffffff8116984d>] set_root+0x1d/0xa0
[  521.815828] RSP: 0018:ffffc90000dbb688  EFLAGS: 00010202
[  521.821133] RAX: ffff880225698c40 RBX: ffffc90000dbb7c0 RCX: ffff880225a63400
[  521.828256] RDX: ffffffff81c56e48 RSI: 0000000000000041 RDI: ffffc90000dbb7c0
[  521.835381] RBP: ffffc90000dbb698 R08: 000000000001a980 R09: ffff880225a63400
[  521.842505] R10: ffff880225a80026 R11: 0000000000000010 R12: 0000000000000000
[  521.849630] R13: ffff880225a8201c R14: 0000000000000001 R15: ffff880218826d80
[  521.856755] FS:  00007fc3f57fa700(0000) GS:ffff88022ed00000(0000) knlGS:0000000000000000
[  521.864834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  521.870571] CR2: 0000000000000008 CR3: 0000000001a08000 CR4: 00000000000406e0
[  521.877694] Stack:
[  521.879706]  ffffc90000dbb7c0 0000000000000041 ffffc90000dbb6d8 ffffffff81169b89
[  521.887160]  ffff880220ead600 ffff880225a82000 ffffc90000dbb7c0 ffffc90000dbb8cc
[  521.894612]  0000000000000001 ffff880218826d80 ffffc90000dbb7b0 ffffffff8116c28a
[  521.902067] Call Trace:
[  521.904515]  [<ffffffff81169b89>] path_init+0x1e9/0x330
[  521.909740]  [<ffffffff8116c28a>] path_openat+0x6a/0x1480
[  521.915141]  [<ffffffff81079bdd>] ? default_wake_function+0xd/0x10
[  521.921319]  [<ffffffff8108cddd>] ? __wake_up_common+0x4d/0x80
[  521.927145]  [<ffffffff8116f189>] do_filp_open+0x79/0xd0
[  521.932467]  [<ffffffff8134f298>] ? acpi_driver_match_device+0x3d/0x5d
[  521.938991]  [<ffffffff813d67c4>] ? platform_match+0x24/0xa0
[  521.944644]  [<ffffffff81602d71>] ? klist_next+0x21/0xf0
[  521.949957]  [<ffffffff8115e5df>] file_open_name+0xdf/0x100
[  521.955529]  [<ffffffff8115e62e>] filp_open+0x2e/0x50
[  521.960573]  [<ffffffff81165561>] kernel_read_file_from_path+0x31/0x70
[  521.967092]  [<ffffffff813dffaf>] _request_firmware+0x2ef/0x5a0
[  521.973002]  [<ffffffff813e0292>] request_firmware+0x32/0x50
[  521.978654]  [<ffffffff813a9604>] drm_load_edid_firmware+0x264/0x500
[  521.985001]  [<ffffffff8139e2fc>] drm_helper_probe_single_connector_modes+0x14c/0x4d0
[  521.992826]  [<ffffffff813aa618>] drm_fb_helper_probe_connector_modes.isra.7+0x48/0x70
[  522.000738]  [<ffffffff813ac154>] drm_fb_helper_hotplug_event+0x94/0xd0
[  522.007343]  [<ffffffff813ac34c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x1bc/0x2a0
[  522.015381]  [<ffffffffa00efa50>] ? amdgpu_driver_postclose_kms+0x90/0xd0 [amdgpu]
[  522.022965]  [<ffffffffa01023d5>] amdgpu_fbdev_restore_mode+0x15/0x40 [amdgpu]
[  522.030199]  [<ffffffffa00ef8dd>] amdgpu_driver_lastclose_kms+0xd/0x10 [amdgpu]
[  522.037505]  [<ffffffff813b0286>] drm_lastclose+0x36/0xf0
[  522.042895]  [<ffffffff813b05e5>] drm_release+0x2a5/0x360
[  522.048288]  [<ffffffff81160f7a>] __fput+0xda/0x1e0
[  522.053167]  [<ffffffff811610b9>] ____fput+0x9/0x10
[  522.058039]  [<ffffffff8106e929>] task_work_run+0x79/0xa0
[  522.063438]  [<ffffffff8105731a>] do_exit+0x34a/0xaa0
[  522.068533]  [<ffffffffa00749ed>] ? _ZN10IOWorkLoop8openGateEv+0xd/0x10 [blackmagic_io]
[  522.076524]  [<ffffffff810588d0>] do_group_exit+0x40/0xa0
[  522.081916]  [<ffffffff81062812>] get_signal+0x272/0x5e0
[  522.087246]  [<ffffffffa004093e>] ? _ZN15UserClientClass21getFlushedInputFramesEPcPj+0x1e/0x20 [blackmagic_io]
[  522.097233]  [<ffffffff8101bfd3>] do_signal+0x23/0x5b0
[  522.102395]  [<ffffffffa003683a>] ? _ZN20UserClientClassLinux5ioctlEjm+0x8a/0xa0 [blackmagic_io]
[  522.111193]  [<ffffffffa002d34c>] ? bmio_client_ioctl+0xc/0x10 [blackmagic_io]
[  522.118424]  [<ffffffffa0070af5>] ? __do_global_dtors_aux+0x145/0x540 [blackmagic_io]
[  522.126251]  [<ffffffff81171fab>] ? do_vfs_ioctl+0x8b/0x5a0
[  522.131823]  [<ffffffff810ab5c5>] ? ktime_get_ts64+0x45/0xf0
[  522.137474]  [<ffffffff8100222e>] exit_to_usermode_loop+0x4e/0x80
[  522.143566]  [<ffffffff81002673>] syscall_return_slowpath+0x43/0x50
[  522.149827]  [<ffffffff81608e1f>] entry_SYSCALL_64_fastpath+0x92/0x94
[  522.156264] Code: 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 65 48 8b 04 25 40 c4 00 00 48 89 e5 41 54 53 f6 47 38 40 4c 8b a0 68 05 00 00 74 39 <41> 8b 4c 24 08 f6 c1 01 75 6d 49 8b 54 24 20 4
9 8b 44 24 18 48 
[  522.176216] RIP  [<ffffffff8116984d>] set_root+0x1d/0xa0
[  522.181536]  RSP <ffffc90000dbb688>
[  522.185022] CR2: 0000000000000008
[  522.188333] ---[ end trace d57bf884cf6f4e4c ]---
[  522.192944] Fixing recursive fault but reboot is needed!
Comment 1 Michel Dänzer 2017-03-28 03:09:14 UTC
set_root doesn't look directly related to amdgpu or drm, so this could be memory corruption. KASAN might give more information.

Does this only happen when forcing an invalid EDID?
Comment 2 Edward O'Callaghan 2017-03-28 03:22:33 UTC
(In reply to Michel Dänzer from comment #1)
> set_root doesn't look directly related to amdgpu or drm, so this could be
> memory corruption. KASAN might give more information.
> 
> Does this only happen when forcing an invalid EDID?

Hi Michel,

yes it only happens on shutdown with a EDID blob passed at boot. Actually the EDID blob passed I don't think is invalid, I don't know where it is getting the one in the trace from that could be from perhaps the monitor itself.

Is the kernel there trying to open the EDID blob on a umounted fs?
Comment 3 Michel Dänzer 2017-03-28 03:28:41 UTC
Sounds plausible, in which case it's probably a core DRM or even lower level kernel issue.
Comment 4 Edward O'Callaghan 2017-04-21 06:31:13 UTC
actually this has nothing to do with the EDID I don't believe as not forcing a EDID makes no difference.

The actual root causes is that if a page flip is in progress something races on that fd and causes the null ptr deref:

[   18.281296] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   18.289158] IP: [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.294401] PGD 0 [   18.296239] 
[   18.297739] Oops: 0000 [#1] SMP
[   18.300885] Modules linked in: amdgpu blackmagic_io(PO) ttm backlight hid_sony led_class
[   18.309086] CPU: 2 PID: 3595 Comm: hyperflow-engin Tainted: P           O    4.9.16-gentoo #1
[   18.317605] Hardware name: BIOSTAR Group A68N-5200/A68N-5200, BIOS 4.6.5 09/03/2015
[   18.325248] task: ffff8802255755c0 task.stack: ffffc90008f30000
[   18.331161] RIP: 0010:[<ffffffff81169a8d>]  [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.338823] RSP: 0018:ffffc90008f33688  EFLAGS: 00010202
[   18.344127] RAX: ffff8802255755c0 RBX: ffffc90008f337c0 RCX: ffff880218f12e00
[   18.351252] RDX: ffffffff81c55e08 RSI: 0000000000000041 RDI: ffffc90008f337c0
[   18.358376] RBP: ffffc90008f33698 R08: 0000000018f12e01 R09: ffff880218f12e00
[   18.365501] R10: ffff88021432a024 R11: 0000000000000017 R12: 0000000000000000
[   18.372626] R13: ffff88021432f01c R14: 0000000000000001 R15: ffff880218de8200
[   18.379750] FS:  00007fee18f6d740(0000) GS:ffff88022ed00000(0000) knlGS:0000000000000000
[   18.387827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.393566] CR2: 0000000000000008 CR3: 0000000001a08000 CR4: 00000000000406e0
[   18.400690] Stack:
[   18.402701]  ffffc90008f337c0 0000000000000041 ffffc90008f336d8 ffffffff81169dc9
[   18.410155]  ffff880219f7e300 ffff88021432f000 ffffc90008f337c0 ffffc90008f338cc
[   18.417607]  0000000000000001 ffff880218de8200 ffffc90008f337b0 ffffffff8116c3aa
[   18.425063] Call Trace:
[   18.427510]  [<ffffffff81169dc9>] path_init+0x1e9/0x330
[   18.432735]  [<ffffffff8116c3aa>] path_openat+0x6a/0x1480
[   18.438137]  [<ffffffff81079c3d>] ? default_wake_function+0xd/0x10
[   18.444315]  [<ffffffff8108ce3d>] ? __wake_up_common+0x4d/0x80
[   18.450149]  [<ffffffff8116f3c9>] do_filp_open+0x79/0xd0
[   18.455463]  [<ffffffff8134fba8>] ? acpi_driver_match_device+0x3d/0x5d
[   18.461987]  [<ffffffff813d7164>] ? platform_match+0x24/0xa0
[   18.467639]  [<ffffffff816039f1>] ? klist_next+0x21/0xf0
[   18.472944]  [<ffffffff8115e82f>] file_open_name+0xdf/0x100
[   18.478515]  [<ffffffff8115e87e>] filp_open+0x2e/0x50
[   18.483560]  [<ffffffff811657b1>] kernel_read_file_from_path+0x31/0x70
[   18.490079]  [<ffffffff813e094f>] _request_firmware+0x2ef/0x5a0
[   18.495989]  [<ffffffff813e0c32>] request_firmware+0x32/0x50
[   18.501649]  [<ffffffff813a9f14>] drm_load_edid_firmware+0x264/0x500
[   18.507996]  [<ffffffff8139ec0c>] drm_helper_probe_single_connector_modes+0x14c/0x4d0
[   18.515822]  [<ffffffff813aaf28>] drm_fb_helper_probe_connector_modes.isra.7+0x48/0x70
[   18.523735]  [<ffffffff813aca84>] drm_fb_helper_hotplug_event+0x94/0xd0
[   18.530347]  [<ffffffff813acc7c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x1bc/0x2a0
[   18.538370]  [<ffffffffa01003d5>] amdgpu_fbdev_restore_mode+0x15/0x40 [amdgpu]
[   18.545605]  [<ffffffffa00ed8dd>] amdgpu_driver_lastclose_kms+0xd/0x10 [amdgpu]
[   18.552909]  [<ffffffff813b0bb6>] drm_lastclose+0x36/0xf0
[   18.558300]  [<ffffffff813b0f15>] drm_release+0x2a5/0x360
[   18.563691]  [<ffffffff811611ca>] __fput+0xda/0x1e0
[   18.568561]  [<ffffffff81161309>] ____fput+0x9/0x10
[   18.573435]  [<ffffffff8106e9a9>] task_work_run+0x79/0xa0
[   18.578834]  [<ffffffff8105738a>] do_exit+0x34a/0xaa0
[   18.583886]  [<ffffffff81058940>] do_group_exit+0x40/0xa0
[   18.589277]  [<ffffffff81062892>] get_signal+0x272/0x5e0
[   18.594582]  [<ffffffff8101bfd3>] do_signal+0x23/0x5b0
[   18.599712]  [<ffffffff81061978>] ? do_send_sig_info+0x58/0x70
[   18.605537]  [<ffffffff8100222e>] exit_to_usermode_loop+0x4e/0x80
[   18.611620]  [<ffffffff81002673>] syscall_return_slowpath+0x43/0x50
[   18.617881]  [<ffffffff81609a9f>] entry_SYSCALL_64_fastpath+0x92/0x94
[   18.624327] Code: 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 65 48 8b 04 25 40 c4 00 00 48 89 e5 41 54 53 f6 47 38 40 4c 8b a0 68 05 00 00 74 39 <41> 8b 4c 24 08 f6 c1 01 75 6d 49 8b 54 24 20  
[   18.644280] RIP  [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.649600]  RSP <ffffc90008f33688>
[   18.653086] CR2: 0000000000000008
[   18.656398] ---[ end trace 506f9f2a94b80534 ]---
[   18.661007] Fixing recursive fault but reboot is needed!
Comment 5 Michel Dänzer 2017-04-21 08:38:55 UTC
(In reply to Edward O'Callaghan from comment #4)
> The actual root causes is that if a page flip is in progress something races
> on that fd and causes the null ptr deref:

How did you determine that it's related to a page flip (or amdgpu in the first place)? I don't see the connection between that and set_root.
Comment 6 dwagner 2017-08-13 21:53:59 UTC
(I filed bug https://bugs.freedesktop.org/show_bug.cgi?id=102202 on what might be a related issue.)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.