Bug 100375 - forced EDID's can cause a amdgpu to null ptr deref
Summary: forced EDID's can cause a amdgpu to null ptr deref
Status: RESOLVED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/other (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-24 10:36 UTC by Edward O'Callaghan
Modified: 2019-10-14 13:20 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Edward O'Callaghan 2017-03-24 10:36:39 UTC
[  307.570505] [drm] Got external EDID base block and 0 extensions from "edid/768x384.bin" for connector "VGA-1"
[  445.605230] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 60
[  445.605232] Raw EDID:
[  445.605235]          00 ff ff ff ff ff ff 00 39 f6 05 04 16 07 02 00
[  445.605236]          10 17 01 03 81 1e 17 b4 ea c1 e5 a3 57 4e 9c 23
[  445.605237]          1d 50 54 21 08 00 01 01 01 01 01 01 01 01 01 01
[  445.605238]          01 01 01 07 01 01 91 26 4f ff ff ff ff ff ff ff
[  445.605239]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605240]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605240]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.605241]          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  445.606369] [drm:amdgpu_connector_dvi_detect [amdgpu]] *ERROR* HDMI-A-1: probed a monitor but no|invalid EDID




 # reboot

INIT: Sending processes the KILL signal
[  521.758143] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  521.765999] IP: [<ffffffff8116984d>] set_root+0x1d/0xa0
[  521.771242] PGD 0 [  521.773080] 
[  521.774580] Oops: 0000 [#1] SMP
[  521.777717] Modules linked in: amdgpu blackmagic_io(PO) ttm backlight hid_sony led_class
[  521.785920] CPU: 2 PID: 3694 Comm: hyperflow-engin Tainted: P           O    4.9.6-gentoo-r1 #1
[  521.794610] Hardware name: BIOSTAR Group A68N-5200/A68N-5200, BIOS 4.6.5 09/03/2015
[  521.802255] task: ffff880225698c40 task.stack: ffffc90000db8000
[  521.808165] RIP: 0010:[<ffffffff8116984d>]  [<ffffffff8116984d>] set_root+0x1d/0xa0
[  521.815828] RSP: 0018:ffffc90000dbb688  EFLAGS: 00010202
[  521.821133] RAX: ffff880225698c40 RBX: ffffc90000dbb7c0 RCX: ffff880225a63400
[  521.828256] RDX: ffffffff81c56e48 RSI: 0000000000000041 RDI: ffffc90000dbb7c0
[  521.835381] RBP: ffffc90000dbb698 R08: 000000000001a980 R09: ffff880225a63400
[  521.842505] R10: ffff880225a80026 R11: 0000000000000010 R12: 0000000000000000
[  521.849630] R13: ffff880225a8201c R14: 0000000000000001 R15: ffff880218826d80
[  521.856755] FS:  00007fc3f57fa700(0000) GS:ffff88022ed00000(0000) knlGS:0000000000000000
[  521.864834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  521.870571] CR2: 0000000000000008 CR3: 0000000001a08000 CR4: 00000000000406e0
[  521.877694] Stack:
[  521.879706]  ffffc90000dbb7c0 0000000000000041 ffffc90000dbb6d8 ffffffff81169b89
[  521.887160]  ffff880220ead600 ffff880225a82000 ffffc90000dbb7c0 ffffc90000dbb8cc
[  521.894612]  0000000000000001 ffff880218826d80 ffffc90000dbb7b0 ffffffff8116c28a
[  521.902067] Call Trace:
[  521.904515]  [<ffffffff81169b89>] path_init+0x1e9/0x330
[  521.909740]  [<ffffffff8116c28a>] path_openat+0x6a/0x1480
[  521.915141]  [<ffffffff81079bdd>] ? default_wake_function+0xd/0x10
[  521.921319]  [<ffffffff8108cddd>] ? __wake_up_common+0x4d/0x80
[  521.927145]  [<ffffffff8116f189>] do_filp_open+0x79/0xd0
[  521.932467]  [<ffffffff8134f298>] ? acpi_driver_match_device+0x3d/0x5d
[  521.938991]  [<ffffffff813d67c4>] ? platform_match+0x24/0xa0
[  521.944644]  [<ffffffff81602d71>] ? klist_next+0x21/0xf0
[  521.949957]  [<ffffffff8115e5df>] file_open_name+0xdf/0x100
[  521.955529]  [<ffffffff8115e62e>] filp_open+0x2e/0x50
[  521.960573]  [<ffffffff81165561>] kernel_read_file_from_path+0x31/0x70
[  521.967092]  [<ffffffff813dffaf>] _request_firmware+0x2ef/0x5a0
[  521.973002]  [<ffffffff813e0292>] request_firmware+0x32/0x50
[  521.978654]  [<ffffffff813a9604>] drm_load_edid_firmware+0x264/0x500
[  521.985001]  [<ffffffff8139e2fc>] drm_helper_probe_single_connector_modes+0x14c/0x4d0
[  521.992826]  [<ffffffff813aa618>] drm_fb_helper_probe_connector_modes.isra.7+0x48/0x70
[  522.000738]  [<ffffffff813ac154>] drm_fb_helper_hotplug_event+0x94/0xd0
[  522.007343]  [<ffffffff813ac34c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x1bc/0x2a0
[  522.015381]  [<ffffffffa00efa50>] ? amdgpu_driver_postclose_kms+0x90/0xd0 [amdgpu]
[  522.022965]  [<ffffffffa01023d5>] amdgpu_fbdev_restore_mode+0x15/0x40 [amdgpu]
[  522.030199]  [<ffffffffa00ef8dd>] amdgpu_driver_lastclose_kms+0xd/0x10 [amdgpu]
[  522.037505]  [<ffffffff813b0286>] drm_lastclose+0x36/0xf0
[  522.042895]  [<ffffffff813b05e5>] drm_release+0x2a5/0x360
[  522.048288]  [<ffffffff81160f7a>] __fput+0xda/0x1e0
[  522.053167]  [<ffffffff811610b9>] ____fput+0x9/0x10
[  522.058039]  [<ffffffff8106e929>] task_work_run+0x79/0xa0
[  522.063438]  [<ffffffff8105731a>] do_exit+0x34a/0xaa0
[  522.068533]  [<ffffffffa00749ed>] ? _ZN10IOWorkLoop8openGateEv+0xd/0x10 [blackmagic_io]
[  522.076524]  [<ffffffff810588d0>] do_group_exit+0x40/0xa0
[  522.081916]  [<ffffffff81062812>] get_signal+0x272/0x5e0
[  522.087246]  [<ffffffffa004093e>] ? _ZN15UserClientClass21getFlushedInputFramesEPcPj+0x1e/0x20 [blackmagic_io]
[  522.097233]  [<ffffffff8101bfd3>] do_signal+0x23/0x5b0
[  522.102395]  [<ffffffffa003683a>] ? _ZN20UserClientClassLinux5ioctlEjm+0x8a/0xa0 [blackmagic_io]
[  522.111193]  [<ffffffffa002d34c>] ? bmio_client_ioctl+0xc/0x10 [blackmagic_io]
[  522.118424]  [<ffffffffa0070af5>] ? __do_global_dtors_aux+0x145/0x540 [blackmagic_io]
[  522.126251]  [<ffffffff81171fab>] ? do_vfs_ioctl+0x8b/0x5a0
[  522.131823]  [<ffffffff810ab5c5>] ? ktime_get_ts64+0x45/0xf0
[  522.137474]  [<ffffffff8100222e>] exit_to_usermode_loop+0x4e/0x80
[  522.143566]  [<ffffffff81002673>] syscall_return_slowpath+0x43/0x50
[  522.149827]  [<ffffffff81608e1f>] entry_SYSCALL_64_fastpath+0x92/0x94
[  522.156264] Code: 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 65 48 8b 04 25 40 c4 00 00 48 89 e5 41 54 53 f6 47 38 40 4c 8b a0 68 05 00 00 74 39 <41> 8b 4c 24 08 f6 c1 01 75 6d 49 8b 54 24 20 4
9 8b 44 24 18 48 
[  522.176216] RIP  [<ffffffff8116984d>] set_root+0x1d/0xa0
[  522.181536]  RSP <ffffc90000dbb688>
[  522.185022] CR2: 0000000000000008
[  522.188333] ---[ end trace d57bf884cf6f4e4c ]---
[  522.192944] Fixing recursive fault but reboot is needed!
Comment 1 Michel Dänzer 2017-03-28 03:09:14 UTC
set_root doesn't look directly related to amdgpu or drm, so this could be memory corruption. KASAN might give more information.

Does this only happen when forcing an invalid EDID?
Comment 2 Edward O'Callaghan 2017-03-28 03:22:33 UTC
(In reply to Michel Dänzer from comment #1)
> set_root doesn't look directly related to amdgpu or drm, so this could be
> memory corruption. KASAN might give more information.
> 
> Does this only happen when forcing an invalid EDID?

Hi Michel,

yes it only happens on shutdown with a EDID blob passed at boot. Actually the EDID blob passed I don't think is invalid, I don't know where it is getting the one in the trace from that could be from perhaps the monitor itself.

Is the kernel there trying to open the EDID blob on a umounted fs?
Comment 3 Michel Dänzer 2017-03-28 03:28:41 UTC
Sounds plausible, in which case it's probably a core DRM or even lower level kernel issue.
Comment 4 Edward O'Callaghan 2017-04-21 06:31:13 UTC
actually this has nothing to do with the EDID I don't believe as not forcing a EDID makes no difference.

The actual root causes is that if a page flip is in progress something races on that fd and causes the null ptr deref:

[   18.281296] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   18.289158] IP: [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.294401] PGD 0 [   18.296239] 
[   18.297739] Oops: 0000 [#1] SMP
[   18.300885] Modules linked in: amdgpu blackmagic_io(PO) ttm backlight hid_sony led_class
[   18.309086] CPU: 2 PID: 3595 Comm: hyperflow-engin Tainted: P           O    4.9.16-gentoo #1
[   18.317605] Hardware name: BIOSTAR Group A68N-5200/A68N-5200, BIOS 4.6.5 09/03/2015
[   18.325248] task: ffff8802255755c0 task.stack: ffffc90008f30000
[   18.331161] RIP: 0010:[<ffffffff81169a8d>]  [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.338823] RSP: 0018:ffffc90008f33688  EFLAGS: 00010202
[   18.344127] RAX: ffff8802255755c0 RBX: ffffc90008f337c0 RCX: ffff880218f12e00
[   18.351252] RDX: ffffffff81c55e08 RSI: 0000000000000041 RDI: ffffc90008f337c0
[   18.358376] RBP: ffffc90008f33698 R08: 0000000018f12e01 R09: ffff880218f12e00
[   18.365501] R10: ffff88021432a024 R11: 0000000000000017 R12: 0000000000000000
[   18.372626] R13: ffff88021432f01c R14: 0000000000000001 R15: ffff880218de8200
[   18.379750] FS:  00007fee18f6d740(0000) GS:ffff88022ed00000(0000) knlGS:0000000000000000
[   18.387827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.393566] CR2: 0000000000000008 CR3: 0000000001a08000 CR4: 00000000000406e0
[   18.400690] Stack:
[   18.402701]  ffffc90008f337c0 0000000000000041 ffffc90008f336d8 ffffffff81169dc9
[   18.410155]  ffff880219f7e300 ffff88021432f000 ffffc90008f337c0 ffffc90008f338cc
[   18.417607]  0000000000000001 ffff880218de8200 ffffc90008f337b0 ffffffff8116c3aa
[   18.425063] Call Trace:
[   18.427510]  [<ffffffff81169dc9>] path_init+0x1e9/0x330
[   18.432735]  [<ffffffff8116c3aa>] path_openat+0x6a/0x1480
[   18.438137]  [<ffffffff81079c3d>] ? default_wake_function+0xd/0x10
[   18.444315]  [<ffffffff8108ce3d>] ? __wake_up_common+0x4d/0x80
[   18.450149]  [<ffffffff8116f3c9>] do_filp_open+0x79/0xd0
[   18.455463]  [<ffffffff8134fba8>] ? acpi_driver_match_device+0x3d/0x5d
[   18.461987]  [<ffffffff813d7164>] ? platform_match+0x24/0xa0
[   18.467639]  [<ffffffff816039f1>] ? klist_next+0x21/0xf0
[   18.472944]  [<ffffffff8115e82f>] file_open_name+0xdf/0x100
[   18.478515]  [<ffffffff8115e87e>] filp_open+0x2e/0x50
[   18.483560]  [<ffffffff811657b1>] kernel_read_file_from_path+0x31/0x70
[   18.490079]  [<ffffffff813e094f>] _request_firmware+0x2ef/0x5a0
[   18.495989]  [<ffffffff813e0c32>] request_firmware+0x32/0x50
[   18.501649]  [<ffffffff813a9f14>] drm_load_edid_firmware+0x264/0x500
[   18.507996]  [<ffffffff8139ec0c>] drm_helper_probe_single_connector_modes+0x14c/0x4d0
[   18.515822]  [<ffffffff813aaf28>] drm_fb_helper_probe_connector_modes.isra.7+0x48/0x70
[   18.523735]  [<ffffffff813aca84>] drm_fb_helper_hotplug_event+0x94/0xd0
[   18.530347]  [<ffffffff813acc7c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x1bc/0x2a0
[   18.538370]  [<ffffffffa01003d5>] amdgpu_fbdev_restore_mode+0x15/0x40 [amdgpu]
[   18.545605]  [<ffffffffa00ed8dd>] amdgpu_driver_lastclose_kms+0xd/0x10 [amdgpu]
[   18.552909]  [<ffffffff813b0bb6>] drm_lastclose+0x36/0xf0
[   18.558300]  [<ffffffff813b0f15>] drm_release+0x2a5/0x360
[   18.563691]  [<ffffffff811611ca>] __fput+0xda/0x1e0
[   18.568561]  [<ffffffff81161309>] ____fput+0x9/0x10
[   18.573435]  [<ffffffff8106e9a9>] task_work_run+0x79/0xa0
[   18.578834]  [<ffffffff8105738a>] do_exit+0x34a/0xaa0
[   18.583886]  [<ffffffff81058940>] do_group_exit+0x40/0xa0
[   18.589277]  [<ffffffff81062892>] get_signal+0x272/0x5e0
[   18.594582]  [<ffffffff8101bfd3>] do_signal+0x23/0x5b0
[   18.599712]  [<ffffffff81061978>] ? do_send_sig_info+0x58/0x70
[   18.605537]  [<ffffffff8100222e>] exit_to_usermode_loop+0x4e/0x80
[   18.611620]  [<ffffffff81002673>] syscall_return_slowpath+0x43/0x50
[   18.617881]  [<ffffffff81609a9f>] entry_SYSCALL_64_fastpath+0x92/0x94
[   18.624327] Code: 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 65 48 8b 04 25 40 c4 00 00 48 89 e5 41 54 53 f6 47 38 40 4c 8b a0 68 05 00 00 74 39 <41> 8b 4c 24 08 f6 c1 01 75 6d 49 8b 54 24 20  
[   18.644280] RIP  [<ffffffff81169a8d>] set_root+0x1d/0xa0
[   18.649600]  RSP <ffffc90008f33688>
[   18.653086] CR2: 0000000000000008
[   18.656398] ---[ end trace 506f9f2a94b80534 ]---
[   18.661007] Fixing recursive fault but reboot is needed!
Comment 5 Michel Dänzer 2017-04-21 08:38:55 UTC
(In reply to Edward O'Callaghan from comment #4)
> The actual root causes is that if a page flip is in progress something races
> on that fd and causes the null ptr deref:

How did you determine that it's related to a page flip (or amdgpu in the first place)? I don't see the connection between that and set_root.
Comment 6 dwagner 2017-08-13 21:53:59 UTC
(I filed bug https://bugs.freedesktop.org/show_bug.cgi?id=102202 on what might be a related issue.)
Comment 7 Martin Peres 2019-10-14 13:20:10 UTC
Hi,

Freedesktop's Bugzilla instance is EOLed and open bugs are about to be migrated to http://gitlab.freedesktop.org.

To avoid migrating out of date bugs, I am now closing all the bugs that did not see any activity in the past year. If the issue is still happening, please create a new bug in the relevant project at https://gitlab.freedesktop.org/drm (use misc by default).

Sorry about the noise!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.