Bug 106159

Summary: When connecting or disconnecting a displayport to a DP hub with 4.16.2+ kernel, hard freeze with frozen video output
Product: DRI Reporter: Joel Sass <sass.joel>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: harry.wentland
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dpkg -l |grep mesa
none
Xorg.0.log
none
dmesg output
none
lshw output
none
[PATCH 1/2] drm/amd/display: Update MST edid property every time
none
[PATCH 2/2] drm/amd/display: Check dc_sink every time in MST hotplug
none
amdgpu_dm.c patch I had to manually apply
none
amdgpu_dm_mst_types.c patch I had to manually apply
none
The modified file causing the problem in my comment.
none
This is the dmesg from an ssh session after attaching a monitor to my MST hub
none
Here's the kernel .config I used for making the kernel none

Description Joel Sass 2018-04-20 18:51:17 UTC
Created attachment 138956 [details]
dpkg -l |grep mesa

When connecting a displayport monitor to an active DP hub with working outputs, my workstation experiences a hard freeze requiring a cold reboot. Network services stop responding, including ping and SSH.

root@nope:~# uname -a
Linux nope 4.16.2+ #1 SMP Fri Apr 13 17:51:14 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux

root@nope:~# lsmod |grep -i amdgpu
amdgpu               2695168  2
chash                  16384  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
gpu_sched              20480  1 amdgpu
ttm                    94208  1 amdgpu
drm_kms_helper        143360  1 amdgpu
drm                   348160  6 amdgpu,gpu_sched,ttm,drm_kms_helper

Mesa drivers from padoka PPA
Comment 1 Joel Sass 2018-04-20 18:52:32 UTC
Created attachment 138957 [details]
Xorg.0.log
Comment 2 Joel Sass 2018-04-20 18:52:58 UTC
Created attachment 138958 [details]
dmesg output
Comment 3 Joel Sass 2018-04-20 18:53:40 UTC
Created attachment 138959 [details]
lshw output
Comment 4 Harry Wentland 2018-04-24 19:30:22 UTC
Created attachment 139069 [details] [review]
[PATCH 1/2] drm/amd/display: Update MST edid property every time
Comment 5 Harry Wentland 2018-04-24 19:31:11 UTC
Created attachment 139070 [details] [review]
[PATCH 2/2] drm/amd/display: Check dc_sink every time in MST hotplug

Can you try patches 1 and 2?
Comment 6 Joel Sass 2018-04-24 20:34:19 UTC
Will do! Sorry, I haven't had much time for testing recently.
Comment 7 dwagner 2018-04-24 21:37:30 UTC
I cannot comment on how useful above patches are for the topic of this bug report, but they are helpful for bug report
https://bugs.freedesktop.org/show_bug.cgi?id=103277
Comment 8 Joel Sass 2018-04-25 15:23:10 UTC
Alright! First, I appreciate the help with this.

I decided to just roll with what I was being given and roll the kernel after applying the patches you'd recommended. I downloaded the newest unstable source this morning from here: git://kernel.ubuntu.com/ubuntu/unstable.git

did a make menuconfig to make sure that amdgpu.dc was included, along with custom processor flags for new core2/xeon processors, and then applied the patches you'd mentioned. Sadly, neither of them were verbatim because I've taken too long to get back to working on this, so I had to apply them manually.

You'll find my diff patches attached. Nothing overtly different.

I'm going to start compiling this, and then go to lunch. Thanks!
Comment 9 Joel Sass 2018-04-25 15:24:36 UTC
Created attachment 139100 [details] [review]
amdgpu_dm.c patch I had to manually apply
Comment 10 Joel Sass 2018-04-25 15:25:04 UTC
Created attachment 139101 [details] [review]
amdgpu_dm_mst_types.c patch I had to manually apply
Comment 11 Joel Sass 2018-04-25 20:58:11 UTC
It appears that the patch I manually created isn't working out so hot. During compiling, I'm seeing this error for amdgpu_dm_mst_types.c

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c: In function ‘dm_dp_mst_dc_sink_create’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:205:2: error: ‘dc_sink’ undeclared (first use in this function)
  dc_sink = dc_link_add_remote_sink(
  ^
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:205:2: note: each undeclared identifier is reported only once for each function it appears in
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:209:4: error: ‘init_params’ undeclared (first use in this function)
   &init_params);
    ^
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c: In function ‘dm_dp_mst_get_modes’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:232:28: warning: unused variable ‘init_params’ [-Wunused-variable]
   struct dc_sink_init_data init_params = {
                            ^
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:231:19: warning: unused variable ‘dc_sink’ [-Wunused-variable]
   struct dc_sink *dc_sink;
                   ^
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.c:251:5: error: request for member ‘sink_signal’ in something not a structure or union
     .sink_signal = SIGNAL_TYPE_DISPLAY_PORT_MST };
     ^
scripts/Makefile.build:332: recipe for target 'drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.o' failed
make[4]: *** [drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_mst_types.o] Error 1
scripts/Makefile.build:606: recipe for target 'drivers/gpu/drm/amd/amdgpu' failed
make[3]: *** [drivers/gpu/drm/amd/amdgpu] Error 2
scripts/Makefile.build:606: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:606: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
make[1]: *** Waiting for unfinished jobs....

Could someone take a look at the file I've attached please?
Comment 12 Joel Sass 2018-04-25 21:01:15 UTC
Created attachment 139112 [details]
The modified file causing the problem in my comment.

Looks like there's some missing nomenclature between the patch you'd suggested, and the kernel source I acquired from git://kernel.ubuntu.com/ubuntu/unstable.git
Comment 13 Alex Deucher 2018-04-25 21:15:33 UTC
Try the patches from this branch:
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-4.17
Comment 14 Joel Sass 2018-04-26 13:57:30 UTC
Alex, I just rebooted to this kernel after building. This problem still exists, but it's not a hard freeze!

I'll attach the dmesg showing the error.
Comment 15 Joel Sass 2018-04-26 14:02:06 UTC
Created attachment 139132 [details]
This is the dmesg from an ssh session after attaching a monitor to my MST hub

root@nope:~/errors# uname -a
Linux nope 4.16.0-rc7+ #2 SMP Thu Apr 26 08:45:00 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Kernel git acquired here: https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-4.17

Key message from dmesg:

[  526.900234] Call Trace:
[  526.900280]  dm_dp_mst_get_modes+0xce/0x120 [amdgpu]
[  526.900288]  drm_helper_probe_single_connector_modes+0x199/0x6c0 [drm_kms_helper]
[  526.900294]  ? jbd2_journal_stop+0xf3/0x3e0
[  526.900297]  ? __ext4_journal_stop+0x37/0xa0
[  526.900309]  drm_mode_getconnector+0x2c4/0x300 [drm]
[  526.900314]  ? _cond_resched+0x16/0x40
[  526.900324]  ? drm_mode_connector_property_set_ioctl+0x60/0x60 [drm]
[  526.900333]  drm_ioctl_kernel+0x67/0xb0 [drm]
[  526.900342]  drm_ioctl+0x2a9/0x350 [drm]
[  526.900352]  ? drm_mode_connector_property_set_ioctl+0x60/0x60 [drm]
[  526.900381]  amdgpu_drm_ioctl+0x46/0x80 [amdgpu]
[  526.900385]  do_vfs_ioctl+0xa2/0x5f0
[  526.900389]  ? vfs_write+0x162/0x1a0
[  526.900391]  SyS_ioctl+0x74/0x80
[  526.900395]  do_syscall_64+0x60/0x110
[  526.900399]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Comment 16 Joel Sass 2018-04-26 14:06:51 UTC
Created attachment 139134 [details]
Here's the kernel .config I used for making the kernel

Key changes:

I added AMDGPU module, checked all 4 boxes. Looks like amdgpu.dc switch is gone, assuming that's intentional.

I also checked the optimization for Intel Core 2/Xeon processors kernel checkbox instead of generic x86
Comment 17 Harry Wentland 2018-06-27 15:25:33 UTC
I think we had a fix for that. Can you check if this is still a problem on https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Comment 18 Martin Peres 2019-11-19 08:35:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/348.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.