Originally reported within DRM/Intel (https://bugs.freedesktop.org/show_bug.cgi?id=92997#c1) and was recommended to seek support with Radeon peopl. The story began a month or so ago whenever I upgraded my Debian testing/sid installation with a new kernel (can dig out later if needed from which version) to 4.3.0 : laptop started to freeze completely, usually when switching displays (to/from external display)/resolution. My laptop is hp zbook 14" with dual GPU: 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars [Radeon HD 8730M] which sits on the docking station (stalls happen without docking I believe) which has a display connected via display port. Running nightly of git://anongit.freedesktop.org/drm-intel from yesterday I have managed to trigger the stall again: turned on 2nd display attached to the docking station (also via displayport), which xrandr doesn't actually see as connected, and usually it just shows the cloning of the 1st display (I guess docking station does it internally). When I turned it off again, which lead X to reset display and almost come back when it stalled... after reboot journalctl -b -1 showed: Nov 18 14:00:49 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) intel(0): resizing framebuffer to 1920x1200 Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): EDID vendor "HWP", prod id 9977 Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Using hsync ranges from config file Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Using vrefresh ranges from config file Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Printing DDC gathered Modelines: Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1920x1200"x0.0 154.00 1920 1968 2000 2080 1200 1203 1209 1235 +hsync -vsync (74.0 kHz eP) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "640x480"x0.0 31.50 640 656 720 840 480 481 484 500 -hsync -vsync (37.5 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "720x400"x0.0 28.32 720 738 846 900 400 412 414 449 -hsync +vsync (31.5 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1280x1024"x0.0 135.00 1280 1296 1440 1688 1024 1025 1028 1066 +hsync +vsync (80.0 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1024x768"x0.0 78.75 1024 1040 1136 1312 768 769 772 800 +hsync +vsync (60.0 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "832x624"x0.0 57.28 832 864 928 1152 624 625 628 667 -hsync -vsync (49.7 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "800x600"x0.0 49.50 800 816 896 1056 600 601 604 625 +hsync +vsync (46.9 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1152x864"x0.0 108.00 1152 1216 1344 1600 864 865 868 900 +hsync +vsync (67.5 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1280x960"x0.0 108.00 1280 1376 1488 1800 960 961 964 1000 +hsync +vsync (60.0 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1600x1000"x60.0 133.14 1600 1704 1872 2144 1000 1001 1004 1035 -hsync +vsync (62.1 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1600x1200"x0.0 162.00 1600 1664 1856 2160 1200 1201 1204 1250 +hsync +vsync (75.0 kHz e) Nov 18 14:00:50 hopa /usr/lib/gdm3/gdm-x-session[2354]: (II) RADEON(G0): Modeline "1680x1050"x0.0 119.00 1680 1728 1760 1840 1050 1053 1059 1080 +hsync -vsync (64.7 kHz e) Nov 18 14:00:50 hopa kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000041 Nov 18 14:00:50 hopa kernel: IP: [<ffffffffa089abeb>] ttm_bo_wait+0x6b/0x170 [ttm] Nov 18 14:00:50 hopa kernel: PGD 35dba067 PUD 35dbb067 PMD 0 Nov 18 14:00:50 hopa kernel: Oops: 0000 [#1] SMP Nov 18 14:00:50 hopa kernel: Modules linked in: fuse ctr ccm rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c loop bnep pci_stub vboxpci( Nov 18 14:00:50 hopa kernel: iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass pcspkr psmouse serio_raw sg i2c_i801 lpc_ich shpchp evdev tpm_infineon mei_me mei snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_ Nov 18 14:00:50 hopa kernel: sd_mod rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng ahci libahci aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_h Nov 18 14:00:50 hopa kernel: CPU: 2 PID: 4571 Comm: kworker/2:0 Tainted: G W O 4.4.0-rc1+ #2 Nov 18 14:00:50 hopa kernel: Hardware name: Hewlett-Packard HP ZBook 14/198F, BIOS L71 Ver. 01.20 07/28/2014 Nov 18 14:00:50 hopa kernel: Workqueue: events ttm_bo_delayed_workqueue [ttm] Nov 18 14:00:50 hopa kernel: task: ffff8804384d7100 ti: ffff880035e14000 task.ti: ffff880035e14000 Nov 18 14:00:50 hopa kernel: RIP: 0010:[<ffffffffa089abeb>] [<ffffffffa089abeb>] ttm_bo_wait+0x6b/0x170 [ttm] Nov 18 14:00:50 hopa kernel: RSP: 0018:ffff880035e17d70 EFLAGS: 00010246 Nov 18 14:00:50 hopa kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 Nov 18 14:00:50 hopa kernel: RDX: 0000000000000ea6 RSI: 0000000000000000 RDI: ffff8800a1626068 Nov 18 14:00:50 hopa kernel: RBP: 0000000000000001 R08: ffff8800a5c5cc78 R09: 0000000000000000 Nov 18 14:00:50 hopa kernel: R10: 0000000000000000 R11: ffff8804300f1dc0 R12: 0000000000000000 Nov 18 14:00:50 hopa kernel: R13: 0000000000000001 R14: ffff8804382d76f8 R15: ffff88031b9b6400 Nov 18 14:00:50 hopa kernel: FS: 0000000000000000(0000) GS:ffff88044ea80000(0000) knlGS:0000000000000000 Nov 18 14:00:50 hopa kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 18 14:00:50 hopa kernel: CR2: 0000000000000041 CR3: 000000043018d000 CR4: 00000000001406e0 Nov 18 14:00:50 hopa kernel: Stack: Nov 18 14:00:50 hopa kernel: 0000000000000ea6 ffff8800a1626068 ffff88044ea959c0 ffff8800a1626068 Nov 18 14:00:50 hopa kernel: 0000000000000001 0000000000000001 ffff880437fcef40 0000000000000000 Nov 18 14:00:50 hopa kernel: 0000000000000001 ffffffffa089b327 0000000000000000 0000000000000001 Nov 18 14:00:50 hopa kernel: Call Trace: Nov 18 14:00:50 hopa kernel: [<ffffffffa089b327>] ? ttm_bo_cleanup_refs_and_unlock+0x27/0x170 [ttm] Nov 18 14:00:50 hopa kernel: [<ffffffffa089b52f>] ? ttm_bo_delayed_delete+0xbf/0x200 [ttm] Nov 18 14:00:50 hopa kernel: [<ffffffffa089b687>] ? ttm_bo_delayed_workqueue+0x17/0x40 [ttm] Nov 18 14:00:50 hopa kernel: [<ffffffff810856ff>] ? process_one_work+0x19f/0x3d0 Nov 18 14:00:50 hopa kernel: [<ffffffff8108597d>] ? worker_thread+0x4d/0x450 Nov 18 14:00:50 hopa kernel: [<ffffffff81085930>] ? process_one_work+0x3d0/0x3d0 Nov 18 14:00:50 hopa kernel: [<ffffffff8108b47d>] ? kthread+0xbd/0xe0 Nov 18 14:00:50 hopa kernel: [<ffffffff8108b3c0>] ? kthread_create_on_node+0x170/0x170 Nov 18 14:00:50 hopa kernel: [<ffffffff8155984f>] ? ret_from_fork+0x3f/0x70 Nov 18 14:00:50 hopa kernel: [<ffffffff8108b3c0>] ? kthread_create_on_node+0x170/0x170 Nov 18 14:00:50 hopa kernel: Code: 85 ff 74 71 41 8b 47 10 ba a6 0e 00 00 85 c0 74 64 31 db eb 0e 83 c3 01 48 85 d2 7e 4f 41 39 5f 10 76 52 48 63 c3 49 8b 6c c7 18 <48> 8b 45 40 a8 01 75 e2 48 8b 45 08 48 8b 40 18 48 85 c0 74 11 Nov 18 14:00:50 hopa kernel: RIP [<ffffffffa089abeb>] ttm_bo_wait+0x6b/0x170 [ttm] Not sure if this particular traceback is associated with all the stalls -- I think that in majority of the cases system stalls before any log/journal gets dumped to the drive so it is not usually accessible after reboot. I also setup xrandr providers to enable external displays connected to the docking station: xrandr --setprovideroffloadsink radeon Intel xrandr --setprovideroutputsource radeon Intel but I think I had stalls prior doing that (but it was with older kernels etc)
Might be similar to bug 92258.
Yep, please try Maarten's patch from bug 92258 for additional information. Also, can you narrow down which kernel version/change introduced the problem, ideally using git bisect?
Thank you Michael for your response! Kernel according to my irc log on #intel-gfx started to happen with upgrade to 4.2.0-1-amd64, and according to old copy of the journal it was Oct 25 11:22:54 hopa kernel: Linux version 3.17-1-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.3 (Debian 4.8.3-13) ) #1 SMP Debian 3.17-1~exp1 (2014-10-14) before that. bisection I guess will be the measure of last resort -- this laptop is the main workhorse and halt is not 100% reproducible patch: applied and rebuilding now. Will report as soon as halts again (will do some forceful playful interaction with external displays tomorrow) or if can't trigger the halt. Thanks!
Created attachment 119987 [details] journalctl output (a bit annonymized) showing details of the session with the crash
reporting on "success": after new patched kernel installation and some ugprades (kept crashing gnome not kernel, so had to upgrade), caused the stall with a bit different but overall similar traceback (full output of journalctl for that boot is attached): Nov 20 10:04:38 hopa kernel: [drm:intel_hdmi_detect] [CONNECTOR:53:HDMI-A-2] Nov 20 10:04:38 hopa kernel: ffff88043c187858 0000000000000001 ffffffff81555c51 ffff88043c678080 Nov 20 10:04:38 hopa kernel: Call Trace: Nov 20 10:04:38 hopa kernel: [<ffffffff8108678d>] ? wq_worker_sleeping+0xd/0x90 Nov 20 10:04:38 hopa kernel: [<ffffffff81555835>] ? __schedule+0x505/0x8f0 Nov 20 10:04:38 hopa kernel: [<ffffffff81555c51>] ? schedule+0x31/0x80 Nov 20 10:04:38 hopa kernel: [<ffffffff8107209c>] ? do_exit+0x72c/0xa90 Nov 20 10:04:38 hopa kernel: [<ffffffff810175ec>] ? oops_end+0x9c/0xd0 Nov 20 10:04:38 hopa kernel: [drm:intel_hdmi_detect] Live status not up! Nov 20 10:04:38 hopa kernel: [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:53:HDMI-A-2] disconnected Nov 20 10:04:38 hopa kernel: [<ffffffff8155b5d8>] ? general_protection+0x28/0x30 Nov 20 10:04:38 hopa kernel: [<ffffffff8140689f>] ? reservation_object_test_signaled_rcu+0xcf/0x220 Nov 20 10:04:38 hopa kernel: [<ffffffff81406ef9>] ? reservation_object_wait_timeout_rcu+0x219/0x260 Nov 20 10:04:38 hopa kernel: [<ffffffffa0832b29>] ? ttm_bo_wait+0x29/0x50 [ttm] Nov 20 10:04:38 hopa kernel: [<ffffffffa0833207>] ? ttm_bo_cleanup_refs_and_unlock+0x27/0x170 [ttm] Nov 20 10:04:38 hopa kernel: [<ffffffffa083340f>] ? ttm_bo_delayed_delete+0xbf/0x200 [ttm] Nov 20 10:04:38 hopa kernel: [<ffffffffa0833567>] ? ttm_bo_delayed_workqueue+0x17/0x40 [ttm] Nov 20 10:04:38 hopa kernel: [<ffffffff810856ff>] ? process_one_work+0x19f/0x3d0 Nov 20 10:04:38 hopa kernel: [<ffffffff8108597d>] ? worker_thread+0x4d/0x450 Nov 20 10:04:38 hopa kernel: [<ffffffff81085930>] ? process_one_work+0x3d0/0x3d0 Nov 20 10:04:38 hopa kernel: [<ffffffff8108b47d>] ? kthread+0xbd/0xe0 Nov 20 10:04:38 hopa kernel: [<ffffffff8108b3c0>] ? kthread_create_on_node+0x170/0x170 Nov 20 10:04:38 hopa kernel: [<ffffffff8155984f>] ? ret_from_fork+0x3f/0x70 Nov 20 10:04:38 hopa kernel: [<ffffffff8108b3c0>] ? kthread_create_on_node+0x170/0x170 Nov 20 10:04:38 hopa kernel: Code: 48 c7 c7 b2 07 80 81 e8 83 39 fe ff e9 bf fe ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 40 04 00 00 <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f Nov 20 10:04:38 hopa kernel: RIP [<ffffffff8108ba2c>] kthread_data+0xc/0x20 Nov 20 10:04:38 hopa kernel: RSP <ffff88043c187b98> Nov 20 10:04:38 hopa kernel: CR2: ffffffffffffffd8 Nov 20 10:04:38 hopa kernel: ---[ end trace 01c0854cd2e7cf2f ]--- Nov 20 10:04:38 hopa kernel: Fixing recursive fault but reboot is needed! To stall it, I had both displays connected where 2nd one was just mirroring the first one. And I have turned off the 2nd display which caused all the mess what could be the next step? ;-) BTW -- with this recent upgrade, now two attached monitors are also seen as an extended desktop (3840x1200) which never happened before, and actually works quite nicely. but then also caused crash (no traceback was recorded and I didn't have remote session attached) using the same trick of turning the 2nd display off
Created attachment 119998 [details] cut/paste terminal output for the 2nd crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000042 The beast crashed again... I don't remember if I had those before -- just that screen went off due to inactivity (may be it was also locked -- I was away from the laptop) and when I came back -- it was stalled. I had ssh session opened at another box watching journalctl -f (nothing in the logs on the drive after reboot). The last messages Nov 20 12:20:04 hopa kernel: [drm:drm_crtc_helper_set_config] attempting to set mode from userspace Nov 20 12:20:04 hopa kernel: [drm:drm_mode_debug_printmodeline] Modeline 57:"" 0 296400 3840 3888 3920 4000 1200 1203 1209 1235 0x0 0x5 Nov 20 12:20:04 hopa kernel: [drm:radeon_encoder_set_active_device] setting active device to 00000008 from 00000008 00000008 for encoder 2 Nov 20 12:20:04 hopa kernel: [drm:drm_crtc_helper_set_mode] [CRTC:29] Nov 20 12:20:04 hopa kernel: [drm:radeon_atom_encoder_dpms] encoder dpms 30 to mode 3, devices 00000080, active_devices 00000000 Nov 20 12:20:04 hopa kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000042
aha -- I think I found what triggered it since I did it again and it stalled probably identically (didn't have remote console :-/): I have ran DISPLAY=:0 0install run -c http://gfxmonk.net/dist/0install/shellshape.xml --replace to try shellshape and whenever it finished downloading, it did smth which triggered the bug, and screens went blank. Probably it is a different, although possibly related, issue since during original stalls I still have smth on the screens. In this case they just go down into suspend mode etc. Do you think I should file a separate report on this one?
I see you have same laptop as me zbook 14, DP on docking station are conected only to AMD GPU. I have DRI_PRIME=1 issue with new kernel (probably start with 3.19) maybe is related https://bugzilla.opensuse.org/show_bug.cgi?id=954783
For me with DRI_PRIME=1 it even sometimes does not render at all... first I thought it happens only with external display, but nope -- also happens straight on laptop screen unpredictably. But no crashes from that so far during my trials
https://wiki.archlinux.org/index.php/PRIME DRI_PRIME=1 need xrandr compositing and crash with 4.1(3.19up) with multiple glmatrix running simultaneously after few minutes or game... 4.3 glmarix works fine for couple hours and crash randomly during gameplay same 4.2 maybe is completly another bug affected 4.2/4.3 kernels.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/663.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.