Bug 93452 - kernel crash DRI_PRIME=1 HP zbook 14
Summary: kernel crash DRI_PRIME=1 HP zbook 14
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-19 21:41 UTC by polo
Modified: 2019-11-19 09:10 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description polo 2015-12-19 21:41:57 UTC
I have issue with new kernels since 3.19 up 4.1 distribution openSUSE/Fedora but on Arch works stable  for some reasons.  Kernels  crash when i use DRI_PRIME=1 after few minutes (max.  7) and kdump is not created.  I try openSUSE leap with old 3.16.7 and works fine also try  AMD only configuration(AMD as primary GPU) and also works fine.

 4.2.5/4.3.0  DRI_PRIME=1 works different ,  crashing randomly ( two times during day test)

Some more info 
https://bugzilla.opensuse.org/show_bug.cgi?id=954783


(drm.debug=0x0e,dmesg -n 8)

openSUSE 4.1.13 

[  147.171197] [drm:i915_gem_open] 
[  156.538080] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[  156.538093] [drm] PCIE gen 2 link speeds already enabled
[  156.542218] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  156.542345] radeon 0000:03:00.0: WB enabled
[  156.542349] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880036d62c00
[  156.542351] radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880036d62c04
[  156.542354] radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880036d62c08
[  156.542356] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880036d62c0c
[  156.542358] radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff880036d62c10
[  156.543910] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90001435a18
[  156.544585] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[  156.544592] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[  156.781101] [drm] ring test on 0 succeeded in 1 usecs
[  156.781114] [drm] ring test on 1 succeeded in 1 usecs
[  156.781121] [drm] ring test on 2 succeeded in 1 usecs
[  156.781131] [drm] ring test on 3 succeeded in 5 usecs
[  156.781140] [drm] ring test on 4 succeeded in 4 usecs
[  156.956763] [drm] ring test on 5 succeeded in 2 usecs
[  156.956772] [drm] UVD initialized successfully.
[  156.956811] [drm] ib test on ring 0 succeeded in 0 usecs
[  156.956858] [drm] ib test on ring 1 succeeded in 0 usecs
[  156.956901] [drm] ib test on ring 2 succeeded in 0 usecs
[  156.956932] [drm] ib test on ring 3 succeeded in 0 usecs
[  156.956966] [drm] ib test on ring 4 succeeded in 0 usecs
[  157.605812] [drm] ib test on ring 5 succeeded
[  157.614315] snd_hda_intel 0000:03:00.1: Enabling via VGA-switcheroo
[  157.719256] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
[  157.765449] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  163.932378] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  167.286021] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  171.125632] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  174.384673] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  177.111597] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  180.648221] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  183.863776] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  187.519144] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  191.038127] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!

--and crash


3.16.7 (stable)

[   75.808416] [drm:intel_crtc_cursor_set] cursor off
[   76.279634] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[   76.279644] [drm] PCIE gen 2 link speeds already enabled
[   76.283764] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000).
[   76.283904] radeon 0000:03:00.0: WB enabled
[   76.283908] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880238c1dc00
[   76.283910] radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880238c1dc04
[   76.283912] radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880238c1dc08
[   76.283915] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880238c1dc0c
[   76.283917] radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff880238c1dc10
[   76.285473] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90011135a18
[   76.286084] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[   76.286089] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[   76.481930] [drm] ring test on 0 succeeded in 1 usecs
[   76.481942] [drm] ring test on 1 succeeded in 1 usecs
[   76.481948] [drm] ring test on 2 succeeded in 1 usecs
[   76.481956] [drm] ring test on 3 succeeded in 4 usecs
[   76.481964] [drm] ring test on 4 succeeded in 4 usecs
[   76.667487] [drm] ring test on 5 succeeded in 2 usecs
[   76.667497] [drm] UVD initialized successfully.
[   76.667531] [drm] ib test on ring 0 succeeded in 0 usecs
[   76.667561] [drm] ib test on ring 1 succeeded in 0 usecs
[   76.667589] [drm] ib test on ring 2 succeeded in 0 usecs
[   76.667617] [drm] ib test on ring 3 succeeded in 0 usecs
[   76.667644] [drm] ib test on ring 4 succeeded in 0 usecs
[   76.818256] [drm] ib test on ring 5 succeeded
[   76.827403] snd_hda_intel 0000:03:00.1: Enabling via VGA-switcheroo
[   76.827546] snd_hda_intel 0000:03:00.1: irq 69 for MSI/MSI-X
[   76.861702] [drm:radeon_info_ioctl] Invalid request 37
[   76.861713] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[   79.192677] [drm:radeon_info_ioctl] Invalid request 37
[   79.192690] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[   83.737880] [drm:radeon_info_ioctl] Invalid request 37
[   83.737892] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[   89.062058] [drm:radeon_info_ioctl] Invalid request 37
[   89.062070] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[   93.254184] [drm:radeon_info_ioctl] Invalid request 37
[   93.254197] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[   96.922265] [drm:radeon_info_ioctl] Invalid request 37
[   96.922280] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  101.272998] [drm:radeon_info_ioctl] Invalid request 37
[  101.273015] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  104.574944] [drm:radeon_info_ioctl] Invalid request 37
[  104.574969] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  108.408574] [drm:radeon_info_ioctl] Invalid request 37
[  108.408597] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  111.561355] [drm:radeon_info_ioctl] Invalid request 37
[  111.561369] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  117.931164] [drm:intel_crtc_cursor_set] cursor off
[  123.179592] [drm:intel_crtc_cursor_set] cursor off
[  130.923942] [drm:intel_crtc_cursor_set] cursor off


And  4.1.15 on Arch (stable)

[  451.911996] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[  451.912004] [drm] PCIE gen 2 link speeds already enabled
[  451.916190] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[  451.916363] radeon 0000:03:00.0: WB enabled
[  451.916366] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff88023eb98c00
[  451.916368] radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff88023eb98c04
[  451.916369] radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff88023eb98c08
[  451.916371] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff88023eb98c0c
[  451.916373] radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff88023eb98c10
[  451.917925] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90004435a18
[  451.918490] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[  451.918494] [drm:radeon_crtc_handle_flip] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[  452.156274] [drm] ring test on 0 succeeded in 1 usecs
[  452.156282] [drm] ring test on 1 succeeded in 1 usecs
[  452.156287] [drm] ring test on 2 succeeded in 1 usecs
[  452.156297] [drm] ring test on 3 succeeded in 5 usecs
[  452.156305] [drm] ring test on 4 succeeded in 5 usecs
[  452.333329] [drm] ring test on 5 succeeded in 2 usecs
[  452.333338] [drm] UVD initialized successfully.
[  452.333379] [drm] ib test on ring 0 succeeded in 0 usecs
[  452.333412] [drm] ib test on ring 1 succeeded in 0 usecs
[  452.333443] [drm] ib test on ring 2 succeeded in 0 usecs
[  452.333474] [drm] ib test on ring 3 succeeded in 0 usecs
[  452.333504] [drm] ib test on ring 4 succeeded in 0 usecs
[  452.982823] [drm] ib test on ring 5 succeeded
[  452.989772] snd_hda_intel 0000:03:00.1: Enabling via VGA-switcheroo
[  453.094277] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
[  453.120392] [drm:radeon_info_ioctl] macrotile mode array is cik+ only!
[  453.192020] [drm:drm_mode_addfb2] [FB:75]
[  466.009035] snd_hda_intel 0000:03:00.1: Disabling via VGA-switcheroo
Comment 1 polo 2016-02-24 15:31:44 UTC
Same bug present on UBUNTU 15.10.

Debian jessie with backport kernel and backport mesa works stable.


But  something must be different  Debian and Arch  has slightly worse performance than OpenSUSE/UBUNTU (3/4 FPS) but with AMD only configuration or Intel IGP is the same.
Comment 2 polo 2016-06-08 19:58:26 UTC
looks like this bug was backported to  3.18.
(I tested 3.18 back in november 2015 works withou any problem but now crash randomly with DRI_PRIME=1  but is same as 4.1-4.5 crash randomly in hours-days of testing ,  3.19-4.1 crash always after few minutes(max. 10) even latest 4.1.25
Comment 3 polo 2016-06-08 20:30:42 UTC
Or I have another bug and that was just coincidence.
I found something in journal: 

Jun 08 19:47:15 polo kernel: [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
Jun 08 19:47:15 polo kernel: [drm] PCIE gen 2 link speeds already enabled
Jun 08 19:47:15 polo kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: WB enabled
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0
Jun 08 19:47:15 polo kernel: radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0
Jun 08 19:47:15 polo kernel: [drm] ring test on 0 succeeded in 1 usecs
Jun 08 19:47:15 polo kernel: [drm] ring test on 1 succeeded in 1 usecs
Jun 08 19:47:15 polo kernel: [drm] ring test on 2 succeeded in 1 usecs
Jun 08 19:47:15 polo kernel: [drm] ring test on 3 succeeded in 4 usecs
Jun 08 19:47:15 polo kernel: [drm] ring test on 4 succeeded in 4 usecs
Jun 08 19:47:15 polo kernel: [drm] ring test on 5 succeeded in 2 usecs
Jun 08 19:47:15 polo kernel: [drm] UVD initialized successfully.
Jun 08 19:47:15 polo kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Jun 08 19:47:15 polo kernel: [drm] ib test on ring 1 succeeded in 0 usecs
Jun 08 19:47:15 polo kernel: [drm] ib test on ring 2 succeeded in 0 usecs
Jun 08 19:47:15 polo kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Jun 08 19:47:15 polo kernel: [drm] ib test on ring 4 succeeded in 0 usecs
Jun 08 19:47:16 polo kernel: [drm] ib test on ring 5 succeeded
Jun 08 19:47:16 polo kernel: snd_hda_intel 0000:03:00.1: Enabling via VGA-switcheroo
Jun 08 19:47:16 polo kernel: snd_hda_intel 0000:03:00.1: irq 48 for MSI/MSI-X
Jun 08 19:47:26 polo dbus-daemon[903]: Activating service name='org.freedesktop.Notifications'
Jun 08 19:47:26 polo dbus-daemon[903]: Successfully activated service 'org.freedesktop.Notifications'
Jun 08 19:47:45 polo smbd[446]: [2016/06/08 19:47:45.168825,  0] ../source3/printing/print_standard.c:69(std_pcap_cache
Jun 08 19:47:45 polo smbd[446]:   Unable to open printcap file /etc/printcap for read!
Jun 08 19:47:57 polo kernel: general protection fault: 0000 [#1] PREEMPT SMP 
Jun 08 19:47:57 polo kernel: Modules linked in: btrfs xor raid6_pq ufs hfsplus hfs minix ntfs vfat msdos fat jfs xfs li
Jun 08 19:47:57 polo kernel:  pps_core i915 snd_hda_codec_hdmi mei_me mei snd_hda_intel drm_kms_helper snd_hda_controll
Jun 08 19:47:57 polo kernel: CPU: 3 PID: 52 Comm: kworker/3:1 Not tainted 3.18.34-1-lts318 #1
Jun 08 19:47:57 polo kernel: Hardware name: Hewlett-Packard HP ZBook 14/198F, BIOS L71 Ver. 01.36 04/25/2016
Jun 08 19:47:57 polo kernel: Workqueue: events ttm_bo_delayed_workqueue [ttm]
Jun 08 19:47:57 polo kernel: task: ffff88024332da90 ti: ffff8802423b0000 task.ti: ffff8802423b0000
Jun 08 19:47:57 polo kernel: RIP: 0010:[<ffffffffa055b0e8>]  [<ffffffffa055b0e8>] ttm_bo_wait+0x38/0x1a0 [ttm]
Jun 08 19:47:57 polo kernel: RSP: 0018:ffff8802423b3cf8  EFLAGS: 00010206
Jun 08 19:47:57 polo kernel: RAX: ffff88009697e6f0 RBX: 5557565c00002000 RCX: 0000000000000001
Jun 08 19:47:57 polo kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880224450858
Jun 08 19:47:57 polo kernel: RBP: ffff8802423b3d48 R08: ffff88024ead2700 R09: 0000000000000001
Jun 08 19:47:57 polo kernel: R10: 00000000f7757bf0 R11: 0000000000000013 R12: 0000000000000000
Jun 08 19:47:57 polo kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 5555555500000001
Jun 08 19:47:57 polo kernel: FS:  0000000000000000(0000) GS:ffff88024eac0000(0000) knlGS:0000000000000000
Jun 08 19:47:57 polo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 08 19:47:57 polo kernel: CR2: 00002e26fee94000 CR3: 0000000001811000 CR4: 00000000001407e0
Jun 08 19:47:57 polo kernel: Stack:
Jun 08 19:47:57 polo kernel:  0000000000000000 ffff880224450858 ffff88009697e6f0 ffff880243200f58
Jun 08 19:47:57 polo kernel:  ffff8802423b3e38 ffff880224450858 ffff880241df59c0 0000000000000000
Jun 08 19:47:57 polo kernel:  0000000000000001 ffff880224450858 ffff8802423b3d88 ffffffffa055b8ee
Jun 08 19:47:57 polo kernel: Call Trace:
Jun 08 19:47:57 polo kernel:  [<ffffffffa055b8ee>] ttm_bo_cleanup_refs_and_unlock+0x2e/0x200 [ttm]
Jun 08 19:47:57 polo kernel:  [<ffffffffa055bb4d>] ttm_bo_delayed_delete+0x8d/0x240 [ttm]
Jun 08 19:47:57 polo kernel:  [<ffffffffa055bd1f>] ttm_bo_delayed_workqueue+0x1f/0x50 [ttm]
Jun 08 19:47:57 polo kernel:  [<ffffffff8108c88d>] process_one_work+0x14d/0x4b0
Jun 08 19:47:57 polo kernel:  [<ffffffff8108cf68>] worker_thread+0x48/0x4b0
Jun 08 19:47:57 polo kernel:  [<ffffffff8108cf20>] ? init_pwq.part.7+0x10/0x10
Jun 08 19:47:57 polo kernel:  [<ffffffff810921a8>] kthread+0xd8/0xf0
Jun 08 19:47:57 polo kernel:  [<ffffffff810920d0>] ? kthread_worker_fn+0x170/0x170
Jun 08 19:47:57 polo kernel:  [<ffffffff81560f18>] ret_from_fork+0x58/0x90
Jun 08 19:47:57 polo kernel:  [<ffffffff810920d0>] ? kthread_worker_fn+0x170/0x170
Jun 08 19:47:57 polo kernel: Code: 41 55 41 54 41 89 d4 53 41 89 cd 48 83 ec 28 48 8b 87 80 01 00 00 48 89 7d b8 48 8b 
Jun 08 19:47:57 polo kernel: RIP  [<ffffffffa055b0e8>] ttm_bo_wait+0x38/0x1a0 [ttm]
Jun 08 19:47:57 polo kernel:  RSP <ffff8802423b3cf8>
Jun 08 19:47:57 polo kernel: ---[ end trace 075d4025b07533d5 ]---
Jun 08 19:47:57 polo kernel: note: kworker/3:1[52] exited with preempt_count 1
Jun 08 19:47:57 polo kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
Jun 08 19:47:57 polo kernel: IP: [<ffffffff810926e0>] kthread_data+0x10/0x20
Comment 4 Martin Peres 2019-11-19 09:10:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/674.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.