Bug 106859 - [snb] Graphic corruptions, GPU hang and unable to handle kernel paging request
Summary: [snb] Graphic corruptions, GPU hang and unable to handle kernel paging request
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 18.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-08 11:36 UTC by Laurent Bonnaud
Modified: 2019-09-09 17:38 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Full dmesg (79.57 KB, text/plain)
2018-06-08 11:38 UTC, Laurent Bonnaud
Details
Content from /sys/class/drm/card0/error (39.76 KB, text/plain)
2018-06-08 11:40 UTC, Laurent Bonnaud
Details
Early stages of the corruptions (645.18 KB, image/png)
2018-06-08 11:42 UTC, Laurent Bonnaud
Details
Corruptions amplify (641.10 KB, image/png)
2018-06-08 11:43 UTC, Laurent Bonnaud
Details
More corruption and the clock is now unreadable (426.23 KB, image/png)
2018-06-08 11:44 UTC, Laurent Bonnaud
Details
Full list of "Important Modified Preferences" from about:support (6.57 KB, text/plain)
2018-06-08 18:31 UTC, Laurent Bonnaud
Details
kwin backtace when run with intel_sanitize_gpu (170.48 KB, image/png)
2018-06-14 16:20 UTC, Laurent Bonnaud
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Bonnaud 2018-06-08 11:36:56 UTC
Hi,

I have been using pretty much all mainline Linux kernel versions for several years and most of them were OK with respect to my SandyBridge CPU/GPU, up to and including 4.15.x kernels.

However, with 4.16.x kernels and 4.17 kernel, I see graphic corruptions.  In addition:
 - with 4.16.x kernels my machine ended up freezing completely
 - with 4.17 kernel I saw a GPU hang which I am reporting now

Here is more info:

 - To reproduce this bug: use Plasma desktop for a few hours or days.  It starts with small graphic corruptions that amplify over time (see screenshots below) and 
it ends up either with a freeze or with a GPU hang.

 - uname -a
Linux vougeot 4.17.0-041700-generic #201806032231 SMP Sun Jun 3 22:33:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

This is the mainline kernel provided by Ubuntu: http://kernel.ubuntu.com/~kernel-ppa/mainline/

 - Linux distribution: Ubuntu 18.04 with Mesa 18.0.0~rc5-1ubuntu1 and later Mesa 18.0.5-0ubuntu0~18.04.1 from Ubuntu proposed-updates

 - Machine or mother board model:
from dmidecode:
System Information
        Manufacturer: Dell Inc.
        Product Name: Latitude E6520
        Version: 01
BIOS Information
        Vendor: Dell Inc.
        Version: A19
        Release Date: 11/14/2013
Base Board Information
        Manufacturer: Dell Inc.
        Product Name: 0NVF5K
        Version: A01

from /proc/cpuinfo:
model name      : Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz

 - Display connector: internal LVDS panel

 - dmesg and other attachment to follow below...
Comment 1 Laurent Bonnaud 2018-06-08 11:38:10 UTC
Here is the dmesg excerpt with the error:

[109452.840920] [drm] GPU HANG: ecode 6:0:0x00ffffff, in Web Content [5443], reason: Hang on rcs0, action: reset
[109452.840922] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[109452.840922] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[109452.840922] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[109452.840923] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[109452.840923] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[109452.840961] i915 0000:00:02.0: Resetting chip after gpu hang
Comment 2 Laurent Bonnaud 2018-06-08 11:38:51 UTC
Created attachment 140081 [details]
Full dmesg
Comment 3 Laurent Bonnaud 2018-06-08 11:40:58 UTC
Created attachment 140082 [details]
Content from /sys/class/drm/card0/error
Comment 4 Laurent Bonnaud 2018-06-08 11:42:30 UTC
Created attachment 140083 [details]
Early stages of the corruptions
Comment 5 Laurent Bonnaud 2018-06-08 11:43:25 UTC
Created attachment 140084 [details]
Corruptions amplify
Comment 6 Laurent Bonnaud 2018-06-08 11:44:14 UTC
Created attachment 140085 [details]
More corruption and the clock is now unreadable
Comment 7 Chris Wilson 2018-06-08 11:48:39 UTC
Hmm. I don't think kernel version plays a significant role here other than layout of the GTT. What appears to be going on is the GPU is overwriting random memory, and that is indicative of a userspace bug.
Comment 8 Laurent Bonnaud 2018-06-08 11:50:53 UTC
Note that I use the exact same kernel and a similar Ubuntu installation on an Intel NUC7 with KabyLake CPU/CPU and HDMI output.  This system has no GPU problems.
Comment 9 Laurent Bonnaud 2018-06-08 11:55:48 UTC
Thank you for the analysis!  I updated the Mesa version in the bug's metadata.
Comment 10 Lionel Landwerlin 2018-06-08 17:29:36 UTC
It seems the kernel flags "Web Content" (which I'm guessing is Firefox) as the culprit of the GPU hang.

We have a tool in Mesa called intel_sanitize_gpu to verify out of bounds memory writes from the driver. I'm not sure how easily you can use it with a multiprocess application like Firefox. If you can give a try, it should be as simple as launching 


$ intel_sanitize_gpu firefox

What version of firefox are you using?
Any particular setting that isn't the default?

Thanks!
Comment 11 Chris Wilson 2018-06-08 17:37:32 UTC
(In reply to Lionel Landwerlin from comment #10)
> It seems the kernel flags "Web Content" (which I'm guessing is Firefox) as
> the culprit of the GPU hang.

You have to be wary in this case. This is Sandybridge and it doesn't have segregated per-process memory, so any client can overwrite the memory of another. In this case, the corruption happened to be in a batch submitted by firefox, but that doesn't guarantee that it was firefox doing the stray writes. But it's a good start since to overwrite a batch buffer, the batch buffer must have been submitted to the GPU and waiting on execution; and the execution pipelines are typically short so the number of clients who may have overwritten the batch is small, and the submitter the prime suspect.
Comment 12 Laurent Bonnaud 2018-06-08 18:15:33 UTC
On 06/08/2018 07:29 PM, bugzilla-daemon@freedesktop.org wrote:

> It seems the kernel flags "Web Content" (which I'm guessing is Firefox) as the
> culprit of the GPU hang.

I confirm that I use Firefox.

> We have a tool in Mesa called intel_sanitize_gpu to verify out of bounds memory
> writes from the driver. 

Great, thanks a lot for the suggestion!

> I'm not sure how easily you can use it with a
> multiprocess application like Firefox. If you can give a try, it should be as
> simple as launching 
> 
> $ intel_sanitize_gpu firefox

I'll try this on Monday...

I will also try running Plasma without Firefox to check if I still see graphic corruptions...  Graphic corruptions are my main problem because I do not know how to recover from them without rebooting, whereas the kernel is able to recover by itself from the GPU hang :>.

> What version of firefox are you using?

60.0.1 (Ubuntu build) or 60.0.2 (Mozilla build).  I also sometimes try beta and nightly versions from Mozilla, and snap and flatpak builds.

> Any particular setting that isn't the default?

In about:support I have:

  Compositing	OpenGL

instead of the default

  Compositing	Basic

In about:config that is:

  layers.acceleration.force-enabled : true
  layers.omtp.enabled : true

I've been running Firefox like this for years and never had problems before.

I have many other "modified" settings, too many to list here.

Thanks again,
Comment 13 Laurent Bonnaud 2018-06-08 18:31:23 UTC
Created attachment 140099 [details]
Full list of "Important Modified Preferences" from about:support
Comment 14 Laurent Bonnaud 2018-06-11 18:21:00 UTC
I tried to run kernel 4.17 without firefox. The situation did not improve:

 - I still saw graphic corruptions, both minor and major
 - I even experienced a complete machine freeze

I tried to further reduce GPU use.  So I disabled compositing in Plasma.  Thanks to that I have been able to use my system for a few hours without graphic corruption, even running firefox with OpenGL acceleration.

So it seems plasmashell or kwin would need to run under intel_sanitize_gpu.
Comment 15 Laurent Bonnaud 2018-06-11 18:26:48 UTC
Concerning intel_sanitize_gpu it is not provided in binary form by Debian or Ubuntu.  So I tried to compile it from source:

 - I tried the Mesa Debian source package (18.1.0).  It contains the intel_sanitize_gpu.c file but the build does not compile it.

 - I tried the Mesa 18.1.1 tarball but it does not contains the intel_sanitize_gpu.c file.

Moreover intel_sanitize_gpu.c seems to be a wrapper for libc functions that needs to be compiled as a shared library (and preloaded ?), not an executable.

Any advice on compiling and using it would be welcome!
Comment 16 vadym 2018-06-14 09:55:40 UTC
I think you can use meson build with -Dtools=all option. See https://www.mesa3d.org/meson.html
Comment 17 Laurent Bonnaud 2018-06-14 15:32:13 UTC
> I think you can use meson build with -Dtools=all option.

Thanks for the hint!

I could build libintel_sanitize_gpu.so and there is a nice wrapper script that does the SO preload.

I tested it on a simple OpenGL program:

$ intel_sanitize_gpu glxgears
INTEL-SANITIZE-GPU: error: missed drm fd 4
302 frames in 5.0 seconds = 60.387 FPS
[...]

I hope that this error is not too serious...
Comment 18 Laurent Bonnaud 2018-06-14 15:34:43 UTC
Since my last report, I upgraded my system:

 - kernel 4.17.1 (no DRM/DRI changes)
 - Mesa 18.1.1 from xorg PPA

When I use Plasma with compositing I still see graphic corruption.
Comment 19 Laurent Bonnaud 2018-06-14 15:54:53 UTC
I also tried to run firefox with intel_sanitize_gpu:

$ intel_sanitize_gpu /usr/local/firefox-64/firefox --no-remote
INTEL-SANITIZE-GPU: error: missed drm fd 3
INTEL-SANITIZE-GPU: error: missed drm fd 45
[...]
INTEL-SANITIZE-GPU: error: missed drm fd 36
[...]

It kind of works, but it looses my saved session (hundreds of tabs), therefore I will not be able to do real tests with intel_sanitize_gpu.

Other problems are:
 - the hamburger menu does not pop up
 - the default "new tab" page is not displayed properly and when I tried to interact with it I got a complete freeze.  After rebooting I found this in the logs:

Jun 14 17:39:36 vougeot kernel: BUG: unable to handle kernel paging request at ffffe80f034b81c0
Jun 14 17:39:36 vougeot kernel: PGD 22d7cb067 P4D 22d7cb067 PUD 22d7ca067 PMD 0
Jun 14 17:39:36 vougeot kernel: Oops: 0000 [#1] SMP PTI
Jun 14 17:39:36 vougeot kernel: Modules linked in: ses enclosure scsi_transport_sas ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid nls_iso8859_1 uas usb_storage rfcomm bnep gpio_ich dm_crypt snd_hda_codec_hdmi snd_hda_codec_idt snd_
Jun 14 17:39:36 vougeot kernel:  mei soundcore shpchp dell_smo8800 mac_hid kvm_intel kvm irqbypass binfmt_misc sch_fq_codel nf_tables nfnetlink parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcr
Jun 14 17:39:36 vougeot kernel: CPU: 0 PID: 42 Comm: kswapd0 Not tainted 4.17.1-041701-generic #201806111730
Jun 14 17:39:36 vougeot kernel: Hardware name: Dell Inc. Latitude E6520/0NVF5K, BIOS A19 11/14/2013
Jun 14 17:39:36 vougeot kernel: RIP: 0010:_vm_normal_page+0xb3/0xe0
Jun 14 17:39:36 vougeot kernel: RSP: 0000:ffffb16cc0e3b730 EFLAGS: 00010286
Jun 14 17:39:36 vougeot kernel: RAX: ffffe80f034b81c0 RBX: 000ffffffffff000 RCX: 0000000000000001
Jun 14 17:39:36 vougeot kernel: RDX: 80000000d2e0722f RSI: 00007f5cb77f2000 RDI: 0000000000000000
Jun 14 17:39:36 vougeot kernel: RBP: ffffb16cc0e3b730 R08: 0000000000000000 R09: ffff9ea9da29f770
Jun 14 17:39:36 vougeot kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5cb77f3000
Jun 14 17:39:36 vougeot kernel: R13: ffff9ea8155fef90 R14: 00007f5cb77f2000 R15: ffffb16cc0e3b860
Jun 14 17:39:36 vougeot kernel: FS:  0000000000000000(0000) GS:ffff9eaa25200000(0000) knlGS:0000000000000000
Jun 14 17:39:36 vougeot kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 17:39:36 vougeot kernel: CR2: ffffe80f034b81c0 CR3: 00000000bc20a005 CR4: 00000000000606f0
Jun 14 17:39:36 vougeot kernel: Call Trace:
Jun 14 17:39:36 vougeot kernel:  unmap_page_range+0x525/0xd00
Jun 14 17:39:36 vougeot kernel:  unmap_single_vma+0x7d/0xf0
Jun 14 17:39:36 vougeot kernel:  zap_page_range_single+0xb7/0x120
Jun 14 17:39:36 vougeot kernel:  ? dma_pte_clear_level+0x130/0x1a0
Jun 14 17:39:36 vougeot kernel:  unmap_mapping_pages+0xfa/0x130
Jun 14 17:39:36 vougeot kernel:  truncate_cleanup_page+0x4b/0xd0
Jun 14 17:39:36 vougeot kernel:  truncate_inode_page+0x1e/0x40
Jun 14 17:39:36 vougeot kernel:  shmem_undo_range+0x378/0x920
Jun 14 17:39:36 vougeot kernel:  shmem_truncate_range+0x16/0x40
Jun 14 17:39:36 vougeot kernel:  i915_gem_object_truncate+0x2d/0x50 [i915]
Jun 14 17:39:36 vougeot kernel:  __i915_gem_object_invalidate+0x42/0x50 [i915]
Jun 14 17:39:36 vougeot kernel:  i915_gem_shrink+0x479/0x4b0 [i915]
Jun 14 17:39:36 vougeot kernel:  i915_gem_shrinker_scan+0x5f/0x130 [i915]
Jun 14 17:39:36 vougeot kernel:  ? i915_gem_shrinker_scan+0x5f/0x130 [i915]
Jun 14 17:39:36 vougeot kernel:  shrink_slab.part.51+0x1a4/0x3d0
Jun 14 17:39:36 vougeot kernel:  shrink_node+0x3ac/0x460
Jun 14 17:39:36 vougeot kernel:  balance_pgdat+0x16e/0x380
Jun 14 17:39:36 vougeot kernel:  kswapd+0x178/0x430
Jun 14 17:39:36 vougeot kernel:  ? wait_woken+0x80/0x80
Jun 14 17:39:36 vougeot kernel:  kthread+0x121/0x140
Jun 14 17:39:36 vougeot kernel:  ? balance_pgdat+0x380/0x380
Jun 14 17:39:36 vougeot kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Jun 14 17:39:36 vougeot kernel:  ret_from_fork+0x35/0x40
Jun 14 17:39:36 vougeot kernel: Code: 5d c3 49 8b 79 50 81 e7 00 04 00 10 75 44 48 39 05 a3 71 44 01 74 3b 48 39 05 92 71 44 01 72 d4 48 c1 e0 06 48 03 05 cd 52 f9 00 <4c> 8b 00 49 c1 e8 33 41 83 e0 07 41 83 f8 04 75 b8 4c 8b 40 20
Jun 14 17:39:36 vougeot kernel: RIP: _vm_normal_page+0xb3/0xe0 RSP: ffffb16cc0e3b730
Jun 14 17:39:36 vougeot kernel: CR2: ffffe80f034b81c0
Jun 14 17:39:36 vougeot kernel: ---[ end trace 271cb02febcc78d6 ]---
Comment 20 Laurent Bonnaud 2018-06-14 16:05:16 UTC
I was able to reproduce this bug with the Ubuntu kernel.  The machine did not freeze immediately, but it froze a few seconds later.

Jun 14 17:56:40 vougeot kernel: BUG: unable to handle kernel paging request at ffffd3e0034a24c0
Jun 14 17:56:40 vougeot kernel: IP: _vm_normal_page+0xb3/0xe0
Jun 14 17:56:40 vougeot kernel: PGD 22d7cb067 P4D 22d7cb067 PUD 22d7ca067 PMD 0
Jun 14 17:56:40 vougeot kernel: Oops: 0000 [#1] SMP PTI
Jun 14 17:56:40 vougeot kernel: Modules linked in: rfcomm bnep dm_crypt dell_rbtn intel_rapl x86_pkg_temp_thermal dell_laptop intel_powerclamp dell_smbios_smm coretemp dcdbas crct10dif_pclmul crc32_pclmul dell_smm_hwmon ghash_clmulni_int
Jun 14 17:56:40 vougeot kernel:  nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables parport_pc ppdev nfnetlink lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbhid hid i915 i2c_algo_bit drm_kms_he
Jun 14 17:56:40 vougeot kernel: CPU: 1 PID: 11997 Comm: Compositor Not tainted 4.15.0-24-generic #26-Ubuntu
Jun 14 17:56:40 vougeot kernel: Hardware name: Dell Inc. Latitude E6520/0NVF5K, BIOS A19 11/14/2013
Jun 14 17:56:40 vougeot kernel: RIP: 0010:_vm_normal_page+0xb3/0xe0
Jun 14 17:56:40 vougeot kernel: RSP: 0018:ffffa506c326fc38 EFLAGS: 00010286
Jun 14 17:56:40 vougeot kernel: RAX: ffffd3e0034a24c0 RBX: 00003ffffffff000 RCX: 0000000000000001
Jun 14 17:56:40 vougeot kernel: RDX: 80000000d289322f RSI: 00007f02367e1000 RDI: 0000000000000000
Jun 14 17:56:40 vougeot kernel: RBP: ffffa506c326fc38 R08: 0000000000000000 R09: ffff955a90e949c0
Jun 14 17:56:40 vougeot kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00007f02367e2000
Jun 14 17:56:40 vougeot kernel: R13: ffff955a90978f08 R14: 00007f02367e1000 R15: ffffa506c326fda8
Jun 14 17:56:40 vougeot kernel: FS:  00007f024b3bd700(0000) GS:ffff955ba5240000(0000) knlGS:0000000000000000
Jun 14 17:56:40 vougeot kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 17:56:40 vougeot kernel: CR2: ffffd3e0034a24c0 CR3: 00000001108e0001 CR4: 00000000000606e0
Jun 14 17:56:40 vougeot kernel: Call Trace:
Jun 14 17:56:40 vougeot kernel:  unmap_page_range+0x525/0xcf0
Jun 14 17:56:40 vougeot kernel:  unmap_single_vma+0x7d/0xf0
Jun 14 17:56:40 vougeot kernel:  unmap_vmas+0x51/0xb0
Jun 14 17:56:40 vougeot kernel:  unmap_region+0xbd/0x130
Jun 14 17:56:40 vougeot kernel:  ? __vma_rb_erase+0x1a2/0x270
Jun 14 17:56:40 vougeot kernel:  do_munmap+0x27c/0x460
Jun 14 17:56:40 vougeot kernel:  vm_munmap+0x69/0xb0
Jun 14 17:56:40 vougeot kernel:  SyS_munmap+0x22/0x30
Jun 14 17:56:40 vougeot kernel:  do_syscall_64+0x73/0x130
Jun 14 17:56:40 vougeot kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun 14 17:56:40 vougeot kernel: RIP: 0033:0x7f026ff03ab7
Jun 14 17:56:40 vougeot kernel: RSP: 002b:00007f024b3bc438 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
Jun 14 17:56:40 vougeot kernel: RAX: ffffffffffffffda RBX: 00007f02383fe670 RCX: 00007f026ff03ab7
Jun 14 17:56:40 vougeot kernel: RDX: 0000000000000000 RSI: 000000000021e000 RDI: 00007f02367e1000
Jun 14 17:56:40 vougeot kernel: RBP: 00007f0237373901 R08: 0000000000000030 R09: 0000000000000030
Jun 14 17:56:40 vougeot kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 00007f02373739e8
Jun 14 17:56:40 vougeot kernel: R13: 00007f024b3bc770 R14: 00007f024b3bc750 R15: 00007f0265060968
Jun 14 17:56:40 vougeot kernel: Code: 5d c3 49 8b 79 50 81 e7 00 04 00 10 75 41 48 39 05 c3 45 45 01 74 38 48 39 05 b2 45 45 01 72 d4 48 c1 e0 06 48 03 05 15 ef 23 01 <4c> 8b 00 49 c1 e8 33 41 83 e0 07 41 83 f8 04 75 b8 4c 8b 40 20
Jun 14 17:56:40 vougeot kernel: RIP: _vm_normal_page+0xb3/0xe0 RSP: ffffa506c326fc38
Jun 14 17:56:40 vougeot kernel: CR2: ffffd3e0034a24c0
Jun 14 17:56:40 vougeot kernel: ---[ end trace 4381c574fd805b3a ]---

and

Jun 14 17:57:44 vougeot kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [Compositor:4017]
Jun 14 17:57:44 vougeot kernel: Modules linked in: rfcomm bnep dm_crypt dell_rbtn intel_rapl x86_pkg_temp_thermal dell_laptop intel_powerclamp dell_smbios_smm coretemp dcdbas crct10dif_pclmul crc32_pclmul dell_smm_hwmon ghash_clmulni_int
Jun 14 17:57:44 vougeot kernel:  nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables parport_pc ppdev nfnetlink lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbhid hid i915 i2c_algo_bit drm_kms_he
Jun 14 17:57:44 vougeot kernel: CPU: 0 PID: 4017 Comm: Compositor Tainted: G      D          4.15.0-24-generic #26-Ubuntu
Jun 14 17:57:44 vougeot kernel: Hardware name: Dell Inc. Latitude E6520/0NVF5K, BIOS A19 11/14/2013
Jun 14 17:57:44 vougeot kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x137/0x1a0
Jun 14 17:57:44 vougeot kernel: RSP: 0000:ffffa506c2ee7a08 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
Jun 14 17:57:44 vougeot kernel: RAX: 0000000000000101 RBX: 00003ffffffff000 RCX: 0000000000000001
Jun 14 17:57:44 vougeot kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffd3e004425e30
Jun 14 17:57:44 vougeot kernel: RBP: ffffa506c2ee7a08 R08: 0000000000000101 R09: ffff955a90d91ba0
Jun 14 17:57:44 vougeot kernel: R10: 0000000000000001 R11: ffff955badfd2000 R12: 00007f0236600000
Jun 14 17:57:44 vougeot kernel: R13: ffff955a90978000 R14: 00007f0236600000 R15: ffffa506c2ee7b48
Jun 14 17:57:44 vougeot kernel: FS:  00007f33da3ff700(0000) GS:ffff955ba5200000(0000) knlGS:0000000000000000
Jun 14 17:57:44 vougeot kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 17:57:44 vougeot kernel: CR2: 00007f339d453900 CR3: 00000001a171c001 CR4: 00000000000606f0
Jun 14 17:57:44 vougeot kernel: Call Trace:
Jun 14 17:57:44 vougeot kernel:  _raw_spin_lock+0x21/0x30
Jun 14 17:57:44 vougeot kernel:  unmap_page_range+0x4e3/0xcf0
Jun 14 17:57:44 vougeot kernel:  unmap_single_vma+0x7d/0xf0
Jun 14 17:57:44 vougeot kernel:  zap_page_range_single+0xb7/0x120
Jun 14 17:57:44 vougeot kernel:  unmap_mapping_range+0x10e/0x130
Jun 14 17:57:44 vougeot kernel:  i915_vma_revoke_mmap+0x58/0xb0 [i915]
Jun 14 17:57:44 vougeot kernel:  fence_update+0x172/0x260 [i915]
Jun 14 17:57:44 vougeot kernel:  i915_vma_pin_fence+0x107/0x190 [i915]
Jun 14 17:57:44 vougeot kernel:  i915_gem_fault+0x2ec/0x500 [i915]
Jun 14 17:57:44 vougeot kernel:  ? ttwu_do_wakeup+0x1e/0x140
Jun 14 17:57:44 vougeot kernel:  ? radix_tree_lookup+0xd/0x10
Jun 14 17:57:44 vougeot kernel:  __do_fault+0x24/0xf0
Jun 14 17:57:44 vougeot kernel:  handle_pte_fault+0x20c/0xdb0
Jun 14 17:57:44 vougeot kernel:  ? i915_gem_object_pin_to_display_plane+0x130/0x130 [i915]
Jun 14 17:57:44 vougeot kernel:  __handle_mm_fault+0x47b/0x5c0
Jun 14 17:57:44 vougeot kernel:  handle_mm_fault+0xb1/0x1f0
Jun 14 17:57:44 vougeot kernel:  __do_page_fault+0x250/0x4d0
Jun 14 17:57:44 vougeot kernel:  ? SyS_futex+0x13b/0x180
Jun 14 17:57:44 vougeot kernel:  do_page_fault+0x2e/0xe0
Jun 14 17:57:44 vougeot kernel:  ? page_fault+0x2f/0x50
Jun 14 17:57:44 vougeot kernel:  page_fault+0x45/0x50
Jun 14 17:57:44 vougeot kernel: RIP: 0033:0x7f33ff31f696
Jun 14 17:57:44 vougeot kernel: RSP: 002b:00007f33da3fd878 EFLAGS: 00010206
Jun 14 17:57:44 vougeot kernel: RAX: 00007f339d4538fc RBX: 0000000000001e00 RCX: 00007f339d453f04
Jun 14 17:57:44 vougeot kernel: RDX: 00000000000005d4 RSI: 00007f33b7254940 RDI: 00007f339d453900
Jun 14 17:57:44 vougeot kernel: RBP: 0000000000001e00 R08: fffffffffffffffc R09: 0000000000001401
Jun 14 17:57:44 vougeot kernel: R10: 00000000000000e0 R11: 00007f339d4538fc R12: 0000000000000010
Jun 14 17:57:44 vougeot kernel: R13: 0000000000000618 R14: 00007f33b72548fc R15: 0000000000000000
Jun 14 17:57:44 vougeot kernel: Code: c0 41 39 c0 74 ea 4d 85 c9 c6 07 01 74 2d 41 c7 41 08 01 00 00 00 eb 96 83 fa 01 0f 84 f4 fe ff ff 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09 4d 85
Comment 21 Laurent Bonnaud 2018-06-14 16:06:34 UTC
Since this is now clearly a kernel bug, should this bug be reassigned to the kernel/DRM component?
Comment 22 Laurent Bonnaud 2018-06-14 16:19:14 UTC
I also tried to run kwin with intel_sanitize_gpu:

$ kill $(pidof kwin_x11); sleep 5; intel_sanitize_gpu kwin_x11

and kwin segfaulted immediately.  I will attach a screenshot of the backtrace that shows that the crash is in the preloaded open() function from 
intel_sanitize_gpu.so.
Comment 23 Laurent Bonnaud 2018-06-14 16:20:55 UTC
Created attachment 140163 [details]
kwin backtace when run with intel_sanitize_gpu
Comment 24 Laurent Bonnaud 2018-06-14 16:26:10 UTC
When trying to run plasmashell with intel_sanitize_gpu I also got an immediate segfault when qt_create_qhash_seed() tries to open /dev/urandom.
Comment 25 Laurent Bonnaud 2019-09-09 17:38:07 UTC
My old laptop died and I cannot reproduce this bug with my new laptop.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.