Bug 106237

Summary: Kernel Oops on boot ../drivers/gpu/drm/i915/intel_drv.h:1813 gen8_write32+0x1e7/0x240
Product: DRI Reporter: omkhar
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: Triaged
i915 platform: BDW i915 features:
Attachments:
Description Flags
Dmesg none

Description omkhar 2018-04-25 14:02:05 UTC
Running Clear Linux (Intel distro): version 22010
Kernel : 4.16.3-553.native

[    1.377384] ------------[ cut here ]------------
[    1.377385] RPM wakelock ref not held during HW access
[    1.377403] WARNING: CPU: 0 PID: 196 at ../drivers/gpu/drm/i915/intel_drv.h:1813 gen8_write32+0x1e7/0x240
[    1.377404] Modules linked in: nf_nat_ipv4(+) nf_nat nf_conntrack ipt_REJECT iptable_filter iptable_mangle iptable_raw iptable_security ip_tables intel_spi_platform intel_spi spi_nor mtd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep lpc_ich igb(+) snd_hda_core snd_pcm shpchp snd soundcore
[    1.377428] CPU: 0 PID: 196 Comm: clr_power Not tainted 4.16.3-553.native #1
[    1.377430] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Q3XXG4-P, BIOS 5.6.5 09/19/2016
[    1.377433] RIP: 0010:gen8_write32+0x1e7/0x240
[    1.377434] RSP: 0018:ffffa3bcc1647d60 EFLAGS: 00010246
[    1.377436] RAX: 0000000000000000 RBX: 000000000e000000 RCX: 0000000000000000
[    1.377437] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    1.377438] RBP: 000000000000a008 R08: 0000000000000000 R09: 0000000000000000
[    1.377439] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e0f14010000
[    1.377441] R13: ffff8e0f14014d30 R14: 0000000000000004 R15: ffff8e0f116e30a8
[    1.377443] FS:  00007f40ff483500(0000) GS:ffff8e0f1f400000(0000) knlGS:0000000000000000
[    1.377444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.377445] CR2: 00000000010a5448 CR3: 0000000211286002 CR4: 00000000000606f0
[    1.377446] Call Trace:
[    1.377453]  gen6_set_rps+0x173/0x1a0
[    1.377457]  i915_min_freq_set+0x9d/0xf0
[    1.377461]  simple_attr_write+0xbc/0xe0
[    1.377465]  full_proxy_write+0x4e/0x80
[    1.377469]  __vfs_write+0x21/0x160
[    1.377473]  ? SyS_newfstat+0x29/0x40
[    1.377476]  ? _cond_resched+0x14/0x40
[    1.377479]  vfs_write+0xac/0x1a0
[    1.377482]  SyS_write+0x3d/0xa0
[    1.377486]  do_syscall_64+0x69/0x1a0
[    1.377490]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[    1.377492] RIP: 0033:0x7f40ff3a1f14
[    1.377493] RSP: 002b:00007ffd28518878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[    1.377496] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f40ff3a1f14
[    1.377497] RDX: 0000000000000004 RSI: 00000000010a4430 RDI: 0000000000000003
[    1.377498] RBP: 00000000010a4430 R08: 00007f40ff483500 R09: 00007f40ff3fdf20
[    1.377499] R10: 00000000010a4010 R11: 0000000000000246 R12: 00000000010ac510
[    1.377500] R13: 0000000000000004 R14: 00007f40ff4792a0 R15: 00007f40ff478760
[    1.377502] Code: 0f 87 a7 fe ff ff e9 f0 fe ff ff 80 3d 62 72 f0 00 00 0f 85 63 fe ff ff 48 c7 c7 a0 d6 47 82 c6 05 4e 72 f0 00 01 e8 79 f4 8a ff <0f> 0b e9 49 fe ff ff b9 01 00 00 00 31 d2 89 ee 48 89 04 24 4c 
[    1.377541] ---[ end trace 3798a54a72919794 ]---
Comment 1 Chris Wilson 2018-04-25 14:24:07 UTC
Then don't use the debugfs interface. Instead of overriding rps, tell us why and report bugs for underperformance.

https://patchwork.freedesktop.org/patch/218775/
Comment 2 omkhar 2018-04-25 14:27:37 UTC
I didn't "Use the debug interface" - I booted a stock kernel on a Linux distro produced by Intel. This was in dmesg. I filed a defect with the distro team and they directed me this team instead.

Is it possible for the two Intel teams to chat about who's doing what incorrectly?
Comment 3 omkhar 2018-04-25 14:29:36 UTC
Corresponding Clear Linux defect: https://github.com/clearlinux/distribution/issues/48
Comment 4 Jani Saarinen 2018-04-25 15:15:44 UTC
Could you provide a dmesg log booting with drm.debug=0xe?
Comment 5 omkhar 2018-04-25 15:27:45 UTC
Oops looks the same, I have added some additional details from dmesg | grep i915:

omkhar@ajaxvpn ~ $ dmesg | grep i915
[    0.320753] calling  i915_init+0x0/0x55 @ 1
[    0.321231] [drm:i915_driver_load] ppgtt mode: 3
[    0.321255] [drm:i915_ggtt_probe_hw] GGTT size = 4096M
[    0.321257] [drm:i915_ggtt_probe_hw] GMADR size = 256M
[    0.321260] [drm:i915_ggtt_probe_hw] DSM size = 32M
[    0.321381] [drm:i915_gem_init_stolen] Memory reserved for graphics device: 32768K, usable: 31744K
[    0.325299] [drm:i915_driver_load] rawclk rate: 24000 kHz
[    0.325310] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    0.326425] [drm:i915_gem_init_ggtt] clearing unused GTT space: [1000, 100000000]
[    0.326449] [drm:i915_gem_contexts_init] logical context support initialized
[    0.327348] [drm] Initialized i915 1.6.0 20171222 for 0000:00:02.0 on minor 0
[    0.329433] i915 device info: pciid=0x1606 rev=0x09 platform=BROADWELL gen=8
[    0.329434] i915 device info: is_mobile: no
[    0.329436] i915 device info: is_lp: no
[    0.329437] i915 device info: is_alpha_support: no
[    0.329438] i915 device info: has_64bit_reloc: yes
[    0.329440] i915 device info: has_aliasing_ppgtt: yes
[    0.329441] i915 device info: has_csr: no
[    0.329442] i915 device info: has_ddi: yes
[    0.329443] i915 device info: has_dp_mst: yes
[    0.329444] i915 device info: has_reset_engine: yes
[    0.329446] i915 device info: has_fbc: yes
[    0.329447] i915 device info: has_fpga_dbg: yes
[    0.329448] i915 device info: has_full_ppgtt: yes
[    0.329455] i915 device info: has_full_48bit_ppgtt: yes
[    0.329461] i915 device info: has_gmch_display: no
[    0.329466] i915 device info: has_guc: no
[    0.329468] i915 device info: has_guc_ct: no
[    0.329469] i915 device info: has_hotplug: yes
[    0.329470] i915 device info: has_l3_dpf: no
[    0.329471] i915 device info: has_llc: yes
[    0.329472] i915 device info: has_logical_ring_contexts: yes
[    0.329474] i915 device info: has_logical_ring_preemption: no
[    0.329475] i915 device info: has_overlay: no
[    0.329476] i915 device info: has_pooled_eu: no
[    0.329477] i915 device info: has_psr: yes
[    0.329479] i915 device info: has_rc6: yes
[    0.329480] i915 device info: has_rc6p: no
[    0.329481] i915 device info: has_resource_streamer: yes
[    0.329482] i915 device info: has_runtime_pm: yes
[    0.329483] i915 device info: has_snoop: no
[    0.329485] i915 device info: unfenced_needs_alignment: no
[    0.329486] i915 device info: cursor_needs_physical: no
[    0.329487] i915 device info: hws_needs_physical: no
[    0.329488] i915 device info: overlay_needs_physical: no
[    0.329490] i915 device info: supports_tv: no
[    0.329491] i915 device info: has_ipc: no
[    0.329492] i915 device info: slice mask: 0001
[    0.329493] i915 device info: slice total: 1
[    0.329494] i915 device info: subslice total: 2
[    0.329496] i915 device info: subslice mask 0003
[    0.329497] i915 device info: subslice per slice: 2
[    0.329498] i915 device info: EU total: 12
[    0.329499] i915 device info: EU per subslice: 6
[    0.329500] i915 device info: has slice power gating: no
[    0.329502] i915 device info: has subslice power gating: no
[    0.329503] i915 device info: has EU power gating: no
[    0.329504] i915 device info: CS timestamp frequency: 12500 kHz
[    0.329540] initcall i915_init+0x0/0x55 returned 0 after 8576 usecs
[    0.330236] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK for addr: 0050 w(1)
[    0.330239] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK on first message, retry
[    0.331234] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK for addr: 0050 w(1)
[    0.331238] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpc
[    0.331244] [drm:intel_gmbus_force_bit] enabling bit-banging on i915 gmbus dpc. force bit now 1
[    0.331998] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpc
[    0.332000] [drm:intel_gmbus_force_bit] disabling bit-banging on i915 gmbus dpc. force bit now 0
[    0.332217] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK for addr: 0040 w(1)
[    0.332219] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK on first message, retry
[    0.333238] [drm:gmbus_xfer] GMBUS [i915 gmbus dpc] NAK for addr: 0040 w(1)
[    1.139865] snd_hda_intel 0000:00:03.0: bound 0000:00:02.0 (ops i915_audio_component_bind_ops)
[    1.139867] clr: call_modprobe: i915   2 
[    1.154234] [drm:i915_audio_component_get_eld] Not valid for port B
[    1.154237] [drm:i915_audio_component_get_eld] Not valid for port B
[    1.154239] [drm:i915_audio_component_get_eld] Not valid for port B
[    1.154241] [drm:i915_audio_component_get_eld] Not valid for port C
[    1.154243] [drm:i915_audio_component_get_eld] Not valid for port C
[    1.154245] [drm:i915_audio_component_get_eld] Not valid for port C
[    1.154247] [drm:i915_audio_component_get_eld] Not valid for port D
[    1.154249] [drm:i915_audio_component_get_eld] Not valid for port D
[    1.154251] [drm:i915_audio_component_get_eld] Not valid for port D
[    1.418946] [drm:i915_min_freq_set] Manually setting min freq to 700
[    1.418964] WARNING: CPU: 0 PID: 201 at ../drivers/gpu/drm/i915/intel_drv.h:1813 gen8_write32+0x1e7/0x240
[    1.419021]  i915_min_freq_set+0x9d/0xf0

Looks like the oops occurs *right* after the i915_min_freq_set call
Comment 6 Jani Saarinen 2018-04-26 06:53:00 UTC
Please send whole dmesg from the boot to failure, do not grep anything. 
What system this is?
Comment 7 omkhar 2018-04-26 18:19:09 UTC
Created attachment 139144 [details]
Dmesg
Comment 8 omkhar 2018-04-26 18:22:43 UTC
This system is a whitebox that I'm using as a headless router. Here are the details on Amazon https://www.amazon.com/gp/product/B01N6GSS7Y/ref=oh_aui_detailpage_o04_s00?ie=UTF8&psc=1
Comment 9 Jani Saarinen 2018-04-27 06:03:36 UTC
Demsg still not from the beginning of the boot, can you get that from the beginning as now it starts from [    0.190165].
Comment 10 Jani Saarinen 2018-04-27 06:31:44 UTC
In that web page: Intel Celeron 3215U Processor, dual core ,1.7 GHz
Comment 11 omkhar 2018-04-27 12:16:40 UTC
Afraid that is the early message out of dmesg right after boot, I've asked the Clear Linux team as to the best method of getting earlier information
Comment 12 omkhar 2018-04-27 12:17:54 UTC
cpuinfo if it's helpful


omkhar@ajaxvpn ~ $ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Celeron(R) CPU 3215U @ 1.70GHz
stepping	: 4
microcode	: 0x2a
cpu MHz		: 1621.583
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust erms invpcid rdseed intel_pt xsaveopt dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 3392.40
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Celeron(R) CPU 3215U @ 1.70GHz
stepping	: 4
microcode	: 0x2a
cpu MHz		: 1696.393
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust erms invpcid rdseed intel_pt xsaveopt dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 3392.40
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:
Comment 13 Jani Saarinen 2018-05-02 06:50:30 UTC
Any updates here getting whole boot log?
Comment 14 omkhar 2018-05-02 13:50:01 UTC
I've asked the Clear Linux team once again, I will advise when I hear back. I've attempted multiple reboots and for some reason the dmesg always begins at 0.190165.
Comment 15 Jani Saarinen 2018-05-09 05:46:49 UTC
ping.
Comment 16 omkhar 2018-05-09 18:05:47 UTC
requested the distro team for further information regarding the missing dmesg info... nothing yet. Just bumped again.
Comment 17 Jani Saarinen 2018-05-11 04:57:31 UTC
OK, thanks.
Comment 18 Jani Saarinen 2018-05-17 10:00:47 UTC
ping.
Comment 19 Jani Saarinen 2018-05-24 07:34:32 UTC
Any luck still?
Comment 20 Jani Saarinen 2018-05-28 06:22:42 UTC
Chris, any idea if still valid?
Comment 21 omkhar 2018-05-28 12:29:57 UTC
Panic seems to have disappeared with the latest Clear Linux Kernel (4.16.9-571.native)
Comment 22 Jani Saarinen 2018-05-28 12:31:46 UTC
Thank you for reporting back, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.