104708 – [IGT] igt@gem_exec_flush@* BUG: unable to handle kernel paging request at 0000000100000084

Bug 104708 - [IGT] igt@gem_exec_flush@* BUG: unable to handle kernel paging request at 0000000100000084

Summary: [IGT] igt@gem_exec_flush@* BUG: unable to handle kernel paging request at 000...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	high critical
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-01-19 20:33 UTC by Elizabeth
Modified:	2018-08-22 13:30 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	CNL
i915 features:	GEM/execlists

Attachments
dmesg ff cnl (218.69 KB, text/plain) 2018-01-19 20:33 UTC, Elizabeth	no flags	Details
dmesg with gem_trace (4.67 KB, text/plain) 2018-02-27 22:51 UTC, Armando Antonio	no flags	Details
config file (210.07 KB, text/x-mpsub) 2018-02-27 22:52 UTC, Armando Antonio	no flags	Details
View All

Description Elizabeth 2018-01-19 20:33:40 UTC

Created attachment 136859 [details]
dmesg ff cnl

We had an oops while running igt@gem_exec_nop@basic-series in CNL's FF cycle. After this, the next test igt@gem_exec_parse@basic-allowed timed out probably derived from the Oops, and igt@gem_exec_parse@basic-rejected was Incomplete probably derived from the two previous tests.

Stdout	
An internal exception that should have been handled was not:
Test run time exceeded timeout value (600 seconds)

[  293.788239] BUG: unable to handle kernel paging request at 0000000100000084
[  293.788281] IP: print_request+0x11/0xb0 [i915]
[  293.788288] Oops: 0000 [#1] SMP PTI
[  293.788291] Dumping ftrace buffer:
[  293.788294]    (ftrace buffer empty)
[  293.788295] Modules linked in: snd_hda_codec_hdmi asix usbnet mii ip6table_filter ip6_tables bnep iptable_filter 8250_dw binfmt_misc snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi nls_iso8859_1 snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_hda_codec x86_pkg_temp_thermal snd_hda_core snd_hwdep intel_powerclamp coretemp snd_pcm kvm_intel kvm snd_seq_midi irqbypass snd_seq_midi_event snd_rawmidi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_seq aesni_intel aes_x86_64 crypto_simd snd_seq_device glue_helper cryptd snd_timer iwlwifi input_leds snd serio_raw wmi_bmof btusb btrtl btbcm btintel soundcore idma64 bluetooth shpchp virt_dma cfg80211 intel_pch_thermal intel_lpss_pci ecdh_generic
[  293.788343]  intel_lpss tpm_crb acpi_pad mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 e1000e ptp i915 pps_core wmi video
[  293.788354] CPU: 3 PID: 36 Comm: kworker/3:1 Tainted: G     U           4.15.0-rc8-drm-intel-qa-ww3-commit-6a58f7b+ #1
[  293.788356] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X114.B04.1711300710 11/30/2017
[  293.788385] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  293.788419] RIP: 0010:print_request+0x11/0xb0 [i915]
[  293.788421] RSP: 0018:ffffb15f80e33cc0 EFLAGS: 00010282
[  293.788424] RAX: 0000000000000017 RBX: ffffb15f80e33e28 RCX: 0000000000000006
[  293.788426] RDX: ffffb15f80e33d30 RSI: 00000000fffffffc RDI: ffffb15f80e33e28
[  293.788427] RBP: ffffb15f80e33e28 R08: 0000000000000003 R09: 0000000000000000
[  293.788429] R10: 0000000000000005 R11: ffffb15f80e33d41 R12: ffffb15f80e33d30
[  293.788431] R13: 00000000fffffffc R14: 0000000000000000 R15: ffff91411b0bc340
[  293.788433] FS:  0000000000000000(0000) GS:ffff914131180000(0000) knlGS:0000000000000000
[  293.788435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  293.788437] CR2: 0000000100000084 CR3: 00000001c780a006 CR4: 0000000000760ee0
[  293.788439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  293.788441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  293.788442] PKRU: 55555554
[  293.788443] Call Trace:
[  293.788477]  intel_engine_dump+0x8fe/0xc30 [i915]
[  293.788500]  ? engine_stuck+0x1b9/0x240 [i915]
[  293.788524]  ? fwtable_read32+0x83/0x1b0 [i915]
[  293.788556]  i915_hangcheck_elapsed+0x347/0x370 [i915]
[  293.788561]  ? drm_printk+0xc0/0xc0
[  293.788565]  process_one_work+0x154/0x400
[  293.788568]  worker_thread+0x4a/0x440
[  293.788570]  kthread+0xf5/0x130
[  293.788572]  ? process_one_work+0x400/0x400
[  293.788574]  ? kthread_associate_blkcg+0x90/0x90
[  293.788578]  ret_from_fork+0x32/0x40
[  293.788580] Code: b9 01 00 00 00 be 90 40 00 00 89 c2 48 89 df 41 ff d4 e9 f6 fc ff ff 0f 1f 00 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 53 48 89 fd <48> 8b 86 88 00 00 00 48 8b 3d 61 93 29 cb 48 89 f3 48 2b be f0 
[  293.788633] RIP: print_request+0x11/0xb0 [i915] RSP: ffffb15f80e33cc0
[  293.788635] CR2: 0000000100000084
[  293.788637] ---[ end trace 33a847ebe65771c1 ]---

Comment 1 Chris Wilson 2018-01-19 21:17:48 UTC

This is a secondary failure. The oops only happens after the lost interrupt.

Comment 2 Chris Wilson 2018-01-19 21:27:29 UTC

Not a missed interrupt, just the ELSP contains a non-zero value, making it look like we didn't get an interrupt to clear the inflight requests.

Comment 3 Armando Antonio 2018-02-09 21:08:31 UTC

the next test cases have the same issue, this is on CNL too.


====================================
Test cases
====================================
igt@gem_exec_flush@batch-cpu-bsd-wb
igt@gem_exec_flush@wc-ro-bsd
igt@gem_exec_flush@wb-pro-vebox-interruptible
igt@gem_exec_flush@wc-ro-bsd
igt@gem_ctx_switch@vebox-interruptible


======================================
             Software
======================================
kernel version              : 4.15.0-drm-intel-qa-ww6-commit-7e8e106+
os version                  : Ubuntu 16.10
os codename                 : yakkety
kernel driver               : i915
swap partition              : enabled on (/dev/nvme0n1p3)

======================================
        Graphic drivers
======================================
intel-gpu-tools (tag)       : intel-gpu-tools-1.20-310-g37bd27f
intel-gpu-tools (commit)    : 37bd27f

====================================
Dmesg
====================================

[ 6264.828302] hangcheck        IPEHR: 0x00000000
[ 6264.828305] hangcheck        Execlist status: 0x00000301 00000000
[ 6264.828309] hangcheck        Execlist CSB read 0 [0 cached], write 0 [0 from hws], interrupt posted? no
[ 6264.828318] BUG: unable to handle kernel paging request at 0000000100000084
[ 6264.828389] IP: print_request+0x11/0xb0 [i915]
[ 6264.828394] PGD 0 P4D 0
[ 6264.828400] Oops: 0000 [#1] SMP PTI
[ 6264.828406] Dumping ftrace buffer:
[ 6264.828411]    (ftrace buffer empty)
[

Comment 4 Chris Wilson 2018-02-11 17:36:13 UTC

Note the GEM_TRACE for this bug may be of some use.

Comment 5 Elizabeth 2018-02-12 19:26:14 UTC

(In reply to Chris Wilson from comment #4)
> Note the GEM_TRACE for this bug may be of some use.
Hello Chris, how do I get the GEM_TRACE??

Comment 6 Chris Wilson 2018-02-12 20:37:35 UTC

CONFIG_DRM_I915_TRACE_GEM

Comment 7 Armando Antonio 2018-02-14 22:26:15 UTC

The following test cases have the same issue on CNL

==================================================
Test cases
==================================================
igt@gem_exec_flush@stream-rw-before-vebox
igt@gem_exec_flush@stream-set-vebox

==================================================
Dmesg Summary
==================================================
[ 4416.601500] [IGT] gem_exec_flush: executing
[ 4416.669251] Setting dangerous option reset - tainting kernel
[ 4416.671089] [IGT] gem_exec_flush: starting subtest stream-rw-before-vebox
[ 4442.817481] BUG: unable to handle kernel paging request at 000000010000006c
[ 4442.817525] IP: i915_gem_record_rings+0x390/0xb60 [i915]
[ 4442.817527] PGD 0 P4D 0
[ 4442.817531] Oops: 0000 [#1] SMP PTI
[ 4442.817535] Dumping ftrace buffer:
[ 4442.817537]    (ftrace buffer empty)

Comment 8 Armando Antonio 2018-02-27 22:51:54 UTC

Created attachment 137673 [details]
dmesg with gem_trace

I ran one of the test that cause this issue with the option CONFIG_DRM_I915_TRACE_GEM added to the config but it looks like I got the same information than without CONFIG_DRM_I915_TRACE_GEM.

Attached new dmesg and config file.

Regards

Comment 9 Armando Antonio 2018-02-27 22:52:19 UTC

Created attachment 137674 [details]
config file

Comment 10 Rodrigo Vivi 2018-02-28 23:31:02 UTC

Btw I believe this might worth a bisect attempt.

I believe we run gem_exec nop basic commands during power on and worked reliably.

Comment 11 Elizabeth 2018-03-01 16:09:12 UTC

Changing tittle since I checked our igt results and the igt@gem_exec_nop@basic-series test is working correctly, not showing the bug anymore.

Comment 12 Hector Velazquez 2018-03-08 22:18:44 UTC

This tests has timeout/dmesg-fail on CNL QA 
Tests List:

igt@gem_ctx_switch@blt-interruptible
igt@gem_exec_flush@wb-rw-blt
igt@gem_exec_flush@wc-pro-blt

dmesg sample:
. . .
[Mar 7 01:20] hangcheck bcs0
[  +0.000006] hangcheck 	current seqno 6ce05a, last 6ce05a, hangcheck 6ce05a [4000 ms], inflight 12490
[  +0.000002] hangcheck 	Reset count: 0 (global 0)
[  +0.000002] hangcheck 	Requests:
[  +0.000004] hangcheck 	RING_START: 0x00e78000
[  +0.000004] hangcheck 	RING_HEAD:  0x00003c48
[  +0.000003] hangcheck 	RING_TAIL:  0x00003c48
[  +0.000005] hangcheck 	RING_CTL:   0x00003000
[  +0.000004] hangcheck 	RING_MODE:  0x00000200 [idle]
[  +0.000004] hangcheck 	RING_IMR: feffffff
[  +0.000006] hangcheck 	ACTHD:  0x00000000_43803c48
[  +0.000006] hangcheck 	BBADDR: 0x00000000_00000004
[  +0.000006] hangcheck 	DMA_FADDR: 0x00000000_00000000
[  +0.000004] hangcheck 	IPEIR: 0x00000000
[  +0.000003] hangcheck 	IPEHR: 0x00000000
[  +0.000005] hangcheck 	Execlist status: 0x00000301 00000000
[  +0.000004] hangcheck 	Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no
[  +0.000011] BUG: unable to handle kernel paging request at 0000000100000084
[  +0.000076] IP: print_request+0x11/0xb0 [i915]
[  +0.000006] PGD 0 P4D 0 
[  +0.000008] Oops: 0000 [#1] SMP PTI
[  +0.000007] Dumping ftrace buffer:
[  +0.000006]    (ftrace buffer empty)
[  +0.000004] Modules linked in: snd_hda_codec_hdmi cmac bnep 8250_dw arc4 nls_iso8859_1 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core snd_hda_codec_realtek x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_generic coretemp snd_compress kvm_intel snd_pcm_dmaengine ac97_bus kvm iwlmvm irqbypass mac80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_intel snd_hda_codec aesni_intel aes_x86_64 snd_hda_core crypto_simd glue_helper snd_hwdep cryptd snd_pcm snd_seq_midi snd_seq_midi_event input_leds snd_rawmidi serio_raw snd_seq wmi_bmof snd_seq_device asix snd_timer usbnet mii iwlwifi snd btusb btrtl btbcm btintel soundcore bluetooth shpchp cfg80211 ecdh_generic idma64 virt_dma intel_pch_thermal mei_me intel_lpss_pci mei intel_lpss
[  +0.000096]  acpi_pad mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 uas usb_storage i915 e1000e wmi prime_numbers video
[  +0.000025] CPU: 0 PID: 48 Comm: kworker/0:2 Tainted: G     U           4.16.0-rc4-drm-intel-qa-ww10-commit-6c6e100+ #1
[  +0.000007] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X124.B02.1802051422 02/05/2018
[  +0.000063] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  +0.000056] RIP: 0010:print_request+0x11/0xb0 [i915]
[  +0.000005] RSP: 0018:ffffb68e40e97c90 EFLAGS: 00010286
[  +0.000006] RAX: 0000000000000017 RBX: 00000000fffffffc RCX: 0000000000000006
[  +0.000006] RDX: ffffb68e40e97cd0 RSI: 00000000fffffffc RDI: ffffb68e40e97e28
[  +0.000005] RBP: ffffb68e40e97e28 R08: 0000000000000003 R09: 0000000000000000
[  +0.000005] R10: 0000000000000005 R11: ffffb68e40e97ce1 R12: ffffb68e40e97cd0
[  +0.000005] R13: ffffb68e40e97e28 R14: ffff91c418cdc2e8 R15: 0000000000000005
[  +0.000006] FS:  0000000000000000(0000) GS:ffff91c42f800000(0000) knlGS:0000000000000000
[  +0.000007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000004] CR2: 0000000100000084 CR3: 000000017a60a002 CR4: 0000000000760ef0
[  +0.000005] PKRU: 55555554
[  +0.000004] Call Trace:
[  +0.000058]  intel_engine_print_registers+0x586/0x7c0 [i915]
[  +0.000056]  intel_engine_dump+0x22e/0x440 [i915]
[  +0.000056]  ? fwtable_read32+0x83/0x1b0 [i915]
[  +0.000052]  i915_hangcheck_elapsed+0x2f2/0x340 [i915]
[  +0.000012]  ? __drm_printfn_info+0x20/0x20
[  +0.000009]  process_one_work+0x147/0x3c0
[  +0.000008]  worker_thread+0x4a/0x440
[  +0.000008]  kthread+0xf8/0x130
[  +0.000007]  ? rescuer_thread+0x360/0x360
[  +0.000006]  ? kthread_associate_blkcg+0x90/0x90
[  +0.000007]  ret_from_fork+0x35/0x40
[  +0.000006] Code: 00 00 00 be 90 40 00 00 89 c2 48 89 df e8 a8 d4 9b d9 e9 c4 fc ff ff 0f 1f 00 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 53 48 89 fd <48> 8b 86 88 00 00 00 48 8b 3d 21 f3 fb d9 48 89 f3 48 2b be e8 
[  +0.000119] RIP: print_request+0x11/0xb0 [i915] RSP: ffffb68e40e97c90
[  +0.000004] CR2: 0000000100000084
[  +0.000006] ---[ end trace 087b49c969bb5e6d ]---
[Mar 7 01:29] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
. . .

software:
IGT-Version: 1.21-g289202e (x86_64) (Linux: 4.16.0-rc4-drm-intel-qa-ww10-commit-6c6e100+ x86_64)

Comment 13 Hector Velazquez 2018-03-08 22:32:25 UTC

from comment 12: /sys/kernel/debug/kmemleak and log files available if is needed...

Comment 14 Jani Saarinen 2018-03-29 07:09:59 UTC

First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.

Comment 15 Elizabeth 2018-04-09 20:14:12 UTC

The dmesg-warn still shows with igt@gem_exec_flush family, though most of them are timeouts.

Results for igt@gem_exec_flush@batch-gtt-blt-uc
Result: timeout

Time	603.13 seconds
Out	
An internal exception that should have been handled was not:
Test run time exceeded timeout value (600 seconds)

Dmesg	
[   70.143378] Setting dangerous option reset - tainting kernel
[  151.840267] BUG: unable to handle kernel paging request at 0000000100000004
[  151.840310] IP: print_request+0x11/0xb0 [i915]
[  151.840318] Oops: 0000 [#1] SMP PTI
[  151.840322] Dumping ftrace buffer:
[  151.840326]    (ftrace buffer empty)
[  151.840328] Modules linked in: snd_hda_codec_hdmi cmac snd_hda_codec_realtek snd_hda_codec_generic bnep 8250_dw nls_iso8859_1 arc4 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core x86_pkg_temp_thermal snd_compress intel_powerclamp iwlmvm coretemp snd_pcm_dmaengine ac97_bus mac80211 kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_intel crc32_pclmul ghash_clmulni_intel pcbc snd_hda_codec snd_hda_core snd_hwdep aesni_intel snd_pcm aes_x86_64 crypto_simd glue_helper cryptd snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi wmi_bmof snd_seq snd_seq_device snd_timer btusb btrtl btbcm asix btintel iwlwifi usbnet mii snd bluetooth soundcore input_leds shpchp ecdh_generic idma64 virt_dma cfg80211 mei_me intel_pch_thermal intel_lpss_pci mei intel_lpss
[  151.840384]  acpi_pad mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid dwc3 udc_core ulpi i915 e1000e dwc3_pci prime_numbers wmi video
[  151.840402] CPU: 0 PID: 46 Comm: kworker/0:1 Tainted: G     U           4.16.0-rc7-drm-intel-qa-ww14-commit-c46052c+ #1
[  151.840406] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X124.B02.1802051422 02/05/2018
[  151.840444] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  151.840476] RIP: 0010:print_request+0x11/0xb0 [i915]
[  151.840479] RSP: 0018:ffffbd3c40e73c30 EFLAGS: 00010282
[  151.840482] RAX: 0000000000000017 RBX: 00000000fffffffc RCX: 0000000000000006
[  151.840485] RDX: ffffbd3c40e73c70 RSI: 00000000fffffffc RDI: ffffbd3c40e73dd8
[  151.840488] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[  151.840491] R10: 0000000000000005 R11: ffffbd3c40e73c81 R12: ffffbd3c40e73c70
[  151.840494] R13: ffffbd3c40e73dd8 R14: ffff98c3aa9662e0 R15: 0000000000000005
[  151.840497] FS:  0000000000000000(0000) GS:ffff98c3bf800000(0000) knlGS:0000000000000000
[  151.840501] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.840503] CR2: 0000000100000004 CR3: 00000001f1c0a001 CR4: 0000000000760ef0
[  151.840507] PKRU: 55555554
[  151.840509] Call Trace:
[  151.840540]  intel_engine_print_registers+0x5ba/0x7f0 [i915]
[  151.840570]  intel_engine_dump+0x260/0x470 [i915]
[  151.840603]  ? fwtable_read32+0x83/0x1b0 [i915]
[  151.840633]  i915_hangcheck_elapsed+0x3ed/0x580 [i915]
[  151.840640]  ? __drm_printfn_info+0x20/0x20
[  151.840646]  process_one_work+0x147/0x3c0
[  151.840650]  worker_thread+0x4a/0x440
[  151.840653]  kthread+0xf8/0x130
[  151.840657]  ? rescuer_thread+0x360/0x360
[  151.840660]  ? kthread_associate_blkcg+0x90/0x90
[  151.840664]  ret_from_fork+0x35/0x40
[  151.840667] Code: 00 00 00 be 90 40 00 00 89 c2 48 89 df e8 38 4e fa eb e9 c4 fc ff ff 0f 1f 00 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 53 48 89 f3 <48> 8b 46 08 48 89 fd 48 89 f7 48 8b 40 08 e8 ac 4c fa eb 48 8b 
[  151.840733] RIP: print_request+0x11/0xb0 [i915] RSP: ffffbd3c40e73c30
[  151.840736] CR2: 0000000100000004
[  151.840739] ---[ end trace d3cb685d687811aa ]---

Comment 16 Chris Wilson 2018-07-14 16:00:14 UTC

commit 77dfedb5be03779f9a5d83e323a1b36e32090105
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 11 13:11:45 2018 +0100

    drm/i915/execlists: Use rmb() to order CSB reads
    
    We assume that the CSB is written using the normal ringbuffer
    coherency protocols, as outlined in kernel/events/ring_buffer.c:
    
        *   (HW)                              (DRIVER)
        *
        *   if (LOAD ->data_tail) {            LOAD ->data_head
        *                      (A)             smp_rmb()       (C)
        *      STORE $data                     LOAD $data
        *      smp_wmb()       (B)             smp_mb()        (D)
        *      STORE ->data_head               STORE ->data_tail
        *   }
    
    So we assume that the HW fulfils its ordering requirements (B), and so
    we should use a complimentary rmb (C) to ensure that our read of its
    WRITE pointer is completed before we start accessing the data.
    
    The final mb (D) is implied by the uncached mmio we perform to inform
    the HW of our READ pointer.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
    References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
    Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
    References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
    Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Michał Winiarski <michal.winiarski@intel.com>
    Cc: Rafael Antognolli <rafael.antognolli@intel.com>
    Cc: Michel Thierry <michel.thierry@intel.com>
    Cc: Timo Aaltonen <tjaalton@ubuntu.com>
    Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
    Acked-by: Michel Thierry <michel.thierry@intel.com>
    Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-1-chris@chris-wilson.co.uk

Comment 17 Lakshmi 2018-08-22 13:30:25 UTC

(In reply to Chris Wilson from comment #16)
> commit 77dfedb5be03779f9a5d83e323a1b36e32090105
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri May 11 13:11:45 2018 +0100
> 
>     drm/i915/execlists: Use rmb() to order CSB reads
>     
>     We assume that the CSB is written using the normal ringbuffer
>     coherency protocols, as outlined in kernel/events/ring_buffer.c:
>     
>         *   (HW)                              (DRIVER)
>         *
>         *   if (LOAD ->data_tail) {            LOAD ->data_head
>         *                      (A)             smp_rmb()       (C)
>         *      STORE $data                     LOAD $data
>         *      smp_wmb()       (B)             smp_mb()        (D)
>         *      STORE ->data_head               STORE ->data_tail
>         *   }
>     
>     So we assume that the HW fulfils its ordering requirements (B), and so
>     we should use a complimentary rmb (C) to ensure that our read of its
>     WRITE pointer is completed before we start accessing the data.
>     
>     The final mb (D) is implied by the uncached mmio we perform to inform
>     the HW of our READ pointer.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
>     Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> from the HWSP")
>     References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context
> status buffer")
>     Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Cc: Michał Winiarski <michal.winiarski@intel.com>
>     Cc: Rafael Antognolli <rafael.antognolli@intel.com>
>     Cc: Michel Thierry <michel.thierry@intel.com>
>     Cc: Timo Aaltonen <tjaalton@ubuntu.com>
>     Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
>     Acked-by: Michel Thierry <michel.thierry@intel.com>
>     Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-1-
> chris@chris-wilson.co.uk

Closing this issue as this was already resolved and verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.