Bug 103064 - [IGT guc] igt@drv_missed_irq fail with a GPU hang...
Summary: [IGT guc] igt@drv_missed_irq fail with a GPU hang...
Status: CLOSED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-02 15:52 UTC by Hector Velazquez
Modified: 2018-02-13 16:27 UTC (History)
1 user (show)

See Also:
i915 platform: CFL
i915 features: firmware/guc


Attachments
output (4.00 KB, text/plain)
2017-10-02 15:52 UTC, Hector Velazquez
no flags Details
Kernel (256.61 KB, text/plain)
2017-10-02 15:54 UTC, Hector Velazquez
no flags Details
dmesg (12.55 KB, text/plain)
2017-10-02 15:54 UTC, Hector Velazquez
no flags Details
dmesg (skip test) (244.46 KB, text/plain)
2017-12-14 17:07 UTC, Hector Velazquez
no flags Details

Description Hector Velazquez 2017-10-02 15:52:46 UTC
Created attachment 134621 [details]
output
Comment 1 Hector Velazquez 2017-10-02 15:52:57 UTC
This test are failing on CFL-S-1 QA

Tests List:

igt@drv_missed_irq


====================================================
Output Sample
====================================================
. . .
(drv_missed_irq:2623) DEBUG: Executing on ring blt [3]
(drv_missed_irq:2623) DEBUG: Executing on ring vebox [4]
(drv_missed_irq:2625) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(drv_missed_irq:2625) igt-debugfs-DEBUG: i915_error_state:
GPU HANG: ecode 9:1:0xfffffffe, reason: Hang on bcs0, vcs0, vecs0, action: reset
Kernel: 4.14.0-rc3-drm-tip-ww40-commit-2f14e31+
Time: 1506929029 s 909737 us
Boottime: 518 s 752115 us
Uptime: 505 s 308188 us
Reset count: 0
Suspend count: 0
Platform: COFFEELAKE
PCI ID: 0x3e92
PCI Revision: 0x00
PCI Subsystem: 8086:2212
IOMMU enabled?: 0
DMC loaded: yes
DMC fw version: 1.1
GT awake: yes
RPM wakelock: yes
PM suspended: no
EIR: 0x00000000
IER: 0x08000000
GTIER[0]: 0x01010101
GTIER[1]: 0x01010101
GTIER[2]: 0x00000070
GTIER[3]: 0x00000101
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00010001
DERRMR: 0x2077efef
CCID: 0x00000000
Missed interrupts: 0x00000017
  fence[0] = 00000000
  fence[1] = 00000000
  fence[2] = 00000000
  fence[3] = 00000000
  fence[4] = 00000000
  fence[5] = 00000000
  fence[6] = 00000000
  fence[7] = 00000000
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = 00000000
  fence[13] = 00000000
  fence[14] = 00000000
  fence[15] = 00000000
  fence[16] = 00000000
  fence[17] = 00000000
  fence[18] = 00000000
  fence[19] = 00000000
  fence[20] = 00000000
  fence[21] = 00000000
  fence[22] = 00000000
  fence[23] = 00000000
  fence[24] = 00000000
  fence[25] = 00000000
  fence[26] = 00000000
  fence[27] = 00000000
  fence[28] = 00000000
  fence[29] = 00000000
  fence[30] = 00000000
  fence[31] = 00000000
ERROR: 0x00000000
FAULT_TLB_DATA: 0x00000000 0x67029d0c
DONE_REG: 0xffffffff
render command stream:
  START: 0x0014f000
  HEAD:  0x00001f18 [0x00000000]
  TAIL:  0x00001f18 [0x00000000, 0x00000000]
  CTL:   0x00003000
  MODE:  0x00000200
  HWS:   0xfede6000
  ACTHD: 0x00000000 00001f18
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0xfffffffe
  SC_INSTDONE: 0xffffffff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  SAMPLER_INSTDONE[0][1]: 0xffffffff
  SAMPLER_INSTDONE[0][2]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][1]: 0xffffffff
  ROW_INSTDONE[0][2]: 0xffffffff
  BBADDR: 0x00000000_00000004
  BB_STATE: 0x00000020
  INSTPS: 0x00000001
  INSTPM: 0x00000000
  FADDR: 0x00000000 00000000
  RC PSMI: 0x00000010
  FAUL
. . .


This is my configuration:

======================================
        Graphic stack
======================================
Component: drm
    tag: libdrm-2.4.81-57-g1dd84e0
    commit: 1dd84e01a972b1759839a7326009be24ab3e6de2

Component: cairo
    tag: 1.15.6-42-gdccbed7
    commit: dccbed7d78d32bd3b912e8810379451dd94e6a1f

Component: intel-gpu-tools
    tag: intel-gpu-tools-1.19-357-g1e99f8b
    commit: 1e99f8b8d2563d7f5c4e82932bab15abc5eacaef

Component: piglit
    tag: piglit-v1
    commit: 5aa6eea37f44f818632a3dad4c1a7478085bd56d

	
======================================
             Software
======================================
kernel version              : 4.14.0-rc3-drm-tip-ww40-commit-2f14e31+
hostname                    : CFL-S-1
architecture                : x86_64
os version                  : Ubuntu 16.10
os codename                 : yakkety
kernel driver               : i915
bios revision               : 104.3
bios release date           : 09/14/2017
ksc                         : 1.5
hardware acceleration       : disabled
swap partition              : enabled on (/dev/nvme0n1p3
/dev/sda3)

======================================
        Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm                      : 2.4.83
cairo                       : 1.15.9
intel-gpu-tools (tag)       : intel-gpu-tools-1.19-357-g1e99f8b
intel-gpu-tools (commit)    : 1e99f8b

======================================
             Hardware
======================================
motherboard model          : CoffeeLakeClientPlatform
motherboard id             : CoffeeLakeSUDIMMRVP
form factor                : Desktop
manufacturer               : IntelCorporation
cpu family                 : Other
cpu family id              : 6
cpu information            : Genuine Intel(R) CPU 0000 @ 3.60GHz
gpu card                   : Intel Corporation Device 3e92 (prog-if 00 [VGA controller])
memory ram                 : 15.59 GB
max memory ram             : 32 GB
cpu thread                 : 12
cpu core                   : 6
cpu model                  : 158
cpu stepping               : 10
socket                     : Other
hard drive                 : 111GiB (120GB)
current cd clock frequency : 337500 kHz
maximum cd clock frequency : 675000 kHz
displays connected         : eDP-1 DP-1

======================================
             Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.1
guc fw loaded             : SUCCESS
guc version wanted        : 9.14
guc version found         : 9.14
huc fw loaded             : yes

======================================
             kernel parameters
======================================
quiet drm.debug=0x1e i915.enable_guc_loading=2 i915.enable_guc_submission=2 i915.alpha_support=1 auto panic=1 nmi_watchdog=panic intel_iommu=igfx_off resume=/dev/sda3 fastboot
Comment 2 Hector Velazquez 2017-10-02 15:54:23 UTC
Created attachment 134622 [details]
Kernel
Comment 3 Hector Velazquez 2017-10-02 15:54:46 UTC
Created attachment 134623 [details]
dmesg
Comment 4 Elizabeth 2017-10-02 19:38:57 UTC
From dmesg:
[  518.757908] ------------[ cut here ]------------
[  518.757910] kernel BUG at drivers/gpu/drm/i915/i915_gem_gtt.c:3380!
[  518.757921] invalid opcode: 0000 [#1] PREEMPT SMP
[  518.757925] Modules linked in: snd_hda_codec_hdmi asix usbnet mii snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core i915 prime_numbers i2c_hid
[  518.757953] CPU: 1 PID: 62 Comm: kworker/1:1 Tainted: G     U          4.14.0-rc3-drm-tip-ww40-commit-2f14e31+ #1
[  518.757959] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X104.A03.1709140535 09/14/2017
[  518.757978] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  518.757983] task: ffff880459238040 task.stack: ffffc900002d4000
[  518.758020] RIP: 0010:i915_ggtt_enable_guc+0x1e/0x20 [i915]
[  518.758024] RSP: 0018:ffffc900002d7a70 EFLAGS: 00010202
[  518.758029] RAX: 0000000080000000 RBX: ffff88044da04d28 RCX: 0000000000000001
[  518.758033] RDX: 0000000080000001 RSI: ffff880459238978 RDI: ffff88044da00000
[  518.758038] RBP: ffffc900002d7a70 R08: 0000000000000000 R09: 0000000000000001
[  518.758043] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88044da00000
[  518.758063] R13: ffff88044da00ec0 R14: ffffc900002d7b38 R15: ffff88044d8312b8
[  518.758067] FS:  0000000000000000(0000) GS:ffff88045d240000(0000) knlGS:0000000000000000
[  518.758072] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  518.758076] CR2: 00007fd7c291d9d0 CR3: 0000000431126001 CR4: 00000000003606e0
[  518.758081] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  518.758085] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  518.758105] Call Trace:
[  518.758120]  intel_uc_init_hw+0x50/0x340 [i915]
[  518.758135]  i915_gem_init_hw+0x10a/0x2a0 [i915]
[  518.758146]  i915_reset+0x1c5/0x220 [i915]
[  518.758157]  i915_reset_device+0x1e4/0x240 [i915]
[  518.758169]  ? gen8_gt_irq_ack+0x160/0x160 [i915]
[  518.758174]  ? work_on_cpu_safe+0x60/0x60
[  518.758185]  i915_handle_error+0x266/0x400 [i915]
[  518.758191]  ? vsnprintf+0x23b/0x470
[  518.758194]  ? scnprintf+0x3a/0x70
[  518.758209]  hangcheck_declare_hang+0xcd/0xf0 [i915]
[  518.758224]  ? intel_engine_get_active_head+0xaf/0xd0 [i915]
[  518.758238]  i915_hangcheck_elapsed+0x26b/0x2e0 [i915]
[  518.758243]  process_one_work+0x20d/0x6a0
[  518.758247]  worker_thread+0x49/0x3b0
[  518.758251]  kthread+0x14d/0x180
[  518.758253]  ? process_one_work+0x6a0/0x6a0
[  518.758257]  ? kthread_create_on_node+0x40/0x40
[  518.758261]  ret_from_fork+0x27/0x40
[  518.758265] Code: c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 48 81 bf 70 67 00 00 60 f1 06 a0 55 48 89 e5 75 0d 48 c7 87 70 67 00 00 80 f1 06 a0 5d c3 <0f> 0b 48 81 bf 70 67 00 00 80 f1 06 a0 55 48 89 e5 75 0d 48 c7 
[  518.758312] RIP: i915_ggtt_enable_guc+0x1e/0x20 [i915] RSP: ffffc900002d7a70
[  518.758323] ---[ end trace 90615b73efb366df ]---
Comment 5 Hector Velazquez 2017-12-14 17:06:41 UTC
This test are Skipping on CFL QA

igt@drv_missed_irq
 
IGT-Version: 1.20-g103af72 (x86_64) (Linux: 4.15.0-rc3-drm-intel-qa-ww50-commit-91d06d0+ x86_64)

(drv_missed_irq:32492) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(drv_missed_irq:32492) drmtest-DEBUG: Test requirement passed: !(fd<0)
(drv_missed_irq:32492) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(drv_missed_irq:32492) drmtest-DEBUG: Test requirement passed: is_i915_device(fd) && has_known_intel_chipset(fd)
(drv_missed_irq:32492) ioctl-wrappers-DEBUG: Test requirement passed: err == 0
Test requirement not met in function __real_main87, file drv_missed_irq.c:100:

Test requirement: !(gem_has_guc_submission(device))   <<<====

Last errno: 9, Bad file descriptor
(drv_missed_irq:32492) igt-core-DEBUG: Exiting with status code 77
SKIP (0.029s)

--------------------------------------------------
#cat /sys/kernel/debug/dri/0/i915_dmc_info
--------------------------------------------------
fw loaded: yes
path: i915/kbl_dmc_ver1_04.bin
version: 1.4
program base: 0x09004040
ssp base: 0x00002fc0
htp: 0x00b40068
--------------------------------------------------
#cat /sys/kernel/debug/dri/0/i915_guc_load_status
--------------------------------------------------
GuC firmware: i915/kbl_guc_ver9_39.bin
	status: fetch SUCCESS, load SUCCESS
	version: wanted 9.39, found 9.39
	header: offset 0, size 128
	uCode: offset 128, size 147392
	RSA: offset 147520, size 256

GuC status 0x800330ed:
	Bootrom status = 0x76
	uKernel status = 0x30
	MIA Core status = 0x3

Scratch registers:
	 0: 	0xf0000000
	 1: 	0x1
	 2: 	0x0
	 3: 	0x5f5e100
	 4: 	0x600
	 5: 	0xcefd3
	 6: 	0x0
	 7: 	0x8
	 8: 	0x3
	 9: 	0x70a40
	10: 	0x0
	11: 	0x0
	12: 	0x0
	13: 	0x0
	14: 	0x0
	15: 	0x0
--------------------------------------------------
#cat /sys/kernel/debug/dri/0/i915_huc_load_status
--------------------------------------------------
HuC firmware: i915/kbl_huc_ver02_00_1810.bin
	status: fetch SUCCESS, load SUCCESS
	version: wanted 2.0, found 2.0
	header: offset 0, size 128
	uCode: offset 128, size 218304
	RSA: offset 218432, size 256

HuC status 0x00006080:
--------------------------------------------------
Comment 6 Hector Velazquez 2017-12-14 17:07:38 UTC
Created attachment 136177 [details]
dmesg (skip test)
Comment 7 Jari Tahvanainen 2018-02-06 13:03:15 UTC
Skip is the expected behavior with GuC loaded after the change on intel-gpu-tools:
commit 3a52d8c244053cac74839e1cdbea58ebaa5fe470
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Tue Oct 24 16:24:22 2017 +0100
Commit:     Chris Wilson <chris@chris-wilson.co.uk>
CommitDate: Tue Oct 24 19:53:35 2017 +0100

    igt/drv_misssed_irq: Skip on guc

    Since the driver's guc submission method requires the breadcrumbs irq
    for feeding requests to the guc, we cannot simply simulate a missing irq
    by disabling the interrupts.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Michał Winiarski <michal.winiarski@intel.com>
    Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

And one uses guc here ... 
#cat /sys/kernel/debug/dri/0/i915_guc_load_status
--------------------------------------------------
GuC firmware: i915/kbl_guc_ver9_39.bin
	status: fetch SUCCESS, load SUCCESS

If you disagree then please reopen this.
Comment 8 Octavio 2018-02-09 23:33:07 UTC
This test is Skip on CFL

IGT-Version: 1.21-g94bd67c (x86_64) (Linux: 4.15.0-drm-intel-qa-ww6-commit-6c10ba2+ x86_64)

Test requirement not met in function __real_main87, file drv_missed_irq.c:100:
Test requirement: !(gem_has_guc_submission(device))
Last errno: 9, Bad file descriptor
SKIP (0.057s)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.