Summary: | [CI] igt@kms_atomic_transition@plane-all-modeset-[transition|transition-fencing] - Softdog incomplete | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Imre Deak <imre.deak> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | high | CC: | bugs, imre.deak, intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | GLK | i915 features: | display/Other |
Description
Marta Löfstedt
2017-10-18 09:17:54 UTC
Looks like the test is running properly, doing modesets while the softwachdog (owatch?) triggers. Could be because it's a long running test without any output. For owatch's sake then we could emit more progress info from these tests. There is also this: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3260/shard-glkb5/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-shrfb-draw-mmap-wc.html However, it looks like something is wrong because there is no pstore only an OOPS indicating that the softdog was triggered. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3265/shard-glkb2/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-mmap-cpu.html Note this also has the pattern with only Oops log usually there are 2 logs. (In reply to Jani Saarinen from comment #3) > https://patchwork.freedesktop.org/series/32206/ The above patchset won't help, owatch is only pinged by piglit's output and piglit doesn't forward the test's output itself. The next step would be to check why one modeset takes so long (~1s) and if that can't be fixed consider - increasing owatch's timeout - break the subtests to smaller parts - reduce the number of iterations in the subtests I did some quick comparison with my glk and skl. Something is "wrong" GLK since the execution time for test is wayyyy more than on SKL: GLK - Subtest plane-all-modeset-transition: SUCCESS (3346,877s) SKL - Subtest plane-all-modeset-transition: SUCCESS (12,187s) Test seems to be pass on this case, but on CI timer kicks in. (In reply to Jari Tahvanainen from comment #7) > I did some quick comparison with my glk and skl. Something is "wrong" GLK > since the execution time for test is wayyyy more than on SKL: > GLK - Subtest plane-all-modeset-transition: SUCCESS (3346,877s) > SKL - Subtest plane-all-modeset-transition: SUCCESS (12,187s) > Test seems to be pass on this case, but on CI timer kicks in. Jari, that is the defining criteria for the softdog to trigger. If Imre want's longer softdog timeout he could just ask Tomi. (In reply to Marta Löfstedt from comment #8) > (In reply to Jari Tahvanainen from comment #7) > > I did some quick comparison with my glk and skl. Something is "wrong" GLK > > since the execution time for test is wayyyy more than on SKL: > > GLK - Subtest plane-all-modeset-transition: SUCCESS (3346,877s) > > SKL - Subtest plane-all-modeset-transition: SUCCESS (12,187s) > > Test seems to be pass on this case, but on CI timer kicks in. > > Jari, that is the defining criteria for the softdog to trigger. > > If Imre want's longer softdog timeout he could just ask Tomi. No, I think there is actually something wrong here, ~2 seconds for two modesets+CRC grabbing is too long. Adding also Maarten, but I can take a look at this on my GLK tomorrow. CI_DRM_3284 GLK-shards igt@gem_exec_whisper@normal incomplete: NOTE FROM CI_DRM_3284 the CI system owatch has been updated. This is what the new Softdog looks like: <12>[ 3757.298960] owatch: TIMEOUT! <12>[ 3757.299104] owatch: timeout for /dev/watchdog0 set to 10 (requested 10) <6>[ 3757.299319] sysrq: SysRq : Trigger a crash <1>[ 3757.299428] BUG: unable to handle kernel NULL pointer dereference at (null) <1>[ 3757.299468] IP: sysrq_handle_crash+0x45/0x80 <6>[ 3757.299484] PGD 16ebfb067 P4D 16ebfb067 PUD 164c6a067 PMD 0 <4>[ 3757.299514] Oops: 0002 [#1] PREEMPT SMP <4>[ 3757.299530] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 3757.299675] CPU: 0 PID: 10412 Comm: owatch Tainted: G U W 4.14.0-rc6-CI-CI_DRM_3284+ #1 <4>[ 3757.299702] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 3757.299733] task: ffff88017a9e4ec0 task.stack: ffffc900007a8000 <4>[ 3757.299755] RIP: 0010:sysrq_handle_crash+0x45/0x80 <4>[ 3757.299771] RSP: 0018:ffffc900007abde0 EFLAGS: 00010282 <4>[ 3757.299791] RAX: ffff88017a9e4ec0 RBX: 0000000000000063 RCX: b946b02c2f5ed020 <4>[ 3757.299814] RDX: 0000000000000001 RSI: 00000000cd19e7cc RDI: ffff88017a9e4ec0 <4>[ 3757.299836] RBP: ffffc900007abde0 R08: ffff88017a9e5780 R09: 0000000000000000 <4>[ 3757.299858] R10: 000000009d08356a R11: 000000006244e854 R12: 0000000000000000 <4>[ 3757.299880] R13: 0000000000000004 R14: ffffffff81ebeb60 R15: ffffc900007abf20 <4>[ 3757.299903] FS: 00007fea1f64c700(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 <4>[ 3757.299928] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 3757.299947] CR2: 0000000000000000 CR3: 000000017abbb000 CR4: 00000000003406f0 <4>[ 3757.299968] Call Trace: <4>[ 3757.299983] __handle_sysrq+0x132/0x210 <4>[ 3757.300001] write_sysrq_trigger+0x51/0x60 <4>[ 3757.300019] proc_reg_write+0x42/0x70 <4>[ 3757.300036] __vfs_write+0x28/0x130 <4>[ 3757.300052] ? rcu_read_lock_sched_held+0x7a/0x90 <4>[ 3757.300070] ? rcu_sync_lockdep_assert+0x2f/0x60 <4>[ 3757.300088] ? __sb_start_write+0x10c/0x200 <4>[ 3757.300106] vfs_write+0xc8/0x1c0 <4>[ 3757.300121] SyS_write+0x49/0xb0 <4>[ 3757.300140] entry_SYSCALL_64_fastpath+0x1c/0xb1 <4>[ 3757.300157] RIP: 0033:0x7fea1f1788f0 <4>[ 3757.300170] RSP: 002b:00007ffd0f0abfa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4>[ 3757.300197] RAX: ffffffffffffffda RBX: ffffffff81491ef3 RCX: 00007fea1f1788f0 <4>[ 3757.300218] RDX: 0000000000000001 RSI: 0000000000f0f250 RDI: 0000000000000003 <4>[ 3757.300240] RBP: ffffc900007abf88 R08: 00007fea1f64c700 R09: 0000000000000410 <4>[ 3757.300262] R10: 00007fea1f441b58 R11: 0000000000000246 R12: 0000000000f0f020 <4>[ 3757.300284] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001 <4>[ 3757.300310] ? __this_cpu_preempt_check+0x13/0x20 <4>[ 3757.300329] Code: 34 e8 90 45 b6 ff 48 c7 c2 2b 76 59 81 be 01 00 00 00 48 c7 c7 80 d5 e4 81 e8 b8 60 b4 ff c7 05 be 47 b7 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 1c 74 b6 ff 84 c0 75 c3 48 c7 <1>[ 3757.300518] RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc900007abde0 <4>[ 3757.300537] CR2: 0000000000000000 <4>[ 3757.300553] ---[ end trace 05a1442557e12508 ]--- <0>[ 3757.404711] Kernel panic - not syncing: Fatal exception <0>[ 3757.404784] Kernel Offset: disabled https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3284/shard-glkb4/igt@gem_exec_whisper@normal.html The reason for modesets taking long: [ 34.061132] [drm:intel_pps_dump_state [i915]] cur t1_t3 0 t8 0 t9 0 t10 500 t11_t12 6000 [ 34.061181] [drm:intel_pps_dump_state [i915]] vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 6000 [ 34.061220] [drm:intel_dp_init_panel_power_sequencer [i915]] panel power up delay 200, power down delay 50, power cycle delay 600 [ 34.061259] [drm:intel_dp_init_panel_power_sequencer [i915]] backlight on delay 1, off delay 200 So each modeset incurs all the above delays ~1050ms. I think the test should set the number of iterations taking into account the above. (In reply to Jani Saarinen from comment #12) > Reference: https://patchwork.freedesktop.org/series/32905/ Note, the patch is only for the kms_atomic_transition issue. So, if/when Imres fix is merged and fixes the issue. I will remove the kms_atomic_transition and rename this bug. This is my bad I should have renamed the bug as soon as I started filing softdogs on other tests on this bug. (In reply to Marta Löfstedt from comment #13) > (In reply to Jani Saarinen from comment #12) > > Reference: https://patchwork.freedesktop.org/series/32905/ > > Note, the patch is only for the kms_atomic_transition issue. So, if/when > Imres fix is merged and fixes the issue. I will remove the > kms_atomic_transition and rename this bug. This is my bad I should have > renamed the bug as soon as I started filing softdogs on other tests on this > bug. I think for tracking purposes it's better to open a new bug instead of changing an existing one (think of coming back to this bug later from the commit's Bugzilla: tag). The fix covers the issues reported originally here, that is the kms_atomic_transition@plane-all-modeset-* on GLK. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3318/shard-glkb4/igt@kms_flip@flip-vs-fences.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3318/shard-glkb4/dmesg-1510055141_Oops_1.log <7>[ 586.496217] [drm:wait_panel_power_cycle [i915]] Wait for panel power cycle <12>[ 586.684918] owatch: TIMEOUT! Should be fixed by the following IGT patch Author: Imre Deak <imre.deak@intel.com> Date: Wed Nov 8 21:33:22 2017 +0200 kms_atomic_transition: Split out modeset tests on internal panels (In reply to Imre Deak from comment #16) > Should be fixed by the following IGT patch > > Author: Imre Deak <imre.deak@intel.com> > Date: Wed Nov 8 21:33:22 2017 +0200 > > kms_atomic_transition: Split out modeset tests on internal panels Thanks Imre, I will start monitoring for this once we get Ci runs with the patch integrated. Imres patch was integrated in: CI_DRM_3326 Now, Imres patch only fixes kms_atomic_transition@plane-all-modeset-[transition|transition-fencing] and I believe that I have filed all GLK Softdogs on this bug. However, since the patch came in there has been no Softdogs on GLK, so I will take a chance and close and archive the whole bug instead of spending time on time consuming bug splitting. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.