Bug 93121

Summary: [BYT BAT IGT] GT register access while GT waking disabled
Product: DRI Reporter: Daniel Vetter <daniel>
Component: DRM/IntelAssignee: Kimmo Nikkanen <knikkane>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: intel-gfx-bugs, knikkane, tomi.p.sarvela
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BYT i915 features: power/runtime PM, power/suspend-resume
Attachments:
Description Flags
Trace for rogue register access from hangcheck
none
4.4.0_pm-rpm_byt-m_kern.log none

Description Daniel Vetter 2015-11-26 13:31:20 UTC
[  244.059028] [drm:vlv_check_no_gt_access [i915]] *ERROR* GT register access while GT waking disabled

This is a fairly random failure that mostly just affects pm_rpm testcases on byt. But it's unstable, so causes tons of noise from CI. Hence blocker.
Comment 1 Daniel Vetter 2015-11-26 13:32:40 UTC
Interim solution would be to tune down the error message, but we really should fix the underlying bug. If the bugfix doesn't materialize within 1 week, we need to move ahead with the interim patch though, just to unblock CI infrastructure progress.
Comment 2 Daniel Vetter 2015-12-07 10:32:30 UTC
Anyone working on this? Kimmo?
Comment 3 Joonas Lahtinen 2015-12-07 14:49:26 UTC
I am able to reproduce this (although it only seems to happen once after boot for me). Will inspect it further.
Comment 4 Joonas Lahtinen 2015-12-08 18:56:52 UTC
I added tracking for the ALLOWWAKE bit for VLV and WARN_ON's to vlv_write/write (vlv_write which I forked from gen6_write) if the bit is disabled and register is in RENDER or MEDIA register ranges. This revealed that at least the bug I am able to reproduce, is caused by hangcheck accessing the render ring when waking from runtime suspend. Attached log.
Comment 5 Joonas Lahtinen 2015-12-08 18:57:24 UTC
Created attachment 120415 [details]
Trace for rogue register access from hangcheck
Comment 6 Joonas Lahtinen 2015-12-10 08:22:53 UTC
Fix written and posted to mailing list: http://patchwork.freedesktop.org/patch/67517/
Comment 7 Daniel Vetter 2015-12-15 08:23:42 UTC
Since CI build 898 these are back again, reopening :( Same set of testcases seem affected, again on the same byt machine.
Comment 8 Daniel Vetter 2015-12-15 08:36:27 UTC
Tomi, can you please generate the long-term overview for this machine/test so that we can figure out when exactly this started happening again?
Comment 9 Tomi Sarvela 2015-12-15 09:01:58 UTC
Long-term results (last 100 builds) are now created automatically and
reachable with hostname.html . Links to the main page will be added.
Comment 10 Joonas Lahtinen 2015-12-16 12:34:23 UTC
Can not reproduce just by running the single test itself (thousands of iterations on two different machines). Attempting to run the whole BAT set in order to bring up the bug.
Comment 11 Joonas Lahtinen 2016-01-07 14:47:52 UTC
Still unable to reproduce. Moving to QA.
Comment 12 Joonas Lahtinen 2016-01-07 14:50:44 UTC
Did multiple test suite runs in succession and power cycled the system occasionaly in between suite runs. All this was on the exact same machine the CI runs the tests, Tomi can provide access information. But still unable to reproduce.
Comment 13 Kimmo Nikkanen 2016-01-27 09:08:16 UTC
Yann, 
is this still a valid bug? Has QA been able to reproduce this?
Comment 14 yann 2016-01-30 07:48:56 UTC
Christophe, please do re-testing of this issue
Comment 15 cprigent 2016-02-01 17:07:36 UTC
Created attachment 121442 [details]
4.4.0_pm-rpm_byt-m_kern.log

I reproduce it by executing pm_rpm

Tested with:
Hardware 
Platform : Bay Trail M
CPU : Intel(R) Celeron(R) CPU  N2930  @ 1.83GHz (family: 6, model: 55 stepping: 8)
SoC : VLV C0
CRB : Bayley Bay Fab3 Rev 03
Software
Linux distribution: Ubuntu 15.10 64 bits
Kernel: drm-intel-nightly 4.4.0 8114b00 from http://cgit.freedesktop.org/drm-intel/  
BIOS : BBAYCRB1.X64.0100.R21.1406301530
drm: tag libdrm-2.4.66 e342c0f from http://cgit.freedesktop.org/mesa/drm/
mesa: tag mesa-11.0.8 261daab from http://cgit.freedesktop.org/mesa/mesa/
cairo: tag 1.15.2 db8a7f1 from http://cgit.freedesktop.org/cairo
waffle: master bb29b2a from https://github.com/waffle-gl/waffle
xorg-server-macros: master d7acec2 from git://git.freedesktop.org/git/xorg/util/macros
libva: tag libva-1.6.1 cb418f6 from http://cgit.freedesktop.org/libva/
vaapi-intel-driver: tag 1.6.1 2110b3a from http://cgit.freedesktop.org/vaapi/intel-driver
intel-gpu-tool: tag intel-gpu-tools-1.13 51e965f from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
Comment 16 Joonas Lahtinen 2016-02-10 09:45:36 UTC
Does it happen every time when running the test? If so, then I'll try to get my hands on BYT-M.
Comment 17 cprigent 2016-03-09 17:30:52 UTC
Tested 5 times with last setup. I can't reproduce it.

Hardware 
Platform : Bay Trail M
CPU : Intel(R) Celeron(R) CPU  N2930  @ 1.83GHz (family: 6, model: 55 stepping: 8)
SoC : VLV C0
CRB : Bayley Bay Fab3 Rev 03
Bios: 100.21
Software
Linux distribution: Ubuntu 15.10 64 bits
Kernel drm-intel-nightly 4.5.0-rc6_59c2aa9 from http://cgit.freedesktop.org/drm-intel/
  commit 59c2aa9790ada24e1c13cd582a91fea33dc75b00
  Author: Jani Nikula <jani.nikula@intel.com>
  Date:   Mon Feb 29 11:15:04 2016 +0200
  drm-intel-nightly: 2016y-02m-29d-09h-14m-18s UTC integration manifest
drm: tag 2.4.67-5 ea07de9 from http://cgit.freedesktop.org/mesa/drm/
mesa: tag 11.1.2 7bcd827 from http://cgit.freedesktop.org/mesa/mesa/
cairo: tag 1.15.2 db8a7f1 from http://cgit.freedesktop.org/cairo
IGT: 1.14 174a06 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git
Comment 18 cprigent 2016-03-09 17:31:05 UTC
So closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.