Bug 64270 - [SNB/IVB/HSW]I-G-T gem_wait_render_timeout Aborted
Summary: [SNB/IVB/HSW]I-G-T gem_wait_render_timeout Aborted
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-06 09:04 UTC by lu hua
Modified: 2017-09-04 10:13 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
fix gem_wait_ioctl remaining time when timeout (902 bytes, patch)
2013-05-06 14:21 UTC, Imre Deak
no flags Details | Splinter Review
hang picture (1.45 MB, image/jpeg)
2013-05-07 09:13 UTC, lu hua
no flags Details
console log (60.16 KB, text/plain)
2013-05-08 09:13 UTC, lu hua
no flags Details
fix for dpaux false timeouts (2.10 KB, text/plain)
2013-05-08 10:38 UTC, Imre Deak
no flags Details

Description lu hua 2013-05-06 09:04:42 UTC
System Environment:
--------------------------
Arch:             x86_64
Platform:         Ivybridge/Haswell
Kernel:        drm-intel-fixes 3ab9c63705cb7b1b9f83ddce725d8bd9ef7c66a9

Bug detailed description:
-------------------------
It happens on ivybridge and haswell with drm-intel-fixes kernel. It works well on -queued kernel.
Bisect shows:3ab9c63705cb7b1b9f83ddce725d8bd9ef7c66a9 is the first bad commit.
commit 3ab9c63705cb7b1b9f83ddce725d8bd9ef7c66a9
Author:     Imre Deak <imre.deak@intel.com>
AuthorDate: Fri May 3 12:57:41 2013 +0300
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Sat May 4 10:24:56 2013 +0200

    drm/i915: hsw: fix link training for eDP on port-A

    According to BSpec the link training sequence for eDP on HSW port-A
    should be as follows:

    1. link training: clock recovery
    2. link training: equalization
    3. link training: set idle transmission mode
    4. display pipe enable
    5. link training: disable (set normal mode)

    Contrary to this at the moment we don't do step 3. and we do step 5.
    before step 4. Fix this by setting idle transmission mode for eDP at
    the end of intel_dp_complete_link_train and adding a new
    intel_dp_stop_link_training function to disable link training. With
    these changes we'll end up with the following functions corresponding
    to the above steps:

    intel_dp_start_link_train    -> step 1.
    intel_dp_complete_link_train -> step 2., step 3.
    intel_dp_stop_link_train     -> step 5.

    For port-A we'll call intel_dp_stop_link_train only after enabling the
    pipe, for everything else we'll call it right after
    intel_dp_complete_link_train to preserve the current behavior.

    Tested on HSW/HSW-ULT.

    In v2:
    - Due to a HW issue we must set idle transmission mode for port-A too
      before enabling the pipe. Thanks for Arthur Runyan for explaining
      this.
    - Update the patch subject to make it clear that it's an eDP fix, DP is
      not affected.

    v3:
    - rename intel_dp_link_train() to intel_dp_set_link_train(), use 'val'
      instead 'l' as var name. (Paulo)

    Signed-off-by: Imre Deak <imre.deak@intel.com>
    Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


output:
32768 iters is enough work
Finished with 577060485 time remaining
gem_wait_render_timeout: gem_wait_render_timeout.c:211: main: Assertion `timeout == 0' failed.
Aborted (core dumped)

dmesg:
[  371.560608] [drm:i915_driver_open],
[  371.560632] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  371.560637] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]
[  371.560640] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  371.560642] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]
[  371.560644] [drm:intel_crtc_set_config], [CRTC:7] [NOFB]
[  371.560646] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]
[  371.560656] [drm:i915_driver_open],
[  371.560688] [drm:i915_getparam], Unknown parameter 22
[  384.133534] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  384.133541] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]
[  384.133544] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  384.133546] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]
[  384.133547] [drm:intel_crtc_set_config], [CRTC:7] [NOFB]
[  384.133549] [drm:intel_modeset_stage_output_state], [CONNECTOR:9:VGA-1] to [CRTC:3]


Reproduce steps:
----------------
1. ./gem_wait_render_timeout
Comment 1 Imre Deak 2013-05-06 14:21:20 UTC
Created attachment 78931 [details] [review]
fix gem_wait_ioctl remaining time when timeout

I'm guessing the bug is a bit random, I can't see how it can relate to the bisected commit. I noticed that for timeouts from gem_wait_ioctl we don't always return 0, so that can trigger the assert. Attached is a fix that should make sure we always return 0 remaining time for timeouts.
Comment 2 lu hua 2013-05-07 07:11:30 UTC
(In reply to comment #1)
> Created attachment 78931 [details] [review] [review]
> fix gem_wait_ioctl remaining time when timeout
> 
> I'm guessing the bug is a bit random, I can't see how it can relate to the
> bisected commit. I noticed that for timeouts from gem_wait_ioctl we don't
> always return 0, so that can trigger the assert. Attached is a fix that
> should make sure we always return 0 remaining time for timeouts.

Test with this patch, System boot fail.
Comment 3 Imre Deak 2013-05-07 07:48:22 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Created attachment 78931 [details] [review] [review] [review]
> > fix gem_wait_ioctl remaining time when timeout
> > 
> > I'm guessing the bug is a bit random, I can't see how it can relate to the
> > bisected commit. I noticed that for timeouts from gem_wait_ioctl we don't
> > always return 0, so that can trigger the assert. Attached is a fix that
> > should make sure we always return 0 remaining time for timeouts.
> 
> Test with this patch, System boot fail.

On what platform? Is it 100% reproducible? Could you attach logs about the boot failure?

On IVB/drm-intel-nightly I can easily reproduce the original problem and with the fix applied I can boot okay and can't reproduce the problem. So based on this I think the boot failure is an independent issue, or something revealed by the fix.
Comment 4 lu hua 2013-05-07 09:13:55 UTC
Created attachment 78979 [details]
hang picture

Fixed on Haswell by this patch.
This patch caused by system boot fail on ivybridge and sandybridge.
Comment 5 Imre Deak 2013-05-07 10:42:51 UTC
(In reply to comment #4)
> Created attachment 78979 [details]
> hang picture
> 
> Fixed on Haswell by this patch.
> This patch caused by system boot fail on ivybridge and sandybridge.

Hm. From the only error message on the screenshot I assume there was an error while loading the kernel modules. Are you sure you installed the proper ones? 

If you made sure you have the proper modules in place: you have ssh server running, so could you ssh-in and get the dmesg? If not we'd need a netconsole log.
Comment 6 lu hua 2013-05-08 09:13:00 UTC
Created attachment 79019 [details]
console log
Comment 7 Imre Deak 2013-05-08 10:38:23 UTC
Created attachment 79021 [details]
fix for dpaux false timeouts

(In reply to comment #6)
> Created attachment 79019 [details]
> console log

Thanks. This looks like a separate issue from the originally reported bug. Is it reproducible also if you revert 'drm/i915: hsw: fix link training for eDP on port-A' ? Could you give a try to the this second attached patch? .. and then with the 'fix gem_wait_ioctl remaining time when timeout' fix try to trigger he original issue?
Comment 8 lu hua 2013-05-09 03:10:35 UTC
Sorry, the 1st patch doesn't cases system boot fail and the bisect result is invalid.
System boot fail because of commit 14134f6584212d585b310ce95428014b653dfaf6, revert this commit, this issue goes away(Bug 63628). 

As you said this bug is a bit random, run ./gem_wait_render_timeout 10 times, It fails 6 times. I test the 2nd patch, the result is also unstable.
Comment 9 lu hua 2013-05-09 07:28:27 UTC
> this issue goes away(Bug 63628). 
> 

It means system boot fail goes away.
Comment 10 Imre Deak 2013-05-09 09:33:37 UTC
(In reply to comment #8)
> Sorry, the 1st patch doesn't cases system boot fail and the bisect result is
> invalid.
> System boot fail because of commit 14134f6584212d585b310ce95428014b653dfaf6,
> revert this commit, this issue goes away(Bug 63628). 
> 
> As you said this bug is a bit random, run ./gem_wait_render_timeout 10
> times, It fails 6 times. I test the 2nd patch, the result is also unstable.

Ok. The original

'gem_wait_render_timeout: gem_wait_render_timeout.c:211: main: Assertion `timeout == 0' failed.'

assertion should be solved by the first patch

'fix gem_wait_ioctl remaining time when timeout'.

There is another gem_wait_render_timeout assertion, that is also random, but unrelated:

'1 iters is enough work
gem_wait_render_timeout: gem_wait_render_timeout.c:181: main: Assertion 'gem_bo_busy(fd, dst2->handle) == 1' failed.'

Where the calibration for the number of iterations was obviously wrong.

Could you verify if you have the same results?
Comment 11 lu hua 2013-05-10 05:23:47 UTC
This issue fixed by the 1st patch, this case has another issue as below:
It fails 1 in 5 runs.
output:
1 iters is enough work
gem_wait_render_timeout: gem_wait_render_timeout.c:181: main: Assertion `gem_bo_busy(fd, dst2->handle) == 1' failed.
Aborted (core dumped)
dmesg:
[  134.543296] [drm:i915_driver_open],
[  134.543312] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  134.543317] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  134.543320] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  134.543322] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  134.543329] [drm:i915_driver_open],
[  134.543351] [drm:i915_getparam], Unknown parameter 22
[  147.154353] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  147.154360] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  147.154363] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  147.154365] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  151.790417] [drm:i915_driver_open],
[  151.790431] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  151.790436] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  151.790438] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  151.790440] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  151.790446] [drm:i915_driver_open],
[  151.790473] [drm:i915_getparam], Unknown parameter 22
[  166.676302] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  166.676309] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  166.676312] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  166.676314] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  168.521429] [drm:i915_driver_open],
[  168.521443] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  168.521449] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  168.521451] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  168.521453] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  168.521459] [drm:i915_driver_open],
[  168.521496] [drm:i915_getparam], Unknown parameter 22
[  173.674679] [drm:intel_crtc_set_config], [CRTC:3] [FB:30] #connectors=1 (x y) (0 0)
[  173.674686] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
[  173.674689] [drm:intel_crtc_set_config], [CRTC:5] [NOFB]
[  173.674691] [drm:intel_modeset_stage_output_state], [CONNECTOR:18:HDMI-A-3] to [CRTC:3]
Comment 12 Imre Deak 2013-05-28 18:59:28 UTC
With a recent igt fix both issues should be fixed now. Could you please check it with updated igt and -nightly?
Comment 13 lu hua 2013-05-29 01:59:53 UTC
(In reply to comment #12)
> With a recent igt fix both issues should be fixed now. Could you please
> check it with updated igt and -nightly?

Fixed.
Comment 14 lu hua 2013-05-29 02:00:35 UTC
Verified.Fixed.
Comment 15 Jari Tahvanainen 2017-09-04 10:13:04 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.