Bug 84855 - [ILK regression]igt kms_rotation_crc/sprite-rotatio causes system hang
Summary: [ILK regression]igt kms_rotation_crc/sprite-rotatio causes system hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: All Linux (All)
: low normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-10 03:21 UTC by lu hua
Modified: 2016-08-04 10:12 UTC (History)
4 users (show)

See Also:
i915 platform: ILK
i915 features: display/Other


Attachments
dmesg (20.16 KB, text/plain)
2014-10-10 03:21 UTC, lu hua
no flags Details
dmesg(update) (22.29 KB, text/plain)
2014-11-24 07:23 UTC, lu hua
no flags Details
dmesg(fixes kernel) (9.98 KB, text/plain)
2014-12-02 03:25 UTC, lu hua
no flags Details

Description lu hua 2014-10-10 03:21:52 UTC
Created attachment 107643 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes. 
Good commit on -next-queued: 6e47e3f097cc6c4cb470a805a3fa07a8e8376dab
bad commit: a128efeaa64609a9672b51bb37bb703e1b0f0128

==kernel==
--------------------------
drm-intel-nightly/ea4bec8e96ea8b33b49a7892c1c7f20041a56da6

==Bug detailed description==
It causes system hang on ILK with -nightly and -queued kernel,works well on -fixes kernel.

output:
IGT-Version: 1.8-gc7551bf (i686) (Linux: 3.17.0-rc5_drm-intel-next-queued_a128ef_20141010+ i686)

==Reproduce steps==
---------------------------- 
1. ./kms_rotation_crc --run-subtest sprite-rotation
Comment 1 lu hua 2014-10-22 08:40:10 UTC
I tried to bisect it twice, bisect shows: 8525b5ec90a58b3e56709ffa1667d6593dbe24c3 is the first bad commit. But revert this commit, the hang still exists.
commit 8525b5ec90a58b3e56709ffa1667d6593dbe24c3
Author: YoungJun Cho <yj44.cho@samsung.com>
Date:   Thu Aug 14 11:22:36 2014 +0900

    drm/exynos: dsi: fix exynos_dsi_set_pll() wrong return value

    The type of this function is unsigned long, and it is expected
    to return proper fout value or zero if something is wrong.
    So this patch fixes wrong return value for error cases.

    Signed-off-by: YoungJun Cho <yj44.cho@samsung.com>
    Acked-by: Inki Dae <inki.dae@samsung.com>
    Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
    Acked-by: Andrzej Hajda <a.hajda@samsung.com>
    Signed-off-by: Inki Dae <inki.dae@samsung.com>
Comment 2 Daniel Vetter 2014-11-18 09:25:47 UTC
I think this is fixed in

commit b8bbac1d01397ead65516f11adba1c7baf76a016
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Mon Nov 10 14:47:30 2014 -0200

    drm/i915: use the correct obj when preparing the sprite plane

Please reopen if I guess wrongly.
Comment 3 Guo Jinxian 2014-11-19 06:26:42 UTC
System still hang on latest -nightly(3cb89f9eef2888c848248bf45d6dd0d67c594586), which contains the commit b8bbac1d01397ead65516f11adba1c7baf76a016(In reply to Daniel Vetter from comment #2)
> I think this is fixed in
> 
> commit b8bbac1d01397ead65516f11adba1c7baf76a016
> Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Date:   Mon Nov 10 14:47:30 2014 -0200
> 
>     drm/i915: use the correct obj when preparing the sprite plane
> 
> Please reopen if I guess wrongly.

System still hang on latest -nightly(3cb89f9eef2888c848248bf45d6dd0d67c594586), which contains the commit b8bbac1d01397ead65516f11adba1c7baf76a016
Comment 4 Daniel Vetter 2014-11-19 09:13:43 UTC
(In reply to lu hua from comment #1)
> I tried to bisect it twice, bisect shows:
> 8525b5ec90a58b3e56709ffa1667d6593dbe24c3 is the first bad commit. But revert
> this commit, the hang still exists.
> commit 8525b5ec90a58b3e56709ffa1667d6593dbe24c3
> Author: YoungJun Cho <yj44.cho@samsung.com>
> Date:   Thu Aug 14 11:22:36 2014 +0900
> 
>     drm/exynos: dsi: fix exynos_dsi_set_pll() wrong return value
> 
>     The type of this function is unsigned long, and it is expected
>     to return proper fout value or zero if something is wrong.
>     So this patch fixes wrong return value for error cases.
> 
>     Signed-off-by: YoungJun Cho <yj44.cho@samsung.com>
>     Acked-by: Inki Dae <inki.dae@samsung.com>
>     Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
>     Acked-by: Andrzej Hajda <a.hajda@samsung.com>
>     Signed-off-by: Inki Dae <inki.dae@samsung.com>

Ok, so we need the bisect but this patch certainly isn't it - it's in the exynos driver which isn't even compiled. Can you please try to redo the bisect carefully and please make really sure that you're chasing the same bug and not something else.
Comment 5 lu hua 2014-11-24 07:23:14 UTC
Created attachment 109926 [details]
dmesg(update)

Test on latest igt and kernel, it still exists(dmesg attached). Bisect will be later.

output:
IGT-Version: 1.8-gaa63fc7 (x86_64) (Linux: 3.18.0-rc5_drm-intel-nightly_0f8cb1_20141124+ x86_64)
Comment 6 lu hua 2014-11-26 07:08:01 UTC
I am unable to reproduce the pass, the "good commit" also this issue.
Test on latest drm-intel-nightly/drm-intel-next-queued kernel and latest igt, it still exists. Test on latest -fixes kernel, it works well.
Comment 7 Daniel Vetter 2014-11-26 08:21:35 UTC
If it works on -fixes but fails on -nightly it's a regression. So please try to bisect between fixes and nightly.
Comment 8 lu hua 2014-12-02 03:17:40 UTC
I am unable to reproduce the pass on drm-intel-fixes kernel.
Comment 9 lu hua 2014-12-02 03:25:26 UTC
Created attachment 110338 [details]
dmesg(fixes kernel)

output:
IGT-Version: 1.8-gddf0f09 (x86_64) (Linux: 3.18.0-rc6_drm-intel-fixes_afa4e5_20141201+ x86_64)
Comment 10 Matt Roper 2015-02-20 23:30:30 UTC
Do you still see this on the latest kernel?  Did we ever manage to get an accurate bisect for this bug?
Comment 11 lu hua 2015-03-02 07:28:22 UTC
Test on the latest drm-intel-nightly kernel, it still exists.
Comment 12 Jesse Barnes 2015-03-03 20:12:49 UTC
Still waiting on a bisect.
Comment 13 lu hua 2015-03-09 06:50:57 UTC
(In reply to Jesse Barnes from comment #12)
> Still waiting on a bisect.

Hi Jesse,
As comment 1, I reproduce the pass, then bisect twice, the bisect result is incorrect.
As comment 6 and comment 8, I am unable the pass. Now I am unable to find out a good commit.
Comment 14 Jesse Barnes 2015-03-09 21:54:44 UTC
Well, I guess we need a developer with ILK access to try to reproduce this then and figure out what's going on.  If it's a system hang, I guess it's related to memory somehow.  Maybe we get the display controller to go crazy and prevent the CPU from getting anymore traffic or something.
Comment 15 Daniel Vetter 2015-03-18 11:27:58 UTC
(In reply to lu hua from comment #13)
> (In reply to Jesse Barnes from comment #12)
> > Still waiting on a bisect.
> 
> Hi Jesse,
> As comment 1, I reproduce the pass, then bisect twice, the bisect result is
> incorrect.
> As comment 6 and comment 8, I am unable the pass. Now I am unable to find
> out a good commit.

Well probably -fixes is just broken now too, so you need to dig out old released kernels for the bisect ...
Comment 16 lu hua 2015-03-19 09:01:09 UTC
I bisect it again. I test it on kernel(drm-intel-next-queued 2014_11_10,0b5492d6b5325), it works well. Test it on kernel(drm-intel-next-queued 2014_11_15, cf3d262e39941d8),it fails. So select 0b5492d6b5325 as good commit and cf3d262e39941d8 as bad commit.
bisect log:
git bisect start 'drivers/gpu/drm'
# good: [0b5492d6b53251acab99b1906a328fac56e08be3] drm/i915: Add gen to the gpu hang ecode
git bisect good 0b5492d6b53251acab99b1906a328fac56e08be3
# bad: [cf3d262e39941d8f148148e840c00fcbc35a8e6f] drm/i915: Fix comments about CHV snoop behaviour
git bisect bad cf3d262e39941d8f148148e840c00fcbc35a8e6f
# bad: [b900b949674464d6ede123fb352d3a63690e31ab] drm/i915: move rps irq enable/disable to i915_irq.c
git bisect bad b900b949674464d6ede123fb352d3a63690e31ab
# bad: [1f9e14baa9139fce2265206746fe5491be7726e9] Merge tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel into drm-next
git bisect bad 1f9e14baa9139fce2265206746fe5491be7726e9
# bad: [5a1cbdb0fb6748a52a33f4ccd5d49486d7479fbb] gpu: drm: Fix warning caused by a parameter description in drm_crtc.c
git bisect bad 5a1cbdb0fb6748a52a33f4ccd5d49486d7479fbb
# bad: [32197aab0425dbecc38462a91bc5c8acf70b2036] gpu:drm: Fix typo in Documentation/DocBook/drm.xml
git bisect bad 32197aab0425dbecc38462a91bc5c8acf70b2036
# bad: [a1f1a79c51fd493887bb66d932ee66a23f8b1527] drm: drm_err: Remove unnecessary __func__ argument
git bisect bad a1f1a79c51fd493887bb66d932ee66a23f8b1527
# bad: [bd008e5b2953186fc0c6633a885ade95e7043800] drm: Implement O_NONBLOCK support on /dev/dri/cardN
git bisect bad bd008e5b2953186fc0c6633a885ade95e7043800
# first bad commit: [bd008e5b2953186fc0c6633a885ade95e7043800] drm: Implement O_NONBLOCK support on /dev/dri/cardN

bisect result shows: bd008e5b2953186fc0c6633a885ade95e7043800 is the first bad commit.
commit bd008e5b2953186fc0c6633a885ade95e7043800
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 7 14:13:51 2014 +0100

    drm: Implement O_NONBLOCK support on /dev/dri/cardN

Then I test it on 528a82b41fda78435976c905546c3329c86bb264, the result is skip. Re-build commit 0b5492d6b53251acab99, it fails.

Test on commit 0b5492d6b5325 which built on 2014_11_10, it works well. But I re-build commit 0b5492d6b5325, it fails. I guess this fail doesn't caused by kernel source, maybe kernel config or any other things from 2014_11_10 to 2014_11_15.
Comment 17 lu hua 2015-03-19 09:04:53 UTC
As comment 6 and comment 8, I am unable to reproduce the pass, I also re-build the kernel.
Comment 18 cprigent 2015-11-17 17:18:57 UTC
Bug scrub:
Decrease priority as it is about ILK.
Comment 19 yann 2016-08-04 10:11:50 UTC
old regression issue with no real activity for more than 1 year, so closing. If this is still an issue with current drm-intel-nightly please reopen.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.