Bug 81850 - [bisected] regression with kernel 3.15: suspend to ram make system instable
Summary: [bisected] regression with kernel 3.15: suspend to ram make system instable
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86 (IA32) All
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-28 20:25 UTC by Dominik Kopp
Modified: 2015-04-30 08:04 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg with two s2ram_resume cycle (114.31 KB, text/plain)
2014-07-29 06:16 UTC, Dominik Kopp
no flags Details
xorg log (45.51 KB, text/plain)
2014-07-29 06:32 UTC, Dominik Kopp
no flags Details

Description Dominik Kopp 2014-07-28 20:25:33 UTC
I'm using radeon driver for my Radeon HD 4200 (integrated chipset/laptop). 
=> RS880

Suspend to RAM (s2ram) was working well with kernel 3.14.x
Problem starts with 3.15.
After resuming I have an unstable system. 

symptoms:
- Last 6 pixel lines (botton) on screen are black (even the mouse cursor isn't
drawn)
- playing fullscreen video (vlc) for a very few seconds: black screen but with
cursor, no keyboard actions possible. Freeze.
- switching to console CTRL-ALT-F1: sometimes screen garbage, keyboard is
working. (blind typing)

workaround:
- restart X (init 3 or logout from KDE)
- change screen resolution (with e.g. kscreen). This makes the system stable until next reboot. Even multiple s2ram/resume cycles won't hurt now.
Comment 1 Alex Deucher 2014-07-28 20:55:45 UTC
Can you bisect?  Please attach your xorg log and dmesg output.
Comment 2 Dominik Kopp 2014-07-29 06:16:04 UTC
Created attachment 103624 [details]
dmesg with two s2ram_resume cycle
Comment 3 Dominik Kopp 2014-07-29 06:32:28 UTC
Created attachment 103625 [details]
xorg log

here are the logs.
What I did was:
1. boot
2. s2ram/resume
3. change screen resolution with kscreen
4. s2ram/resume

bisect is difficult for me because my last hand made kernel built was kernel 3-4 years ago. I would be happy if you can point to a good How-To web site. There are a lot of web sites for very old kernels/boot loaders out there. :-(
I'm using openSUSE with grub2.
Comment 4 Tom Li 2014-07-29 07:35:57 UTC
You can copy the existing kernel config from /boot or /proc/config.gz instead of start from scratch.
Comment 5 Dominik Kopp 2014-07-29 10:07:25 UTC
I started with bisect and used the following commands.
if there is something wrong, let me please know! It's my first kernel bisect.

cd /usr/src
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
cd linux-2.6
git bisect start
git bisect good v3.14
git bisect bad v3.15
cp /boot/config-3.15.6-41.gede5ddf-desktop .config
make localmodconfig
make -j 2 bzImage modules
make modules_install install

reboot
do the tests

cd /usr/src/linux-2.6
git bisect <good/bad>
make localmodconfig
make -j 2 bzImage modules
make modules_install install

again reboot and so on...
Comment 6 Dominik Kopp 2014-07-29 17:38:10 UTC
bisect says:
8d51a977a4961d3ed6df699aea50bc2dd6bbc5cc is the first bad commit

----------------

commit 8d51a977a4961d3ed6df699aea50bc2dd6bbc5cc
Merge: aa17edf c752308
Author: Dave Airlie <airlied@redhat.com>
Date:   Sat Apr 5 16:07:39 2014 +1000

    Merge tag 'ttm-next-2014-04-04' of git://people.freedesktop.org/~thomash/linux into drm-next
    
    Pull request of 2014-04-04
    
    Currently only a single patch fixing up mixed use of the ttm_bo_reserve and
    ww_mutex APIs
    
    * tag 'ttm-next-2014-04-04' of git://people.freedesktop.org/~thomash/linux:
      drm/ttm: Hide the implementation details of reservation
Comment 7 Dominik Kopp 2014-07-31 11:47:49 UTC
solved. :-)

In comment#6 I could only bisect until to a faulty *merge* commit.
But I looked deeper in the commits and I can say that this is the only commit which causes my problem:
deadcb36f49bee9b3010382ffe4fe4f5c439f1c5 drm/radeon: Use two-ended allocation by size, v2

I've checked it by doing
- git checkout v3.16-rc7   (or v3.15 it doesn't matter...)
- git revert deadcb36f49bee9b3010382ffe4fe4f5c439f1c5
and it works for me.
Comment 8 Dominik Kopp 2014-09-30 10:29:52 UTC
Any news?
Solution is provided with comment#7 and it's a simple
git revert deadcb36f49bee9b3010382ffe4fe4f5c439f1c5

My laptop is running fine with this solution, however, I would be glad to see a fix in the normal (upstream) kernel.
Comment 9 Alex Deucher 2014-09-30 13:07:17 UTC
Christian, any ideas?
Comment 10 Christian König 2014-09-30 13:22:13 UTC
(In reply to comment #9)
> Christian, any ideas?

Not at all, the patch should just change order in which memory is allocated (from the top instead of the bottom).

The only possible reason that sounds logical is faulty memory, but then it would be rather unlikely to only happen on suspend/resume.

Another possibility is that some engine can't access certain addresses, but the dmesg clearly shows that this system has only 256MB of VRAM and that's rather unlikely under those conditions.

Just speculating, but maybe eviction on suspend doesn't work correctly?

I'm using an RS880 based laptop for typing this message and this box is working perfectly fine with 3.15.

Have you tried newer kernels as well? E.g. 3.16 or 3.17-rc?
Comment 11 Alex Deucher 2014-09-30 13:30:36 UTC
maybe flags is never initialized and has inconsistent values?  Does your kernel have this patch?
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e3f202798aaa808e7a38faa8c3a9f0aa93b85cc0
If not, does it help?
Comment 12 Dominik Kopp 2014-09-30 22:28:27 UTC
I'm running now vanilla 3.17-rc7 and the issue still exists. (also 3.16.x was affected)
I assume the commit e3f202798aaa808e7a38faa8c3a9f0aa93b85cc0 is also included. (git show e3f202798 works.)

I have run memtest, but it doesn't show any error.
I have swapped both RAM modules but it doesn't change anything as well.
Comment 13 Dominik Kopp 2014-12-07 09:05:24 UTC
(In reply to Christian König from comment #10)

> I'm using an RS880 based laptop for typing this message and this box is
> working perfectly fine with 3.15.

Hello Christian,

interesting news!

I've made some tests with openSUSE 13.2 KDE live by using it as live medium (boot from SD-card) as well as installed it on a free partition.

Here are the results:
- 64-bit version is not affected. *Only* for 32-bit this issue is valid.
- When desktop effects are enabled - no issue.
- When desktop effects are disabled - suspend/resume triggers my issue.

Easiest way to reproduce e.g. on your RS880 machine:
1. Grab a copy of opensuse 13.2 kde live 32-bit e.g. from
http://ftp.uni-erlangen.de/pub/mirrors/opensuse/distribution/13.2/iso/openSUSE-13.2-KDE-Live-i686.iso

2. Burn it on DVD or copy to SD or USB stick (I'm using "SUSE Studio Imagewriter" for that. "dd" may also work)

3. After booting, press shift-alt-F12 to disable desktop effects.
4. suspend to RAM and resume.
=> last line should be black on screen
(5. pressing shift-alt-F12 again and screen will freeze except for the mouse)
(6. suspend to RAM and resume and everything will be OK again.)
Comment 14 Dominik Kopp 2014-12-07 09:16:38 UTC
PS: used kernel of the live medium (comment above) is 3.16.2
Comment 15 Dominik Kopp 2015-04-30 08:04:39 UTC
the commit, which introduced my problem has been reverted with commit: 
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=0cd0c3867310fe333cd1d035693c273983cbe4ed

I've made a test with kernel 3.19.4 and it's working fine.
I don't need my workarounds anymore.
(One workaround was changing the screen resolution after resume. The other workaround was enabling Glamor together with OpenGL 3.1 as compositor type)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.