Bug 28342 - When cold-booting gfx is messed up with latest drm-radeon-testing kernel
Summary: When cold-booting gfx is messed up with latest drm-radeon-testing kernel
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-01 04:08 UTC by Magnus Jensen
Modified: 2010-11-05 07:03 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg-lockups (3.55 KB, text/plain)
2010-06-02 14:17 UTC, Magnus Jensen
no flags Details
/var/log/dmesg (28.41 KB, text/plain)
2010-06-03 13:43 UTC, Magnus Jensen
no flags Details
output from dmesg (30.20 KB, text/plain)
2010-06-03 13:44 UTC, Magnus Jensen
no flags Details
/var/log//Xorg.0.log (89.45 KB, text/plain)
2010-06-03 13:44 UTC, Magnus Jensen
no flags Details
diable createpixmap2 (566 bytes, patch)
2010-06-03 13:55 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg after patch (30.27 KB, text/plain)
2010-06-03 14:52 UTC, Magnus Jensen
no flags Details
Xorg.0.log after patch (71.65 KB, text/plain)
2010-06-03 14:52 UTC, Magnus Jensen
no flags Details
dmesg before was wrong this is with right kernel (30.29 KB, text/plain)
2010-06-03 14:54 UTC, Magnus Jensen
no flags Details
emit DB_DEPTH_INFO (560 bytes, patch)
2010-06-03 15:34 UTC, Alex Deucher
no flags Details | Splinter Review

Description Magnus Jensen 2010-06-01 04:08:02 UTC
This only occurs when computer has been of for a long time (e.g over night)
Does not work either when booting into gdm, or booting into console and loading radeon module manually.
If i first boot an vanilla kernel and go into X then warm-boot into d-r-t kernel it will not show up anymore until next time i cold-boot.
It seems like the drm is having troubles initializing my gfx output.

My gfx card is an agp hd3650 (rv635)
Comment 1 Marc Dietrich 2010-06-01 08:23:53 UTC
me too! But I wouldn't say "cold boot". Here, just booting the disto kernel once and then booting the testing kernel also works. Chip is rs780.
Comment 2 Alex Deucher 2010-06-01 08:56:15 UTC
Can you bisect what change caused the problem?
Comment 3 Magnus Jensen 2010-06-01 11:32:23 UTC
Marc, do you also have patched mesa and ddx? (Add r6xx/r7xx tiling support to mesa   Alex Deucher ; Add r6xx/r7xx tiling support to the ddx   Alex Deucher)

When replacing the packages individually with unpatched versions i still have error, i have to remove patches from both mesa and ddx, then i don't have this error.
Tried with latest drm-radeon-testing from git.

At least this is the situation for me.
I came to the conclusion by resetting using REISUB when trash output appears then reinstalling first ddx using no patches, then tried with unpatched mesa and patched ddx, then unpatched mesa & ddx and it worked!

So i am not 100% sure i will test some more with this.

(b.t.w i removed the whole patch set since it seemed a bad idea to run a half-patched driver but i can try with individual patches also if u think i should)
Comment 4 Marc Dietrich 2010-06-02 02:26:30 UTC
Magnus: yes, I have the tiling patches applied to userspace and I think this is a tiling related bug, but I will check this later.

Alex: I guess the tiling patches are not bisectable enough to get a usefull result. Any suggestions?
Comment 5 Marc Dietrich 2010-06-02 10:04:19 UTC
I donno how many times I rebooted my machine, but now it definitely will die earlier...

here my findings:
I kept the userspace patches and tried kernel with and without tiling patches - no change, still crashes. Then I installed a newer distro kernel (maverick backport of 2.6.34, running Ubuntu lucid) -> works.

To make the long story short, crash or not to crash depends on whether plymouth is started or not. Uh! I normaly boot my self compiled kernels with "verbose", while the distro kernels boot with "quiet splash".

I guess plymouth initializes something the ddx doesn't.
Comment 6 Alex Deucher 2010-06-02 10:07:43 UTC
(In reply to comment #5)
> 
> I guess plymouth initializes something the ddx doesn't.

With kms, neither one touches the hw.  It all goes through the drm.
Comment 7 Marc Dietrich 2010-06-02 10:16:05 UTC
so it must be something else - I'll take a look at plymouth source.
Comment 8 Magnus Jensen 2010-06-02 10:52:21 UTC
I use gdm, so maybe it's something gnome related? Isn't plymouth some continuation off gdm?
It doesn't help when starting X straight into gnome with startx either,
I think i even tried with twm but i guess the gnome stuff could be started anyway somehow.
I haven't had any time today really to do any more testing, but i hope i can test some things later.
Comment 9 Alex Deucher 2010-06-02 10:55:55 UTC
Neither gdm nor plymouth touches the hw.  Perhaps this is a dupe of bug 28327.  Does the patch I posted there help?
Comment 10 Magnus Jensen 2010-06-02 14:17:52 UTC
Created attachment 36021 [details]
dmesg-lockups

After the patch the card gets inited, at least but i get gpu crashes when running gdm and firefox
here's the dmesg utput when gdm and firefox crashes
Comment 11 Magnus Jensen 2010-06-02 14:19:34 UTC
sorry, to be clearer: the programs doesn't crash the gpu just crashes and recover with trashed image as result
Comment 12 Marc Dietrich 2010-06-02 14:28:00 UTC
here it crashes hard as soon as X starts, no dmesg available. plymouth renders something to the framebuffer during boot process. Not all distros are using this (I know of Ubuntu and Fedora).

Magnus: can you try with tiling patches in kernel + mesa, but without patched ddx?
Comment 13 Magnus Jensen 2010-06-02 15:29:23 UTC
Marc: I did what you suggested and now everything works fine (inited ok, no gpu lockups)
I also updated xorg server to 1.8.1.901 (from 1.8.1)
This is with the patch Alex suggested
Comment 14 Magnus Jensen 2010-06-02 15:46:52 UTC
I decided to do one final test an found it works with just setting "ColorTiling" "off" in xorg.conf and all patches in both userspace and kernelspace intact. 

Much easier than recompiling/switching packages over and over if u just want to workaround the problem for now.

(Haven't tried to do an cold boot to see if init works yet but i'll try it right away.)
Comment 15 Alex Deucher 2010-06-02 16:14:22 UTC
(In reply to comment #14)
> I decided to do one final test an found it works with just setting
> "ColorTiling" "off" in xorg.conf and all patches in both userspace and
> kernelspace intact. 
> 

colortiling is off by default and is automatically disabled if your kernel is not new enough.  Lets try and clarify what the problem is.  Try the following configurations (start each one from with a cold boot):

1. drm-radeon-testing + the patch from bug 28327. No patches to ddx or mesa. No tiling options in your config

2. drm-radeon-testing + the patch from bug 28327. ddx and mesa patch with tiling patches. no tiling options in your config.

And report back what happens.
Comment 16 Marc Dietrich 2010-06-03 02:52:38 UTC
Magnus: disabling tiling is not a real solution. 

Alex:
case 1: stable
case 2: crash

I'm still wondering why running plymouth before seems to cure the problem.
Comment 17 Magnus Jensen 2010-06-03 03:14:59 UTC
Marc: 
Well that's my bad, it's not a solution. But it seems the problem is there even if tiling is turned off.

Alex:

Same results as Marc. stable in case 1, in case 2 not.

When it crashes the output on half the screen looks like the dmesg output just before fbcon starts, the boot messages that's lost because of loading radeon module. Then the screen goes black and there's just the pointer on screen.
Comment 18 Magnus Jensen 2010-06-03 07:22:27 UTC
I tried using built-in solution for kernel, and now it's a bit different in case 2
It just crashes once on starting X then seems to work fine for the rest of the session and looks to be stable after an warm boot (so far, so good).
Comment 19 Alex Deucher 2010-06-03 08:29:38 UTC
In the case 2 crash, does the system hang or do you get a kernel oops?  Can you still access the machine over the network?  Is there any chance you could boot up without loading the radeon kernel module then load it manually after you've booted?  Also, is there any chance you can get the xorg log and dmesg from case 2?
Comment 20 Magnus Jensen 2010-06-03 13:43:17 UTC
(In reply to comment #19)
> In the case 2 crash, does the system hang or do you get a kernel oops?  Can you
> still access the machine over the network?  Is there any chance you could boot
> up without loading the radeon kernel module then load it manually after you've
> booted?  Also, is there any chance you can get the xorg log and dmesg from case
> 2?

OK, i compiled radeon as module (still have the patch from bug #28327) in kernelso i can blacklist it and load it from console before starting.
All patches are in userspace, so this is case 2 test again.
It does not hang it crashes randomly giving gpu lockup messages in dmesg. 
And it really crashes RANDOMLY sometimes when starting X it crashes over and over making X unusuble, sometimes just 1 time at x startup then seems stable after that. very strange. 
I think using built in radeon in kernel seems to give much less crashes.
I attach what u wanted from this test.
Comment 21 Magnus Jensen 2010-06-03 13:43:49 UTC
Created attachment 36037 [details]
/var/log/dmesg
Comment 22 Magnus Jensen 2010-06-03 13:44:16 UTC
Created attachment 36038 [details]
output from dmesg
Comment 23 Magnus Jensen 2010-06-03 13:44:47 UTC
Created attachment 36039 [details]
/var/log//Xorg.0.log
Comment 24 Alex Deucher 2010-06-03 13:55:55 UTC
Created attachment 36040 [details] [review]
diable createpixmap2

Does this ddx patch help?  try case 2 with this patch applied to the ddx on top of the tiling patches.
Comment 25 Magnus Jensen 2010-06-03 14:52:26 UTC
Created attachment 36042 [details]
dmesg after patch

This is after patch, still crashes
Comment 26 Magnus Jensen 2010-06-03 14:52:59 UTC
Created attachment 36043 [details]
Xorg.0.log after patch
Comment 27 Magnus Jensen 2010-06-03 14:54:47 UTC
Created attachment 36044 [details]
dmesg before was wrong this is with right kernel
Comment 28 Alex Deucher 2010-06-03 15:34:00 UTC
Created attachment 36045 [details] [review]
emit DB_DEPTH_INFO

Try this ddx patch in case 2 instead of the last patch I attached.
Comment 29 Magnus Jensen 2010-06-03 23:43:31 UTC
(In reply to comment #28)
> Created an attachment (id=36045) [details]
> emit DB_DEPTH_INFO
> 
> Try this ddx patch in case 2 instead of the last patch I attached.

Yes, that fixes it for me!

Thanks!
Comment 30 Marc Dietrich 2010-06-06 09:00:05 UTC
last patch also fixes X startup here, but now I'm hit by

https://bugs.freedesktop.org/show_bug.cgi?id=28381
Comment 31 Fabio Pedretti 2010-11-05 07:03:37 UTC
The patch was merged to ddx.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.