Bug 55692 - [KMS][Cayman] Garbled screen and oops with 6950 with linus git from 20121006 (3.7-rc0)
Summary: [KMS][Cayman] Garbled screen and oops with 6950 with linus git from 20121006 ...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-06 11:48 UTC by Serkan Hosca
Modified: 2013-01-31 13:37 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg.3.7.0-rc0 (77.64 KB, text/plain)
2012-10-06 11:48 UTC, Serkan Hosca
no flags Details
dmesg.3.7.0-rc0 with irqpoll (76.56 KB, text/plain)
2012-10-06 11:49 UTC, Serkan Hosca
no flags Details
oops pic (1.52 MB, image/jpeg)
2012-10-06 11:49 UTC, Serkan Hosca
no flags Details
Xorg.0.log with 3.7-rc0 (58.39 KB, text/plain)
2012-10-06 11:53 UTC, Serkan Hosca
no flags Details
Xorg.0.log with 3.6 (69.64 KB, text/plain)
2012-10-06 11:55 UTC, Serkan Hosca
no flags Details
dmesg with 3.6 (70.30 KB, text/plain)
2012-10-06 12:07 UTC, Serkan Hosca
no flags Details
Possible fix. (2.23 KB, text/plain)
2012-10-13 10:31 UTC, Christian König
no flags Details
Possible fix rebased on correct branch. (2.23 KB, text/plain)
2012-10-13 10:37 UTC, Christian König
no flags Details
dmesg linus git with patch (76.96 KB, text/plain)
2012-10-13 12:54 UTC, Serkan Hosca
no flags Details
dmesg with alex's drm-next-3.7 branch with patch (75.76 KB, text/plain)
2012-10-13 19:33 UTC, Serkan Hosca
no flags Details
Test patch. (911 bytes, patch)
2012-10-16 15:09 UTC, Christian König
no flags Details | Splinter Review
dmesg.3.7-rc1 with test patch (114.78 KB, text/plain)
2012-10-16 23:38 UTC, Serkan Hosca
no flags Details
dmesg.3.7-rc1 with testpatch with mesa-git (73.22 KB, text/plain)
2012-10-17 22:11 UTC, Serkan Hosca
no flags Details
Possible fix. (2.14 KB, patch)
2012-10-22 09:00 UTC, Christian König
no flags Details | Splinter Review
dmesg-3.6+drm-next-3.7 (71.58 KB, text/plain)
2012-10-22 22:39 UTC, Serkan Hosca
no flags Details
dmesg-3.6+drm-next-3.7+patch (68.64 KB, text/plain)
2012-10-22 22:40 UTC, Serkan Hosca
no flags Details

Description Serkan Hosca 2012-10-06 11:48:16 UTC
I boot up freshly compiled linus git from 20121006, gdm starts but its all black screen after a couple of seconds its all garbage.

I vt switch to 1 and try restarting gdm and i get the oops.

xf86-video-ati git from 20121004
mesa git from 20121004

Using arch with 3.6 works fine
Comment 1 Serkan Hosca 2012-10-06 11:48:45 UTC
Created attachment 68153 [details]
dmesg.3.7.0-rc0
Comment 2 Serkan Hosca 2012-10-06 11:49:17 UTC
Created attachment 68154 [details]
dmesg.3.7.0-rc0 with irqpoll
Comment 3 Serkan Hosca 2012-10-06 11:49:47 UTC
Created attachment 68155 [details]
oops pic
Comment 4 Serkan Hosca 2012-10-06 11:53:30 UTC
Created attachment 68156 [details]
Xorg.0.log with 3.7-rc0
Comment 5 Serkan Hosca 2012-10-06 11:55:36 UTC
Created attachment 68157 [details]
Xorg.0.log with 3.6
Comment 6 Serkan Hosca 2012-10-06 12:07:02 UTC
Created attachment 68159 [details]
dmesg with 3.6
Comment 7 Alex Deucher 2012-10-06 12:51:20 UTC
Can you bisect to locate the problematic commit?
Comment 8 Serkan Hosca 2012-10-06 13:40:34 UTC
Here it is:
2a6f1abbb48f1d90f20b8198c4894c0469468405 is the first bad commit
commit 2a6f1abbb48f1d90f20b8198c4894c0469468405
Author: Christian König <deathsimple@vodafone.de>
Date:   Sat Aug 11 15:00:30 2012 +0200

    drm/radeon: make page table updates async v2
    
    Currently doing the update with the CP.
    
    v2: Rebased on Jeromes bugfix. Make validity comparison
        more human readable.
    
    Signed-off-by: Christian König <deathsimple@vodafone.de>

:040000 040000 3ed3f64bd42f5f1000ab9e957df08f53e81e09d9 c5143cbc30add8e3472366fbdb84756d9cdcd035 M	drivers
Comment 9 Christian König 2012-10-07 13:23:19 UTC
Mhm, interesting. You get a GPU lockup, but not a pagefault.

Need to look deeper into it, but this looks rather strange to me.
Comment 10 Christian König 2012-10-13 10:31:32 UTC
Created attachment 68515 [details]
Possible fix.

Could you try the attached patch ontop of Alex latest drm-nex-3.7 branch (git://people.freedesktop.org/~agd5f/linux) ?

I'm not 100% sure that it's this problem, but it might be it.

Thanks,
Christian.
Comment 11 Christian König 2012-10-13 10:37:44 UTC
Created attachment 68516 [details]
Possible fix rebased on correct branch.
Comment 12 Serkan Hosca 2012-10-13 12:18:36 UTC
Yes the patch works.
Comment 13 Serkan Hosca 2012-10-13 12:52:50 UTC
(In reply to comment #12)
> Yes the patch works.

I'm sorry o spoke to soon, same problem
Comment 14 Serkan Hosca 2012-10-13 12:54:43 UTC
Created attachment 68519 [details]
dmesg linus git with patch
Comment 15 Serkan Hosca 2012-10-13 19:31:08 UTC
I've tried the patch on git://people.freedesktop.org/~agd5f/linux drm-nex-3.7 branch and it doesn't work. The gdm sets the blue background image and freezes, no top bar or login dialog. I ssh from another computer and dmesg is clean at this point. I try to stop gdm and it displays some garbage, mostly black screen with some vertical purple bars about 4 cm thick and about 2 cm from the top of the screen, then it displays the gpu crash messages on log and then the console comes back.
Comment 16 Serkan Hosca 2012-10-13 19:33:46 UTC
Created attachment 68531 [details]
dmesg with alex's drm-next-3.7 branch with patch
Comment 17 Serkan Hosca 2012-10-14 04:30:59 UTC
It works with linus git without the patch with arch packages for mesa 9.0-1 and -ati 6.14.6-2.

I tried with -ati git and mesa 9 and it worked. Then i tried with mesa git and it failed. I started to bisect mesa but i got the following:
$ git bisect bad
Bisecting: a merge base must be tested
[2d2f1fd164218eacf2b142bc808be1f25f66e72c] docs: Add some missing features to 9.0 release notes and GL3.txt

$ git bisect bad
The merge base 2d2f1fd164218eacf2b142bc808be1f25f66e72c is bad.
This means the bug has been fixed between 2d2f1fd164218eacf2b142bc808be1f25f66e72c and [e5fdeef1e08b55acd48dc68f0cc8fe213f2820b8].

So i did a git log --graph --oneline --all and started to git checkout between those two commits, starting from 2d2f1fd to de92b7a are bad and with commit "ef557ea winsys/radeon: disable virtual memory on Cayman" it started working.
Comment 18 Alexandre Demers 2012-10-15 14:27:57 UTC
Is VM enabled or disabled on your system? I'm experiencing a similar bug with kernel 3.7-rc1, but it is working fine with 3.6. VM is enabled on my system, I'll try to disable it when I'll get home to see if that helps and I'll also try to bisect the kernel commit that screwed things for me.
Comment 19 Serkan Hosca 2012-10-15 14:31:06 UTC
(In reply to comment #18)
> Is VM enabled or disabled on your system? I'm experiencing a similar bug
> with kernel 3.7-rc1, but it is working fine with 3.6. VM is enabled on my
> system, I'll try to disable it when I'll get home to see if that helps and
> I'll also try to bisect the kernel commit that screwed things for me.

I don't know, how can i check?
Comment 20 Serkan Hosca 2012-10-15 14:49:08 UTC
mesa-git is working fine on linux 3.6 and mesa-git dont have the "ef557ea winsys/radeon: disable virtual memory on Cayman" commit
Comment 21 Alexandre Demers 2012-10-15 15:22:43 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > Is VM enabled or disabled on your system? I'm experiencing a similar bug
> > with kernel 3.7-rc1, but it is working fine with 3.6. VM is enabled on my
> > system, I'll try to disable it when I'll get home to see if that helps and
> > I'll also try to bisect the kernel commit that screwed things for me.
> 
> I don't know, how can i check?

Use "setenv" in a terminal and look for "RADEON_VA".
Comment 22 Serkan Hosca 2012-10-15 15:27:26 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > (In reply to comment #18)
> > > Is VM enabled or disabled on your system? I'm experiencing a similar bug
> > > with kernel 3.7-rc1, but it is working fine with 3.6. VM is enabled on my
> > > system, I'll try to disable it when I'll get home to see if that helps and
> > > I'll also try to bisect the kernel commit that screwed things for me.
> > 
> > I don't know, how can i check?
> 
> Use "setenv" in a terminal and look for "RADEON_VA".

Oh, i have nothing like that in env
Comment 23 Christian König 2012-10-16 15:09:39 UTC
Created attachment 68623 [details] [review]
Test patch.

VM is definitely enabled, otherwise you won't got that error in the first place.

Ok let's try to narrow down that bug a bit more, please apply the attached test patch and see what happens.

If the GPU hang vanished we indeed have a syncing issue, but not the PFP sync.
Comment 24 Serkan Hosca 2012-10-16 23:37:55 UTC
(In reply to comment #23)
> Created attachment 68623 [details] [review] [review]
> Test patch.
> 
> VM is definitely enabled, otherwise you won't got that error in the first
> place.
> 
> Ok let's try to narrow down that bug a bit more, please apply the attached
> test patch and see what happens.
> 
> If the GPU hang vanished we indeed have a syncing issue, but not the PFP
> sync.

The patch resets the gpu constantly, even without X, with both linus git and agd5f drm-next-3.7 branch with mesa git.
Comment 25 Serkan Hosca 2012-10-16 23:38:31 UTC
Created attachment 68655 [details]
dmesg.3.7-rc1 with test patch
Comment 26 Alexandre Demers 2012-10-17 03:44:21 UTC
(In reply to comment #23)
> Created attachment 68623 [details] [review] [review]
> Test patch.
> 
> VM is definitely enabled, otherwise you won't got that error in the first
> place.
> 
> Ok let's try to narrow down that bug a bit more, please apply the attached
> test patch and see what happens.
> 
> If the GPU hang vanished we indeed have a syncing issue, but not the PFP
> sync.

It is and it is not. What I mean is concerning comment 17 "So i did a git log --graph --oneline --all and started to git checkout between those two commits, starting from 2d2f1fd to de92b7a are bad and with commit "ef557ea winsys/radeon: disable virtual memory on Cayman" it started working."

If the variable "RADEON_VA" is not set or doesn't exist, from the point commit "ef557ea" kicks in, VM gets disabled. Before that commit, VM is always enabled; from that point, we must be careful. If we want to test after commit "ef557ea" with VM enabled, "RADEON_VA" MUST be set, otherwise it will be disable and will hide the bug.
Comment 27 Christian König 2012-10-17 14:10:09 UTC
Well that's interesting, according to the logs you are running out of GART memory (which is 512MB in size) just 7 seconds after boot, and that is really odd.

Could you please tell me what the heck you're doing to run out of memory? Is there some kind of animated splash screen running or something like that?

I think that this problem shows up when you're tight on memory AND try to use VM at the same time. Probably we're missing some return value check or something like this.

Anyway, as Alexandre Demers pointed out simply disabling VM should also help.

In the meantime I will try to test the VM implementation under memory pressure, maybe that will yield some results.

Cheers,
Christian.
Comment 28 Serkan Hosca 2012-10-17 14:16:50 UTC
(In reply to comment #27)
> Well that's interesting, according to the logs you are running out of GART
> memory (which is 512MB in size) just 7 seconds after boot, and that is
> really odd.
> 
> Could you please tell me what the heck you're doing to run out of memory? Is
> there some kind of animated splash screen running or something like that?
> 
> I think that this problem shows up when you're tight on memory AND try to
> use VM at the same time. Probably we're missing some return value check or
> something like this.
> 
> Anyway, as Alexandre Demers pointed out simply disabling VM should also help.
> 
> In the meantime I will try to test the VM implementation under memory
> pressure, maybe that will yield some results.
> 
> Cheers,
> Christian.

I don't have anything graphical running during boot. I have radeon in mkinitcpio MODULES, no plymouth or anything just console, that sets up the mode then straight to gdm.
Comment 29 Alexandre Demers 2012-10-17 15:41:30 UTC
I haven't had time to dig it, but just to let you know I'm pretty much in the same situation as Serkan with a very similar config. I don't think it has to do with something using too much memory, but more about not releasing/attributing it correctly in the first place. Otherwise, why would it work with kernel 3.6 and not 3.7 if only kernel version is in the balance?

I should have time to look at it tonight.
Comment 30 Jerome Glisse 2012-10-17 15:58:05 UTC
Well log for comment #25 shows out of memory. Which should not happen. It looks like it's the framebuffer that try to go into gtt but that doesn't make sense (16M is fb size according to log).
Comment 31 Serkan Hosca 2012-10-17 16:02:05 UTC
(In reply to comment #29)
> I haven't had time to dig it, but just to let you know I'm pretty much in
> the same situation as Serkan with a very similar config. I don't think it
> has to do with something using too much memory, but more about not
> releasing/attributing it correctly in the first place. Otherwise, why would
> it work with kernel 3.6 and not 3.7 if only kernel version is in the balance?
> 
> I should have time to look at it tonight.

I think the gart memory issue is because of my recent update to gnome 3.6, i didn't see that with gdm 3.4. The machine also boots very fast now after the systemd upgrade, from grub to gdm i would say its about 5~7 seconds. Also when grub starts, the screen stays at console login prompt with the mouse cursor available and it takes about 2~3 seconds till gdm starts doing its fading thing to login prompt.

I will try to revert it back and test it again when i get home.
Comment 32 Jerome Glisse 2012-10-17 18:15:21 UTC
Other explanation might be that the gdm admin queue a bunch of animation in form of big bo and thus fill up the gart before the first gpu lockup had a chance to be detected.
Comment 33 Serkan Hosca 2012-10-17 18:17:23 UTC
(In reply to comment #32)
> Other explanation might be that the gdm admin queue a bunch of animation in
> form of big bo and thus fill up the gart before the first gpu lockup had a
> chance to be detected.

I'll try lightdm or straight startx from console too
Comment 34 Alexandre Demers 2012-10-17 19:45:35 UTC
If Serkan and I are experiencing the same problem as I suspect, I would say this is improbably related to Gnome 3.6 because I'm still using 3.4 (with both kernel 3.6 and 3.7-rc1). We have the same GPU and we are not using plymouth. We are experiencing similar visual problem (can't confirm with a remote connection for now) when moving to kernel 3.7-rcX, but not with 3.6.

I'll bisect kernel tonight and when I'm done. I'll keep you updated.
Comment 35 Alexandre Demers 2012-10-17 22:06:56 UTC
I've been playing a bit (booting and restarting with kernel 3.7-rc1) and strangely, what I see is very similar to what I was observing in bug 43655. It was then merged with bug 42373. At the time, attachment 64759 [details] [review] was proposed and a similar patch ended up being commited that fixed bug 43655 for me (but it never fixed bug 42373 on NI CAICOS).

I'll try the workaround used at the time to see if it is really related to bug 43655 (comments 8 and 10) and I'll begin bisecting kernel right after.
Comment 36 Alex Deucher 2012-10-17 22:10:38 UTC
(In reply to comment #35)
> I've been playing a bit (booting and restarting with kernel 3.7-rc1) and
> strangely, what I see is very similar to what I was observing in bug 43655.
> It was then merged with bug 42373. At the time, attachment 64759 [details] [review]
> [review] was proposed and a similar patch ended up being commited that fixed
> bug 43655 for me (but it never fixed bug 42373 on NI CAICOS).
> 
> I'll try the workaround used at the time to see if it is really related to
> bug 43655 (comments 8 and 10) and I'll begin bisecting kernel right after.

I think what you really want for your caicos is this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=62444b7462a2b98bc78d68736c03a7c4e66ba7e2
Comment 37 Serkan Hosca 2012-10-17 22:11:14 UTC
Created attachment 68728 [details]
dmesg.3.7-rc1 with testpatch with mesa-git

I removed gdm and installed slim as login manager. Also installed cinnamon as a replacement for gnome and it works fine the first round with linus git with the test patch and mesa git. Restarted slim and logged in again and there were some font corruptions, i restarted cinnamon and they were gone. I tried google maps with webgl enabled and it was working fine.

After that i edited my .xinitrc to startup gnome, restarted slim and logged in but it failed and got the error window saying oh no something has gone wrong and a log out button. I checked dmesg at that point and saw the ttm gart memory error. i switched back to cinnamon logged in and got the same font corruptions, restarting cinnamon fixed them.
Comment 38 Serkan Hosca 2012-10-17 22:43:33 UTC
Using the same linus git kernel with test patch and mesa git, I've reverted gnome to 3.4, kept slim as login manager, logged in to gnome, it worked fine, no errors in dmesg. I stopped slim, installed gdm and started it and logged in without any errors.

I disabled slim and enabled gdm instead and rebooted the computer. Gdm login came up, i logged in and it worked fine.
Comment 39 Alexandre Demers 2012-10-17 22:53:53 UTC
(In reply to comment #36)
> (In reply to comment #35)
> > I've been playing a bit (booting and restarting with kernel 3.7-rc1) and
> > strangely, what I see is very similar to what I was observing in bug 43655.
> > It was then merged with bug 42373. At the time, attachment 64759 [details] [review] [review]
> > [review] was proposed and a similar patch ended up being commited that fixed
> > bug 43655 for me (but it never fixed bug 42373 on NI CAICOS).
> > 
> > I'll try the workaround used at the time to see if it is really related to
> > bug 43655 (comments 8 and 10) and I'll begin bisecting kernel right after.
> 
> I think what you really want for your caicos is this patch:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;
> h=62444b7462a2b98bc78d68736c03a7c4e66ba7e2

You misunderstood me. I'm using a 6950 (not CAICOS) and it was working great with commit http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5 for my Cayman (but not for CAICOS according to reporter of bug 42373) included in kernel 3.6. What I'm saying is that the symptoms I'm now seeing with 3.7-rc1 are similar to what I was seeing at the time, but it was fixed in 3.6.

Now, about the patch you propose, it is already included in kernel 3.7-rc1 according to commit history. Since I'm experiencing bug 55692 with 3.7-rc1, the proposed patch can't be the cure. I'm bisecting right now between kernel 3.6 and 3.7-rc1. If it appears to be a different bug than 55692, I'll open a new one.
Comment 40 Alexandre Demers 2012-10-18 06:29:49 UTC
(In reply to comment #36)
> (In reply to comment #35)
> > I've been playing a bit (booting and restarting with kernel 3.7-rc1) and
> > strangely, what I see is very similar to what I was observing in bug 43655.
> > It was then merged with bug 42373. At the time, attachment 64759 [details] [review] [review]
> > [review] was proposed and a similar patch ended up being commited that fixed
> > bug 43655 for me (but it never fixed bug 42373 on NI CAICOS).
> > 
> > I'll try the workaround used at the time to see if it is really related to
> > bug 43655 (comments 8 and 10) and I'll begin bisecting kernel right after.
> 
> I think what you really want for your caicos is this patch:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;
> h=62444b7462a2b98bc78d68736c03a7c4e66ba7e2

kernel bisected. Here is the culprit commit from what I see here:
62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit
commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Aug 15 17:18:42 2012 -0400

    drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)
    
    - Stop the displays from accessing the FB
    - Block CPU access
    - Turn off MC client access
    
    This should fix issues some users have seen, especially
    with UEFI, when changing the MC FB location that result
    in hangs or display corruption.
    
    v2: fix crtc enabled check noticed by Luca Tettamanti
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c 4932561607b160734ec1eade927a9fe18c9f3f1b M	drivers

So it may not be the same bug I'm hitting as Serkan is. Where should I track this faulty commit/bug? In the NI CAICOS bug or in a new one?
Comment 41 Christian König 2012-10-18 08:33:51 UTC
(In reply to comment #37)
> Created attachment 68728 [details]
> dmesg.3.7-rc1 with testpatch with mesa-git
> 
> I removed gdm and installed slim as login manager. Also installed cinnamon
> as a replacement for gnome and it works fine the first round with linus git
> with the test patch and mesa git. Restarted slim and logged in again and
> there were some font corruptions, i restarted cinnamon and they were gone. I
> tried google maps with webgl enabled and it was working fine.
> 
> After that i edited my .xinitrc to startup gnome, restarted slim and logged
> in but it failed and got the error window saying oh no something has gone
> wrong and a log out button. I checked dmesg at that point and saw the ttm
> gart memory error. i switched back to cinnamon logged in and got the same
> font corruptions, restarting cinnamon fixed them.

Thanks allot for your additional testing, as I suspected we are really facing two problems here:

1.  The new gnome/gdm versions seem to trigger an out of memory situation in the GART memory area. That's probably because some miscalculation or memory leak or something like this and should be handled as a separate bug.

BTW: You can take a look at the current memory allocations with:
    sudo cat /sys/kernel/debug/dri/0/radeon_gtt_mm
and
    sudo cat /sys/kernel/debug/dri/0/radeon_vram_mm

2. Properly updating the page table asynchronously somehow fails under high memory pressure.

I will try to look into problem 2 first, since that got added with my patch. But problem number 1 is as equally as bad.

I don't think we just spool up allot of drawing operations like Jerome suspected, cause in this case TTM would just block on previous render operations to complete. It looks more like we are submitting a single draw operation with multiple ~16MB chunks of memory that is so big that it just won't fit into the GART memory altogether.
Comment 42 Christian König 2012-10-18 08:40:19 UTC
(In reply to comment #40)
[SNIP]
> kernel bisected. Here is the culprit commit from what I see here:
> 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit
> commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Wed Aug 15 17:18:42 2012 -0400
> 
>     drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)
>     
>     - Stop the displays from accessing the FB
>     - Block CPU access
>     - Turn off MC client access
>     
>     This should fix issues some users have seen, especially
>     with UEFI, when changing the MC FB location that result
>     in hangs or display corruption.
>     
>     v2: fix crtc enabled check noticed by Luca Tettamanti
>     
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> :040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c
> 4932561607b160734ec1eade927a9fe18c9f3f1b M	drivers
> 
> So it may not be the same bug I'm hitting as Serkan is. Where should I track
> this faulty commit/bug? In the NI CAICOS bug or in a new one?

That indeed looks like a separate bug to me, so I suggest to open up a new bug.
Comment 43 Christian König 2012-10-19 12:46:37 UTC
Good news! I figured out what it is (the crash not the memory problem) and can reproduce it.

A patch fixing this shouldn't be to much of a problem any more, but I don't think I will have time to fix it before Monday.

So please be patient for a couple of more days.
Comment 44 Serkan Hosca 2012-10-19 22:39:42 UTC
(In reply to comment #43)
> Good news! I figured out what it is (the crash not the memory problem) and
> can reproduce it.
> 
> A patch fixing this shouldn't be to much of a problem any more, but I don't
> think I will have time to fix it before Monday.
> 
> So please be patient for a couple of more days.

Thats cool. I found out what triggers the gart error. I had gtk-redshift on session start up. After removing that the ttm error is gone. It redshifts the screen colors so that it is easy on the eyes and when its started it starts the redshifting gradually.

Also, i have been playing around with the RADEON_VA variable but i can't trigger the gpu stall anymore, i get some graphical corruptions and a couple of these instead:

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!

After a shell restart, the glitches go away.
Comment 45 Serkan Hosca 2012-10-20 18:23:08 UTC
> Thats cool. I found out what triggers the gart error. I had gtk-redshift on
> session start up. After removing that the ttm error is gone. It redshifts
> the screen colors so that it is easy on the eyes and when its started it
> starts the redshifting gradually.
> 

Scratch that, i removed redshift but the gart error happened again. Its not the gdm startup though, it happens during gnome session startup.
Comment 46 Christian König 2012-10-22 09:00:08 UTC
Created attachment 68906 [details] [review]
Possible fix.

Ok, please try the attached patch. It should fix the issue with the original "async page table updates patch".

Please note that Alex current drm-fixes-3.7 branch already contains another patch that is also masquerading this problem, so please test with the original drm-next-3.7 branch.

I've submitted a series of patches that should fix and cleanup the code.
Comment 47 Serkan Hosca 2012-10-22 22:39:15 UTC
(In reply to comment #46)
> Created attachment 68906 [details] [review] [review]
> Possible fix.
> 
> Ok, please try the attached patch. It should fix the issue with the original
> "async page table updates patch".
> 
> Please note that Alex current drm-fixes-3.7 branch already contains another
> patch that is also masquerading this problem, so please test with the
> original drm-next-3.7 branch.
> 
> I've submitted a series of patches that should fix and cleanup the code.

Yes the patch works. I've checked out v3.6 and merged alex' drm-next-3.7 branch on top and tested with mesa-git and ati-git. Because of the gnome update i don't get the same exact dmesg errors but the result is the same, gpu just stalls when you try to login.

After the patch, i am able to login, i still get a couple relocation errors and some glitches, which disappear after restarting gnome shell.
Comment 48 Serkan Hosca 2012-10-22 22:39:46 UTC
Created attachment 68932 [details]
dmesg-3.6+drm-next-3.7
Comment 49 Serkan Hosca 2012-10-22 22:40:13 UTC
Created attachment 68933 [details]
dmesg-3.6+drm-next-3.7+patch
Comment 50 Alex Deucher 2012-10-22 22:42:55 UTC
you'll probably want the updated version of the patch here:
http://lists.freedesktop.org/archives/dri-devel/2012-October/029292.html
Comment 51 Alexandre Demers 2013-01-05 19:43:57 UTC
Since the patch was submitted and applied on kernel 3.7, should this bug be closed?
Comment 52 Serkan Hosca 2013-01-05 20:16:36 UTC
Yes this is fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.