Bug 58272

Summary: Rv670 AGP drm-next ttm errors
Product: DRI Reporter: Andy Furniss <adf.lists>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: bugs, florian
Version: XOrg git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
errors in kern log with dma
none
errors in kern log before dma changes
none
fix
none
gpu lock + oops on use async dma for ttm buffer moves on 6xx-SI none

Description Andy Furniss 2012-12-14 01:03:31 UTC
Created attachment 71476 [details]
errors in kern log with dma

I haven't had time to test latest drm-next but will post this now as may be AFK tomorrow.

After finding a place on mesa where etqw seems OK with drm-fixes I am getting errors with drm-next.

On yesterdays head + the wb patch I got attachment 1 [details] [review].

With the tree reset to before the dma changes which required the patch -

drm/ttm: remove no_wait_reserve, v3

I got attachment 2 [details] [review]

the last lines repeating for 400k lines and the log also getting filled with junk.
Comment 1 Andy Furniss 2012-12-14 01:04:48 UTC
Created attachment 71477 [details]
errors in kern log before dma changes
Comment 2 Andy Furniss 2012-12-14 01:09:45 UTC
Hmm I see using the word  a t t a c h m e n t does strange things  - 1 and 2 are not mine.
Comment 3 Maarten Lankhorst 2012-12-14 07:40:39 UTC
It seems that ttm_mem_evict_first is called way more often in a nasted fashion than is healthy there. Could you resolve the ttm_mem_evict_first address where it ends up calling itself back to a specific line?
Comment 4 Maarten Lankhorst 2012-12-14 07:57:43 UTC
It looks nasty though, could you also dump mem_type for each time it calls ttm_mem_evict_first?
Comment 5 Maarten Lankhorst 2012-12-14 08:51:19 UTC
Do any of your local patches touch radeon_evict_flags or radeon_ttm_placement_from_domain? I don't see why it would recurse so deeply otherwise.

A full public git tree to reproduce the problem and seeing what patches are applied would also be nice.
Comment 6 Alex Deucher 2012-12-14 15:04:50 UTC
Created attachment 71507 [details] [review]
fix

Should be fixed with this patch.
Comment 7 Andy Furniss 2012-12-14 21:55:52 UTC
(In reply to comment #6)
> Created attachment 71507 [details] [review] [review]
> fix
> 
> Should be fixed with this patch.

Probably :-)

It seems that current drm-next head + fix has a different issue which makes etqw die quite quickly.

drm-next reset onto 

drm/ttm: remove no_wait_reserve, v3 + the fix 

is now stable with etqw.

The head issue is -

EE r600_texture.c:697 r600_texture_transfer_map - failed to create temporary texture to hold untiled copy
Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage
radeon: The kernel rejected CS, see dmesg for more information.
double fault: 'Segmentation fault', bailing out

in dmesg - 

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (8192, 2, 4096, -12)
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
etqw.x86[2478]: segfault at 0 ip af5142ad sp bff8b310 error 4 in gamex86.so[af23f000+948000]
Comment 8 Maarten Lankhorst 2012-12-14 22:02:02 UTC
Could you please run a git bisection to see where that error has been introduced, then?
Comment 9 Andy Furniss 2012-12-15 01:01:14 UTC
Created attachment 71530 [details]
gpu lock + oops on use async dma for ttm buffer moves on 6xx-SI
Comment 10 Andy Furniss 2012-12-15 01:02:20 UTC
(In reply to comment #8)
> Could you please run a git bisection to see where that error has been
> introduced, then?

It seems that drm/radeon: use async dma for ttm buffer moves on 6xx-SI is the first non working, but it gives a different fail from head. Log attached.
Comment 11 Florian Mickler 2012-12-22 09:23:15 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc1:

commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Dec 14 21:04:46 2012 +1000

    radeon: fix regression with eviction since evict caching changes
Comment 13 Andy Furniss 2012-12-22 20:57:13 UTC
(In reply to comment #12)
> Make sure your kernel has this patch:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;
> h=0953e76e91f4b6206cef50bd680696dc6bf1ef99

I tested drm-next head when that went in and got the same results.

I've just rebuilt it to be sure and with etqw I get a segfault after about 10 secs and in dmesg -

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!

I've also managed to reproduce the GPU lock + oops I reported earlier - this time with nexuiz on current drm-next head.

I am not getting ttm errors any more so I guess this bug should be closed?
Comment 14 Andy Furniss 2013-01-03 11:52:17 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Make sure your kernel has this patch:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;
> > h=0953e76e91f4b6206cef50bd680696dc6bf1ef99
> 
> I tested drm-next head when that went in and got the same results.
> 
> I've just rebuilt it to be sure and with etqw I get a segfault after about
> 10 secs and in dmesg -
> 
> [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
> 
> I've also managed to reproduce the GPU lock + oops I reported earlier - this
> time with nexuiz on current drm-next head.
> 
> I am not getting ttm errors any more so I guess this bug should be closed?

FWIW I tried current drm-next + patch -

0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch

And I still fail with etqw after about 10 secs, but do get more info.

radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: Failed to allocate a buffer:
radeon:    size      : 7168 bytes
radeon:    alignment : 256 bytes
radeon:    domains   : 2
EE r600_texture.c:697 r600_texture_transfer_map - failed to create temporary texture to hold untiled copy
Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
double fault: 'Segmentation fault', bailing out
shutdown terminal support
/home/andy/bin/etqw: line 1:  2472 Segmentation fault      /usr/local/games/etqw/etqw

dmesg -

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (8192, 2, 4096, -12)
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
etqw.x86[2472]: segfault at 0 ip af5292ad sp bfbe3250 error 4 in gamex86.so[af254000+948000]
Comment 15 Andy Furniss 2013-01-23 11:17:11 UTC
Current drm-fixes is working for me now.

The remaining etqw issue was fixed by -

Revert "drm/radeon: do not move bo to different placement at each cs"

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.