Bug 67016

Summary: Lockup on piglit test vs-textureSize-compare with AMD 6950
Product: Mesa Reporter: Martin Andersson <g02maran>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: serkan, vmerlet
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg
possible fix
only align the pt base to 32k

Description Martin Andersson 2013-07-18 00:08:38 UTC
I'm running kernel http://cgit.freedesktop.org/~agd5f/linux/?h=drm-fixes-3.11 with latest commit 444bddc4b9b3313a562cd3ba40f780fb82570f7d and mesa master with latest commit e4fdf1b008ce29c5b5a52985c586b61f35d31e4c

When I run spec/glsl-1.30/execution/vs-textureSize-compare my system locks up. I can't reboot or ssh into it, I have the power cycle the machine to reset it.

Nothing is printed to dmesg before it hangs. I have tried both with and without radeon.dpm=1, no difference.

I did a bisect and it identified this commit:
http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-3.11&id=1c01103cb90197900beb534911de558d7a43d0b3
Comment 1 Martin Andersson 2013-07-18 11:26:13 UTC
Perhaps mesa needs to be updated after this commit. I have tried to change mesa to align to 32768 instead of 4096. With some changes I can make the test pass a few times before it hangs. It does not hard lock everytime with the changes, so I managed to get a dmesg. Don't know how useful it is though because of my, probably incorrect, changes.
Comment 2 Alex Deucher 2013-07-18 13:02:47 UTC
Does the test work reliably with that kernel commit reverted?  You don't need to adjust the alignment of anything in mesa.  I don't see how that commit would cause any regressions.  It just adjusts the alignment of VM page table blocks in the kernel driver.  Even if cayman doesn't need 32K VM page table alignment, over-aligning shouldn't hurt.
Comment 3 Martin Andersson 2013-07-18 13:22:01 UTC
(In reply to comment #2)
> Does the test work reliably with that kernel commit reverted?  You don't
> need to adjust the alignment of anything in mesa.  I don't see how that
> commit would cause any regressions.  It just adjusts the alignment of VM
> page table blocks in the kernel driver.  Even if cayman doesn't need 32K VM
> page table alignment, over-aligning shouldn't hurt.

I ran the test 500 times without fail with RADEON_VA=0 with that kernel commit.

I also ran the test 500 times without fail with that kernel commit reverted and RADEON_VA=1.
Comment 4 Martin Andersson 2013-07-18 13:31:21 UTC
Created attachment 82599 [details]
dmesg

I managed to get a dmesg without my patches.
Comment 5 Alex Deucher 2013-07-18 13:33:28 UTC
Do you still get the issue with dpm disabled?
Comment 6 Michel Dänzer 2013-07-18 13:37:02 UTC
That commit causes problems (VM faults, lockups) on my Cape Verde card as well.

I suspect other kernel code needs to be adjusted for the increased buffer sizes, or something like that.
Comment 7 Martin Andersson 2013-07-18 13:37:35 UTC
(In reply to comment #5)
> Do you still get the issue with dpm disabled?

yes
Comment 8 Alex Deucher 2013-07-18 13:45:24 UTC
Ok.  I'll go ahead and revert it for now.
Comment 9 Martin Andersson 2013-07-18 15:29:38 UTC
Created attachment 82616 [details] [review]
possible fix

This patch fixes the issue for me.
Comment 10 Alex Deucher 2013-07-18 16:13:20 UTC
I think that patch is correct.  We have to align the PTE block size as well.
Comment 11 Martin Andersson 2013-07-18 16:36:44 UTC
At least it works for me, I have run a complete piglit test(quick.tests) with that patch without issues, with dpm enabled.
Comment 12 Christian König 2013-07-18 17:04:29 UTC
(In reply to comment #9)
> Created attachment 82616 [details] [review] [review]
> possible fix
> 
> This patch fixes the issue for me.

The patch itself is correct, but the problem is it shouldn't be necessary!

When the increment is incorrect we should just make allot of small page directory updates instead of one big update, not as efficient but should work also...

Could you try to setting the increment to something like 0xffffffff and so disable the accumulation of updates and see if that still doesn't work?
Comment 13 Alex Deucher 2013-07-18 17:20:40 UTC
Created attachment 82622 [details] [review]
only align the pt base to 32k

Does this patch help?  Only the page table base address should need the 32k alignment.
Comment 14 Martin Andersson 2013-07-18 17:46:16 UTC
(In reply to comment #13)
> Created attachment 82622 [details] [review] [review]
> only align the pt base to 32k
> 
> Does this patch help?  Only the page table base address should need the 32k
> alignment.

Yes that patch fixes the problem.
Comment 15 Christian König 2013-07-19 11:37:00 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Created attachment 82622 [details] [review] [review] [review]
> > only align the pt base to 32k
> > 
> > Does this patch help?  Only the page table base address should need the 32k
> > alignment.
> 
> Yes that patch fixes the problem.

Even if the problem is fixed for now can you please make the test I suggested?

We are still having some problems with the virtual memory support for NI and it would be nice if  we can narrow thos down a bit more.
Comment 16 Martin Andersson 2013-07-19 15:31:01 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > Created attachment 82622 [details] [review] [review] [review] [review]
> > > only align the pt base to 32k
> > > 
> > > Does this patch help?  Only the page table base address should need the 32k
> > > alignment.
> > 
> > Yes that patch fixes the problem.
> 
> Even if the problem is fixed for now can you please make the test I
> suggested?
> 
> We are still having some problems with the virtual memory support for NI and
> it would be nice if  we can narrow thos down a bit more.

I reset the branch to drm-fixes-3.11 and set incr to 0xffffffff. The computer booted fine but when I ran vs-textureSize-compare the computer locked up.
Comment 17 Alex Deucher 2013-07-19 20:11:46 UTC
*** Bug 67102 has been marked as a duplicate of this bug. ***
Comment 18 GitLab Migration User 2019-09-18 19:05:04 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/454.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.