Summary: | [GEM] PAT breaks GEM with "*ERROR* Failed to map ringbuffer" | ||
---|---|---|---|
Product: | DRI | Reporter: | Steven Newbury <s_j_newbury> |
Component: | DRM/other | Assignee: | Eric Anholt <eric> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | critical | ||
Priority: | medium | CC: | brian |
Version: | DRI git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Steven Newbury
2008-08-11 14:17:17 UTC
Created attachment 18225 [details]
Xorg.0.log of initialisation failure
Weird. What does the dmesg look like with git://people.freedesktop.org/~anholt/linux-2.6#drm-gem-fd-17085 ? (In reply to comment #2) > Weird. What does the dmesg look like with > git://people.freedesktop.org/~anholt/linux-2.6#drm-gem-fd-17085 ? > I'll test that now.. give me a few minutes... Created attachment 18226 [details]
dmesg output with debugging
Is there any more information or testing I can do to help resolve this? (In reply to comment #5) > Is there any more information or testing I can do to help resolve this? > Has anyone got this working on x86-64 yet? Maybe there is a int where there should be an unsigned long somewhere? (In reply to comment #6) > (In reply to comment #5) > > Is there any more information or testing I can do to help resolve this? > > > > Has anyone got this working on x86-64 yet? Maybe there is a int where there > should be an unsigned long somewhere? > That wasn't quite what I meant to say, but my meaning should be clear enough.. Try the latest git version of libdrm, specifically this commit http://cgit.freedesktop.org/mesa/drm/commit/?id=b0e68829462aad00ce68be998da6313bca754e9a fixes a bug where libdrm overwrites the gem handles of some buffers. (In reply to comment #8) > Try the latest git version of libdrm, specifically this commit > > http://cgit.freedesktop.org/mesa/drm/commit/?id=b0e68829462aad00ce68be998da6313bca754e9a > > fixes a bug where libdrm overwrites the gem handles of some buffers. > No change. (In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > Is there any more information or testing I can do to help resolve this? > > > > > > > Has anyone got this working on x86-64 yet? Maybe there is a int where there > > should be an unsigned long somewhere? > > > > That wasn't quite what I meant to say, but my meaning should be clear enough.. > What I was trying to say; the debugging output you put in is expecting the addresses to be in 64bits, which would make sense, but the actual addresses used seem to be only 32bits. Created attachment 18304 [details]
dmesg output with debugging enabled
Linux fu 2.6.27-rc3-00227-g54b288f #22 SMP PREEMPT Fri Aug 15 23:30:39 CEST 2008 x86_64 Intel(R) Core(TM)2 Duo CPU L7500 @ 1.60GHz GenuineIntel GNU/Linux
That is Linus' tree as of Fri Aug 15 plus drm-gem-merge and drm-gem-merge-fd-17085 pulled in.
It looks like the mtrr mapping isn't being set up correctly. mtrr_add() returns 0 and the mapping doesn't show up in /proc/mtrr. Will try to investigate further but any help is appreciated. The drm_core_ioremap() at [1] eventually calls ioremap() in include/asm-x86/io_64.h which calls ioremap_nocache() in arch/x86/mm/ioremap.c which calls __ioremap_caller() in the same file which fails in this block: if (prot_val != new_prot_val) { /* * Do not fallback to certain memory types with certain * requested type: * - request is uc-, return cannot be write-back * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ if ((prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && new_prot_val == _PAGE_CACHE_WB)) { printk( "ioremap error for 0x%llx-0x%llx, requested 0x%lx, got 0x%lx\n", (unsigned long long)phys_addr, (unsigned long long)(phys_addr + size), prot_val, new_prot_val); free_memtype(phys_addr, phys_addr + size); return NULL; } prot_val = new_prot_val; } with the following values: ioremap error for 0xe077f000-0xe079f000, requested 0x10, got 0x8 The types are: - requested: _PAGE_CACHE_UC_MINUS (this one comes directly from ioremap_nocache()) - got: _PAGE_CACHE_WC Interesting is also the comment inside ioremap_nocache() (mentions the X drivers and ioremap_wc()). new_prot_val comes from arch/x86/mm/pat.c:reserve_memtype() which calls chk_conflict() because of this conflict: Xorg:3880 memory types e077f000-e079f000 uncached-minus <-> e0000000-f0000000 write-combining So someone must be setting up e0000000-f0000000 as WC and drm can't map that as uncached. Maybe AGP? agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xe0000000 tom [1] http://gitweb.freedesktop.org/?p=users/anholt/anholt/linux-2.6.git;a=blob;h=2394f4568005fb0eda07742a343437df943ec67f;hb=93e91293c9d7a3fcdd3a7874cd43295786e9a614;f=drivers/gpu/drm/i915/i915_gem.c#l2401 During startup of the working TTM DRM/Mesa/Xorg (based off git master with TTM patched back in) I get these messages in dmesg: X:4367 conflicting memory types e0000000-f0000000 write-combining<->uncached-minus reserve_memtype failed 0xe0000000-0xf0000000, track write-combining, req write-combining X:4367 conflicting memory types e0000000-f0000000 write-combining<->uncached-minus reserve_memtype failed 0xe0000000-0xf0000000, track write-combining, req write-combining X:4378 freeing invalid memtype e0000000-f0000000 X:4367 conflicting memory types e0000000-f0000000 write-combining<->uncached-minus reserve_memtype failed 0xe0000000-0xf0000000, track write-combining, req write-combining X:4379 freeing invalid memtype e0000000-f0000000 X:4367 conflicting memory types e0000000-f0000000 write-combining<->uncached-minus reserve_memtype failed 0xe0000000-0xf0000000, track write-combining, req write-combining X:4380 freeing invalid memtype e0000000-f0000000 X:4367 conflicting memory types e0000000-f0000000 write-combining<->uncached-minus reserve_memtype failed 0xe0000000-0xf0000000, track write-combining, req write-combining X:4381 freeing invalid memtype e0000000-f0000000 Seems it may be relevant? I was reading through the intel-agp driver and I was wondering since it uses the mappings as set up during POST, could it be on some systems the BIOS is setting the aperture mapping as WC by default? This is the only logical conclusion I have been able to come to after following on from Tomas Carnecky's tracing of the code paths. Isn't the root cause of this that the intel module is using drm_core_ioremap() when it should in fact be using ioremap_wc() with newer kernels. That would make the conflict go away. /Thomas (In reply to comment #16) > Isn't the root cause of this that the intel module is using > drm_core_ioremap() when it should in fact be using > ioremap_wc() with newer kernels. > That would make the conflict go away. > > /Thomas > Yes it would, if it is indeed acceptable to have the ringbuffer as WC. I'll make this change and see how it goes... (In reply to comment #17) > (In reply to comment #16) > > Isn't the root cause of this that the intel module is using > > drm_core_ioremap() when it should in fact be using > > ioremap_wc() with newer kernels. > > That would make the conflict go away. > > > > /Thomas > > > > Yes it would, if it is indeed acceptable to have the ringbuffer as WC. I'll > make this change and see how it goes... > Yes, the ringbuffer _should_ be WC. Before ioremap_wc() was introduced, drm_core_ioremap() combined with a WC MTRR meant a WC mapping. /Thomas After I located the conflict with the WC mapping, I changed the code to use ioremap_wc() just to see if that improves anything. It didn't. When I started up Xorg, it locked up the computer and I had to do a hard reboot. Maybe I did something wrong, so if you give me a patch or an git url I could pull the needed changes from, I will try again (patch preferably against anholt's linux-2.6/drm-gem-merge tree). Created attachment 18396 [details] [review] use ioremap_wc() in place of ioremap() Okay, this works as far as initialisation. Ringbuffer is allocated and enabled, though there a some errors during server startup. dmesg contains: [drm] Set up GEM GTT of 144896kb [drm] Creating ringbuffer [drm:i915_gem_object_pin] *ERROR* pinning object ffff88007bf53180 size 32768: 1 Also the next attached Xorg log, notice "(WW) intel(0): Existing errors found in hardware state." and "(EE) intel(0): underrun on pipe B!" Furthermore, attempting to run a glxgears results in the machine hard locking. Created attachment 18397 [details]
Xorg log with GEM enabled after ioremap_wc() change
I'm going to be away now for the next week, so maybe it'll all be working when I get back! :) I was experiencing this issue. It went away when I removed CONFIG_X86_PAT from my kernel config. I suspect that option is the only requirement to trigger this bug. The option can be found in the kernel configuration menu under 'Processor type and features.' I'm uncomfortable changing the drm core stuff without reviewing what all the other drivers intended with it. I'll probably just change the intel driver to do what we meant using linux calls directly now that ioremap()'s semantics changed under us. But, looking at it we've been doing things wrong anyway. We want cached access to the status page -- that's kind of the point of having a status page. The newer hardware actually expects you to access it cached and that may be required for coherency. Fixed in drm-gem-merge branch of my tree. commit f2b39cd19b63ec8b82c198004d0e45044abce103 Author: Eric Anholt <eric@anholt.net> Date: Tue Sep 23 14:50:57 2008 -0700 drm: Use ioremap_wc in i915_driver instead of ioremap, since we always want WC. Fixes failure to map the ringbuffer when PAT tells us we don't get to do uncached on something that's already mapped WC, or something along those lines. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.