Bug 25690 - [G35/KMS] DRM failure during boot (linux 2.6.31->2.6.32 regression)
Summary: [G35/KMS] DRM failure during boot (linux 2.6.31->2.6.32 regression)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Wang Zhenyu
QA Contact: Xorg Project Team
URL: http://lists.freedesktop.org/archives...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-17 09:42 UTC by Jeremy Stanley
Modified: 2017-07-24 23:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg booting working linux 2.6.31 kernel (52.45 KB, text/plain)
2009-12-17 09:42 UTC, Jeremy Stanley
no flags Details
dmesg booting broken linux 2.6.32 kernel (52.25 KB, text/plain)
2009-12-17 09:45 UTC, Jeremy Stanley
no flags Details
working linux 2.6.31 kernel configuration (104.08 KB, text/plain)
2009-12-17 09:47 UTC, Jeremy Stanley
no flags Details
broken linux 2.6.32 kernel configuration (108.22 KB, text/plain)
2009-12-17 09:47 UTC, Jeremy Stanley
no flags Details
lspci generated under linux 2.6.32 (2.17 KB, text/plain)
2009-12-17 09:48 UTC, Jeremy Stanley
no flags Details
lspci Thinkpad X61 Tablet (1.97 KB, text/plain)
2009-12-26 14:45 UTC, Jose Gardiazabal
no flags Details
dmesg booting non-PAE linux 2.6.32 kernel (51.45 KB, text/plain)
2009-12-27 12:08 UTC, Jeremy Stanley
no flags Details
non-PAE linux 2.6.32 kernel configuration (107.79 KB, text/plain)
2009-12-27 12:09 UTC, Jeremy Stanley
no flags Details
agp debug patch (977 bytes, patch)
2009-12-28 00:18 UTC, Wang Zhenyu
no flags Details | Splinter Review
Corrupted penguins (50.66 KB, image/jpeg)
2009-12-28 01:38 UTC, Jose Gardiazabal
no flags Details
dmesg 2.6.32.2 with agp patch (58.35 KB, text/plain)
2009-12-28 01:40 UTC, Jose Gardiazabal
no flags Details
disable pci dma mapping in non-iommu case (3.08 KB, patch)
2009-12-29 18:25 UTC, Wang Zhenyu
no flags Details | Splinter Review
set dma mask in i915 driver (587 bytes, patch)
2009-12-30 20:51 UTC, Wang Zhenyu
no flags Details | Splinter Review
dmesg with dma patch (58.85 KB, text/plain)
2009-12-31 02:34 UTC, Jose Gardiazabal
no flags Details
remove dma mask setting in drm_pci_alloc() (4.31 KB, patch)
2010-01-04 04:08 UTC, Wang Zhenyu
no flags Details | Splinter Review
dmesg booting 2.6.32 kernel without dma mask setting (52.43 KB, text/plain)
2010-01-04 12:16 UTC, Jeremy Stanley
no flags Details
Xorg.0.log under 2.6.32 kernel without dma mask setting (7.10 KB, text/plain)
2010-01-04 12:17 UTC, Jeremy Stanley
no flags Details

Description Jeremy Stanley 2009-12-17 09:42:48 UTC
Created attachment 32151 [details]
dmesg booting working linux 2.6.31 kernel

Following an upgrade from 2.6.31 to 2.6.32 I'm running into DRM issues at boot. Here's a diff of the dmesg between them (- is 31, + is 32) starting from [drm]:

 [drm] Initialized drm 1.1.0 20060810
 i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
 i915 0000:00:02.0: setting latency timer to 64
 mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
 [drm] MTRR allocation failed.  Graphics performance may suffer.
 i915 0000:00:02.0: irq 27 for MSI/MSI-X
-[drm] TMDS-8: set mode 720x480 d
-Console: switching to colour frame buffer device 90x30
-[drm] fb0: inteldrmfb frame buffer device
-[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
+[drm] set up 7M of stolen space
+nommu_map_sg: overflow 22e000000+4096 of device mask ffffffff
+[drm:drm_agp_bind_pages] *ERROR* Failed to bind AGP memory: -12
+[drm:i915_driver_load] *ERROR* failed to init modeset
+i915: probe of 0000:00:02.0 failed with error -28

I'm attaching the full dmesg output and configs for both kernels, as well as an lspci under 2.6.32.
Comment 1 Jeremy Stanley 2009-12-17 09:45:56 UTC
Created attachment 32152 [details]
dmesg booting broken linux 2.6.32 kernel
Comment 2 Jeremy Stanley 2009-12-17 09:47:18 UTC
Created attachment 32153 [details]
working linux 2.6.31 kernel configuration
Comment 3 Jeremy Stanley 2009-12-17 09:47:59 UTC
Created attachment 32154 [details]
broken linux 2.6.32 kernel configuration
Comment 4 Jeremy Stanley 2009-12-17 09:48:45 UTC
Created attachment 32155 [details]
lspci generated under linux 2.6.32
Comment 5 Wang Zhenyu 2009-12-21 19:49:54 UTC
In consider David's patch
commit ec402ba97a6479dd80488b4404a73275e894289f
Author: David Woodhouse <dwmw2@infradead.org>
Date:   Wed Nov 18 10:22:46 2009 +0000

    agp/intel-agp: Set dma_mask for capable chipsets before agp_add_bridge()
    
is already in 2.6.32, this one seems failed somehow to set dma mask to 36 bit in 32 bit system (although dmesg doesn't show failure of that...). Can you tried to
enable CONFIG_SWIOTLB or enable VT-d in BIOS and force "intel_iommu=on"? I'm not quite sure if there's any bug in nommu case.
Comment 6 Jeremy Stanley 2009-12-23 10:33:19 UTC
The BIOS on the ASUS P2E-VM HDMI board where I'm experiencing this
issue appears to lack any option for enabling VT-d. I'll hopefully
get time to build a custom kernel with CONFIG_SWIOTLB this weekend
and provide some feedback on that front.
Comment 7 Jeremy Stanley 2009-12-25 16:35:32 UTC
Should CONFIG_SWIOTLB be expected to work on 32-bit i386 arch?
There's no make menuconfig option for it where arch/x86/Kconfig
suggests it ought to appear, but the option search shows me:

Symbol: SWIOTLB [=n]
  Selected by: GART_IOMMU [=n] && X86_64 [=n] && PCI [=y] || CALGARY_IOMMU [=n] && X86_64 [=n] && PCI [=y] && EXPERIMENTAL [=y] || AMD_IOMMU [=n] && X86_64 [=n] && PCI [=y] && ACPI [=y]

...and checking those options in the original config:

$ grep -e CONFIG_GART_IOMMU= -e CONFIG_X86_64= -e CONFIG_PCI= -e CONFIG_CALGARY_IOMMU= -e CONFIG_EXPERIMENTAL= -e CONFIG_AMD_IOMMU= -e CONFIG_ACPI= .config
CONFIG_EXPERIMENTAL=y
CONFIG_ACPI=y
CONFIG_PCI=y

I'm at a loss as to why GART_IOMMU [=n] && X86_64 [=n] && PCI [=y]
isn't setting this (or, more likely, I'm misunderstanding the kernel
option conditionals here). What am I missing? Apologies in
advance...
Comment 8 Jose Gardiazabal 2009-12-26 14:45:58 UTC
Created attachment 32308 [details]
lspci Thinkpad X61 Tablet
Comment 9 Jose Gardiazabal 2009-12-26 14:47:06 UTC
Hi,
I'm receiving the same error messages, running 2.6.32 in 64 bit.
to narrow the error a bit, I tried 2.6.31.2, 2.6.31.9, and 2.6.32-rc1, and first 2 work but the rc1 gives me the same problems.
I checked my config, and I found CONFIG_SWIOTLB=y
My only other guess is that 2.6.32 works fine if I have 2gb of ram in my computer, but if I have 8gb it fails.
I'm also attaching my lspci.

Thanks.
Comment 10 Jeremy Stanley 2009-12-27 11:25:58 UTC
I, too, am experiencing this on an 8GiB system with PAE support in
the kernel. Switching to a non-PAE generic kernel for the same
version results in this dmesg diff starting from drm (- is PAE, + is
non-PAE):

    [drm] Initialized drm 1.1.0 20060810
    i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
    i915 0000:00:02.0: setting latency timer to 64
    mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
    [drm] MTRR allocation failed.  Graphics performance may suffer.
    i915 0000:00:02.0: irq 27 for MSI/MSI-X
    [drm] set up 7M of stolen space
   -nommu_map_sg: overflow 22e000000+4096 of device mask ffffffff
   -[drm:drm_agp_bind_pages] *ERROR* Failed to bind AGP memory: -12
   -[drm:i915_driver_load] *ERROR* failed to init modeset
   -i915: probe of 0000:00:02.0 failed with error -28
   +[drm] TMDS-8: set mode 1280x720 c
   +Console: switching to colour frame buffer device 160x45
   +fb0: inteldrmfb frame buffer device
   +registered panic notifier
   +[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

I'll attach a full dmesg and config for the non-PAE kernel shortly.
Comment 11 Jeremy Stanley 2009-12-27 12:08:12 UTC
Created attachment 32314 [details]
dmesg booting non-PAE linux 2.6.32 kernel
Comment 12 Jeremy Stanley 2009-12-27 12:09:18 UTC
Created attachment 32315 [details]
non-PAE linux 2.6.32 kernel configuration
Comment 13 Wang Zhenyu 2009-12-27 23:27:50 UTC
Could you try bisect kernel? The 32bit mask in error log seems weird.
Comment 14 Wang Zhenyu 2009-12-28 00:18:51 UTC
Created attachment 32317 [details] [review]
agp debug patch

please apply this debug patch against your failure kernel and attach dmesg log.
Comment 15 Jose Gardiazabal 2009-12-28 01:37:33 UTC
I tried the patch in my computer and it didn't work.
I'm attaching the dmesg output. Also I forgot to mention, I found another symptom, there is corruption in the penguin logo during boot. The corruption looks like black lines that appear on top of the image. for me it looks like the output of the console is mapped there. The patterns are (almost) the same on every boot.
Comment 16 Jose Gardiazabal 2009-12-28 01:38:47 UTC
Created attachment 32318 [details]
Corrupted penguins
Comment 17 Jose Gardiazabal 2009-12-28 01:40:46 UTC
Created attachment 32319 [details]
dmesg 2.6.32.2 with agp patch
Comment 18 Wang Zhenyu 2009-12-29 01:22:50 UTC
Current workaround should be disable CONFIG_DMAR on no VT-d machine.

I think in agpgart we might fallback to origin memory insert function in case no VT-d available.

Comment 19 Jose Gardiazabal 2009-12-29 02:33:47 UTC
Hi,
The DMAR deactivation does the trick! I can boot 2.6.32 without problems. Anyway, since I had it configured in my previous kernels, I understand this is just a temporary fix. I'm currently bisecting the kernel (4 more iterations...) so as soon as I have the result, I'll post it.

Thanks!
Comment 20 Jose Gardiazabal 2009-12-29 04:43:22 UTC
the kernel bisect says that the first bad commit is 176616814d700f19914d8509d9f65dec51a6ebf7, "intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU".
This is consistent with the workaround proposed.
I hope this helps, if you need, I can try patches in my computer.
Thanks for your help!
Comment 21 Wang Zhenyu 2009-12-29 18:25:44 UTC
Created attachment 32369 [details] [review]
disable pci dma mapping in non-iommu case

This one trys to revert back to origin behavior in non-iommu case. I'm not quite sure if this is the ideal solution for now, but it should fix the current issue you see.
Comment 22 Wang Zhenyu 2009-12-30 20:51:24 UTC
Created attachment 32379 [details] [review]
set dma mask in i915 driver

Could you help to test this patch instead?

The real problem here might be we failed to setup 36 bit dma mask properly.

thanks.
Comment 23 Jose Gardiazabal 2009-12-31 02:34:51 UTC
Created attachment 32383 [details]
dmesg with dma patch

it looks like it is not the 36 bit dma mask. Anyway, I'm attaching the whole dmesg output.
Comment 24 Jeremy Stanley 2010-01-01 17:40:41 UTC
Unfortunately, the "set dma mask in i915 driver" patch does not
solve my issue. The dmesg booting with this patch is basically
identical to the one I get when booting with Debian/sid's current
linux-image-2.6.32-686-bigmem package. I continue to see the
"nommu_map_sg: overflow XXXX00000+4096 of device mask ffffffff"
error followed by "Failed to bind AGP memory: -12" (and no working
DRM, obviously). I'm in no hurry for a fix, as I can still
adequately test non-PAE 2.6.32 x86 kernels for now (albeit without
access to all the RAM on this machine), but am happy to assist in
whatever way I can util you have access to a suitable machine with
4+GiB RAM.
Comment 25 Jose Gardiazabal 2010-01-02 00:33:10 UTC
Jeremy, try compiling your kernel with PAE enabled and DMAR disabled. I think it should work (in my case, 64 bit and DMAR disabled works).
Comment 26 Jeremy Stanley 2010-01-02 11:47:21 UTC
Confirmed that disabling DMA remapping in the kernel config also
works around this for x86+PAE.
Comment 27 Wang Zhenyu 2010-01-04 04:08:13 UTC
Created attachment 32430 [details] [review]
remove dma mask setting in drm_pci_alloc()

This is the refreshed patch after investigate the real cause of dma mask setting failure. Please help to verify it fixes your problem.
Comment 28 Jose Gardiazabal 2010-01-04 07:14:05 UTC
Dear Wang Zhenyu,
your patch works nicely! Thanks for your help!
If you need to try something else, just drop me an email.
Comment 29 Jeremy Stanley 2010-01-04 10:23:02 UTC
Booting an x86+PAE kernel with your new "remove dma mask setting in
drm_pci_alloc()" results in an interesting warning backtrace in
dmesg (and seems to break KMS):

   WARNING: at /tmp/linux-2.6-2.6.32/debian/build/source_i386_none/drivers/gpu/drm/drm_crtc_helper.c:1032 drm_helper_initial_config+0x33/0x4c [drm_kms_helper]()

I'll attach new dmesg and Xorg.0.log examples shortly.
Comment 30 Jeremy Stanley 2010-01-04 12:16:34 UTC
Created attachment 32442 [details]
dmesg booting 2.6.32 kernel without dma mask setting
Comment 31 Jeremy Stanley 2010-01-04 12:17:30 UTC
Created attachment 32443 [details]
Xorg.0.log under 2.6.32 kernel without dma mask setting
Comment 32 Jeremy Stanley 2010-01-04 12:20:24 UTC
Okay, actually, forget my last update. I was testing this remotely
from work on my HTPC at home, and these errors were probably because
the HDTV was switched to a different input or turned off altogether.
I'll have to retest this evening when I get home. Apologies for the
confusion.
Comment 33 Jeremy Stanley 2010-01-04 17:43:36 UTC
Okay, yes, the "remove dma mask setting in drm_pci_alloc()" patch
fixes the regression under Linux 2.6.32 x86+PAE on my 8GiB RAM
machine. The dmesg now looks esentially the same as it did for
2.6.31 (better, in fact, since it now picks a superior default
resolution). Thanks again!
Comment 34 Wang Zhenyu 2010-01-07 17:51:12 UTC
Patch is in Linus's tree. Close.
commit e6be8d9d17bd44061116f601fe2609b3ace7aa69
Author: Zhenyu Wang <zhenyu.z.wang@intel.com>
Date:   Tue Jan 5 11:25:05 2010 +0800

    drm: remove address mask param for drm_pci_alloc()
    
    drm_pci_alloc() has input of address mask for setting pci dma
    mask on the device, which should be properly setup by drm driver.
    And leave it as a param for drm_pci_alloc() would cause confusion
    or mistake would corrupt the correct dma mask setting, as seen on
    intel hw which set wrong dma mask for hw status page. So remove
    it from drm_pci_alloc() function.
    
    Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.