Created attachment 57574 [details] live dmesg before X was started When I start nouveau with 2D acceleration the framebuffer is unusable (typically displays frozen contents of the last shutdown), but the keyboard is responsive. Once X starts, the display is unreadable - anything to a tiled staircase picture to a partially solid grey screen, with no mouse cursor, and the keyboard locks up - no responsive from the caps/num/scroll locks keys, however it DOES respond to Magic SysRQ. The machine itself doesn't seem hard locked, though. The syslog is spammed with tons of message from the nouveau driver complaining about PFIFO_CACHE_ERROR, PFIFO_DMA_PUSHER: MEM_FAULT, INVALID_CMD, CALL_SUBR_ACTIVE, etc - varies wildly on each boot. Sometimes its only few errors, sometimes tons of them, but the end result is the same. Tried with the nouveau git tree on freedesktop 2/23. using nouveau.nofbaccel=1 clears up the framebuffer corruption, but the X display corruption/lockups still happen Display adapter: 00:05.0 VGA compatible controller: nVidia Corporation C51 [GeForce 6150 LE] (rev a2) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device 2a34 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fb000000 (64-bit, non-prefetchable) [size=16M] [virtual] Expansion ROM at c0000000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Kernel driver in use: nouveau Kernel modules: nouveau
Created attachment 57575 [details] Xorg logs
Created attachment 61012 [details] Picture from the framebugger after the nouveau drivers loads with acceleration Using the current (5/2/12) nouveau git -this is a picture from the framebuffer when nouveau loads and acceleration is enabled - it displays the shutdown screen from the previous boot and the display is frozen. It stay this way until either the driver is unloaded or the GPU hangs and the fbcon code switches back to software fbcon (dmesg displays "GPU lockup - switching to software fbcon").
Created attachment 61315 [details] VBOIS from card vios dump attached; also sent mmiotrace via e-mail.
After experimenting with a few kernels, acceleration works in the framebuffer and X normally on 32-bit (x86) kernels; but not on 64-bit (amd64) kernels.
Created attachment 61969 [details] 64-bit dmesg Here's the "bad" 64-bit dmesg from kernel 3.4.0
Created attachment 61970 [details] 32-bit dmesg This is the "good" 32-bit dmesg for comaprison (both dmesg have drm.debug=0x06)
After playing with kernel command line, I've found the problem does not occur in a 64-bit kernel if mem=2G is added the kernel command line (the machine in question has 3G of RAM). The higher the mem above that, the great the chance of corruption - its hard to tell exactly when its an issue because its intermittent - sometimes it work, sometimes it doesn't, (So far, 5 attempts at 2G no failures on either X for the framebuffer, whereas one start at 2G + 80M worked but the second time it did not)
I experience something similar on my notebook. - AMD Turion CPU; 3GiB RAM; nVidia GeForce Go 6100 video controller - was working fine with 64 bit Ubuntu 10.04 using nv X video driver - problems on 64 bit Ubuntu 12.04 with nouveau driver (VESA driver works but does not support native resolution) Symptoms are various. Simplest, on fully updated system: LightDM seems to work to allow login but Unity desktop does not show up except for background. dmesg shows a lot of messages like this (and variants): [drm] nouveau 0000:00:05.0: PFIFO_DMA_PUSHER - Ch 3 Get 0x01256038 Put 0x011c60b0 State 0x4ffe0004 (err: INVALID_MTHD) Push 0x00000000 /var/log/Xorg.0.log shows lots of mayhem after this message: [mi] EQ overflowing. Additional events will be discarded until existing events are processed. Adding mem=2g to kernel seems to fix the problem! Thanks Salah for this discovery! Thanks xexaxo on #nouveau for recognizing my problem and pointing me here! I will attach logs.
Created attachment 63133 [details] Hugh's dmesg + lspci + /proc/modules
Created attachment 63134 [details] Hugh's Xorg.0.log
Created attachment 63135 [details] Hugh's mem=2g dmesg + lspci -v + /proc/modules
Created attachment 63136 [details] Hugh's mem=2g Xorg.0.log
Created attachment 63137 [details] [review] ugly workaround Does this patch help too? It's still a workaround, but it doesn't lose memory above 2GB.
Marcin: Thanks for the proposed patch. I'm in "dumb Ubuntu user mode". Would testing your patch be valuable to the cause, valuable enough for me to learn how to learn how to rebuild Ubuntu kernels, with patches? (I've rebuilt CentOS kernels and long ago built kernel.org kernels, but not debian or Ubuntu kernels.) Is there a good test for "broken PCI/AGP" hardware? I take it that there is a lot broken at the 4G boundary but you suspect mine is broken at the 2G boundary. Possibly relevant factoid: the notebook is speced to accept 4G of RAM but won't with this BIOS (the latest). It will accept 4G with an older BIOS. The manufacturer (Acer) does not accept that this is a defect. I had guessed (based on no evidence) that there was a sign-extension bug in the nouveau code. That guess was based on the apparent fact that a 32-bit kernel worked. How do you distinguish hardware vs software bug? Your patch should bypass either. I would have thought that a kernel parameter to set MAX_DMA32_PFN might be useful.
Thinking some more. Some inconclusive evidence that this is not a hardware bug: 64-bit Ubuntu worked fine on it with 3G of RAM. So to does the MS Windows Vista. This surely included DMAing into the high 1G by disk I/O, DVD writer I/O, and video driver I/O. What's new is nouveau. Perhaps it uses part of the video controller that nv and Vista do not, part that does defective DMA, but that isn't obvious to me. (Note: I'm using "DMA" in the computer architecture sense, not the IBM PC clone sense.)
Sorry, I said my notebook had 3G of RAM. I misremembered. It has 2.25G.
The above "workaround" works. Framebuffer is OK, X is good, glxgears runs without having to specify mem=2G (this is against the nouveau git). I think it is buggy hardware - its just the blob and Windows driver know about it and only do 31-bit DMA (or maybe they just get lucky). Attempting to set dma_bits=31 in nouveau_vram_init cause nouveau_sgdma_init to mail to map the page, and attempting to allocate a suitable page using pci_alloc_consistent / dna_alloc_coherent or alloc_page GFP_DMA flag causes BUGs and paging faults. If I specify if I specify both nouveau.vram_pushbuf=1 AND nouveau.vram_notify=1 (just one alone does not work), it "soft of" works without mem=2G. The framebuffer is OK. X is still distorted but not as badly and isn't locked up, and the dmesg is no longer filled with errors, but glxgears does not work (it doesn't crash but just shows crazy flashing triangles).
Salah: If the limitation were in the hardware, why would the kernel arch (32 bit vs 64 bit) make a difference? By the time that addresses get to the PCI bus, the architecture should make no difference.
*** Bug 54988 has been marked as a duplicate of this bug. ***
I talked to Ben about this bug at XDC2012 and he told me we are using nv04-style virtual memory interface, because of some then unknown bugs in nv4x implementation - and this is probably the reason why you are seeing this bug. Since XDC, Ben fixed and enabled nv4x-style virtual memory in nouveau.git, so please test it!
Created attachment 68024 [details] dmesg of new kernel It runs FAR better - the distortion and lockups are gone, and the picture us substantially better (on par with the blob). Its not quite 100% - X crashes reliably if I switch to another VT, back to X, and then use something with video acceleration (after it crashes and restart its distored like before, but not locked up. If I apply the workaround in bug 31961, X still crashes but restarts withotu distortion) Regardless of console switching, CACHE_ERROR start flooding the syslog but not immedately, but there no noticable artfacts or slowdown.
For my system (reported Bug 54988) it starts up now and works for a RAM size of 3G. But after logging into the gnome desktop the screen becomes blurry and and while trying to open any gnome menu, the screen gets scrambled now. This is even a problem when i boot with mem=2G - this used to work with the current version of the fedora driver in kernel 3.5.4. Seems the DMA problem is gone, however some new bugs are showing up :-) System Description is still the same as 54988. I am running a current fedora kernel,patched with nouveau from 2nd October. Linux version 3.5.4-2.localnouveau.fc17.x86_64 (root@baldur) (gcc version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC) ) #1 SMP Wed Oct 3 11:36:49 CEST 2012
here is the output for nouveau from dmesg [ 1.098755] nouveau 0000:00:05.0: >setting latency timer to 64 [ 1.099370] nouveau [ DEVICE][0000:00:05.0] BOOT0 : 0x04e000a2 [ 1.099374] nouveau [ DEVICE][0000:00:05.0] Chipset: C51 (NV4E) [ 1.099377] nouveau [ DEVICE][0000:00:05.0] Family : NV40 [ 1.100226] nouveau [ VBIOS][0000:00:05.0] checking PRAMIN for image... [ 1.136693] nouveau [ VBIOS][0000:00:05.0] ... appears to be valid [ 1.136696] nouveau [ VBIOS][0000:00:05.0] using image from PRAMIN [ 1.136922] nouveau [ VBIOS][0000:00:05.0] BIT signature found [ 1.136925] nouveau [ VBIOS][0000:00:05.0] version 05.51.22.33 [ 1.137122] nouveau [ PFB][0000:00:05.0] RAM type: stolen system memory [ 1.137125] nouveau [ PFB][0000:00:05.0] RAM size: 32 MiB [ 1.789533] nouveau [ DRM] VRAM: 29 MiB [ 1.789540] nouveau [ DRM] GART: 512 MiB [ 1.789546] nouveau [ DRM] BIT BIOS found [ 1.789550] nouveau [ DRM] Bios version 05.51.22.33 [ 1.789554] nouveau [ DRM] TMDS table version 1.1 [ 1.789557] nouveau [ DRM] DCB version 3.0 [ 1.789560] nouveau [ DRM] DCB outp 00: 02000300 00000023 [ 1.789563] nouveau [ DRM] DCB outp 01: 03011312 00000000 [ 1.789566] nouveau [ DRM] DCB outp 02: 020023f1 0040c080 [ 1.789569] nouveau [ DRM] DCB conn 00: 0000 [ 1.789572] nouveau [ DRM] DCB conn 01: 0131 [ 1.789575] nouveau [ DRM] DCB conn 02: 0210 [ 1.789577] nouveau [ DRM] DCB conn 03: 0211 [ 1.789580] nouveau [ DRM] DCB conn 04: 0213 [ 1.791153] nouveau [ DRM] 0xD186: Parsing digital output script table [ 1.841924] nouveau [ DRM] 1 available performance level(s) [ 1.841930] nouveau [ DRM] 0: core 475MHz shader 475MHz fanspeed 100% [ 1.841932] nouveau [ DRM] c: [ 1.843560] nouveau [ DRM] MM: using M2MF for buffer copies [ 1.843567] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 1.843570] nouveau [ DRM] Setting dpms mode 3 on tmds encoder (output 1) [ 1.843574] nouveau [ DRM] Setting dpms mode 3 on TV encoder (output 2) [ 1.878032] nouveau [ DRM] Load detected on output B [ 1.892156] nouveau [ DRM] allocated 1024x768 fb: 0x9000, bo ffff880036fed400 [ 1.892270] fbcon: nouveaufb (fb0) is primary device [ 1.902714] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 1.902716] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 1.903846] fb0: nouveaufb frame buffer device [ 1.903853] [drm] Initialized nouveau 1.1.0 20120801 for 0000:00:05.0 on minor 0 [ 1.980036] nouveau [ DRM] Load detected on output B [ 2.081584] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 2.101978] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 2.101986] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 36.584586] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 36.604973] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 36.604978] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 37.232622] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 37.253031] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 37.253038] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 37.268629] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 37.289024] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 37.289028] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 37.381346] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 37.401716] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 37.401720] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 42.247520] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 42.267901] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 42.267906] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 55.168028] nouveau [ DRM] Load detected on output B [ 55.185027] nouveau [ DRM] Load detected on output B [ 58.741447] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 58.761829] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 58.761833] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 59.054034] nouveau [ DRM] Load detected on output B [ 62.834059] nouveau [ DRM] Load detected on output B [ 63.022824] nouveau [ DRM] Load detected on output B [ 67.077190] nouveau E[ DRM] fail ttm_validate [ 67.077198] nouveau E[ DRM] validate vram_list [ 67.077208] nouveau E[ DRM] validate: -12 [ 67.137161] nouveau E[ DRM] fail ttm_validate [ 67.137169] nouveau E[ DRM] validate vram_list [ 67.137175] nouveau E[ DRM] validate: -12 [ 77.311034] nouveau [ DRM] Load detected on output B [ 80.197045] nouveau [ DRM] Load detected on output B [ 80.290070] nouveau [ DRM] Load detected on output B [ 87.611597] nouveau E[ DRM] fail ttm_validate [ 87.611605] nouveau E[ DRM] validate vram_list [ 87.611611] nouveau E[ DRM] validate: -12 [ 87.637939] nouveau E[ DRM] fail ttm_validate [ 87.637946] nouveau E[ DRM] validate vram_list [ 87.637950] nouveau E[ DRM] validate: -12 [ 211.914453] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 211.934857] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 211.934862] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 256.428249] nouveau [ DRM] Setting dpms mode 3 on vga encoder (output 0) [ 256.448634] nouveau [ DRM] Setting dpms mode 0 on vga encoder (output 0) [ 256.448638] nouveau [ DRM] Output VGA-1 is running on CRTC 0 using output B [ 256.481072] nouveau [ DRM] Load detected on output B
(In reply to comment #23) > here is the output for nouveau from dmesg > > ... > [ 1.098755] nouveau 0000:00:05.0: >setting latency timer to 64 > ... > [ 67.077190] nouveau E[ DRM] fail ttm_validate > [ 67.077198] nouveau E[ DRM] validate vram_list > [ 67.077208] nouveau E[ DRM] validate: -12 (ENOMEM) > ... You have allocated only 32MB of RAM for the GPU Try bumping it to 128 or 256MB it should resolve your issue
Created attachment 69231 [details] dmesg|egrep -i 'drm|agp|fb' I regularly encounter a similar trace as in the first attachment (attachment 57574 [details]) since I went from 1.5G RAM to 4G RAM using an NV34 [GeForce FX 5200]. (linux 3.6.0, xf86-video-nouveau 1.0.2) It happens almost every day, by the end of the day, always during (basic) graphical operation (eg: open a PDF viewer)
Judging from the errors, I'd say it can't look up the handle it created. Diving into the old dma implementation seems The handles for vram and gart could not be looked up, so guessing an invalid entry was used. Does setting dma_bits = 32 inside drivers/gpu/drm/nouveau/core/subdev/vm/nv44.c help? The old nouveau driver seemed to have commented out the part about 39-bits support for cards < nv50.
Created attachment 70841 [details] nouveau CALL_SUBR_ACTIVE errors using dma_bits=32 kernel on NV34 I tried your suggestion about setting dma_bits to 32 inside drivers/gpu/drm/nouveau/core/subdev/vm/nv44.c. but sadly the same issue arises (dmesg attached). I hope I'm still wise to post those traces (from my NV34) in this bug report and hope the root cause is common. I currently use the drm kernel module with debug=2, debug=3 dumps too much output, but let me know if this can provide additional useful info.
I should add that I've no problem with 3GB. Problem arises when I add 1 more GB.
Created attachment 72201 [details] [review] limit vm size to 31 bits (nv04-nv40,nv45) Ok, original Salah's issue seems to be fixed. Xorg crashes and CACHE_ERRORs look like separate bugs - please open new bug reports for them (note that for CACHE_ERRORs I advise running nouveau git kernel with http://lists.freedesktop.org/archives/nouveau/2012-December/011780.html). Raphaël Droz's: you have nv34, so changing something in *nv44.c* obviously won't fix anything for you... Does the above patch help?
As of kernel 3.7, xorg-1.13, nouveau DDX 1.0.4, mesa-9.0 all the errors related to ths bug are gone for me - no distortion, no crashes, and no CACHE_ERROR, even after switching VT and running accelerated programs for over a week.
I switched to 3.7.0 and I can't reproduce either. All seems stable with 4GB. I'm confident, but I may need to do longer testing. Note that I can consistently throw "nouveau: ib channel create, -22" messages (eg: each time I run glxgears) but they seem harmless (and maybe even unrelated)
Heh, you probably were experiencing different bug. "ib channel create" messages are not errors - if you turn debugging off you won't see them again. I'm changing status of this bug to RESOLVED FIXED.
Created attachment 72253 [details] nouveau CALL_SUBR_ACTIVE errors unpatched 3.7.0 oops, I spoke too soon. It just happened again with an unpatched 3.7 kernel. I'll come back later after testing your patch, heavily.
Created attachment 75747 [details] netconsole log of a NV34 crash when mem > 3GB Finally I took some time to seriously (netconsole) dig "how" it crashes when I boot using my 4th memory module. trace attached
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.