Summary: | MacBook Pro 5,1 with nVidia 9400m and 9600m, scrambled screen | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Joanand <MacGyver031> | ||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||
Severity: | major | ||||||
Priority: | medium | CC: | john, krestfallen, Markovics.matyas, pierre.morrow, redballoon36 | ||||
Version: | unspecified | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | other | ||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=27501 | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Joanand
2012-12-20 07:23:51 UTC
I have the exact same symptoms on kernel 3.7.4, and nouveau has been that way for many kernel versions. It's never worked with this system (9600M GT/9400M), whether in-tree or out. My setup is slightly different, as I'm booting with grub2, not the EFI stub, but the behavior is exactly the same: a scrambled, non-updating screen after the handover from efifb. I've never tried using vgaswitcheroo by typing blindly, though. This is still an issue with Linux 3.9.9 on my Macbook Pro 17" Early 2009 (5,2) According to the Wikipedia article: https://en.wikipedia.org/wiki/Macbook_pro , MBP 5,1 (15" late 2008/early 2009) and MBP 5,2 (17" early 2009) share the same graphics setup. MBP 5,1 5,3 5,4 5,5 (13", 15", 17" mid 2009) look the same too. I am also experiencing this issue. I am booting the kernel EFI stub using refind on my macbook 5,1. This model is fitted only with nvidia 9400M. I use nouveau.noaccel=1 as a boot parameter. This way I at least get a scrambled screen after the efifb handover. Otherwise the screen would just freeze. When I can see that the init script finished, I blindly log in and type pm-suspend. Than I press the power button to wake from suspend. Now, I am presented with a usable screen, but not accelerated. I can also run X. I am running Gentoo. Tried vanilla and git kernels in and out of tree without luck, from version 3.7.4 till 3.10.1 Clearly it is a handover problem. If there is any useful information I can provide let me know how. Created attachment 82879 [details]
Kernel messages with nouveau verbose debug enabled
I remembered that I had nouveau debug on already, so here are the kernel messages. Kernel: vanilla 3.10.0 from Gentoo Packages
Great find Matyas If you're interested in cutting short the sequence you can forcepost the card, thus it should have you the suspend/resume trick. Use nouveau.config=NvForcePost=1 Curious if the above will give you acceleration or only a working output/monitor (ie. try it with and without noaccel=1) Cheers (In reply to comment #5) > Great find Matyas > > If you're interested in cutting short the sequence you can forcepost the > card, thus it should have you the suspend/resume trick. Use > nouveau.config=NvForcePost=1 > > Curious if the above will give you acceleration or only a working > output/monitor > (ie. try it with and without noaccel=1) > > Cheers Hi, On my MBP, if I do not use noaccel=1, then the whole system crashes. I have tried NvForcePost=1 with noaccel, the result was a switched off screen, but system seems to boot. With accel, screen switches off and crashes. Is there any config which would activate accel for 9400 but no accel for 9600? This would be very helpful for me. Thanks. Hi Joanand (In reply to comment #0) > Resolution for kernels < 3.4.9: I made use of gpupwr to disable the discrete > adapter and then loaded nouveau (without any parameters). The system was > "programmed" to use 9400m as it starts. This worked quite fine over long > time. > "Worked" with our without acceleration ? > Temporary resolution for 3.7.1: Use MacOSX to change the default adapter to > the one which is "not" desired. Reboot, load nouveau with noaccel=1 (now > screen gets scrambled), switch to the other device (in my case "echo DIGD > > /sys/kernel/debug/vgaswitcheroo/switch"). Voila you have a readable screen. Did you had the chance to narrow down what caused the change (<3.4.9 vs 3.7.1) ? It may be due to nouveau, vgaswitcheroo and/or other kernel driver (In reply to comment #6) > (In reply to comment #5) > > Great find Matyas > > > > If you're interested in cutting short the sequence you can forcepost the > > card, thus it should have you the suspend/resume trick. Use > > nouveau.config=NvForcePost=1 > > > > Curious if the above will give you acceleration or only a working > > output/monitor > > (ie. try it with and without noaccel=1) > > > > Cheers > > Hi, > On my MBP, if I do not use noaccel=1, then the whole system crashes. > > I have tried NvForcePost=1 with noaccel, the result was a switched off > screen, but system seems to boot. > With accel, screen switches off and crashes. > > Is there any config which would activate accel for 9400 but no accel for > 9600? This would be very helpful for me. > > Thanks. There has been a brief discussion what is the best way to handle this (passing nouveau params to specific card, disabling certain card etc.) although implementation may be far off ;( Meanwhile add a hack for your card by checking the PCI and returning early rather than initialising the card - not sure which location is better nouveau_drm_probe or nouveau_drm_load. Keep in mind to keep is symmetric (ie. handle the case in nouveau_drm_remove/nouveau_drm_unload) Your code will look something similar to if ((pdev->bus == xx) && (pdev->dev == xx) && (pdev->func == xx)) { return 0; // you can also try return -E* } Cheers Emil Same/similar to bug 27501 ? Either way lets link both bugs (In reply to comment #7) > Hi Joanand Hi Emil > "Worked" with our without acceleration ? With GPUPWR, the discrete (nVidia 9600M GT) was no longer available. And as nouveau works with 9400 and acceleration, it worked with acceleration. > Did you had the chance to narrow down what caused the change (<3.4.9 vs > 3.7.1) ? It may be due to nouveau, vgaswitcheroo and/or other kernel driver 3.4.9 did not have a viable vgaswitcheroo, but 3.7+ did. GPUPWR did no longer help, as vgaswiteroo reactivated 9600m. So the sole solution was to use the nouveau driver without acceleration. > > There has been a brief discussion what is the best way to handle this > (passing nouveau params to specific card, disabling certain card etc.) > although implementation may be far off ;( > > Meanwhile add a hack for your card by checking the PCI and returning early > rather than initialising the card - not sure which location is better > nouveau_drm_probe or nouveau_drm_load. Keep in mind to keep is symmetric > (ie. handle the case in nouveau_drm_remove/nouveau_drm_unload) > > Your code will look something similar to > > if ((pdev->bus == xx) && > (pdev->dev == xx) && > (pdev->func == xx)) { > return 0; // you can also try return -E* > } > > Cheers > Emil Thanks for these pointers. I will try this hack on 3.10.2. I will report if I get any further. BR Joanand Thanks for the pointers Emil. I have tried the NvForcePost=1 configuration with noaccel=1, it only resulted in the backlight being bumped to 100%. My screen was still meessed up. Doing the suspend/resume trick fixed it. Without noaccel I get a system crash too. It would be nice to get acceleration. Joanand, am I understanding it correctly that you could get your 9400 with accel? If so was that prior to 3.7 kernels? (In reply to comment #10) > Joanand, am I understanding it correctly that you could get your 9400 with > accel? If so was that prior to 3.7 kernels? Kernel 3.4.9 and below has worked with gpupwr and nouveau WITH acceleration. gpupwr program deactivated 9600m graphics adapter and nouveau was unable to load the driver for 9600 (PCI xxxx has fallen off the bus, was the message). Hi Joanand, (In reply to comment #6) > Is there any config which would activate accel for 9400 but no accel for > 9600? This would be very helpful for me. You might also try, in nouveau_accel_init: if (device->chipset == 0x96) return; It works without nouveau.noaccel=1 and has no scrambled screen (at least for the GUI and the console, except for a tiny moment of full garbage (boot logo?) but it gets cleared away after), but it is unstable (hanged up some times at boot) and it spams a lot (more than 1600 lines) of nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT Strangely, when connecting an external monitor to the laptop (MacBook Pro mid 2009, same cards), the GUI isn't scrambled any more and the console still is, but in a "better way". I'll try to find why with an external monitor or with acceleration on the 9400, the handover goes well. There are two long bugs about this same issue, I'm giving the older one precedence. *** This bug has been marked as a duplicate of bug 27501 *** Hi Ilia, It seems to me bug 27501 is about being unable to boot, which is not the "main" problem here, but rather having a garbage screen after a successful boot. I'm bisecting it, and it seems it appeared between kernel 3.4 and 3.5-rc7. I'll post the full bisection here as soon as I can. Bisected to: commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e Author: Ben Skeggs <bskeggs@redhat.com> Date: Mon Apr 30 11:33:43 2012 -0500 drm/nouveau: create real execution engine for software object class Just a cleanup more or less, and to remove the need for special handling of software objects. This removes a heap of documentation on dma/graph object formats. The info is very out of date with our current understanding, and is far better documented in rnndb in envytools git. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> I'll try to look for a patch this week-end. I patched my kernel with this diff: http://pastebin.com/q3MVep1f to screen the 9400m from being initialized. this: http://pastebin.com/JMrbbFVA is the dmesg output. (search "jamie") with this patch, my command line is *not scrambled* when I use nouveau.noaccel=1, but crashes without that argument. When I screen the 9600m from being initialized in the same way instead, the console gets scrambled with nouveau.noaccel=1 and crashes without that argument. But, in the case of screening the 9400m, and using nouveau.noaccel=1, I cannot start X - it gives a "No screens found" error. Using that patched kernel, and trying to use the Nvidia proprietary driver fails as well with a "No screens found" error - but the Nvidia driver works when I use the unpatched kernel. And now I'm stuck .. any help? (In reply to comment #16) > But, in the case of screening the 9400m, and using nouveau.noaccel=1, I > cannot start X - it gives a "No screens found" error. Using that patched > kernel, and trying to use the Nvidia proprietary driver fails as well with a > "No screens found" error - but the Nvidia driver works when I use the > unpatched kernel. > > And now I'm stuck .. any help? Yes this is "normal". X tries to bind to the PCI with the lowest ID, in our case 9600M has ID=2 and 9400M has ID=3. So you will have to use BusID: Section "Device" Identifier "NOUVEAU" Driver "nouveau" BusID "PCI:03:00:0" Screen 0 EndSection Now if you are screening the 9400m, X should start without problem. BR. It took me some times, but here is a patch correcting commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e: --------------------------------------------------------------------------------- diff --git a/drivers/gpu/drm/nouveau/nouveau_software.h b/drivers/gpu/drm/nouveau/nouveau_software.h index fe30a8f..7adfcb9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_software.h +++ b/drivers/gpu/drm/nouveau/nouveau_software.h @@ -20,10 +20,10 @@ struct nouveau_software_chan { static inline void nouveau_software_vblank(struct drm_device *dev, int crtc) { - struct nouveau_software_priv *psw = nv_engine(dev, NVOBJ_ENGINE_SW); + struct drm_nouveau_private *dev_priv = dev->dev_private; struct nouveau_software_chan *pch, *tmp; - list_for_each_entry_safe(pch, tmp, &psw->vblank, vblank.list) { + list_for_each_entry_safe(pch, tmp, &dev_priv->vbl_waiting, vblank.list) { if (pch->vblank.head != crtc) continue; --------------------------------------------------------------------------------- (The above empty line is needed) However, the code was later modified, and the patch can't be applied on recent kernel; I'll try to get a new patch for it this week-end. Hi! Regarding suspend/resume scrambling screen issues, try this patch: [PATCH] drm/nouveau/fb: fix suspend/resume fbcon http://lists.freedesktop.org/archives/nouveau/2013-October/014656.html chr[] Here are some news about my small progresses. I found out why commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e introduced a bug: it would init the psw->vblank field only if acceleration is enabled even if it is used in both cases; calling nv50_software_create even with acceleration solves the problem. I reverted a few more commits, however I'm stuck on commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69 Author: Ben Skeggs <bskeggs@redhat.com> Date: Thu, 19 Jul 2012 22:17:34 +0000 It seems like the init issue was fixed, but the screen stays scrambled. After some testing, it seems some of the work done by nouveau_channel_new (which is only called when acceleration is enabled) is needed, and therefore also n84_fence_create, but I couldn't find which parts. When enabling nv84_fence_create and nouveau_channel_new for the NVAC card (boot hangs if enabling it for the NV96 card), nv50_disp_intr spams lots of nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT but it seems harmless, apart from getting a really big dmesg and, some times, hanging on boot. Booting with 'nouveau.accel=0 nouveau.modeset=0 3' results in a clean console mode, and running startx manually after boot will also give a clean GUI. Necessary part from nouveau_channel_new to get a clean screan are: - nouveau_channel_ind - nouveau_channel_init, but only the beginning: - vram creation - gart creation - dma variables initialisation There are some MEM_CACHE errors, but at least it boosts and screen is clean. I found out that nouveau_abi16_ioctl_channel_alloc was also calling nouveau_channel_new, but with other arguments for vram and gart, and it is called whether or not acceleration is enabled. Is there a specific reason to call at first nouveau_channel_new only when acceleration is enabled, and later on when starting the GUI, to always call it? Using commit e18833a518777e249b6badf54f65b37b741b6864 (http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=e18833a518777e249b6badf54f65b37b741b6864) fixes the issue (tested on Git HEAD and on 3.13.5). Thanks for the pointer Ilia! (In reply to comment #22) > Using commit e18833a518777e249b6badf54f65b37b741b6864 > (http://cgit.freedesktop.org/~darktama/nouveau/commit/ > ?id=e18833a518777e249b6badf54f65b37b741b6864) fixes the issue (tested on Git > HEAD and on 3.13.5). > Thanks for the pointer Ilia! Hi Pierre, I have tried the diff patch (3 changed lines) on Kernel 3.13.5-gentoo, and did a quick test on nVidia 9600m GT: The screen gets scrambled as soon as nouveau is loaded. At the startup, EFIFB is used and works with full screen resolution, but the colors are "incorrect". EFI is setup to use 9600m as default. I am still using nouveau without acceleration. I will be testing the patched Kernel on nVidia 9400m, by modifying EFI to use 9400m as default. Could you do a diff on your patched kernel and unpatched, so that we could find other differences? Thanks. BR Hi Joanand, My HEAD before applying the patch was commit 34d5950 (http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=drm-nouveau-next&id=34d595081812da62b5357579267c4ab5eae64ac1). The HEAD after the patch is just: 34d5950 + e18833a, nothing more. So I'm running with both cards enabled. I tried Ilias' patch (https://bugs.freedesktop.org/show_bug.cgi?id=27501#c27) to test each card alone, and I got: * NVAC only: corrupted screen (without e18833a) + acceleration not working; * NV96 only: no corrupted screen (even without e18833a) + acceleration working. I wonder why we do have quite different results for the NV96... Cheers, Pierre (In reply to comment #24) Hi Pierre, I have applied/changed these lines on driver/gpu/drm/nouveau/core/subdev/bar/nv50.c: - line 233: int ret, i; - line 351: for(i = 0; i < 8; i++) - line 352: nv_wr32(priv, 0x001900 + (i * 4), 0x00000000); Booting with both adapters enabled (9600 is the boot adapter), EFI-stub and EFIFB, loading nouveau without acceleration:leads to scrambled screen. The resolution is native 1440x900. Booting with both adapters enabled (9600 is the boot adapter) EFI-stub and EFIFB, loading nouveau with acceleration, screen/adapter is frozen. System seems to "hang"/lag. I am now trying the setup without EFIFB. Report back as I have tested it. BR. PS: Do you have a MacBook Pro 5,1 15.4"? Hi Joanand, Shouldn't the patch for driver/gpu/drm/nouveau/core/subdev/bar/nv50.c rather be + line 233: int ret, i; + line 351: for(i = 0; i < 8; i++) + line 352: nv_wr32(priv, 0x001900 + (i * 4), 0x00000000); By the way, on top of which commit/kernel are you applying the patch? Yeah, if you try with acceleration you end up with bug 27501 (https://bugs.freedesktop.org/show_bug.cgi?id=27501). EFIFB should not be the problem: it is removed by nouveau at some point, to be replaced by nouveaufb, which seems to be not rightly configured (or the accesses to it), bringing screen corruption. I have a 5,3 (iirc) MacBook Pro (mid 2009) 15.6", with the same graphic cards, resolution is also 1440x900. Pierre (In reply to comment #25) > Booting with both adapters enabled (9600 is the boot adapter), EFI-stub and > EFIFB, loading nouveau without acceleration:leads to scrambled screen. The > resolution is native 1440x900. > > Booting with both adapters enabled (9600 is the boot adapter) EFI-stub and > EFIFB, loading nouveau with acceleration, screen/adapter is frozen. System > seems to "hang"/lag. doesn't efifb wants to pick the 9400M? efifb pics the framebuffer base at 0xC0010000 from efifb.c: [M_MBP_5_1] = { "mbp5,1", 0xc0010000, 2048 * 4, 1440, 900 } these are the values for each gpu: 9400M: 0xC0010000 9600M GT: 0xB0030000 i tried nouveau yesterday and couldn't even load the kernel. with the nvidia driver it's even possible to switch to the desired gpu on boot. but with the 9400m screening and logging out afterwards and back in (no reboot) the screen also gets scrambled, slow and unstable. perhaps it's something with the gmux values or memory allocation!? I tested the drm-fixes branch from airlied's repo (http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes) and it works, I can boot with just nouveau.noaccel=1 without having any garbage screen. Grabbing any 3.15-rc* should also work. Without any parameters seems to be working sometimes, until I launch X, but this is another bug obviously. A fix went into kernel 3.15. If you're still experiencing this issue with kernel 3.15+, please reopen the bug report. is noaccel=1 still needed? As long as bug 27501 isn't fixed, noaccel=1 is still needed unfortunately. Pierres patch from <a href="https://bugs.freedesktop.org/show_bug.cgi?id=27501#c29">Bug 27501, comment 29</a> works on my system. Patch is tested on gentoo-sources 3.16.3. Next step is to check if Ilias patch works to deactivate 9600M on boot up. For the moment I am using gpupwr to shutdown 9600M. BR |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.