Description
René Krell
2015-05-25 07:16:54 UTC
For completeness I just got to mention another time (because it has been mentioned in another issue): The nouveau driver worked _before_ kernel 3.19.x on the same hardware and stopped to work beginning from 3.19.x. The problem reported in this issue applies to the kernel after commit 4195f40685a5f2783b4decece13ed740b61ee038. (In reply to René from comment #1) > For completeness I just got to mention another time (because it has been > mentioned in another issue): > The nouveau driver worked _before_ kernel 3.19.x on the same hardware and > stopped to work beginning from 3.19.x. Did you mean for one of those version numbers to be different? Can you bisect between a working and non-working kernel to identify the issue? (Is it the same commit as pointed out in #89047?) Please post logs from a working kernel, as well as your VBIOS image (/sys/kernel/debug/dri/0/vbios.rom from a successful nouveau load). Alright, trying to break it down regarding HP ZBook 15 and the nouveau driver: - I'm currently on 3.16.7 and have a working nouveau driver. I will provide the needed attachments for exactly that one. - Kernel 3.18: This is last working kernel I found. - Kernel 3.19: The issue #89047 git introduced, which I could reproduce, and which has been apparently fixed by Ben in commit 4195f40685a5f2783b4decece13ed740b61ee038. I mentioned it just to ensure this report here is not the same issue to not mark it duplicate. - Kernel 4.1 RC4: This is a kernel definitely containing the fix committed above and therefore the next stage to be tested for me. In kernel 4.1 RC4 I get the problem reported here, which is different from that in issue #89047. (In reply to René from comment #3) > Alright, trying to break it down regarding HP ZBook 15 and the nouveau > driver: > - I'm currently on 3.16.7 and have a working nouveau driver. I will provide > the needed attachments for exactly that one. > - Kernel 3.18: This is last working kernel I found. > - Kernel 3.19: The issue #89047 git introduced, which I could reproduce, and > which has been apparently fixed by Ben in commit > 4195f40685a5f2783b4decece13ed740b61ee038. I mentioned it just to ensure this > report here is not the same issue to not mark it duplicate. > - Kernel 4.1 RC4: This is a kernel definitely containing the fix committed > above and therefore the next stage to be tested for me. In kernel 4.1 RC4 I > get the problem reported here, which is different from that in issue #89047. Right, I realize it's a different issue... but could have been caused by the same commit :) Assuming nothing obvious comes to mind when you post the other info, doing a bisect between 3.18 and 3.19 will be the surest way to get this resolved. Created attachment 116047 [details]
/sys/kernel/debug/dri/0/vbios.rom loaded on kernel 3.16.7
Created attachment 116048 [details]
/var/log/boot.msg on kernel 3.16.7
Interesting. So the VBIOS you uploaded has: 0xce73: 71 DONE However the error print from the more recent kernel says "unknown opcode 0xff". That means it's getting read in wrong somehow. Can I provide something else or help somehow? I'm not a nouveau driver expert, but I can help in testing and reproducing, just need a HOWTO from some points :-) Created attachment 116049 [details]
/sys/kernel/debug/dri/0/vbios.rom loaded on kernel 3.18.10
Created attachment 116051 [details]
/var/log/boot.msg on kernel 3.18.10
I attached also vbios.rom and boot.msg for kernel 3.18.10, which is the most recent working kernel version I currently know for this hardware. In the future, please avoid gzippling logs. That makes me have to download them instead of being able to read them via the web interface. Created attachment 116053 [details]
/var/log/boot.omsg on kernel 4.1rc4
Any progress on this bug? Any missing information? I still get the same thing on a HP ZBook 15, too: août 30 23:33:05 eyak kernel: nouveau 0000:01:00.0: enabling device (0004 -> 0007) août 30 23:33:05 eyak kernel: nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x108390a1 août 30 23:33:05 eyak kernel: nouveau [ DEVICE][0000:01:00.0] Chipset: GK208 (NV108) août 30 23:33:05 eyak kernel: nouveau [ DEVICE][0000:01:00.0] Family : NVE0 août 30 23:33:05 eyak kernel: nouveau [ VBIOS][0000:01:00.0] using image from ACPI août 30 23:33:05 eyak kernel: nouveau [ VBIOS][0000:01:00.0] BIT signature found août 30 23:33:05 eyak kernel: nouveau [ VBIOS][0000:01:00.0] version 80.28.52.00.09 août 30 23:33:05 eyak kernel: nouveau W[ VBIOS][0000:01:00.0] DCB header validation failed août 30 23:33:05 eyak kernel: nouveau W[ VBIOS][0000:01:00.0] DCB header validation failed août 30 23:33:05 eyak kernel: nouveau [ DEVINIT][0000:01:00.0] adaptor not initialised août 30 23:33:05 eyak kernel: nouveau [ VBIOS][0000:01:00.0] running init tables août 30 23:33:05 eyak kernel: nouveau E[ VBIOS][0000:01:00.0] 0xd075[0]: unknown opcode 0xb1 août 30 23:33:05 eyak kernel: nouveau E[ DEVINIT][0000:01:00.0] init failed, -22 août 30 23:33:05 eyak kernel: nouveau E[ DRM] failed to create 0x00000080, -22 août 30 23:33:05 eyak kernel: nouveau: probe of 0000:01:00.0 failed with error -22 Note that the "0xd075[0]: unknown opcode 0xb1" varies. The previous boot had "0xce73[0]: unknown opcode 0x00" And I did not upgrade the BIOS nor the kernel (4.2.0-rc8 from Debian). Update: The nouveau driver still doesn't work on my ZBook 15 using kernel 4.2.3. There has been a refactoring of the Nouveau source code in 4.3, but the relevant files seems to be left untouched. *** Bug 91402 has been marked as a duplicate of this bug. *** Could one of you post an acpidump from the laptop? Specifically interested in how the _ROM method is defined. (In reply to Ilia Mirkin from comment #17) > Could one of you post an acpidump from the laptop? Specifically interested > in how the _ROM method is defined. I can do that. Just two questions: - Can I leave the native nvidia driver loaded (my workaround for the broken nouveau driver) when running acpidump or do I have to uninstall it an reactivate nouveau for this? - Some special command line options to acpidump? Created attachment 119333 [details]
acpidump -b
(In reply to René from comment #18) > (In reply to Ilia Mirkin from comment #17) > > Could one of you post an acpidump from the laptop? Specifically interested > > in how the _ROM method is defined. > > I can do that. Just two questions: > - Can I leave the native nvidia driver loaded (my workaround for the broken > nouveau driver) when running acpidump or do I have to uninstall it an > reactivate nouveau for this? Running blob is fine, the ACPI tables are invariant of anything the OS does. > - Some special command line options to acpidump? Just run 'acpidump' and save the output (e.g. acpidump > zbook15.acpi) (In reply to Ortwin Glück from comment #19) > Created attachment 119333 [details] > acpidump -b Sorry, I don't know what to do this. Can you get me the output without -b? These utilities are very finicky. Really I want to read the _ROM function and maybe a few others... normally I do acpidump, then acpixtract, then iasl -d. These tools don't like your output. Created attachment 119334 [details]
acpidump (HP ZBook 15, nVidia GK208GLM [Quadro K610M], BIOS v31/03/2015)
Ok, here you are. There is already another acpidump.
Created attachment 119336 [details]
acpidump (HP ZBook 15, nVidia GK106GLM [Quadro K2100M], BIOS v08/13/2014)
OK, here hexdump version. Slightly different hardware than René's.
Looks like this bios hard-codes a 4K return size for each bios chunk fetch. There is a "fast" and a "slow" method in nouveau... the fast one grabs it all in one go, while the "slow" one does it 4K at a time. In drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c, try commenting out the line { 0, &nvbios_acpi_fast }, For some reason we're not detecting that it's fetching a bad bios... Yes that fixes it for me. Tested with 4.3.0-rc6. The slow version really is slower and causes a noticable delay during booting, but I don't really mind: work laptop boots only once a day. FTR, the _ROM function from both dumps (same thing): Method (_ROM, 2, NotSerialized) // _ROM: Read-Only Memory { Local0 += Arg0 = (VRMB (0x04) + Local0) Local1 = (VRMS () - 0x04) If ((Arg0 < Local1)) { OperationRegion (OROM, SystemMemory, Local0, 0x1000) Field (OROM, AnyAcc, NoLock, Preserve) { R4KB, 32768 } Return (R4KB) /* \_SB_.PCI0.PEGP.DGFX._ROM.R4KB */ } Else { Local0 = Buffer (0x01) { 0x00 /* . */ } Return (Local0) } } (In reply to Ilia Mirkin from comment #26) > FTR, the _ROM function from both dumps (same thing): > ... > 0x00 > /* . */ > ... Shouldn't this be also reported to HP? What do you think? (In reply to René Krell from comment #27) > (In reply to Ilia Mirkin from comment #26) > > FTR, the _ROM function from both dumps (same thing): > > ... > > 0x00 > > /* . */ > > ... > > Shouldn't this be also reported to HP? What do you think? This bit is fine, it handles the out-of-bounds case. The sad bit is OperationRegion (OROM, SystemMemory, Local0, 0x1000) Which hard-codes a 4K-sized region instead of taking it from Arg1. Feel free to give your contact at HP a call :) (In reply to Ilia Mirkin from comment #28) > ... Feel free to give your contact at HP a call :) I will refer to this issue to not spread rumours :-) (In reply to Ilia Mirkin from comment #28) > (In reply to René Krell from comment #27) > > (In reply to Ilia Mirkin from comment #26) > > > FTR, the _ROM function from both dumps (same thing): > > > ... > > > 0x00 > > > /* . */ > > > ... > > > > Shouldn't this be also reported to HP? What do you think? > > This bit is fine, it handles the out-of-bounds case. The sad bit is > > OperationRegion (OROM, SystemMemory, Local0, 0x1000) > > Which hard-codes a 4K-sized region instead of taking it from Arg1. Feel free > to give your contact at HP a call :) Actually judging from the comments in nouveau, the spec calls for precisely the behavior they implement. However most manufacturers just expose the full bios anyways, not a 4K-sized chunk. The delays during boot are not caused by the bios shadow code but by this: [ 5.878439] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: 1, ctxsw: 0, busy: 1) [ 7.877714] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: 1, ctxsw: 0, busy: 1) [ 9.876992] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: 1, ctxsw: 0, busy: 1) [ 11.876268] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: 1, ctxsw: 0, busy: 1) So I think you could simply scrap the "fast" version and always use the 4K "slow" version for compatibility. (In reply to Ortwin Glück from comment #31) > The delays during boot are not caused by the bios shadow code but by this: > [ 5.878439] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: > 1, ctxsw: 0, busy: 1) > [ 7.877714] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: > 1, ctxsw: 0, busy: 1) > [ 9.876992] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: > 1, ctxsw: 0, busy: 1) > [ 11.876268] nouveau E[ PGRAPH][0000:01:00.0] wait for idle timeout (en: > 1, ctxsw: 0, busy: 1) Try nouveau.config=War00C800_0=1 which will enable yet-another workaround. > > So I think you could simply scrap the "fast" version and always use the 4K > "slow" version for compatibility. The old code was able to make the fallback just fine, apparently (since it broke when the rewrite happened). The new code should fall back as well, but... doesn't. Very odd. (In reply to Ilia Mirkin from comment #32) > nouveau.config=War00C800_0=1 Doesn't help, but don't worry now - different issue. > The old code was able to make the fallback just fine, apparently (since it > broke when the rewrite happened). True. I think the fast/slow versions were also present in the old code. (In reply to Ortwin Glück from comment #33) > (In reply to Ilia Mirkin from comment #32) > > nouveau.config=War00C800_0=1 > > Doesn't help, but don't worry now - different issue. Hmmm... did you see a "hw bug workaround enabled" line? Are you using Linux 4.3? [4.2 and older didn't have the workaround logic.] (In reply to Ilia Mirkin from comment #34) > Hmmm... did you see a "hw bug workaround enabled" line? Are you using Linux > 4.3? [4.2 and older didn't have the workaround logic.] Ok works fine in 4.3! $ dmesg | grep workaround [ 4.082516] nouveau 0000:01:00.0: pmu: hw bug workaround enabled [ 4.194475] nouveau 0000:01:00.0: pmu: hw bug workaround enabled I suffer from the same bug on HP ZBook. However, with the last Debian kernel (ie plain 4.3), I still got an 'error -22' and no "pmu: hw bug workaround enabled" log: $ dmesg | grep nouveau [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.3.0-trunk-amd64 root=/dev/mapper/eyak-root ro nouveau.config=War00C800_0=1 apparmor=1 security=apparmor quiet [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.3.0-trunk-amd64 root=/dev/mapper/eyak-root ro nouveau.config=War00C800_0=1 apparmor=1 security=apparmor quiet [ 4.459102] nouveau 0000:01:00.0: enabling device (0000 -> 0003) [ 4.459126] nouveau 0000:01:00.0: NVIDIA GK208 (108390a1) [ 4.462991] nouveau 0000:01:00.0: bios: version 80.28.52.00.09 [ 4.462994] nouveau 0000:01:00.0: mxm: BIOS version 3.0 [ 4.462995] nouveau 0000:01:00.0: bios: unknown ddc map v00 [ 4.463695] nouveau 0000:01:00.0: devinit: 0xce73[0]: unknown opcode 0x00 [ 4.463699] nouveau 0000:01:00.0: preinit failed with -22 [ 4.463718] nouveau: DRM:dddddddd:00000080: init failed with -22 [ 4.464046] nouveau: probe of 0000:01:00.0 failed with error -22 I will put in attachment the result of acpidump on my machine. Regards, Vincent (In reply to vdanjean@free.fr from comment #36) > I suffer from the same bug on HP ZBook. However, with the last Debian kernel > (ie plain 4.3), I still got an 'error -22' and no "pmu: hw bug workaround > enabled" log: You need to manually remove the "fast" acpi method. See my instructions in comment 24. Created attachment 119456 [details]
acpidump (HP ZBook 15, NVidia GK208GLM [Quadro K610M], BIOS v21/07/2015)
I was thinking that the "nouveau.config=War00C800_0=1" workaround was enough with a plain 4.3 kernel. So, I will try with your patch (but this will wait for tomorrow as I will need to recompile the kernel). Thanks Vincent (In reply to vdanjean@free.fr from comment #39) > I was thinking that the "nouveau.config=War00C800_0=1" workaround was enough > with a plain 4.3 kernel. No, that will do nothing for you -- that's only for GK104/GK106/GK107 boards. Yours is a GK208. I just want to report that the fix work also for me : $ xrandr --listproviders Providers: number : 2 Provider 0: id: 0x8d cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 4 outputs: 3 associated providers: 0 name:Intel Provider 1: id: 0x66 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 4 outputs: 3 associated providers: 0 name:nouveau I still have ACPI warnings when enabling or disabling the NVidia card but it seems to work nevertheless: [23693.512790] ACPI Warning: \_SB_.PCI0.PEGP.DGFX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150818/nsarguments-95) [23693.513206] ACPI: \_SB_.PCI0.PEGP.DGFX: failed to evaluate _DSM [23693.513209] ACPI Warning: \_SB_.PCI0.PEGP.DGFX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150818/nsarguments-95) Thank you very much Vincent I should be receiving hardware with an issue similar to this during the week. Hopefully the cause is the same and I can come up with a fix the lets us keep the "fast" method, as it saves quite a lot of boot time on (particularly) the laptop I primarily use :) Just for the record: I dropped a support question to HP regarding this issue: http://h30434.www3.hp.com/t5/Notebook-Display-and-Video/ZBook-15-BIOS-and-nouveau-incompatibility/m-p/5334943 The BIOS vendor should probably know this. Please let me know if there is any inaccuracy in the description. http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=fcd74e81e65aee8a2a33bdca3142a5358dac7582 Does this patch help? The last patch help. I tried it when I upgraded my kernel and I checked it works. Without it, nouveau was failing at loading time With it, I can use my NVidia card together with the Intel one. Note that I tried the new kernel from Debian (package 4.3-1~exp2), it was less stable that 4.3-1~exp1. So I removed it, reinstall and repatch the 4.3-1~exp1 Debian kernel. So, I can also certify that the new patch also works for the previous kernel where I applied the previous patch Will this patches go into the main kernel driver? (In reply to René Krell from comment #46) > Will this patches go into the main kernel driver? It got merged in kernel 4.4-rc3 [1]. Please test and report when successful. [1] http://lkml.iu.edu/hypermail/linux/kernel/1511.3/03588.html (In reply to Roy from comment #47) > (In reply to René Krell from comment #46) > > Will this patches go into the main kernel driver? > > It got merged in kernel 4.4-rc3 [1]. Please test and report when successful. > > [1] http://lkml.iu.edu/hypermail/linux/kernel/1511.3/03588.html There were tow patches mentioned by Ilja: - http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=fcd74e81e65aee8a2a33bdca3142a5358dac7582 (This got merged as "drm/nouveau/bios: return actual size of the buffer retrieved via _ROM"). - https://bugs.freedesktop.org/show_bug.cgi?id=90626#c24 - removing the fast loading option (As far as I understand this one has been also necessary to make the HP ZBook 15 work. I can't see this officially. Is it obsolete or should just owners of that special hardware apply this patch?) (In reply to René Krell from comment #48) > (In reply to Roy from comment #47) > > (In reply to René Krell from comment #46) > > > Will this patches go into the main kernel driver? > > > > It got merged in kernel 4.4-rc3 [1]. Please test and report when successful. > > > > [1] http://lkml.iu.edu/hypermail/linux/kernel/1511.3/03588.html > > There were tow patches mentioned by Ilja: > - > http://cgit.freedesktop.org/~darktama/nouveau/commit/ > ?id=fcd74e81e65aee8a2a33bdca3142a5358dac7582 (This got merged as > "drm/nouveau/bios: return actual size of the buffer retrieved via _ROM"). > - https://bugs.freedesktop.org/show_bug.cgi?id=90626#c24 - removing the fast > loading option (As far as I understand this one has been also necessary to > make the HP ZBook 15 work. I can't see this officially. Is it obsolete or > should just owners of that special hardware apply this patch?) Comment #45 seems to indicate that the patch carried upstream should be sufficient to fix this bug, but please test and report when successful. Hi, As requested, I can confirm that upstream kernel works (in my case, these are Debian packaged kernels). I cannot tell exactly from which version it works, but 4.4-rc3 looks possible to me. The Debian bug report (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772716) marks it fixed in the 4.3.1-1 version (i.e. the patch was probably backported to the stable series) Recently (kernel > 4.5.x), the nouveau kernel driver is not automatically loaded at startup, so I need to run "modprobe nouveau" manually. But this is another bug that I did not investigate (nor report) for now. For me, this bug can be closed. Regards, Vincent I was also able to reactivate nouveau on a HP ZBook 15 G2, OpenSUSE Tumbleweed 20160530, kernel 4.5.4-1-default and start X and KDE Plasma 5 on it. The patch seems to work fine. There are some different issues in several applications, I'll try to address them separately. Thank you all. Just for the record: I'm now affected by https://bugs.freedesktop.org/show_bug.cgi?id=95054 on the same hardware (HP ZBook 15 G2). nouveau is freezing in accelerating Plasma 5 Desktop after a while. (In reply to René Krell from comment #52) > Just for the record: I'm now affected by > https://bugs.freedesktop.org/show_bug.cgi?id=95054 on the same hardware (HP > ZBook 15 G2). nouveau is freezing in accelerating Plasma 5 Desktop after a > while. Thank you for your feedback. We'll discuss the other issues in separate bug reports. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.