Bug 75511 - [NVE7] Lenovo Y500 with 2x GT650M, first card gets wrong vbios, fails boot on 3.13+
Summary: [NVE7] Lenovo Y500 with 2x GT650M, first card gets wrong vbios, fails boot on...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-25 22:15 UTC by Claas Lorenz
Modified: 2014-08-21 22:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output (63.91 KB, text/plain)
2014-02-25 22:15 UTC, Claas Lorenz
no flags Details
output of "cat /sys/bus/pci/devices/0000:01:00.0/rom > vbios-0.rom" (62.00 KB, application/octet-stream)
2014-02-25 23:11 UTC, Claas Lorenz
no flags Details
output of "nvagetbios -c 1 -s PROM > vbios-1.rom" (1.00 MB, application/octet-stream)
2014-02-25 23:13 UTC, Claas Lorenz
no flags Details
output of "nvagetbios -c 0 -s PRAMIN > vbios-0.ramin" (128.00 KB, application/octet-stream)
2014-02-25 23:14 UTC, Claas Lorenz
no flags Details
output of "nvagetbios -c 1 -s PRAMIN > vbios-1.ramin" (128.00 KB, application/octet-stream)
2014-02-25 23:15 UTC, Claas Lorenz
no flags Details
output of acpidump (258.12 KB, text/plain)
2014-02-26 18:44 UTC, Claas Lorenz
no flags Details

Description Claas Lorenz 2014-02-25 22:15:15 UTC
Created attachment 94735 [details]
dmesg output

During boot (with late module load of nouveau) the screen freezes at about the time where normally a glitch occurs and the resolution is set much higher. It will not recover from this state and the machine must be restarted/powered off via ctrl+alt+del or the power button. The journald log shows that systemd booted normally. I circumvented the problem by downgrading to kernel 3.12 where everything works fine. Here are the Arch Linux package versions of the configuration when the error occurs:

kernel - 3.13.5-1
xf86-video-nouveau - 1.0.10-2
nouveau-dri - 10.0.3-1
mesa - 10.0.3-1
xorg-server - 1.15.0-5
libdrm - 2.4.52-1


The dmesg output in the attachment was taken during a boot (as last systemd target before gnome) with the problem described above.
Comment 1 Ilia Mirkin 2014-02-25 22:29:44 UTC
Am I correct that you have 2 ~identical GK107 cards in there? Unfortunately it appears that the VBIOS for the first one is corrupt. It's being retrieved from PCIROM, which more often than not actually doesn't have what we want (despite it passing the checksum). Note that your 2nd card appears to init correctly, and it gets its VBIOS from the PROM. Not sure where the VBIOS for your first card lives...

I assume that the VBIOS error about the unknown opcode exists in 3.12 as well? Except that 3.13 made that an error. Who knows, maybe it's a legitimately unknown opcode... could you grab a copy of envytools and retrieve the VBIOSes for both of your cards with nvagetbios? i.e.

nvagetbios -c 0 -s PROM > vbios-0.rom
nvagetbios -c 1 -s PROM > vbios-1.rom

If the first one fails, also try PRAMIN instead of PROM (I assume -c 1 will work just fine since nouveau is able to find it without problem). If that still fails, try to get it from the pci rom directly:

echo 1 > /sys/bus/pci/devices/0000:01:00.0/rom
cat /sys/bus/pci/devices/0000:01:00.0/rom > vbios-0.rom
echo 0 > /sys/bus/pci/devices/0000:01:00.0/rom

(all of this as root, obviously.)
Comment 2 Claas Lorenz 2014-02-25 23:11:33 UTC
Created attachment 94736 [details]
output of "cat /sys/bus/pci/devices/0000:01:00.0/rom > vbios-0.rom"
Comment 3 Claas Lorenz 2014-02-25 23:13:03 UTC
Created attachment 94737 [details]
output of "nvagetbios -c 1 -s PROM > vbios-1.rom"
Comment 4 Claas Lorenz 2014-02-25 23:14:25 UTC
Created attachment 94738 [details]
output of "nvagetbios -c 0 -s PRAMIN > vbios-0.ramin"
Comment 5 Claas Lorenz 2014-02-25 23:15:34 UTC
Created attachment 94739 [details]
output of "nvagetbios -c 1 -s PRAMIN > vbios-1.ramin"
Comment 6 Claas Lorenz 2014-02-25 23:22:12 UTC
Wow, that was fast. Yes, I have indeed two cards in there and yes, the kernel message also occured in all earlier kernels I used (at least since 3.9.8 when I got my laptop).

From the attachments you see that I had to go the direct pci way to get the vbios-0.rom.
Comment 7 Ilia Mirkin 2014-02-25 23:33:14 UTC
Well, analyzing the vbios from the second (working) card, what I see is:

Init script 0 at 0x83e0:
0x83e0: 8c                                             UNK8C
0x83e1: 7a 00 02 00 00 20 20 00 00                     ZM_REG   R[0x000200] = 0x00002020
0x83ea: 33 14                                          REPEAT   0x14
0x83ec: 6e 00 00 00 00 ff ff ff ff 00 00 00 00         NV_REG   R[0x000000] &= 0xffffffff |= 0x00000000
0x83f9: 36                                             END_REPEAT
0x83fa: 7a 00 02 00 00 25 21 01 40                     ZM_REG   R[0x000200] = 0x40012125
0x8403: 7a c0 24 12 00 00 00 00 00                     ZM_REG   R[0x1224c0] = 0x00000000
0x840c: 7a 40 26 12 00 00 00 00 00                     ZM_REG   R[0x122640] = 0x00000000
0x8415: 6e 00 24 02 00 ff f7 ff ff 00 08 00 00         NV_REG   R[0x022400] &= 0xfffff7ff |= 0x00000800
and so on

Looking at the pci rom of the first (non-working) card, I see:

Init script 0 at 0x83e0:
0x83e0: 42                                             ???
0x83e1: 66                                             CONFIGURE_MEM
0x83e2: ad                                             ???
0x83e3: 66                                             CONFIGURE_MEM
0x83e4: c1                                             ???
0x83e5: c8                                             ???

As you can see the bytes are all different, I'm pretty sure this is 16-bit real mode x86 code:

$ udcli -16 -x
42 66 ad 66 c1 c8 10 ee 66 c1 c0 08 ee 66 c1 c0 08 ee e2 ed 1f 66 61
0000000000000000 42               inc dx                  
0000000000000001 66ad             lodsd                   
0000000000000003 66c1c810         ror eax, 0x10           
0000000000000007 ee               out dx, al              
0000000000000008 66c1c008         rol eax, 0x8            
000000000000000c ee               out dx, al              
000000000000000d 66c1c008         rol eax, 0x8            
0000000000000011 ee               out dx, al              
0000000000000012 e2ed             loop 0x1                
0000000000000014 1f               pop ds                  
0000000000000015 6661             popad                   

Otherwise the bioses appear identical... at least the DCB and GPIO tables match up. So I'd recommend simply grabbing that good vbios-1.rom, sticking it in /lib/firmware (in the initrd if that's where nouveau is loaded from), and adding nouveau.config=NvBios=vbios-1.rom which will use that as the vbios instead of trying to read it from the card.

I'm not extremely happy with this solution, of course, but it's the fastest way to get something that works.

Can you provide any relevant details about your system? These appear to both be mobile chips, it's pretty uncommon to have that in a single system. Would you also mind providing an acpidump? Perhaps the vbios for the first card is hiding in ACPI somewhere unexpected.
Comment 8 Claas Lorenz 2014-02-26 18:43:00 UTC
I have a Lenovo Y500 with two GeForce GT 650M cards. I think this laptop was intended as medium class gaming machine where two graphic cards make sense somehow. Here is the lspci:

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 650M] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 650M] (rev a1)

Since I do not use it for gaming, I never cared about having a second card. I put the working vbios to /lib/firmware and added the config to the kernel line, as you suggested, and it works fine now (also with 3.13). Thank you very much for your help!

I also attached my acpidump, I hope that helps.
Comment 9 Claas Lorenz 2014-02-26 18:44:14 UTC
Created attachment 94782 [details]
output of acpidump
Comment 10 Ilia Mirkin 2014-03-26 21:26:35 UTC
Would you mind trying to edit nouveau_acpi.c:nouveau_acpi_rom_supported and removing the check for dsm_detected and optimus_detected? (And then booting that without the NvBios setting.) I think the situation is that those aren't set, but we should still get the rom from ACPI for that first card.
Comment 11 Ilia Mirkin 2014-08-21 22:33:08 UTC
(In reply to comment #10)
> Would you mind trying to edit nouveau_acpi.c:nouveau_acpi_rom_supported and
> removing the check for dsm_detected and optimus_detected? (And then booting
> that without the NvBios setting.) I think the situation is that those aren't
> set, but we should still get the rom from ACPI for that first card.

A patch to do this was merged a while back, and backported to stable kernels. Pretty sure it will fix the issue for you (without needing the NvBios thing). Feel free to re-open if not.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.