Created attachment 98653 [details] kernel log I have noticed that Dell Dimension 4600 running Ubuntu 12.04 (32-bit) would become very unstable and hang within minutes after I replaced an AGP Radeon 7000 to a PCI Radeon 9200. It turns out BIOS would not disable the internal Intel video in the later case. Blacklisting i915 fixed the problem. Nothing is connected to the Intel video. I updated to the current mainline kernel (3.15.0-999-generic as of May 7, 2014) and the problem persisted. To capture the attached logs, I used "init 1" to bring X11 down. Then I ran "modprobe i915". The kernel oopsed about a minute later. I was able to save /sys/class/drm/card1/error before it happened. Attached are the kernel log after "modprobe i915", the output of "lspci -nvvvxxx", the output of "lsmod" and the error file (compressed).
Created attachment 98654 [details] lsmod output
Created attachment 98655 [details] Contents of /sys/class/drm/card1/error
Created attachment 98656 [details] Output of "lspci -nvvvxxx"
Actually, the system is Ubuntu 14.04, sorry. Also, when I saw the problem originally, the system had BIOS version A07. I upgraded it to version A12 (the latest), but it made no difference. I believe BIOS is supposed to disable the internal graphics, but it only does that for AGP cards. I think i915 should have logic to ignore the Intel video in that case. Alternatively, it should be initialized properly without relying on BIOS.
It basically dies during setting up the gpu: PGTBL_ER: 0x00000052 source = Reserved System Memory error = Invalid Memory It would interesting to read a drm.debug=0xf dmesg to learn just when that fires. The most likely suspect is that we misdetected the amount of stolen memory reserved for us by the BIOS.
Created attachment 98683 [details] kernel log with drm.debug=0xf
(In reply to comment #6) > Created attachment 98683 [details] > kernel log with drm.debug=0xf Are you sure this debug log is from 3.15-rc kernels? There's an awful lot of stuff missing, or you didn't enable full debug. Or a lot is lost somewhere. Please double-check it all worked out.
The kernel is mainline 3.15-rc, but I see drm.debug=1 in the log. I'll capture the correct log shortly.
Created attachment 99131 [details] Linux 3.15-rc5 log with drm.debug=0xf (this time really with 0xf)
The amount of memory stolen seems genuine. So what is suspect is its location then.
(In reply to comment #10) > The amount of memory stolen seems genuine. So what is suspect is its > location then. My gen2 stolen base patches aren't in yet, so how can there be stolen memory? Or do you suspect it's getting clobbered by something else? In any case the log is weird since it's missing the printk from the stolen memory early quirk. The gen2 patches for those are definitely in 3.15-rc5, so I would expect to see something, unless I fumbled the 865g part. That's of course possible since I didn't have a machine to test with.
(In reply to comment #11) > (In reply to comment #10) > > The amount of memory stolen seems genuine. So what is suspect is its > > location then. > > My gen2 stolen base patches aren't in yet, so how can there be stolen > memory? Or do you suspect it's getting clobbered by something else? > > In any case the log is weird since it's missing the printk from the stolen > memory early quirk. The gen2 patches for those are definitely in 3.15-rc5, > so I would expect to see something, unless I fumbled the 865g part. That's > of course possible since I didn't have a machine to test with. Just to confirm some of that, I'd like to see what these say: setpci -s 0:2.0 0xc4.w intel_reg_read 0x2020
root@dimension4600:~# setpci -s 0:2.0 0xc4.w 5f80 root@dimension4600:~# intel_reg_read 0x2020 0x2020 : 0xFFFFFFFF root@dimension4600:~# modprobe i915 root@dimension4600:~# setpci -s 0:2.0 0xc4.w 5f80 root@dimension4600:~# intel_reg_read 0x2020 0x2020 : 0xFFFFF001
(In reply to comment #13) > root@dimension4600:~# setpci -s 0:2.0 0xc4.w > 5f80 I think that would mean you have 1.5GiB of RAM. Seems to match your dmesg "Memory: 1520084K/1563708K available" I'm still a bit puzzled by your dmesg since even the e820 table is missing from it. Did you scrub it somehow? > root@dimension4600:~# intel_reg_read 0x2020 > 0x2020 : 0xFFFFFFFF Ouch. That looks entirely bogus. Can you repeat w/o external graphics cards attached? > root@dimension4600:~# modprobe i915 > root@dimension4600:~# setpci -s 0:2.0 0xc4.w > 5f80 > root@dimension4600:~# intel_reg_read 0x2020 > 0x2020 : 0xFFFFF001
(In reply to comment #13) > root@dimension4600:~# setpci -s 0:2.0 0xc4.w Oh sorry, that was supposed to be 'setpci -s 0:0.0 0xc4.w' but your value still makes sense so perhaps it's the same on 0:2.0. Anyway please check again.
I'm getting the exactly same thing with setpci on 0:0.0. Without i915: 5f80 0x2020 : 0xFFFFFFFF With i915: 5f80 0x2020 : 0xFFFFF001 If there is no other PCI card, I'm getting: 5f80 0x2020 : 0x5FFE0001 It's the same result whether i915 is loaded or not and whether I'm using 0:0.0 or 0:2.0.
Created attachment 99174 [details] kernel log without an extra video card In this case, everything is working fine
Sounds more like your bios is leaving the card in a not-really-initialized state. Not much we can do besides trying to detect this and not loading the driver. Any differences in the output between lspci -nn between the working and non-working configuration?
Created attachment 99177 [details] Output of "lspci -nvvvxxx" without external video card
Since I already uploaded the output of "lspci -nvvvxxx", that's the command I used again without an external video card. It should have the same information except the user readable defice names. I looked at the diff. The addresses are different, that's not a big deal. This part is interesting: -00:02.0 0380: 8086:2572 (rev 02) +00:02.0 0300: 8086:2572 (rev 02) (prog-if 00 [VGA controller]) If the Intel video is not initialized, it has no "prog-if". Also, this appears for 00:02.0: + Expansion ROM at <unassigned> [disabled] Obviously, Radeon goes away (01:00.0 and 01:00.1).
Created attachment 99181 [details] "lspci -nvvvxxx" with external video card and Intel video set to primary This should be much closer to the original. No differences in addresses. The only differences for Intel video (00:02.0) are prog-if and Expansion ROM at <unassigned>. Since BIOS disables Intel video with and AGP card (no 00:02.0 in lspci output), I think the intention was to disable it with a PCI video card as well. BIOS fails to do so either due to a bug or for some technical reason. In any case, it's reasonable not to support the Intel video. It's a very old system. I intend to give it away for educational purposes. Nobody is going to do mutlihead on it. I just want to get that bug fixed because it was a pain to install Linux on it, and it would be a pain to reinstall it. There were hangs and disk corruption until I blacklisted i915. The Intel video has VGA output only and there are some minor visual artefacts (that's material for another bug).
On Fri, May 16, 2014 at 11:14 PM, <bugzilla-daemon@freedesktop.org> wrote: > -00:02.0 0380: 8086:2572 (rev 02) > +00:02.0 0300: 8086:2572 (rev 02) (prog-if 00 [VGA controller]) > > If the Intel video is not initialized, it has no "prog-if". Also, this appears > for 00:02.0: > > + Expansion ROM at <unassigned> [disabled] Hm, we're unlucky - both can also happen on systems with two gpus, it's actually how it's supposed to be. Not sure if there's really anything we can do here on the driver side. I think the only option you really have is to blacklist i915 or hope for a bios upgrade somewhere :(
Closing as not our bug, the bios here is pretty terribly broken. I guess we could do an elaborate quirk somewhere in the pci core layer, but doesn't seem like worth it really. Thanks anyway for reporting this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.