Bug 78416 - [gen2 865g] Bad stolen
Summary: [gen2 865g] Bad stolen
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Paulo Zanoni
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-08 01:46 UTC by Pavel Roskin
Modified: 2017-07-24 22:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
kernel log (51.26 KB, text/plain)
2014-05-08 01:46 UTC, Pavel Roskin
no flags Details
lsmod output (2.11 KB, text/plain)
2014-05-08 01:47 UTC, Pavel Roskin
no flags Details
Contents of /sys/class/drm/card1/error (111.57 KB, text/plain)
2014-05-08 01:47 UTC, Pavel Roskin
no flags Details
Output of "lspci -nvvvxxx" (24.07 KB, text/plain)
2014-05-08 01:49 UTC, Pavel Roskin
no flags Details
kernel log with drm.debug=0xf (99.52 KB, text/plain)
2014-05-08 13:08 UTC, Pavel Roskin
no flags Details
Linux 3.15-rc5 log with drm.debug=0xf (this time really with 0xf) (118.00 KB, text/plain)
2014-05-16 03:08 UTC, Pavel Roskin
no flags Details
kernel log without an extra video card (58.88 KB, text/plain)
2014-05-16 20:14 UTC, Pavel Roskin
no flags Details
Output of "lspci -nvvvxxx" without external video card (21.10 KB, text/plain)
2014-05-16 21:04 UTC, Pavel Roskin
no flags Details
"lspci -nvvvxxx" with external video card and Intel video set to primary (24.14 KB, text/plain)
2014-05-16 21:54 UTC, Pavel Roskin
no flags Details

Description Pavel Roskin 2014-05-08 01:46:36 UTC
Created attachment 98653 [details]
kernel log

I have noticed that Dell Dimension 4600 running Ubuntu 12.04 (32-bit) would become very unstable and hang within minutes after I replaced an AGP Radeon 7000 to a PCI Radeon 9200.  It turns out BIOS would not disable the internal Intel video in the later case.  Blacklisting i915 fixed the problem.  Nothing is connected to the Intel video.

I updated to the current mainline kernel (3.15.0-999-generic as of May 7, 2014) and the problem persisted.  To capture the attached logs, I used "init 1" to bring X11 down.  Then I ran "modprobe i915".  The kernel oopsed about a minute later.  I was able to save /sys/class/drm/card1/error before it happened.

Attached are the kernel log after "modprobe i915", the output of "lspci -nvvvxxx", the output of "lsmod" and the error file (compressed).
Comment 1 Pavel Roskin 2014-05-08 01:47:14 UTC
Created attachment 98654 [details]
lsmod output
Comment 2 Pavel Roskin 2014-05-08 01:47:55 UTC
Created attachment 98655 [details]
Contents of /sys/class/drm/card1/error
Comment 3 Pavel Roskin 2014-05-08 01:49:03 UTC
Created attachment 98656 [details]
Output of "lspci -nvvvxxx"
Comment 4 Pavel Roskin 2014-05-08 01:57:59 UTC
Actually, the system is Ubuntu 14.04, sorry.

Also, when I saw the problem originally, the system had BIOS version A07.  I upgraded it to version A12 (the latest), but it made no difference.

I believe BIOS is supposed to disable the internal graphics, but it only does that for AGP cards.  I think i915 should have logic to ignore the Intel video in that case.  Alternatively, it should be initialized properly without relying on BIOS.
Comment 5 Chris Wilson 2014-05-08 05:19:29 UTC
It basically dies during setting up the gpu:

PGTBL_ER: 0x00000052
    source = Reserved System Memory
    error = Invalid Memory

It would interesting to read a drm.debug=0xf dmesg to learn just when that fires. The most likely suspect is that we misdetected the amount of stolen memory reserved for us by the BIOS.
Comment 6 Pavel Roskin 2014-05-08 13:08:53 UTC
Created attachment 98683 [details]
kernel log with drm.debug=0xf
Comment 7 Daniel Vetter 2014-05-15 21:34:26 UTC
(In reply to comment #6)
> Created attachment 98683 [details]
> kernel log with drm.debug=0xf

Are you sure this debug log is from 3.15-rc kernels? There's an awful lot of stuff missing, or you didn't enable full debug. Or a lot is lost somewhere. Please double-check it all worked out.
Comment 8 Pavel Roskin 2014-05-16 00:37:47 UTC
The kernel is mainline 3.15-rc, but I see drm.debug=1 in the log.  I'll capture the correct log shortly.
Comment 9 Pavel Roskin 2014-05-16 03:08:36 UTC
Created attachment 99131 [details]
Linux 3.15-rc5 log with drm.debug=0xf (this time really with 0xf)
Comment 10 Chris Wilson 2014-05-16 10:09:44 UTC
The amount of memory stolen seems genuine. So what is suspect is its location then.
Comment 11 Ville Syrjala 2014-05-16 10:23:00 UTC
(In reply to comment #10)
> The amount of memory stolen seems genuine. So what is suspect is its
> location then.

My gen2 stolen base patches aren't in yet, so how can there be stolen memory? Or do you suspect it's getting clobbered by something else?

In any case the log is weird since it's missing the printk from the stolen memory early quirk. The gen2 patches for those are definitely in 3.15-rc5, so I would expect to see something, unless I fumbled the 865g part. That's of course possible since I didn't have a machine to test with.
Comment 12 Ville Syrjala 2014-05-16 10:38:39 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > The amount of memory stolen seems genuine. So what is suspect is its
> > location then.
> 
> My gen2 stolen base patches aren't in yet, so how can there be stolen
> memory? Or do you suspect it's getting clobbered by something else?
> 
> In any case the log is weird since it's missing the printk from the stolen
> memory early quirk. The gen2 patches for those are definitely in 3.15-rc5,
> so I would expect to see something, unless I fumbled the 865g part. That's
> of course possible since I didn't have a machine to test with.

Just to confirm some of that, I'd like to see what these say:

setpci -s 0:2.0 0xc4.w
intel_reg_read 0x2020
Comment 13 Pavel Roskin 2014-05-16 13:33:23 UTC
root@dimension4600:~# setpci -s 0:2.0 0xc4.w
5f80
root@dimension4600:~# intel_reg_read 0x2020
0x2020 : 0xFFFFFFFF
root@dimension4600:~# modprobe i915
root@dimension4600:~# setpci -s 0:2.0 0xc4.w
5f80
root@dimension4600:~# intel_reg_read 0x2020
0x2020 : 0xFFFFF001
Comment 14 Ville Syrjala 2014-05-16 13:57:24 UTC
(In reply to comment #13)
> root@dimension4600:~# setpci -s 0:2.0 0xc4.w
> 5f80

I think that would mean you have 1.5GiB of RAM. Seems to match your dmesg
"Memory: 1520084K/1563708K available"

I'm still a bit puzzled by your dmesg since even the e820 table is missing from it. Did you scrub it somehow?

> root@dimension4600:~# intel_reg_read 0x2020
> 0x2020 : 0xFFFFFFFF

Ouch. That looks entirely bogus.

Can you repeat w/o external graphics cards attached?

> root@dimension4600:~# modprobe i915
> root@dimension4600:~# setpci -s 0:2.0 0xc4.w
> 5f80
> root@dimension4600:~# intel_reg_read 0x2020
> 0x2020 : 0xFFFFF001
Comment 15 Ville Syrjala 2014-05-16 14:04:01 UTC
(In reply to comment #13)
> root@dimension4600:~# setpci -s 0:2.0 0xc4.w

Oh sorry, that was supposed to be 'setpci -s 0:0.0 0xc4.w'
but your value still makes sense so perhaps it's the same
on 0:2.0. Anyway please check again.
Comment 16 Pavel Roskin 2014-05-16 20:09:30 UTC
I'm getting the exactly same thing with setpci on 0:0.0.

Without i915:

5f80
0x2020 : 0xFFFFFFFF

With i915:

5f80
0x2020 : 0xFFFFF001

If there is no other PCI card, I'm getting:

5f80
0x2020 : 0x5FFE0001

It's the same result whether i915 is loaded or not and whether I'm using 0:0.0 or 0:2.0.
Comment 17 Pavel Roskin 2014-05-16 20:14:56 UTC
Created attachment 99174 [details]
kernel log without an extra video card

In this case, everything is working fine
Comment 18 Daniel Vetter 2014-05-16 20:47:48 UTC
Sounds more like your bios is leaving the card in a not-really-initialized state. Not much we can do besides trying to detect this and not loading the driver.

Any differences in the output between lspci -nn between the working and non-working configuration?
Comment 19 Pavel Roskin 2014-05-16 21:04:16 UTC
Created attachment 99177 [details]
Output of "lspci -nvvvxxx" without external video card
Comment 20 Pavel Roskin 2014-05-16 21:14:56 UTC
Since I already uploaded the output of "lspci -nvvvxxx", that's the command I used again without an external video card.  It should have the same information except the user readable defice names.

I looked at the diff.  The addresses are different, that's not a big deal.

This part is interesting:

-00:02.0 0380: 8086:2572 (rev 02)
+00:02.0 0300: 8086:2572 (rev 02) (prog-if 00 [VGA controller])

If the Intel video is not initialized, it has no "prog-if".  Also, this appears for 00:02.0:

+       Expansion ROM at <unassigned> [disabled]

Obviously, Radeon goes away (01:00.0 and 01:00.1).
Comment 21 Pavel Roskin 2014-05-16 21:54:39 UTC
Created attachment 99181 [details]
"lspci -nvvvxxx" with external video card and Intel video set to primary

This should be much closer to the original.  No differences in addresses.  The only differences for Intel video (00:02.0) are prog-if and Expansion ROM at <unassigned>.

Since BIOS disables Intel video with and AGP card (no 00:02.0 in lspci output), I think the intention was to disable it with a PCI video card as well.  BIOS fails to do so either due to a bug or for some technical reason.  In any case, it's reasonable not to support the Intel video.

It's a very old system.  I intend to give it away for educational purposes.  Nobody is going to do mutlihead on it.  I just want to get that bug fixed because it was a pain to install Linux on it, and it would be a pain to reinstall it.  There were hangs and disk corruption until I blacklisted i915.

The Intel video has VGA output only and there are some minor visual artefacts (that's material for another bug).
Comment 22 Daniel Vetter 2014-05-16 22:03:00 UTC
On Fri, May 16, 2014 at 11:14 PM,  <bugzilla-daemon@freedesktop.org> wrote:
> -00:02.0 0380: 8086:2572 (rev 02)
> +00:02.0 0300: 8086:2572 (rev 02) (prog-if 00 [VGA controller])
>
> If the Intel video is not initialized, it has no "prog-if".  Also, this appears
> for 00:02.0:
>
> +       Expansion ROM at <unassigned> [disabled]

Hm, we're unlucky - both can also happen on systems with two gpus,
it's actually how it's supposed to be. Not sure if there's really
anything we can do here on the driver side.

I think the only option you really have is to blacklist i915 or hope
for a bios upgrade somewhere :(
Comment 23 Daniel Vetter 2014-05-16 22:04:49 UTC
Closing as not our bug, the bios here is pretty terribly broken. I guess we could do an elaborate quirk somewhere in the pci core layer, but doesn't seem like worth it really. Thanks anyway for reporting this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.