Bug 50381

Summary: [i915 DRM] [drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!
Product: DRI Reporter: Ali Bahar <ali_bugs>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: medium CC: ali_bugs, ben, chris, daniel, eugeni, florian, jbarnes
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
screenshot
none
Xorg.0.log
none
xrandr
none
dmesg
none
lspci -nn
none
lsmod
none
.config
none
.config
none
Streamline agp-intel find
none
Check for another host bridge
none
syslog, patches applied.
none
lsmod, patches applied.
none
dmesg, patches applied.
none
lspci -vv, patches applied.
none
Xorg.0.log, patches applied.
none
startx's stderr & stdin none

Description Ali Bahar 2012-05-27 00:16:45 UTC
Created attachment 62130 [details]
screenshot

Symptoms:
- when startx is run, the resulting desktop is rendered in one of 2 ways:
  1. the resolution is fairly ok, and the screen dimensions are 1280x1024, but it is shifted to the right such that part of it wraps-around to the LHS! See attached image.
  2. an unshifted screen, but very low resolution.

- syslog always shows
  [drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!

- The graphics performance of the box is far from its best. Video playback runs OK, but far from optimal. A simple 2D test, the old utility 'ico', confirms this: the polygon bounces around like 'pong'!

2) System environment: 
- chipset: i915, as far as I can tell!
- uname -m: x86_64
- uname -r 3.3.7
- cpu: Intel(R) Core(TM) i3 CPU         540  @ 3.07GHz
- mobo: Gigabyte H55M-S2V. In other words, the Intel H55 chipset.
- pkg-config --modversion libdrm: 2.4.27
- glxinfo usually shows Mesa to be Mesa 7.11. 
- xf86-video-intel seems to be 12.0 (as shown by Xorg.0.log)
- X.Org X Server 1.12.1.902 (1.12.2 RC 2)
-- Linux distribution:
  Debian 6.0, but the Testing branch. However, the problem (low resolution and poor performance) is reproducible with Debian Stable as well.

-- Display connector: (e.g. VGA, DVI, HDMI, S-video, ...)
  DVI
3) Reproduce steps. Probability if not 100% reproducible.
  a. run 'startx'. It happens each and every time. 100%.
4) Attachment:
-- Xorg.0.log
-- dmesg output (better with boot option "drm.debug=0x06")
  A few printks were added to the i915 code, denoted by "alidbg".
-- screenshot or photo (optional, a picture is worth a thousand words)
-- output of "xrandr --verbose" for display mode issue


  Other combinations of CPU/mobo have been involved as well. The G41 chipset (Gigabyte G41M-Combo, with a celeron e3400 CPU) used to be fine, when run in Debian 6.0 (Stable branch). Today's Debian Testing (with its Xorg upgrades) seems to have broken that; even glxinfo fails!
A Radeon 5450 card shows the same low resolution, and low 'ico' performance; but no AGP failures in syslog. 
Debian's Stable kernel, 2.6.32-5-amd64, did not fix the problems.
Testing was with, and without, /etc/X11/xorg.conf.
Comment 1 Ali Bahar 2012-05-27 00:18:20 UTC
Created attachment 62131 [details]
Xorg.0.log

Resulting from 'startx'.
Comment 2 Ali Bahar 2012-05-27 00:19:32 UTC
Created attachment 62132 [details]
xrandr

xrandr --verbose
I don't usually run this, so I can't tell you much about any variance in its output.
Comment 3 Ali Bahar 2012-05-27 00:23:24 UTC
Created attachment 62133 [details]
dmesg

To trace behaviour, I had added a few printk statements to 
linux-3.3.7/drivers/char/agp/
linux-3.3.7/drivers/gpu/drm/i915/

You will see these flagged by 'alidbg'.
Comment 4 Ali Bahar 2012-05-27 00:34:38 UTC
Manual loading of the intel_agp module will cause no warnings. But subsequent loading of i915 will cause:
FATAL: Error inserting i915 (/lib/modules/3.3.7_Sat26may2012/kernel/drivers/gpu/drm/i915/i915.ko): No such device

All of the following do get loaded, though not i915 itself:

root@misery Sat May 26 22:07:14 ~$ modinfo i915|egrep -i depends
depends:        drm,drm_kms_helper,fb,i2c-core,cfbfillrect,video,button,cfbimgblt,cfbcopyarea,i2c-algo-bit
root@misery Sat May 26 22:19:55 ~$ 

The following is very similar to my symptoms:
http://lists.freedesktop.org/archives/intel-gfx/2011-August/011791.html

The kernel was built locally, though different kernels (including a debian Stable one) have been tried.
I _have_ had such problems before, with various h/w & s/w including i915, but they'd all been eventually resolved.
Comment 5 Ali Bahar 2012-05-27 00:36:44 UTC
intel_reg_dumper etc can be provided upon request.
Comment 6 Ali Bahar 2012-05-27 17:21:42 UTC
Correction: The Display connector is VGA.

The symptoms toggle, between shifted and unshifted, every time startx is run. That is, one run will be shifted, and the next run will be unshifted.

The G41M + e3400 box used to run perfectly, in resolution and performance, in both Debian Stable and Debian Testing. Its recent failure was due only to another Testing upgrade.
Comment 7 Ali Bahar 2012-05-27 19:38:37 UTC
With Debian 6.0, Stable, the error string
[drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!
does not occur in syslog. However, everything else (shifting, resolution, performance) does. The behaviour has been consistent for months.

- As usual, glxnfo shows:

direct rendering: Yes
OpenGL renderer string: Software Rasterizer

- xvinfo shows:

X-Video Extension version 2.2
screen #0
 no adaptors present


- and xdriinfo shows:

Screen 0: not direct rendering capable.
Comment 8 Eugeni Dodonov 2012-05-28 07:23:16 UTC
Could you please attach the output of 'lspci -nn' as well?
Comment 9 Ali Bahar 2012-05-28 08:05:38 UTC
(In reply to comment #8)
> Could you please attach the output of 'lspci -nn' as well?

See attached.
Comment 10 Ali Bahar 2012-05-28 08:07:18 UTC
Created attachment 62167 [details]
lspci -nn

lspci -nn
Comment 11 Eugeni Dodonov 2012-05-28 08:58:42 UTC
Looks like you are missing something in your kernel build.. Could you please also attach your kernel .config file, and lsmod output?
Comment 12 Ali Bahar 2012-05-28 09:06:38 UTC
Created attachment 62168 [details]
lsmod

lsmod.
Comment 13 Ali Bahar 2012-05-28 09:09:26 UTC
Created attachment 62169 [details]
.config

.config
The kernel was locally built, and has far more stuff than it ought to.
Comment 14 Ali Bahar 2012-05-28 09:14:03 UTC
Created attachment 62170 [details]
.config

.config

The kernel was locally built, and has far too much stuff than it ought to.
Comment 15 Chris Wilson 2012-05-28 11:07:30 UTC
Created attachment 62175 [details] [review]
Streamline agp-intel find

Step 1, can you please apply this patch.
Step 2, instrument the find function with all the pdev->vendor / pdev->device / pdev->class / pdev->devfn that is tries to match against.
Comment 16 Eugeni Dodonov 2012-05-28 11:10:55 UTC
Created attachment 62176 [details] [review]
Check for another host bridge

Additionally, could you please try with this patch and check if it makes things work?
Comment 17 Ali Bahar 2012-05-28 17:27:00 UTC
I'll apply the patches later today, and will update you.
Thanks.
Comment 18 Ali Bahar 2012-05-29 06:49:02 UTC
Bad news. The box died. Testing with different motherboards point to the CPU, though this is not definitive (because mobos are not consistent with respect to their beeps and interaction with the power supply.)
So I can no longer reproduce the problem. The patch had not been applied.
Thanks for you help, though.
Comment 19 Daniel Vetter 2012-05-29 07:23:37 UTC
Ok I'll close this as invalid because the box died. Thanks for reporting this issue and please reopen this bug if the problem happens again (on fixed hw).
Comment 20 Chris Wilson 2012-05-29 07:30:08 UTC
Note, if we can confirm the 0x0069 HostBridge then Eugeni's patch should go upstream. And mine if you don't like O(nm) searches ;-)
Comment 21 Ali Bahar 2012-06-05 21:19:22 UTC
"The re-animation of dead tissue!"
Don't ask me how -- I've given on the reliability of post-Pentium x86 hardware! -- but the box came back up again!

I applied both patches. Now, the i915 module does get loaded, but startx hangs. I will attach some files, but I have not managed to figure out how to "instrument" the code as Chris Wilson asked. In syslog it shows that it finds the 00:02.0 graphics controller. So please clarify what you want done.
Comment 22 Ali Bahar 2012-06-05 21:21:05 UTC
Created attachment 62621 [details]
syslog, patches applied.

Older, earlier, parts of syslog have been edited out.
Comment 23 Ali Bahar 2012-06-05 21:21:46 UTC
Created attachment 62622 [details]
lsmod, patches applied.
Comment 24 Ali Bahar 2012-06-05 21:22:18 UTC
Created attachment 62623 [details]
dmesg, patches applied.
Comment 25 Ali Bahar 2012-06-05 21:22:53 UTC
Created attachment 62624 [details]
lspci -vv, patches applied.
Comment 26 Ali Bahar 2012-06-05 21:26:36 UTC
Created attachment 62625 [details]
Xorg.0.log, patches applied.

I run 'startx'. It ends up with a non-blinking cursor on the top left-hand side of the screen. The box continues to run fine (when connected-to remotely), but there is no response on the display, and I find no /usr/bin/X process to kill.
Comment 27 Chris Wilson 2012-06-06 00:32:14 UTC
Ok, that sounds like X crashed during startup. Can you also attach your gdm.log (or equivalent kdm.log, xdm.log etc) as the fault doesn't appear in the Xorg.log and so I hope it went to stderr instead...
Comment 28 Ali Bahar 2012-06-06 04:09:41 UTC
(In reply to comment #27)
> Ok, that sounds like X crashed during startup. Can you also attach your gdm.log
> (or equivalent kdm.log, xdm.log etc) as the fault doesn't appear in the
> Xorg.log and so I hope it went to stderr instead...

Good tip. Thanks.
There is no display manager, but I caught the stderr+stdin of startx anyway. See attachment; there seems to be an undefined symbol:

/usr/bin/X: symbol lookup error: /usr/lib/xorg/modules/drivers/intel_drv.so: undefined symbol: drm_intel_bufmgr_gem_set_vma_cache_size

I updated the driver just today, but it did not help.

Package: xserver-xorg-video-intel
Version: 2:2.19.0-1

I am assuming that I may need to contact the Debian crew for this matter.
In the next few days, I will see if I can try this hardware with Debian Stable. As the patches have caused the i915 driver to be loaded, then I suppose there has been some progress. 
Thanks for your help.
Comment 29 Ali Bahar 2012-06-06 04:12:45 UTC
Created attachment 62664 [details]
startx's stderr & stdin

The capture command was something like
  startx > startx_stdin_stderr_W06jun2012 2>&1
Comment 30 Chris Wilson 2012-06-06 04:18:10 UTC
If you update libdrm_intel, that should resolve the missing symbol. drm_intel_bufmgr_gem_set_vma_cache_size was introduced in libdrm_intel-2.4.29
Comment 31 Ali Bahar 2012-06-06 04:29:30 UTC
(In reply to comment #30)
> If you update libdrm_intel, that should resolve the missing symbol.
> drm_intel_bufmgr_gem_set_vma_cache_size was introduced in libdrm_intel-2.4.29

It's already at 

Package: libdrm-intel1
Version: 2.4.33-1

and installed.
Comment 32 Daniel Vetter 2012-06-06 04:33:21 UTC
On Wed, Jun 6, 2012 at 1:29 PM,  <bugzilla-daemon@freedesktop.org> wrote:
> --- Comment #31 from Ali Bahar <ali_bugs@internetdog.org> 2012-06-06 04:29:30 PDT ---
> (In reply to comment #30)
>> If you update libdrm_intel, that should resolve the missing symbol.
>> drm_intel_bufmgr_gem_set_vma_cache_size was introduced in libdrm_intel-2.4.29
>
> It's already at
>
> Package: libdrm-intel1
> Version: 2.4.33-1
>
> and installed.

Well, that missing symbol is in libdrm-intel1 and Xorg can't find it.
So something has gone wrong with your install. Can you please check
what ldd says for your intel_drv.so (i.e. the xf86-video-intel driver
you've manually installed) and whether it indeed links to the libdrm
that you've compiled and installed?
Comment 33 Ali Bahar 2012-06-06 07:27:44 UTC
(In reply to comment #32)


> Well, that missing symbol is in libdrm-intel1 and Xorg can't find it.
> So something has gone wrong with your install. Can you please check

Yes, I'd thought so. Hence the plan to test Debian Stable.

> what ldd says for your intel_drv.so (i.e. the xf86-video-intel driver
> you've manually installed) and whether it indeed links to the libdrm
> that you've compiled and installed?

You're right on the mark. ldd showed that forgotten libdrm libs from /usr/local/ were being used. Removing them has now restored proper display. No shifting, high resolution, 'ico' runs blazing fast again, and xvinfo/xdriinfo/glxinfo/vainfo seem to find everything. glxgears is running at suspiciously low FPS, but I do not know enough to consider this a problem.

So, the patches were key to the fix, of course. And, on my particular box, old DRM libs (from Nov. 2011) needed to be upgraded.

Thanks very much for all your help. I will close this bug.
Comment 34 Ali Bahar 2012-06-06 07:29:40 UTC
The patches seem to have fixed the problem.
(In my particular case, some stale libdrm files were hiding around as well.)
Comment 35 Florian Mickler 2012-07-01 03:42:12 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc2:

commit 67384fe3fd450536342330f684ea1f7dcaef8130
Author: Eugeni Dodonov <eugeni.dodonov@intel.com>
Date:   Wed Jun 6 11:59:06 2012 -0300

    char/agp: add another Ironlake host bridge

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.