Bug 44999 - [arrandale] kernel BUG setting external monitor to a higher resolution
Summary: [arrandale] kernel BUG setting external monitor to a higher resolution
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2012-01-20 10:03 UTC by Bryce Harrington
Modified: 2016-11-03 12:23 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
BUG trace photo (410.67 KB, image/jpeg)
2012-01-20 10:23 UTC, Bryce Harrington
no flags Details
BootDmesg.txt (65.67 KB, text/plain)
2012-01-20 10:25 UTC, Bryce Harrington
no flags Details
CurrentDmesg.txt (4.09 KB, text/plain)
2012-01-20 10:26 UTC, Bryce Harrington
no flags Details
XorgLog.txt (43.93 KB, text/plain)
2012-01-20 10:27 UTC, Bryce Harrington
no flags Details
use private slab for i915 gem objects (3.02 KB, patch)
2012-01-20 16:05 UTC, Daniel Vetter
no flags Details | Splinter Review

Description Bryce Harrington 2012-01-20 10:03:19 UTC
Forwarding this bug from Ubuntu reporter Martin Pool:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/906086

[Problem]
if I try to turn the external display up above 1600x1200 the kernel oopses (unrecoverably) inside i915_gen_init_ioctl.  See attached photo showing the full trace.

BUG: unable to handle kernel NULL pointer dereference at 000000000000030
 IP: i915_gem_get_aperture_ioctl+0x67/0xb0 [i915]
 Oops: 0000 [#1] SMP

Process Xorg
Call trace:
 drm_ioctl+0x444/0x510
 ? i915_gem_init_ioctl
 ? do_page_fault
 do_vfs_ioctl
 ? vfs_write
 sys_ioctl
 system_call_fastpath

[Original Description]
Following on from bug 745112 but on precise - similar but different symptoms so a separate bug.

With an external monitor connected through the doc displayport of a thinkpad x201, running current precise:

 * the internal screen looks ok
 * if I use the display control panel to try to turn off the internal screen and use only the external screen, it looks distorted and flickery, with two panels, as if it's trying to show two desktops on the same display; when the safety timeout expires it recovers ok
 * if I have the displays side by side and the external display at a low resolution, it works ok
 * if I try to turn the external display up above 1600x1200 the kernel oopses (unrecoverably) inside i915_gen_init_ioctl

DistroRelease: Ubuntu 12.04
Package: xserver-xorg 1:7.6+7ubuntu7
ProcVersionSignature: Ubuntu 3.2.0-5.11-generic 3.2.0-rc5
Uname: Linux 3.2.0-5-generic x86_64
.tmp.unity.support.test.0:
 
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
CompizPlugins: [core,bailer,detection,composite,opengl,compiztoolbox,decor,gnomecompat,mousepoll,imgpng,place,regex,session,unitymtgrabhandles,resize,vpswitch,animation,grid,move,snap,wall,expo,workarounds,ezoom,fade,scale,unityshell]
CompositorRunning: compiz
Date: Mon Dec 19 09:56:07 2011
DistUpgraded: Log time: 2011-12-15 10:06:44.057094
DistroCodename: precise
DistroVariant: ubuntu
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:215a]
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: LENOVO 3249CTO
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-5-generic root=UUID=8aff985d-377a-420d-a38e-62ce8bd54504 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to precise on 2011-12-15 (3 days ago)
dmi.bios.date: 05/31/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET66WW (1.36 )
dmi.board.name: 3249CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET66WW(1.36):bd05/31/2011:svnLENOVO:pn3249CTO:pvrThinkPadX201:rvnLENOVO:rn3249CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3249CTO
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu8
version.ia32-libs: ia32-libs 20090808ubuntu26
version.libdrm2: libdrm2 2.4.27-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu4
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu4
version.xserver-xorg-core: xserver-xorg-core 2:1.10.4-1ubuntu6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1
Comment 1 Bryce Harrington 2012-01-20 10:23:28 UTC
Created attachment 55842 [details]
BUG trace photo
Comment 2 Bryce Harrington 2012-01-20 10:25:38 UTC
Created attachment 55843 [details]
BootDmesg.txt
Comment 3 Bryce Harrington 2012-01-20 10:26:55 UTC
Created attachment 55844 [details]
CurrentDmesg.txt
Comment 4 Bryce Harrington 2012-01-20 10:27:09 UTC
Created attachment 55845 [details]
XorgLog.txt
Comment 5 Daniel Vetter 2012-01-20 15:37:03 UTC
OOPS-decoding for fun and profit:

A reasonable decode of the code from the OOPS

  0x400641 <array+1>:  mov    0x16e0(%r12),%rdx
   0x400649 <array+9>:  lea    0x16e0(%r12),%rcx
   0x400651 <array+17>: cmp    %rdx,%rcx
   0x400654 <array+20>: lea    -0xb0(%rdx),%rax
   0x40065b <array+27>: je     0x400682
   0x40065d <array+29>: nopl   0x0(%rax)
   0x400664 <array+36>: mov    0x88(%rax),%rdx
   0x40066b <array+43>: add    0x30(%rdx),%ebx <- we die here
   0x40066e <array+46>: mov    0xb0(%rax),%rdx
   0x400675 <array+53>: cmp    %rdx,%rcx
   0x400678 <array+56>: lea    -0xb0(%rdx),%rax
   0x40067f <array+63>: add    %ah,0x1000a70(%rip)        # 0x14010f5
   0x400685:    sbb    (%rbx),%eax
   0x400687:    cmp    (%rax),%ebp
   0x400689:    add    %al,(%rax)
   0x40068b:    add    %al,(%rax,%rax,1)
   0x40068e:    add    %al,(%rax)
   0x400690:    rex.WR std 
   0x400692:    (bad)  
   0x400693:    incl   0x0(%rax,%rax,1)

Some comparison with asm from my own tree suggest that

%rdx == gtt_space
0x30(%rdx) gtt_space->size

%rax == obj
0x88(rax) == obj->gtt_space

0xb0(rax) == obj->mm_list.next

We die at NULL+0x30.

Stuff before&after makes less sense, and I'm misssing the function exit code which should follow. Propably the add %rip does something fancy out-of-line.

In other news we have an obj on the pinned list with gtt_space = NULL.
Comment 6 Daniel Vetter 2012-01-20 16:05:14 UTC
Created attachment 55880 [details] [review]
use private slab for i915 gem objects

Please try to reproduce the issue with patch. Also ensure that you either use the SLAB allocator or if your using SLUB, please boot with slub_debug on the kernel cmdline.
Comment 7 Martin Pool 2012-01-22 22:01:45 UTC
You are missing one declaration of dev_priv, around line 205.  With that fixed, it does build on the Ubuntu kernel.  I'm going to test it with slub_debug.
Comment 8 Daniel Vetter 2012-01-23 01:17:42 UTC
> --- Comment #7 from Martin Pool <mbp@sourcefrog.net> 2012-01-22 22:01:45 PST ---
> You are missing one declaration of dev_priv, around line 205.  With that fixed,
> it does build on the Ubuntu kernel.  I'm going to test it with slub_debug.

Oops, I've fixed that locally but forgot to amend the patch before attaching it.
Comment 9 Martin Pool 2012-01-23 09:41:09 UTC
Hi Daniel,

With this patch applied, and 'slub_debug' on the kernel command line, I get the same problem I was previously: no oops, but the external screen is blank or can't sync.
Comment 10 Daniel Vetter 2012-01-25 06:42:04 UTC
Just to check: the patch _does_ get rid of the oops?

For the blank screen issue: Please boot with drm.debug=0x4 added to your kernel cmdline, reproduce the problem and then attach the full dmesg.
Comment 11 Daniel Vetter 2012-05-07 05:55:00 UTC
Ping.
Comment 12 Daniel Vetter 2012-06-27 03:47:23 UTC
It looks like the private slab works around foreign memory corruption and for the remaining issues the reporter has gone awol. Also, I don't quite see the evidence for why this is a regression.

Tentatively closing, please reopen if this is still an issue on latest kernel version.
Comment 13 Jari Tahvanainen 2016-11-03 12:23:42 UTC
Closing resolved+invalid. No activity on >4 years.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.