Bug 73261

Summary: [hsw ult regresion] first few megabytes of aperture space overwritten after i915.ko loads and before userspace submits first batch
Product: DRI Reporter: Marcos Truchado <marcos.truchado>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, marcos.truchado, rakothedin
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state file generated
none
Xorg log
none
glxinfo
none
two i915_error_state files none

Description Marcos Truchado 2014-01-03 15:46:22 UTC
Created attachment 91463 [details]
i915_error_state file generated

System info:

NAME=openSUSE
VERSION="13.1 (Bottle)"
VERSION_ID="13.1"
PRETTY_NAME="openSUSE 13.1 (Bottle) (x86_64)"
ID=opensuse
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:opensuse:13.1"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://opensuse.org/"
ID_LIKE="suse"

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)

parameters:

disable_power_well 1
enable_hangcheck Y
enable_ips 1
fbpercrtc 0
i915_enable_fbc -1
i915_enable_ppgtt -1
i915_enable_rc6 -1
invert_brightness 0
lvds_channel_mode 0
lvds_downclock 0
lvds_use_ssc -1
modeset 1
panel_ignore_lid 1
powersave 1
preliminary_hw_support 0
reset Y
semaphores -1
vbt_sdvo_panel_type -1

kernel info:

[   19.704725] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[   19.704751] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
Comment 1 Marcos Truchado 2014-01-03 15:48:49 UTC
I didn't do anything special, just using applications like xchat, firefox, eclipse, doing my stuff... suddenly everything freeze, I had manually to shut down my laptop, after restarting I found that the error was recorded.
Comment 2 Chris Wilson 2014-01-03 16:21:35 UTC
Let me guess... An old version of mesa?
Comment 3 Marcos Truchado 2014-01-03 16:23:38 UTC
Hi

No idea, but I really doubt because opensuse 13.1 is quite new, here is the requested info:

linux-s0ap:/home/trucmar # rpm -qa | grep Mesa                                                                                                                                                                                                                                 
Mesa-libGL-devel-9.2.3-61.9.1.x86_64
libOSMesa9-9.2.3-61.9.1.x86_64
Mesa-32bit-9.2.3-61.9.1.x86_64
Mesa-libGL1-32bit-9.2.3-61.9.1.x86_64
Mesa-libGLESv2-2-9.2.3-61.9.1.x86_64
Mesa-libEGL1-32bit-9.2.3-61.9.1.x86_64
Mesa-9.2.3-61.9.1.x86_64
DirectFB-Mesa-1.6.3-4.1.3.x86_64
Mesa-libglapi0-9.2.3-61.9.1.x86_64
Mesa-libglapi0-32bit-9.2.3-61.9.1.x86_64
Mesa-libEGL1-9.2.3-61.9.1.x86_64
Mesa-libEGL-devel-9.2.3-61.9.1.x86_64
libOSMesa9-32bit-9.2.3-61.9.1.x86_64
Mesa-libGL1-9.2.3-61.9.1.x86_64
Comment 4 Chris Wilson 2014-01-03 17:56:52 UTC
That shouldn't have the bug that I think corresponds with that error-state (the issue is that some client is overwriting the ring buffers with a stray render). But can you please paste the output of glxinfo to be sure?
Comment 5 Chris Wilson 2014-01-03 17:57:16 UTC
And also attaching Xorg.0.log would be useful.
Comment 6 Marcos Truchado 2014-01-03 18:55:04 UTC
Hi

By doing this, I realize that I don't even have the GLX module loaded, I will change that.

Xorg.0.log attached.

Thanks
Comment 7 Marcos Truchado 2014-01-03 18:56:21 UTC
Created attachment 91467 [details]
Xorg log
Comment 8 Marcos Truchado 2014-01-03 19:05:49 UTC
Hi

More information that could be important.

This laptop has 2 VGA adapters:

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
01:00.0 3D controller: NVIDIA Corporation GK107M [GeForce GT 750M] (rev a1)

Yesterday I tried to enable the NVIDIA 750M, any attempt to make it work was unsuccessful. I did a cleanup and now I got the GLX extension loaded back on the system, the output is attach to this bug

Thanks.
Comment 9 Marcos Truchado 2014-01-03 19:06:47 UTC
Created attachment 91468 [details]
glxinfo
Comment 10 Chris Wilson 2014-01-10 20:03:02 UTC
Do you have a second error state available? (Just want to see if it is identical or can shed more light on the problem)
Comment 11 Marcos Truchado 2014-01-14 22:26:01 UTC
Hi

I attached is a tar.gz with the two error_state that I found on my machine, I hope this helps.
Comment 12 Marcos Truchado 2014-01-14 22:27:43 UTC
Created attachment 92097 [details]
two i915_error_state files
Comment 13 Chris Wilson 2014-01-15 11:57:58 UTC
Interesting. It looks like there was a CPU access that overwrote the first few megabytes of video RAM after i915.ko was loaded but before X started. The hang occurs because we write some instructions into the ring early which are overwritten before we execute the first batch (the actual instructions for the batches are fine). Since we do not appear to overwrite any batches, that leads me to the conclusion that it is not the GPU (following invalid userspace commands) doing the overwriting, but access by the CPU. Ergo this is a nasty BIOS bug, and most likely prevented by a working fastboot or bios update.
Comment 14 Chris Wilson 2014-01-27 10:56:09 UTC
More appropriate for Daniel since it is his regression...
Comment 15 Chris Wilson 2014-02-26 08:57:12 UTC
*** Bug 75514 has been marked as a duplicate of this bug. ***
Comment 16 Daniel Vetter 2014-03-03 09:21:42 UTC
Isn't this simply because we've moved the ring buffers into stolen?

Iirc there's other stuff at the start of stolen we're supposed to reserve but atm don't ...
Comment 17 Chris Wilson 2014-03-20 14:21:53 UTC
This will be fixed by

commit d978ef14456a38034f6c0e94a794129501f89200
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Mar 7 08:57:51 2014 -0800

    drm/i915: Wrap the preallocated BIOS framebuffer and preserve for KMS fbcon v12

for booting, but I think there may still be an issue on some systems on suspend (depending on whether that BIOS fb is preserved).
Comment 18 Daniel Vetter 2014-03-26 22:03:13 UTC
Let's hope ...
Comment 19 Hohahiu 2014-04-05 22:00:40 UTC
I still have this bug.

My kernel version is 3.14.0, xf86-video-intel is 2.99.911. Mesa and libdrm are from git.
Comment 20 Chris Wilson 2014-04-07 07:36:49 UTC
The aforementioned patch was in the 3.15 queue.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.