Created attachment 36603 [details] large sample of visible corruption on wallpaper Bug description: getting screen corruption on the order of shifted or transposed small tiles, seems to happen on certain patterns/colors (refreshing a page in firefox for example, will have the same image corruption even after re-decoding the image) i'm _always_ running compiz System environment: -- chipset: [ 46.235] (II) intel(0): Integrated Graphics Chipset: Intel(R) GM45 -- system architecture: 64-bit -- xf86-video-intel: [ 45.954] (II) Module intel: vendor="X.Org Foundation" [ 45.954] compiled for 1.8.1.902, module version = 2.12.0 [ 45.954] Module class: X.Org Video Driver [ 45.954] ABI class: X.Org Video Driver, version 7.0 -- xserver: X.Org X Server 1.8.1.902 (1.8.2 RC 2) -- mesa: OpenGL renderer string: Mesa DRI Mobile Intel® GM45 Express Chipset GEM 20100330 DEVELOPMENT OpenGL version string: 2.1 Mesa 7.9-devel -- libdrm: $ pkg-config --modversion libdrm 2.4.21 -- kernel: 2.6.32-23-generic -- Linux distribution: Ubuntu (10.04) using xorg-edgers ppa; but same bug occurs as shipped for 10.04 -- Machine or mobo model: $ cat /sys/class/dmi/id/product_name Compaq Presario CQ60 Notebook PC -- Display connector: internal LVDS Reproducing steps: happens after a period of time (2 hours?) without fail, though i suspect my huge firefox session helps it along Additional info:
Created attachment 36604 [details] bug as seen in browser, returns even after image is fetched and decoded again (clean cache)
Created attachment 36605 [details] as seen in nautilus, window drop shadows were also uniquely corrupted (not typical of other patterns)
Probably A17 issues after paging the image out. Assigning to myself, but unlikely to make progress on it.
The idle thought I had was that perhaps the MOVABLE flag is allowing the system to move a17 pages without our realisation. So do things improve after: commit 985b823b919273fe1327d56d2196b4f92e5d0fae Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri Jul 2 10:04:42 2010 +1000 drm/i915: fix hibernation since i915 self-reclaim fixes Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 ("drm/i915: Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the i915 page allocator where we weren't before due to some over-eager removal of the page mapping gfp_flags games the code used to play. This caused hibernate on Intel hardware to result in a lot of memory corruptions on resume. See for example http://bugzilla.kernel.org/show_bug.cgi?id=13811 Reported-by: Evengi Golov (in bugzilla) Signed-off-by: Dave Airlie <airlied@redhat.com> Tested-by: M. Vefa Bicakci <bicave@superonline.com> Cc: stable@kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> included with 2.6.35-rc4.
(In reply to comment #4) > included with 2.6.35-rc4. it was at rc4 by the time i tried mainline; if there was any change it was that it happens in a slightly more rareified manner, and it doesn't "spread" as fast (if at all)
Appears I can't just wish this corruption away. Thanks for checking.
i've been told the memory configuration might have something to do with it, the DCC register in particular, and "bit 11" DCC: 0x000f0002 (dual channel interleaved, XOR randomization: enabled, XOR bit: 11) and the full output of intel_reg_dumper DCC: 0x000f0002 (dual channel interleaved, XOR randomization: enabled, XOR bit: 11) CHDECMISC: 0x00000000 (none, ch2 enh disabled, ch1 enh disabled, ch0 enh disabled, flex disabled, ep not present) C0DRB0: 0x000f0002 (0x0002) C0DRB1: 0x0000000f (0x000f) C0DRB2: 0x00000000 (0x0000) C0DRB3: 0x0c000000 (0x0000) C1DRB0: 0x00000000 (0x0000) C1DRB1: 0x00000000 (0x0000) C1DRB2: 0x00000000 (0x0000) C1DRB3: 0x00000000 (0x0000) C0DRA01: 0x00030c00 (0x0c00) C0DRA23: 0x001c0003 (0x0003) C1DRA01: 0x00000000 (0x0000) C1DRA23: 0x00000000 (0x0000) PGETBL_CTL: 0x00000001 VCLK_DIVISOR_VGA0: 0x00031108 (n = 3, m1 = 17, m2 = 8) VCLK_DIVISOR_VGA1: 0x00031406 (n = 3, m1 = 20, m2 = 6) VCLK_POST_DIV: 0x00020002 (vga0 p1 = 4, p2 = 2, vga1 p1 = 2, p2 = 2) DPLL_TEST: 0x00010001 () CACHE_MODE_0: 0x00006820 D_STATE: 0x00000000 DSPCLK_GATE_D: 0x1004000c (clock gates disabled: VRHUNIT DSSUNIT OVRUNIT OVCUNIT) RENCLK_GATE_D1: 0x00000000 RENCLK_GATE_D2: 0x000002c0 SDVOB: 0x0000001c (disabled, pipe A, stall disabled, detected) SDVOC: 0x00000018 (disabled, pipe A, stall disabled, not detected) SDVOUDI: 0x00000000 DSPARB: 0x00000000 DSPFW1: 0x1a901702 DSPFW2: 0x00001f00 DSPFW3: 0x20000000 ADPA: 0x40008c18 (disabled, pipe B, +hsync, +vsync) LVDS: 0xc2308300 (enabled, pipe B, 18 bit, 1 channel) DVOA: 0x00000000 (disabled, pipe A, no stall, -hsync, -vsync) DVOB: 0x0000001c (disabled, pipe A, no stall, +hsync, +vsync) DVOC: 0x00000018 (disabled, pipe A, no stall, +hsync, +vsync) DVOA_SRCDIM: 0x00000000 DVOB_SRCDIM: 0x00000000 DVOC_SRCDIM: 0x00000000 PP_CONTROL: 0x00000001 (power target: on) PP_STATUS: 0xc0000008 (on, ready, sequencing idle) PP_ON_DELAYS: 0x012c07d0 PP_OFF_DELAYS: 0x012c157c PP_DIVISOR: 0x0033f305 PFIT_CONTROL: 0x00000000 PFIT_PGM_RATIOS: 0x00000000 PORT_HOTPLUG_EN: 0x28000200 PORT_HOTPLUG_STAT: 0x08000300 DSPACNTR: 0x00000000 (disabled, pipe A) DSPASTRIDE: 0x00000000 (0 bytes) DSPAPOS: 0x00000000 (0, 0) DSPASIZE: 0x00000000 (1, 1) DSPABASE: 0x00000000 DSPASURF: 0x00000000 DSPATILEOFF: 0x00000000 PIPEACONF: 0x00000000 (disabled, inactive) PIPEASRC: 0x00000000 (1, 1) PIPEASTAT: 0x00000000 (status:) PIPEA_GMCH_DATA_M: 0x00000000 PIPEA_GMCH_DATA_N: 0x00000000 PIPEA_DP_LINK_M: 0x00000000 PIPEA_DP_LINK_N: 0x00000000 CURSOR_A_BASE: 0x00000000 CURSOR_A_CONTROL: 0x00000000 CURSOR_A_POSITION: 0x00000000 FPA0: 0x00021509 (n = 2, m1 = 21, m2 = 9) FPA1: 0x00021509 (n = 2, m1 = 21, m2 = 9) DPLL_A: 0x04040004 (disabled, non-dvo, VGA, default clock, DAC/serial mode, p1 = 3, p2 = 10) DPLL_A_MD: 0x00000003 HTOTAL_A: 0x00000000 (1 active, 1 total) HBLANK_A: 0x00000000 (1 start, 1 end) HSYNC_A: 0x00000000 (1 start, 1 end) VTOTAL_A: 0x00000000 (1 active, 1 total) VBLANK_A: 0x00000000 (1 start, 1 end) VSYNC_A: 0x00000000 (1 start, 1 end) BCLRPAT_A: 0x00000000 VSYNCSHIFT_A: 0x00000000 DSPBCNTR: 0xd9000400 (enabled, pipe B) DSPBSTRIDE: 0x00001600 (5632 bytes) DSPBPOS: 0x00000000 (0, 0) DSPBSIZE: 0x00000000 (1, 1) DSPBBASE: 0x00000000 DSPBSURF: 0x0588a000 DSPBTILEOFF: 0x00000000 PIPEBCONF: 0xc0000000 (enabled, active) PIPEBSRC: 0x055502ff (1366, 768) PIPEBSTAT: 0x00440100 (status: LBLC_EVENT_ENABLE SVBLANK_INT_ENABLE DLINE_COMPARE_STATUS) PIPEB_GMCH_DATA_M: 0x00000000 PIPEB_GMCH_DATA_N: 0x00000000 PIPEB_DP_LINK_M: 0x00000000 PIPEB_DP_LINK_N: 0x00000000 CURSOR_B_BASE: 0x00000000 CURSOR_B_CONTROL: 0x10000000 CURSOR_B_POSITION: 0x029a0148 FPB0: 0x00021309 (n = 2, m1 = 19, m2 = 9) FPB1: 0x00021309 (n = 2, m1 = 19, m2 = 9) DPLL_B: 0x98046c00 (enabled, non-dvo, spread spectrum clock, LVDS mode, p1 = 3, p2 = 14) DPLL_B_MD: 0x00000000 HTOTAL_B: 0x05c70555 (1366 active, 1480 total) HBLANK_B: 0x05c70555 (1366 start, 1480 end) HSYNC_B: 0x05a50585 (1414 start, 1446 end) VTOTAL_B: 0x030b02ff (768 active, 780 total) VBLANK_B: 0x030b02ff (768 start, 780 end) VSYNC_B: 0x03060301 (770 start, 775 end) BCLRPAT_B: 0x00000000 VSYNCSHIFT_B: 0x00000000 VCLK_DIVISOR_VGA0: 0x00031108 VCLK_DIVISOR_VGA1: 0x00031406 VCLK_POST_DIV: 0x00020002 VGACNTRL: 0x80000000 (disabled) TV_CTL: 0x00000010 TV_DAC: 0x70000000 TV_CSC_Y: 0x00000000 TV_CSC_Y2: 0x00000000 TV_CSC_U: 0x00000000 TV_CSC_U2: 0x00000000 TV_CSC_V: 0x00000000 TV_CSC_V2: 0x00000000 TV_CLR_KNOBS: 0x00000000 TV_CLR_LEVEL: 0x00000000 TV_H_CTL_1: 0x00000000 TV_H_CTL_2: 0x00000000 TV_H_CTL_3: 0x00000000 TV_V_CTL_1: 0x00000000 TV_V_CTL_2: 0x00000000 TV_V_CTL_3: 0x00000000 TV_V_CTL_4: 0x00000000 TV_V_CTL_5: 0x00000000 TV_V_CTL_6: 0x00000000 TV_V_CTL_7: 0x00000000 TV_SC_CTL_1: 0x00000000 TV_SC_CTL_2: 0x00000000 TV_SC_CTL_3: 0x00000000 TV_WIN_POS: 0x00000000 TV_WIN_SIZE: 0x00000000 TV_FILTER_CTL_1: 0x00000000 TV_FILTER_CTL_2: 0x00000000 TV_FILTER_CTL_3: 0x00000000 TV_CC_CONTROL: 0x00000000 TV_CC_DATA: 0x00000000 TV_H_LUMA_0: 0x00000000 TV_H_LUMA_59: 0x00000000 TV_H_CHROMA_0: 0x00000000 TV_H_CHROMA_59: 0x00000000 FBC_CFB_BASE: 0x00000000 FBC_LL_BASE: 0x00000000 FBC_CONTROL: 0xe0000404 FBC_COMMAND: 0x08c80034 FBC_STATUS: 0x00000000 FBC_CONTROL2: 0x00000000 FBC_FENCE_OFF: 0x00000000 FBC_MOD_NUM: 0x000000f2 MI_MODE: 0x00000200 MI_ARB_STATE: 0x00000040 MI_RDRET_STATE: 0x00000000 ECOSKPD: 0x00000307 DP_B: 0x0000001c DPB_AUX_CH_CTL: 0x00050000 DPB_AUX_CH_DATA1: 0x00000000 DPB_AUX_CH_DATA2: 0x00000000 DPB_AUX_CH_DATA3: 0x00000000 DPB_AUX_CH_DATA4: 0x00000000 DPB_AUX_CH_DATA5: 0x00000000 DP_C: 0x00000018 DPC_AUX_CH_CTL: 0x00050000 DPC_AUX_CH_DATA1: 0x00000000 DPC_AUX_CH_DATA2: 0x00000000 DPC_AUX_CH_DATA3: 0x00000000 DPC_AUX_CH_DATA4: 0x00000000 DPC_AUX_CH_DATA5: 0x00000000 DP_D: 0x0000001c DPD_AUX_CH_CTL: 0x00050000 DPD_AUX_CH_DATA1: 0x00000000 DPD_AUX_CH_DATA2: 0x00000000 DPD_AUX_CH_DATA3: 0x00000000 DPD_AUX_CH_DATA4: 0x00000000 DPD_AUX_CH_DATA5: 0x00000000 AUD_CONFIG: 0x00000004 AUD_HDMIW_STATUS: 0x00000000 AUD_CONV_CHCNT: 0x00000000 VIDEO_DIP_CTL: 0x20000600 AUD_PINW_CNTR: 0x00000140 AUD_CNTL_ST: 0x00002000 AUD_PIN_CAP: 0x00000094 AUD_PINW_CAP: 0x004073bd AUD_PINW_UNSOLRESP: 0x00000000 AUD_OUT_DIG_CNVT: 0x00000001 AUD_OUT_CWCAP: 0x00006211 AUD_GRP_CAP: 0x00000004 FENCE START 0: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 0: 0x00000000 ( 0x00000000 end) FENCE START 1: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 1: 0x00000000 ( 0x00000000 end) FENCE START 2: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 2: 0x00000000 ( 0x00000000 end) FENCE START 3: 0x044920ad ( enabled, X tile walk, 5632 pitch, 0x04492000 start) FENCE END 3: 0x04991000 ( 0x04991000 end) FENCE START 4: 0x0588a0ad ( enabled, X tile walk, 5632 pitch, 0x0588a000 start) FENCE END 4: 0x05d89000 ( 0x05d89000 end) FENCE START 5: 0x0636e01d ( enabled, X tile walk, 1024 pitch, 0x0636e000 start) FENCE END 5: 0x06373000 ( 0x06373000 end) FENCE START 6: 0x0632601d ( enabled, X tile walk, 1024 pitch, 0x06326000 start) FENCE END 6: 0x0632b000 ( 0x0632b000 end) FENCE START 7: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 7: 0x00000000 ( 0x00000000 end) FENCE START 8: 0x0d6e701d ( enabled, X tile walk, 1024 pitch, 0x0d6e7000 start) FENCE END 8: 0x0d6ec000 ( 0x0d6ec000 end) FENCE START 9: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 9: 0x00000000 ( 0x00000000 end) FENCE START 10: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 10: 0x00000000 ( 0x00000000 end) FENCE START 11: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 11: 0x00000000 ( 0x00000000 end) FENCE START 12: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 12: 0x00000000 ( 0x00000000 end) FENCE START 13: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 13: 0x00000000 ( 0x00000000 end) FENCE START 14: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 14: 0x00000000 ( 0x00000000 end) FENCE START 15: 0x00000000 (disabled, X tile walk, 128 pitch, 0x00000000 start) FENCE END 15: 0x00000000 ( 0x00000000 end) INST_PM: 0x00000000 SDVO phase shift 0 out of range -- probobly not an issue. pipe A dot 100800 n 2 m1 21 m2 9 p1 3 p2 10 pipe B dot 69047 n 2 m1 19 m2 9 p1 3 p2 14
I can easily reproduce this :(. Thinkpad X301, gentoo, kernel 2.6.36, xorg-server 1.9.1, mesa 7.8.2, xf86-video-intel-2.13.0, libdrm 2.4.22. Can I help somehow?
Sorry, in my case this happens after hibernate. Perhaps this is another bug?
yes, the hibernate version is probably a different bug, but might be related
If my analysis of the visual corruptions is correct, this should be fixed by my swizzling branch, specifically http://cgit.freedesktop.org/~danvet/drm/commit/?h=swizzling&id=b337d071edfc0a161667fb5eb67188f6d1e07428
Ok, my analysis is wrong. Looking further at the corruptions on the wallpaper, it looks like bit11/10/9 get swizzled into bit6 for an entire X-tiled page. If the DCC from the intel_reg_dump comes from the same configuration as the screenshot, then that would mean that some of the pages get swizzled (with the detected bit11/10/9 configuration) and some don't. Once the pages go through swap once and get reallocated, all hell breaks loose.
I have a GM45 integrated GPU (Lenovo R500 laptop) and suffer from the same bug. The issue seems correlated with page faults, whether those faults come from memory pressure during normal usage or due to all the shenanigans that suspend and hibernate put on the MM subsystem. Running v1.1 of intel-gpu-tools's make test shows failure of the pread tests designed to find the swizzling bugs (didn't save the log the first time, and I'm in X at the moment so can't rerun). I can attach that log as well as whatever else should someone deem them useful, otherwise I'll keep the noise level down. Regardless, please ask if there are things you wish me to try. After skimming google on the topic of the GM45 and swizzle bugs it sounds like the core issue is the mixed (L-shaped) memory layout? In which case perhaps the swizzle memory region boundary can be detected as per the i-g-t tests? Finally, thanks for all your good work on the i915 driver, despite the hiccups it's still nice to finally have a more modern graphics stack under Linux.
Well, if you can easily reproduce it with the i-g-t test, we might get a handle on this. Please attach the following things: - anything the test spits out. - detailed description of your ram dimms (i.e. which slot, total sizes, ranks, anything else you can grab from the bios). Alternatively you can also read out the dimm details using the i2c tool decode-dimm. - an i915 register dump from i-g-t/tools/intel_reg_dumper. - full dmesg. Thanks, Daniel
Created attachment 55310 [details] RAM layout, R500 Lenovo laptop w/GM45 This is actually the full DMI decode for the laptop BIOS, but the RAM section is located almost at the very top. Two slots filled, 3072MB memory.
Created attachment 55311 [details] Output from `make test` minus gem_partial_pwrite_pread test the gem_partial_pwrite_pread test spins at 100% CPU for over 11 minutes, repeatedly (twice, anyway). Killed it and removed the test from tests/Makefile to get the other tests running. If I was merely being impatient, I can rerun it, but looking at the source for that test I'm at a loss as to why it was spinning so long.
Created attachment 55312 [details] Dmesg from the R500, note the warn/oops at the end After the kernel whinged I saved things off, tried to bring X up again in hopes of it being merely a warning, hit a hard lock. Rebooted.
Created attachment 55313 [details] Register dump from within X output of the i-g-t/tools/intel_reg_dumper. Run from within X, but can rerun from outside X if that's important.
On Sun, Jan 08, 2012 at 07:55:42PM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=28813 > > --- Comment #17 from Ray Lee <ray-bugzilla@madrabbit.org> 2012-01-08 11:55:42 PST --- > Created attachment 55312 [details] > --> https://bugs.freedesktop.org/attachment.cgi?id=55312 > Dmesg from the R500, note the warn/oops at the end > > After the kernel whinged I saved things off, tried to bring X up again in hopes > of it being merely a warning, hit a hard lock. Rebooted. The oops is a known bug in one of the debugfs files. Patch is on track to get merged.
In the intel-gpu-tools packages is also tools/intel_reg_dumper to read out arbitraray registers. Can you read out the following ones: 0x10200 0x10204, 0x100e0, 0x11234, 0x11334 Also, do you by chance have another 1G dimm so that you could create a symmetric 2x 1G memory configuration? If that's possible, please check with the i-g-t tests whether that works (it should) and grab the same set of registers.
I suspect you wanted intel_reg_read? If so: $ for i in 0x10200 0x10204 0x100e0 0x11234 0x11334; do sudo ./intel_reg_read $i; done 0x10200 : 0xF0002 0x10204 : 0x10 0x100E0 : 0x0 0x11234 : 0x910C1800 0x11334 : 0x910C1800 Sadly, I have no matching memory for this machine. Checking online it looks like I could use this as an excuse to cheaply upgrade to 2 x 4G. OTOH I'm running a 32-bit kernel at the moment, and I've no idea how that layout would interact with the kernel and driver, so I'll let you guide me on that one. (I suppose I could try to reproduce this with a 64-bit kernel, to rule that out.) Anyway, let me know if the above isn't what you were looking for, or if you think you could serve as my excuse for an upgrade :-)
Totally forgotten: Ray, can you please also add a screenshot of typical corruptions? Just to check that the pattern of the corruption is what I expect it to be ... (i.e. some larger picture where it's easy to guess what it should really look like is ideal).
A few other things to test: Can you boot with mem=2G and check with the i-g-t tests that things work as expected? If you're adventurous and have decent backups, can you try intel_reg_write 0x10204 0x100010 This will set bit20 in ddc2, which I hope controls the xor decoding of the gpu. If we're unlucky it also controls xor decoding for the cpu, which will result in randomly-looking corruptions in the top 1G of memory. Which will surely eat any data that's there :( Still from a in-depth reading of the mch docs that bit is about my only hope, so if you dare (maybe on a new install in an otherwise empty hd) to test this it would be interesting to see what happens. Maybe attach the screenshot with corruptions and to the mem=2G test first ;-)
Created attachment 55678 [details] Corruption of wallpaper Never a failure around when you need one. This took a while to occur, despite trying to encourage the issue with memory pressure and suspend cycles. Corruption in this case was limited to wallpaper, but on other occasions window borders or fonts would also exhibit corruption in the same way.
> --- Comment #24 from Ray Lee <ray-bugzilla@madrabbit.org> 2012-01-17 07:17:13 PST --- > Created attachment 55678 [details] > --> https://bugs.freedesktop.org/attachment.cgi?id=55678 > Corruption of wallpaper Thanks for the picture, it's definitely swizzling mistmatch failure exactly like in the screenshot already attached to the bug.
Apologies for the delay, but due to travel constraints I'm going to need a bit more time. I think I can run through the tests you asked, however, by booting from a USB device instead. I'll leave my root partition unmounted which should keep things happy. I am definitely running a backup first, though :-)
On Mon, Jan 30, 2012 at 18:38, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=28813 > > --- Comment #26 from Ray Lee <ray-bugzilla@madrabbit.org> 2012-01-30 09:38:01 PST --- > Apologies for the delay, but due to travel constraints I'm going to need a bit > more time. I think I can run through the tests you asked, however, by booting > from a USB device instead. I'll leave my root partition unmounted which should > keep things happy. I am definitely running a backup first, though :-) Some one else already tried it on a i965gm with the same issues and it doesn't work - hw doesn't allow this bit to be flipped after initial setup :( I'll look into other options, but that might take a while ...
*** Bug 31704 has been marked as a duplicate of this bug. ***
Still reproducible with kernel 3.2.11 mesa 8.0.2 xf86-video-intel 2.17 xorg-server 1.11.4 libdrm 2.4.33 Gentoo ~x86, Thinkpad X301. Please, help!
> --- Comment #29 from Serge Gavrilov <serge@pdmi.ras.ru> 2012-04-01 > Please, help! Well, I now have a gm45 machine myself and I can reproduce this issue with the intel-gpu-tools test. Bad news is that I still have no idea how we could fix this :(
Seems to be related with https://bugzilla.kernel.org/show_bug.cgi?id=37142 They claim there that tuxonice kernel does not suffer from corruption after hibernate. Trying to check this...
On Fri, May 25, 2012 at 12:02 PM, <bugzilla-daemon@freedesktop.org> wrote: > --- Comment #31 from Serge Gavrilov <serge@pdmi.ras.ru> 2012-05-25 03:02:49 PDT --- > Seems to be related with https://bugzilla.kernel.org/show_bug.cgi?id=37142 > > They claim there that tuxonice kernel does not suffer from corruption after > hibernate. Trying to check this... Imo red herring. If you swap out and the swap in at a different place, things can get corrupted. Maybe tuxonice tries harder to swap stuff back at the same place, but that does in no way fix the underlying bug.
Is there a way to help track and fix this bug ? I have been having it for years, on both Ubuntu and Arch Linux, with kernels from 2.6.32 to 3.6.9 at least and numerous X.org and xorg-video-intel versions. Test system is a Dell Latitude E4200 with GM45.
Created attachment 71911 [details] [review] Hack to keep tiled pages pinned This should prevent tiled buffers from being paged out whilst they are referenced by userspace.
Created attachment 71924 [details] [review] Hack to keep tiled pages pinned
Created attachment 71948 [details] kernel bug screen Hm, do I need some other patch/some updated driver ? I applied this patch to linux-3.7.1 (using Arch's testing linux-3.7.1-2 https://www.archlinux.org/packages/testing/x86_64/linux/) and the kernel died when X started (see attached picture for the (blurry) kernel backtrace). Test setup was running xorg-video-intel 2.20.16 and xorg-server 1.13.1
Created attachment 72183 [details] [review] Hack to keep tiled pages pinned Sorry about that bad patch, hopefully this one is less buggy.
(In reply to comment #37) > Created attachment 72183 [details] [review] [review] > Hack to keep tiled pages pinned > > Sorry about that bad patch, hopefully this one is less buggy. Guillame, please retest with this patch.
(In reply to comment #37) > Created attachment 72183 [details] [review] [review] > Hack to keep tiled pages pinned > > Sorry about that bad patch, hopefully this one is less buggy. I have tested the latest patch applied against kernel 3.7.4 (Gentoo, x86, Lenovo X301). Kernel seems to be stable, and I cannot reproduce the problem. Thank you very much for squashing this very old and ugly bug! Is it possible to include the patch into mainstream kernel? Again, thanks a lot.
That's good to know. The last remaining detail is a way to detect the L shape memory configurations. There should be enough info in the CSPEC that the memory has different swizzle regions (even if not enough info to work out the swizzle) - as we really only want to enable this page pinning as a means of last resort.
Found a potential lead in the 4 series chipset datasheet: 5.2.1 CHDECMISC—Channel Decode Miscellaneous B/D/F/Type: 0/0/0/MCHBAR Address Offset:111h Default Value:00h Access: R/W/L, R/W Size: 8 bits Bit 4, written by bios then locked: L-Shaped GFX Tile Cycle (LGFXTLCYC): This bit forces graphics tiled cycles in L-shaped memory configuration to modify bit 6 of the address. This field should be set to 1 only when L-mode memory configuration is enabled and should be set to 0 for all other memory configurations. This bit is locked by ME stolen Memory lock.
Created attachment 81967 [details] [review] Print out more DRAM info Can you please apply this patch and paste the output of /sys/kernel/debug/dri/0/i915_swizzle_info? Thanks.
Hmm, the mobile chipset are different. DCC and the channel descriptors are at different locations.
Created attachment 81968 [details] [review] Print out more DRAM info (mobiles), step 2
Created attachment 81969 [details] [review] A lead?
Created attachment 81970 [details] [review] Hack, redux
Does anyone still care about this? We kinda can't fix tiled swapping on these boxes, and ickle's hack breaks a few assumptions here and there. So if we still have unhappy users out there, please speak up.
Yes, I am care. Unfortunately, I did not read your messages from July. I will try to patch the kernel and print the necessary output in the evening. What kernel I need to patch?
bit6 swizzle for X-tiling = bit9/bit10/bit11 bit6 swizzle for Y-tiling = bit9/bit11 C0DRB0 = 0x0020 C0DRB1 = 0x0040 C0DRB2 = 0x0040 C0DRB3 = 0x0040 C1DRB0 = 0x0000 C1DRB1 = 0x0000 C1DRB2 = 0x0000 C1DRB3 = 0x0000 C1DRB3 = 0x0000
(In reply to comment #48) > Yes, I am care. Unfortunately, I did not read your messages from July. I > will try to patch the kernel and print the necessary output in the evening. > What kernel I need to patch? If Chri's hunch for the bit is right then we neeed to know the value of register 0x10111 and 0x10200 (Chris' patch had a bug for dumping those regs): # intel_reg_read 0x10111 # intel_reg_read 0x10200 Then I can confirm with my own gm45 (which is also affected iirc) and update the patch.
What should I do? :)
Execute the listed commands as root. You need to install the intel-gpu-tools package to get them.
0x10111 : 0x0 0x10200 : 0xF0002
I have the same: 0x10111 : 0x0 0x10200 : 0xF0002 Are these values helpful?
Could you be so kind to provide the patch for a newer kernel?
Ok I've finally gotten around to polish Chris' patch and update testcase: http://patchwork.freedesktop.org/patch/37073/ As soon as I have a few tested-by reports I'll pull this in, so please go wild. Patch applies on top of latest drm-intel-nightly.
Workaround is now merged into drm-intel-nightly, should land in 3.19 commit 14a369b6c9bdb40cebdac5a248321a05119fe02b Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 20 09:26:30 2014 +0100 drm/i915: Pin tiled objects for L-shaped configs Note that this is v2, v1 was a bit WARNING-happy.
*** Bug 86898 has been marked as a duplicate of this bug. ***
*** Bug 87065 has been marked as a duplicate of this bug. ***
*** Bug 88924 has been marked as a duplicate of this bug. ***
Created attachment 113489 [details] screenshots of possably related bug possably related?
Created attachment 113490 [details] another screenshot of a possably related bug Possably related bug?
*** Bug 90725 has been marked as a duplicate of this bug. ***
Actually I still have the issue with kernel 4.0.2 (see Bug 90725 marked as duplicate) My device is as follows, which is appearently Gen3 - 00:02.0 VGA compatible controller: Intel Corporation 82946GZ/GL Integrated Graphics Controller (rev 02) - 00:02.0 0300: 8086:2972 (rev 02) Since your patch is designed for Gen4 could that explain that I'm still facing the issue ? Should I reopen this one, or reopen 90725 ?
# cat i915_swizzle_info bit6 swizzle for X-tiling = none bit6 swizzle for Y-tiling = none DDC = 0x00200010 DDC2 = 0x00200020 C0DRB3 = 0x0020 C1DRB3 = 0x0010
(In reply to Fab Stz from comment #64) > Since your patch is designed for Gen4 could that explain that I'm still > facing the issue ? No, only that I guessed incorrectly you had gen4 given the tiling artifacts. > Should I reopen this one, or reopen 90725 ? Hmm, reopen bug 90725 and this time add your Xorg.0.log! Can you also please use xf86-video-intel.git to rule out one gen3 swizzling bug in the process.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.