Created attachment 99556 [details] dmesg System Environment: -------------------------- Platform: Broadwell Libdrm: (master)libdrm-2.4.54-9-g8fc62ca8ac010659023bb63c4759eb683de4f9af Mesa: (master)c524f3ef9155caba6cd4f9fc72485426b1da76fd Xserver:(master)xorg-server-1.15.99.902-91-g01e18af17f8dc91451fbd0902049045afd1cea7e Xf86_video_intel:(master)2.99.911-178-g197ab0cda06c44aa1a2b17bf69ac08612811b107 Libva: (staging)35e70cb9b9c77dfb99fb370e319ed501f0c31b17 Libva_intel_driver: (staging)1d1b8da1284f7f918733db79428f09af38d7e14a Kernel: (drm-intel-nightly) 36765340cb068dec1216342bfcdbf2678ec29860 Bug detailed description: ----------------------------- It fails and causes X server shutdown. It happens on Broadwell with -queued and -nightly kernel, works well on -fixes kernel. The latest known good commit: 229b0489aa75a8c51d2f2e124329d3ac326f326d The latest known bad commit: bc76e320f21f8bd790a72bd5dc06909617432352 output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" xinit: connection to X server lost after 550 requests (550 known processed) with 0 events remaining. xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit Reproduce steps: ---------------------------- 1. xinit 2. ./rendercheck -o src,over,overreverse,xor -t dcoords
Please bisect the kernel and file another bug for X crashing and include the Xorg.0.log and gdb stacktrace (bt full).
5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06 is the first bad commit commit 5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 16 14:22:37 2014 +0100 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Fri May 16 19:31:29 2014 +0200 drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Created attachment 99571 [details] Xorg.0.log
gdb --pid=`pidof Xorg` required.
Also please update i-g-t to commit 9911f3f0cf202444f1ef2399f5961605880b7360 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 22 10:20:33 2014 +0100 igt/gem_userptr_blits: Fix up last minute API changes When the patch was merged, the ioctl numbers had to be adjusted to leave no holes. Also there was a final piece of munging of the API to downgrade unsynced userptr for export over dma-buf. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> and run igt/gem_userptr_blits.
(In reply to comment #5) > Also please update i-g-t to > > commit 9911f3f0cf202444f1ef2399f5961605880b7360 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu May 22 10:20:33 2014 +0100 > > igt/gem_userptr_blits: Fix up last minute API changes > > When the patch was merged, the ioctl numbers had to be adjusted to leave > no holes. Also there was a final piece of munging of the API to > downgrade unsynced userptr for export over dma-buf. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > and run igt/gem_userptr_blits. Run this case, system will lose network connection(BTW, this case takes more than 30 minutes). Then run reboot, system is no response. output: IGT-Version: 1.6-gff3c122 (x86_64) (Linux: 3.15.0-rc5_drm-intel-nightly_f5b0cc_20140522+ x86_64) Aperture size is 4096 MiB Total RAM is 3852 MiB Subtest input-checking: SUCCESS Subtest usage-restrictions: SUCCESS Subtest invalid-mapping: SUCCESS Subtest forbidden-operations: SUCCESS Testing unsynchronized mappings... Subtest create-destroy-unsync: SUCCESS Subtest unsync-overlap: SUCCESS Subtest unsync-unmap: SUCCESS Subtest unsync-unmap-cycles: SUCCESS Subtest unsync-unmap-after-close: SUCCESS Using 2x2730 1MiB buffers Verifying initialisation... Cyclic blits cpu->gpu, forward... Cyclic blits gpu->cpu, backward... Random blits... Subtest coherency-unsync: SUCCESS Subtest dmabuf-unsync: SUCCESS Subtest forked-unsync-normal: SUCCESS Subtest forked-unsync-interruptible: SUCCESS Subtest forked-unsync-swapping-normal: SUCCESS Subtest forked-unsync-swapping-interruptible: SUCCESS Subtest forked-unsync-multifd-normal: SUCCESS Subtest forked-unsync-multifd-interruptible: SUCCESS Subtest forked-unsync-swapping-multifd-normal: SUCCESS Subtest forked-unsync-swapping-multifd-interruptible: SUCCESS Subtest forked-unsync-mempressure-normal: SUCCESS Subtest forked-unsync-mempressure-interruptible: SUCCESS Subtest forked-unsync-swapping-mempressure-normal: SUCCESS Subtest forked-unsync-swapping-mempressure-interruptible: SUCCESS Subtest forked-unsync-multifd-mempressure-normal: SUCCESS Subtest forked-unsync-multifd-mempressure-interruptible: SUCCESS Subtest forked-unsync-swapping-multifd-mempressure-normal: SUCCESS
Created attachment 99603 [details] dmesg(gem_userptr_blits)
(gdb) bt #0 0x00007fda77ae0f77 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fda77ae45e8 in __GI_abort () at abort.c:90 #2 0x00007fda77b1e4fb in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fda77c32240 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:199 #3 0x00007fda77b2a996 in malloc_printerr (ptr=0x2527c30, str=0x7fda77c32370 "double free or corruption (out)", action=3) at malloc.c:4923 #4 _int_free (av=<optimized out>, p=0x2527c20, have_lock=0) at malloc.c:3779 #5 0x0000000000436cfc in DoGetImage (planemask=<optimized out>, height=5, width=5, y=0, x=<optimized out>, drawable=<optimized out>, format=<optimized out>, client=0x252b490) at dispatch.c:2170 #6 ProcGetImage (client=0x252b490) at dispatch.c:2181 #7 0x0000000000439bde in Dispatch () at dispatch.c:432 #8 0x000000000043db0a in dix_main (argc=2, argv=<optimized out>, envp=<optimized out>) at main.c:296 #9 0x00007fda77acbde5 in __libc_start_main (main=0x428250 <main>, argc=2, ubp_av=0x7fff365f6448, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff365f6438) at libc-start.c:260 #10 0x0000000000428281 in _start ()
Any other platforms affected?
(In reply to comment #9) > Any other platforms affected? Only happens on BDW.
It works well with UXA.
(In reply to comment #11) > It works well with UXA.
(In reply to comment #11) > It works well with UXA. How? UXA doesn't even use this code.
Please recompile the ddx with --enable-debug.
Also note that a bisect result that points to the enabling of a feature is not exhaustative. If you compile the ddx with --enable-userptr you can regression check the ddx over the last couple of years. If you reorder the kernel commits, you can regression check the kernel over the introduction of bdw support. Without taking those steps, it is misleading to point to the feature commit as the *root cause* of the regression.
Re-adding regression tag since it's an overall degradation, so need to track it as such.
(In reply to comment #14) > Please recompile the ddx with --enable-debug. add --enable-debug (gdb) bt #0 0x00007f3d325b2f77 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f3d325b65e8 in __GI_abort () at abort.c:90 #2 0x00007f3d325f04fb in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f3d32704240 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:199 #3 0x00007f3d325fc996 in malloc_printerr (ptr=0xf31570, str=0x7f3d32704370 "double free or corruption (out)", action=3) at malloc.c:4923 #4 _int_free (av=<optimized out>, p=0xf31560, have_lock=0) at malloc.c:3779 #5 0x0000000000434632 in DoGetImage (planemask=<optimized out>, height=5, width=5, y=0, x=0, drawable=<optimized out>, format=<optimized out>, client=<optimized out>) at dispatch.c:2170 #6 ProcGetImage (client=<optimized out>) at dispatch.c:2181 #7 0x0000000000437427 in Dispatch () at dispatch.c:432 #8 0x000000000043b20a in dix_main (argc=2, argv=0x7fffeaabfb88, envp=<optimized out>) at main.c:296 #9 0x00007f3d3259dde5 in __libc_start_main (main=0x4268f0 <main>, argc=2, ubp_av=0x7fffeaabfb88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffeaabfb78) at libc-start.c:260 #10 0x0000000000426921 in _start () output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000
Hmm, so it passes the sanity checks. Next up: Please test with diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c index f104a69..32f579a 100644 --- a/src/sna/sna_accel.c +++ b/src/sna/sna_accel.c @@ -73,7 +73,7 @@ #define USE_ZERO_SPANS 1 /* -1 force CPU, 1 force GPU */ #define USE_CPU_BO 1 #define USE_USERPTR_UPLOADS 1 -#define USE_USERPTR_DOWNLOADS 1 +#define USE_USERPTR_DOWNLOADS 0 #define USE_COW 1 #define UNDO 1
(In reply to comment #18) > Hmm, so it passes the sanity checks. Next up: > > Please test with > > diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c > index f104a69..32f579a 100644 > --- a/src/sna/sna_accel.c > +++ b/src/sna/sna_accel.c > @@ -73,7 +73,7 @@ > #define USE_ZERO_SPANS 1 /* -1 force CPU, 1 force GPU */ > #define USE_CPU_BO 1 > #define USE_USERPTR_UPLOADS 1 > -#define USE_USERPTR_DOWNLOADS 1 > +#define USE_USERPTR_DOWNLOADS 0 > #define USE_COW 1 > #define UNDO 1 Works well with this patch.
Ok, it looks like bdw is just rendering outside of its target. It could just be a bug with the render setup, or the hardware might genuinely foul bytes within the page surrounding the object.
This should force that operation to use the BLT rather than the render pipeline, please test. diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c index a307d9e..7e6f9c8 100644 --- a/src/sna/sna_accel.c +++ b/src/sna/sna_accel.c @@ -16272,6 +16272,10 @@ sna_get_image__blt(PixmapPtr pixmap, dst_bo->pitch = pitch; kgem_bo_mark_unreusable(dst_bo); + kgem_set_mode(&sna->kgem, KGEM_BLT, dst_bo); + kgem_bo_mark_busy(priv->gpu_bo, KEM_BLT); + kgem_bo_mark_busy(dst_bo, KEM_BLT); + ok = sna->render.copy_boxes(sna, GXcopy, pixmap, priv->gpu_bo, 0, 0, pixmap, dst_bo,
Created attachment 100509 [details] [review] Check offset alignment before RENDER with a userptr If my hunch is correct, this should be the right fix.
Which is consistent with BDW being the first to fail, since that has more severe alignment restrictions than gen7 and earlier.
(In reply to comment #22) > Created attachment 100509 [details] [review] [review] > Check offset alignment before RENDER with a userptr > > If my hunch is correct, this should be the right fix. Test this patch, It still exists. output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" xinit: connection to X server lost after 550 requests (550 known processed) with 0 events remaining. xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X s erver ":0" [1]+ Done xinit (wd: /GFX/build/component/Xf86_video_intel/ xf86-video-intel) (wd now: /GFX/Test/Rendercheck/bin)
Created attachment 100511 [details] [review] Check offset alignment before RENDER with a userptr A more complete version of that patch, worth testing as well.
(In reply to comment #25) > Created attachment 100511 [details] [review] [review] > Check offset alignment before RENDER with a userptr > > A more complete version of that patch, worth testing as well. rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" xinit: connection to X server lost after 550 requests (550 known processed) with 0 events remaining. xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit (wd: /GFX/build/component/Xf86_video_intel/xf86-video-intel) (wd now: /GFX/Test/Rendercheck/bin)
Did you test with comment 21? How about if you use the following xorg.conf snippet? Section "Device" Identifier "Intel" Driver "intel" Option "AccelMethod" "blt" EndSection
(In reply to comment #27) > Did you test with comment 21? > > How about if you use the following xorg.conf snippet? > > Section "Device" > Identifier "Intel" > Driver "intel" > Option "AccelMethod" "blt" > EndSection It still has this issue. output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 550 requests (550 known processed) with 0 events remaining. xinit: connection to X server lost xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit
Whilst I think this is related to the userptr blit, there is a chance that the corruption is due to a render operation whilst the userptr is still bound in the aperture. So can I ask you retest with the latest stack?
(In reply to comment #29) > Whilst I think this is related to the userptr blit, there is a chance that > the corruption is due to a render operation whilst the userptr is still > bound in the aperture. So can I ask you retest with the latest stack? Test on latest -nightly kernel and X.It still exists. output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 550 requests (550 known processed) with 0 events remaining. xinit: connection to X server lost xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit (wd: /opt) (wd now: /GFX/Test/Rendercheck/bin)
Chris, based on this latest testing result, do you have any more ideas on this bug?
(In reply to comment #31) > Chris, based on this latest testing result, do you have any more ideas on > this bug? It's a stray write from the render ring (I believe) that clobbers the blt used to transfer the results back. But no, I don't have enough to pin down where or why?
Is it possible to use an ITP, setting a watchpoint on the memory location that is being written to, so you can grab the IP and trace it back to the offending function? Or, is it more like the HW is being programmed incorrectly and therefore scribbles past where it should? Damien has an ITP if needed.
(In reply to comment #33) > Or, is it more like the HW is being programmed > incorrectly and therefore scribbles past where it should? I did a run on fulsim/bdw yesterday now that the internal tree has userptr as well. It does not suffer the same stray writes, or issue any new complaints. But we have already experienced that the hardware behaves differently to the simulator...
Meh, I can do diff --git a/src/sna/kgem.c b/src/sna/kgem.c index dfaf878..6a9543d 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -1063,6 +1063,9 @@ static bool test_has_userptr(struct kgem *kgem) if (kgem->gen == 040) return false; + if (kgem->gem >= 0100) + return false; /* FIXME */ + if (posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE)) return false; to hide the issue until fixed. In fact, done. Please let me know if they are another issues that show up on bdw that may help explain this. commit 3f4da671b0a570de8734cc668d8e8975c2fdbd90 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 25 08:01:57 2014 +0100 sna: Disable userptr for bdw Something fishy is afoot. But let's kill this particularly worrisome regression so that we can do a full round of testing. References: https://bugs.freedesktop.org/show_bug.cgi?id=79053 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Test on commit ea6b0bb8e1b358d728d80daac936ec6521154302, it works well.
Fixed.
It's not fixed yet - just the immediate crash masked.
Jani, the bug is low priority as it no longer affects the next release.
And the [regression] if that makes you feel happier.
(In reply to comment #40) > And the [regression] if that makes you feel happier. I'm indifferent, but my script is a stubborn little bugger.
I can't make progress on this bug without access to bdw.
Mika tried with userptr re-enabled and with using userptr for the dcoords readback and found it worked... So let's re-enable userptr and cross our fingers. commit 9e397830f5ea56f2e43430666c78ffc574058d45 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 11 17:02:37 2014 +0100 sna/gen8: Re-enable userptr Current testing says that it is stable now... Let's see how long that holds. References: https://bugs.freedesktop.org/show_bug.cgi?id=79053 Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Test on commit 9e397830f5ea56f2e43430666c78ffc574058d45, this issue still exists. output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" xinit: connection to X server lost after 550 requests (550 known processed) with 0 events remaining. xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit (wd: /GFX/build/component/Xf86_video_intel/xf86-video-intel) (wd now: /GFX/Test/Rendercheck/bin)
Hmm, the code path that originally failed is still disabled. Could you send me a log file of the crash with xf86-video-intel compiled with --enable-debug=full?
Mika, mind if I add this to the list of things for you to investigate? It should be a nice change of pace...
add --enable-debug=full, make fail. make[4]: Leaving directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel/src/sna/fb' make[4]: Entering directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel/src/sna' CC blt.lo CC kgem.lo CC sna_accel.lo sna_accel.c: In function 'sna_poly_zero_line_blt': sna_accel.c:8920:7: warning: variable 'degenerate' set but not used [-Wunused-but-set-variable] bool degenerate = true; ^ CC sna_acpi.lo CC sna_blt.lo CC sna_composite.lo CC sna_cpu.lo CC sna_damage.lo CC sna_display.lo In file included from sna.h:112:0, from sna_display.c:43: sna_display.c: In function 'sna_output_backlight_off': sna_display.c:630:33: error: 'output' undeclared (first use in this function) DBG(("%s(%s)\n", __FUNCTION__, output->name)); ^ fb/fb.h:43:21: note: in definition of macro 'DBG' #define DBG(x) LogF x ^ sna_display.c:630:33: note: each undeclared identifier is reported only once for each function it appears in DBG(("%s(%s)\n", __FUNCTION__, output->name)); ^ fb/fb.h:43:21: note: in definition of macro 'DBG' #define DBG(x) LogF x ^ sna_display.c: In function 'sna_output_backlight_on': sna_display.c:638:33: error: 'output' undeclared (first use in this function) DBG(("%s(%s)\n", __FUNCTION__, output->name)); ^ fb/fb.h:43:21: note: in definition of macro 'DBG' #define DBG(x) LogF x ^ sna_display.c:642:17: warning: declaration of 'output' shadows previous non-variable [-Wshadow] xf86OutputPtr output = sna_output->base; ^ make[4]: *** [sna_display.lo] Error 1 make[4]: Leaving directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel/src/sna' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel/src/sna' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/GFX/build/component/Xf86_video_intel/xf86-video-intel' make: *** [all] Error 2
(In reply to comment #47) > add --enable-debug=full, make fail. commit e3edf2948467ad989590a347ffe687780192be16 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 12 07:53:12 2014 +0100 sna: DBG compile fix Dereference the right output for the name in the new backlight functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Created attachment 106166 [details] Xorg.0.log(debug-full)
(In reply to comment #49) > Created attachment 106166 [details] > Xorg.0.log(debug-full) That's odd, the log just stops. Was there anything on the terminal or in the gdm (or equiv) logs?
[ 2856.227] (II) intel(0): SNA initialized with generic backend Hmm, that's broken.
[ 2856.225] (**) intel(0): Option "AccelMethod" "blt" is why.
Hmm, that log makes the failure path look even simpler: a single BLT copy in the batch to write into the userptr. It would be the same code path on every generation (certainly if Option "AccelMethod" "BLT" is used). Something you can do for a quick check: ickle@nuc-i3427:/usr/src/xf86-video-intel$ git diff diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 66adae8..b1cf92a 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -2581,7 +2581,7 @@ bool __kgem_ring_is_idle(struct kgem *kgem, int ring) return true; } -#if 0 +#if 1 static void kgem_commit__check_reloc(struct kgem *kgem) { struct kgem_request *rq = kgem->next_request; which enables the sanity check on the kernel relocation values.
(In reply to comment #53) > Hmm, that log makes the failure path look even simpler: a single BLT copy in > the batch to write into the userptr. It would be the same code path on every > generation (certainly if Option "AccelMethod" "BLT" is used). > > Something you can do for a quick check: > > ickle@nuc-i3427:/usr/src/xf86-video-intel$ git diff > diff --git a/src/sna/kgem.c b/src/sna/kgem.c > index 66adae8..b1cf92a 100644 > --- a/src/sna/kgem.c > +++ b/src/sna/kgem.c > @@ -2581,7 +2581,7 @@ bool __kgem_ring_is_idle(struct kgem *kgem, int ring) > return true; > } > > -#if 0 > +#if 1 > static void kgem_commit__check_reloc(struct kgem *kgem) > { > struct kgem_request *rq = kgem->next_request; > > > which enables the sanity check on the kernel relocation values. Test this patch, It still fail. XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 550 requests (550 known processed) with 0 events remaining. xinit: connection to X server lost xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit (wd: /GFX/build/component/Xf86_video_intel/xf86-video-intel) (wd now: /GFX/Test/Rendercheck/bin)
The idea was that is might change how it failed to reveal more information... Hence a fresh debug log is required with the patch (sorry for not being clear).
Created attachment 106169 [details] Xorg.0.log(patch)
Thanks. One last little debugging technique to try. Please apply diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 66adae8..6e679b0 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -92,7 +92,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int num_pages, unsigned flags); #endif #define SHOW_BATCH_BEFORE 0 -#define SHOW_BATCH_AFTER 0 +#define SHOW_BATCH_AFTER 1 #if 0 #define ASSERT_IDLE(kgem__, handle__) assert(!__kgem_busy(kgem__, handle__)) and attach the resulting logfile.
Created attachment 106170 [details] Xorg.0.log(patch2) (In reply to comment #57) > Thanks. One last little debugging technique to try. Please apply > > > diff --git a/src/sna/kgem.c b/src/sna/kgem.c > index 66adae8..6e679b0 100644 > --- a/src/sna/kgem.c > +++ b/src/sna/kgem.c > @@ -92,7 +92,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int > num_pages, unsigned flags); > #endif > > #define SHOW_BATCH_BEFORE 0 > -#define SHOW_BATCH_AFTER 0 > +#define SHOW_BATCH_AFTER 1 > > #if 0 > #define ASSERT_IDLE(kgem__, handle__) assert(!__kgem_busy(kgem__, handle__)) > > and attach the resulting logfile. output: rendercheck 1.4 Render extension version 0.11 Window format: r8g8b8 Found server-supported format: a8 Found server-supported format: a8r8g8b8 Found server-supported format: x8r8g8b8 Found server-supported format: b8g8r8a8 Found server-supported format: b8g8r8x8 Found server-supported format: r8g8b8 Found server-supported format: b8g8r8 Found server-supported format: r5g5b5 Found server-supported format: b5g5r5 Found server-supported format: x1r5g5b5 Found server-supported format: x1b5g5r5 Found server-supported format: r5g6b5 Found server-supported format: b5g6r5 Found server-supported format: x8b8g8r8 Found server-supported format: x2r10g10b10 Found server-supported format: x2b10g10r10 Beginning dest coords test dst coords test error of 255.0000 at (0, 0) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 2) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (0, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (1, 1) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 3) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (1, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (2, 1) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (2, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (3, 2) -- R G B A got: 1.000 1.000 1.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 0.000 0.000 1.000 dst coords test error of 255.0000 at (3, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 1) -- R G B A got: 1.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 3) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 dst coords test error of 255.0000 at (4, 4) -- R G B A got: 0.000 0.000 0.000 1.000 expected: 1.000 1.000 1.000 1.000 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 550 requests (550 known processed) with 0 events remaining. xinit: connection to X server lost xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0" [1]+ Done xinit
It also happens on BSW.
triangles sporadically has this issue on BSW. Fail rate: 3/5.
Created attachment 107830 [details] [review] sna/kgem: Gen8 needs blt destination aligned with 32
(In reply to Mika Kuoppala from comment #61) > Created attachment 107830 [details] [review] [review] > sna/kgem: Gen8 needs blt destination aligned with 32 Fixed by this patch.
(In reply to Mika Kuoppala from comment #61) > Created attachment 107830 [details] [review] [review] > sna/kgem: Gen8 needs blt destination aligned with 32 Still feel slightly dubious about this - I think I need a more general check. But I'll wait on an investigative igt to see what the bad pattern is. If we can just get away with checking the origin alignment, it seems a simple enough w/a.
Gavin just asked about this, apparently it blocks qa testing a bit as a high-prio item. So can we pull in the patch as a duct-tape while we try to figure out with igts what might be really wrong.
Mika: We might also want to file an HSD about this if it's confirmed with a more minimal test.
Did this patch get merged?
(In reply to Gavin Hindman from comment #66) > Did this patch get merged? No, because it is not clear that it is the right solution. And the danger is if we do merge it we then ignore the hw issue...
Created attachment 109634 [details] [review] [PATCH] sna/kgem: Gen8 blt broken when dest base offset has bit 4 set Updated patch with more strict restriction to dest offset
The current igt to explore this bug is at: http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=79053 Please test rendercoords with the new patch attached.
(In reply to Mika Kuoppala from comment #69) > The current igt to explore this bug is at: > http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=79053 > > Please test rendercoords with the new patch attached. Firstly,I have tried to patch the [PATCH] sna/kgem: Gen8 blt broken when dest base offset has bit 4 set, but failed,it isn't a kernel patch ,right? And then with the new igt patch attached,case can run,and shows as follows: IGT-Version: 1.8-gaa63fc7 (x86_64) (Linux: 3.18.0-rc4_drm-intel-nightly_e49ebf_20141117+ x86_64) Aperture size is 4096 MiB Total RAM is 3795 MiB Subtest input-checking: SUCCESS (0.000s) Subtest usage-restrictions: SUCCESS (0.000s) Subtest invalid-null-pointer: SUCCESS (0.000s) Subtest invalid-gtt-mapping: SUCCESS (0.002s) Subtest forked-access: SUCCESS (0.001s) Subtest forbidden-operations: SUCCESS (0.000s) Testing unsynchronized mappings... Subtest create-destroy-unsync: SUCCESS (5.002s) Subtest unsync-overlap: SUCCESS (0.000s) Subtest unsync-unmap: SUCCESS (0.001s) Subtest unsync-unmap-cycles: SUCCESS (1.744s) Subtest unsync-unmap-after-close: SUCCESS (0.002s) Using 2x2730 1MiB buffers Verifying initialisation... Cyclic blits cpu->gpu, forward... Cyclic blits gpu->cpu, backward... Random blits... Subtest coherency-unsync: SUCCESS (445.795s) Subtest dmabuf-unsync: SUCCESS (0.027s) Subtest forked-unsync-normal: SUCCESS (0.402s) Subtest forked-unsync-interruptible: SUCCESS (0.547s) Subtest forked-unsync-swapping-normal: SUCCESS (0.475s) Subtest forked-unsync-swapping-interruptible: SUCCESS (0.351s) Subtest forked-unsync-multifd-normal: SUCCESS (0.366s) Subtest forked-unsync-multifd-interruptible: SUCCESS (0.609s) Subtest forked-unsync-swapping-multifd-normal: SUCCESS (0.412s) Subtest forked-unsync-swapping-multifd-interruptible: SUCCESS (0.282s) Subtest forked-unsync-mempressure-normal: SUCCESS (19.652s) Subtest forked-unsync-mempressure-interruptible: SUCCESS (19.636s) Subtest forked-unsync-swapping-mempressure-normal: SUCCESS (2.384s) Subtest forked-unsync-swapping-mempressure-interruptible: SUCCESS (2.317s) Subtest forked-unsync-multifd-mempressure-normal: SUCCESS (19.518s) Subtest forked-unsync-multifd-mempressure-interruptible: SUCCESS (19.516s) Subtest forked-unsync-swapping-multifd-mempressure-normal: SUCCESS (2.355s) Subtest forked-unsync-swapping-multifd-mempressure-interruptible: SUCCESS (2.211s) Subtest swapping-unsync-normal: SUCCESS (1792.547s) Estimated that we need 4298117120 bytes for the test, but only have 3903848448 bytes available (RAM) Test requirement not met in function minor_evictions, file eviction_common.c:80: Test requirement: intel_check_memory(total_surfaces, surface_size, CHECK_RAM) Subtest minor-unsync-normal: SKIP (0.284s) Estimated that we need 4613746688 bytes for the test, but only have 3903848448 bytes available (RAM) Test requirement not met in function major_evictions, file eviction_common.c:115: Test requirement: intel_check_memory(nr_surfaces, surface_size, CHECK_RAM) Subtest major-unsync-normal: SKIP (0.001s)
Created attachment 109657 [details] [review] Enforce alignment for new buffers Also we should do something like this... Definitely need the buffer tweak, but we might get away with the general bo as it currently stands, but it seems silly to risk it.
(In reply to Mika Kuoppala from comment #68) > Created attachment 109634 [details] [review] [review] > [PATCH] sna/kgem: Gen8 blt broken when dest base offset has bit 4 set > > Updated patch with more strict restriction to dest offset Woah... Any alignment with bit 4 set is broken? ARGH.
(In reply to Li Xu from comment #70) > (In reply to Mika Kuoppala from comment #69) > > The current igt to explore this bug is at: > > http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=79053 > > > > Please test rendercoords with the new patch attached. > Firstly,I have tried to patch the [PATCH] sna/kgem: Gen8 blt broken when > dest base offset has bit 4 set, but failed,it isn't a kernel patch ,right? > And then with the new igt patch attached,case can run,and shows as follows: The patch is for xf86-video-intel. Please test.
(In reply to Chris Wilson from comment #72) > (In reply to Mika Kuoppala from comment #68) > > Created attachment 109634 [details] [review] [review] [review] > > [PATCH] sna/kgem: Gen8 blt broken when dest base offset has bit 4 set > > > > Updated patch with more strict restriction to dest offset > > Woah... Any alignment with bit 4 set is broken? ARGH. And it applies to source also. Any alignment on either source or destination base offset with bit 4 set will fail. http://cgit.freedesktop.org/~miku/intel-gpu-tools/commit/?h=79053&id=7f15366454a0c5c409fcdd3e6508e960d09fd365
(In reply to Mika Kuoppala from comment #68) > Created attachment 109634 [details] [review] [review] > [PATCH] sna/kgem: Gen8 blt broken when dest base offset has bit 4 set > > Updated patch with more strict restriction to dest offset It passes with the patch.
commit 3a22b6f6d55a5b1e0a1c0a3d597996268ed439ad Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed Nov 19 15:10:05 2014 +0200 sna: gen8 BLT broken when address has bit 4 set With bit 4 set in address, the gen8 blitter fails and blits errorneously into the cacheline preceeding the destination and similarly when reading fro the source, corrupting memory. v2: Update the destination base offset pattern as revealed by igt/tests/gem_userptr_blits/destination-bo-align v3: Check base address as well Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79053 Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: xunx.fang@intel.com [v2] Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> commit 8dee52997891108eec8e4df12dd02f3a060d9cb8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 19 13:38:20 2014 +0000 sna: Add more checks and asserts for BLT capable bo Before we use the BLT for core acceleration, double check that we can. This should catch the case where we attempt to operate on SHM pixmaps which do not meet the restrictions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit a90cc3b3889fafbd91c11c42d068a9d6474e218b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 18 08:37:25 2014 +0000 sna: Tweak alignment constraints on gen8 to allow BLT The previous commits prevent us from using the BLT if the destination address is misaligned. Honour that restriction when creating buffers as well, so that they are always usuable by the BLT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Verified it on latest xf86-video-intel.
Closing (>2 years) old Verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.