Summary: | [i945 page flipping] GPU hang on 2.6.34-45 32-bit PAE kernel with GL compositor | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Simon Farnsworth <simon.farnsworth> | ||||||||||||||||
Component: | Driver/intel | Assignee: | Jesse Barnes <jbarnes> | ||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | medium | CC: | cfeck | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | x86 (IA32) | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Description
Simon Farnsworth
2010-06-28 02:34:31 UTC
Created attachment 36565 [details]
dmesg from the failed unit.
Created attachment 36566 [details]
Xorg.0.log from the failed machine
You need to check /sys/kernel/debug/dri/0/i915_error_state as well. My guess is that you've hit one of i945 page-flipping bugs still lurking in 2.6.34. Try: https://bugs.freedesktop.org/attachment.cgi?id=36463 https://bugs.freedesktop.org/attachment.cgi?id=36464 Created attachment 36567 [details]
intel_error_dump output
Looks like if you attach a large attachment as part of the original submission, Bugzilla loses it silently. Reattaching gzip'd version of intel_error_dump output.
And I missed Chris's comments in-flight - I'll try both those patches together and report back. The batch buffer dump doesn't correspond to page-flip waits. The only striking error in the dump (consisting of just two ops...) is the DRAWING_RECT off-by-one. So I would update mesa first. Created attachment 36575 [details]
intel_error_dump output
I've added Chris's recommended kernel patches to the kernel, and updated Mesa to ce7a70b8b48a4dded9b1e29590b5101dacd56e0b. I'm still seeing a GPU hang in dmesg - attaching intel_error_dump output again.
Created attachment 36576 [details]
Xorg.0.log from the failed machine
And new xserver log from the same failure.
Right, that is just a single copy from 1200x1920 buffer to a 1920x1200. No obvious reason for failure, and it waiting for the GPU to finish executing those 2 triangles. This is an instance where it would be useful to check the vertex data... Not sure what you mean by "check the vertex data" - is there something I can do to a hung process to dig it out (I've got debug symbols, and know how to drive GDB if it's something I can dig out of Mesa's datastructures)? The compositor is aiming to rotate the screen by 90° during the compositing process - the background image drawing part is what's hanging, and that appears to be slightly buggy (in that it's scaling the background image rather than rotating it). Roughly outlined, the GL code that's hanging does: /* during initialisation */ if( XGetWindowProperty( display, RootWindow( display, screen ), XInternAtom( display, "_XROOTPMAP_ID", False ), 0, 4, False, AnyPropertyType, &actual_type, &actual_format, &nitems, &bytes_after, &prop) == Success && actual_type == XInternAtom( m_display, "PIXMAP", False ) && actual_format == 32 && nitems == 1 ) { memcpy( &background, prop, 4 ); } XFree( prop ); XImage *background_image = NULL; if( background != None ) { background_image = XGetImage( display, background, 0, 0, 1200, 1920, AllPlanes, ZPixmap ); } if( background == None || background_image == NULL ) { render_background = false; return; } glBindTexture( GL_TEXTURE_2D, texture ); glTexImage2D( GL_TEXTURE_2D, 0, GL_RGB, 1200, 1920, 0, GL_BGRA, GL_UNSIGNED_BYTE, background_image_data); glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST ); glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST ); const GLfloat vertexes[] = { 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f }; const GLfloat texcoords[] = { 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f }; glGenBuffersARB( 2, buffers ); glBindBufferARB( GL_ARRAY_BUFFER, buffers[0] ); glBufferDataARB( GL_ARRAY_BUFFER, sizeof( GLfloat ) * 8, vertexes, GL_STATIC_DRAW ); glBindBufferARB( GL_ARRAY_BUFFER, buffers[1] ); glBufferDataARB( GL_ARRAY_BUFFER, sizeof( GLfloat ) * 8, texcoords, GL_STATIC_DRAW ); /* at time of hang */ glColor4f( 1.0, 1.0, 1.0, 1.0 ); glBindTexture( GL_TEXTURE_2D, texture ); glTexEnvi( GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE ); glBindBufferARB( GL_ARRAY_BUFFER, buffers[0] ); glVertexPointer( 2, GL_FLOAT, sizeof(GLfloat) * 2, 0 ); glBindBufferARB( GL_ARRAY_BUFFER, buffers[1] ); glTexCoordPointer( 2, GL_FLOAT, sizeof(GLfloat) * 2, 0 ); glDrawArrays( GL_TRIANGLE_STRIP, 0, 4 ); My bug is that it's not rotating the vertexes array to handle the rotated screen - but I don't think this should cause a GPU hang. I just power failed the unit under test (on a hunch), and I'm getting a different failure state. Instead of a nice, clean GPU hang, I'm seeing rendering stalled waiting for a reply to DRI2GetBuffersWithFormat - and my frame hasn't completed rendering on screen. This happens on the first frame I try to render; I'm not sure what state I can dump that will help with debugging. If I restart X11, I get the same hang as documented already. I've also not been able to work out which magic INTEL_DEBUG option would cause vertex data to get dumped - it appears that i945 uses the generic Mesa software TnL pipeline, and doesn't provide an option to dump the final transformed vertex data. intel_gpu_dump tells me that ACTHD is stuck at 0x398, which is an MI_NOOP in the ringbuffer. cat /proc/interrupts shows that I'm still getting interrupts, as does /sys/kernel/debug/dri/0/i915_gem_interrupt. Oddly, neither current sequence nor IRQ sequence in i915_gem_interrupt have changed overnight: Interrupt enable: 00028c53 Interrupt identity: 00000000 Interrupt mask: fffd73ae Pipe A stat: 00020200 Pipe B stat: 00000000 Interrupts received: 3352510 Current sequence: 26 Waiter sequence: 0 IRQ sequence: 0 I'm therefore not sure whether the hang is purely CPU-side, or a mix of CPU-side and GPU-side; my understanding is that every so often, the GPU is supposed to execute an MI_STORE_DATA_INDEX that updates the CPU's idea of where the GPU is in the command stream, then an MI_USER_INTERRUPT to get the CPU to check. The intel_gpu_dump header looks sane: ACTHD: 0x00000398 EIR: 0x00000000 EMR: 0xffffffed ESR: 0x00000000 PGTBL_ER: 0x00000000 IPEHR: 0x01000000 IPEIR: 0x00000000 INSTDONE: 0x7fffffc0 Ringbuffer: Reminder: head pointer is GPU read, tail pointer is CPU write ringbuffer at 0x00000000: The ringbuffer has a fairly regular pattern at the moment, in the bit preceding the HEAD pointer: 0x00000000: 0x02000000: MI_FLUSH 0x00000004: 0x00000000: MI_NOOP 0x00000008: 0x18800080: MI_BATCH_BUFFER_START 0x0000000c: 0x010ac001: dword 1 0x00000010: 0x02000004: MI_FLUSH 0x00000014: 0x00000000: MI_NOOP 0x00000018: 0x10800001: MI_STORE_DATA_INDEX 0x0000001c: 0x00000080: dword 1 0x00000020: 0x00000001: dword 2 0x00000024: 0x01000000: MI_USER_INTERRUPT 0x00000028: 0x02000004: MI_FLUSH 0x0000002c: 0x00000000: MI_NOOP 0x00000030: 0x18800080: MI_BATCH_BUFFER_START 0x00000034: 0x010b2001: dword 1 0x00000038: 0x02000004: MI_FLUSH 0x0000003c: 0x00000000: MI_NOOP 0x00000040: 0x10800001: MI_STORE_DATA_INDEX 0x00000044: 0x00000080: dword 1 0x00000048: 0x00000002: dword 2 0x0000004c: 0x01000000: MI_USER_INTERRUPT repeating with different batch buffers until: 0x00000340: 0x02000000: MI_FLUSH 0x00000344: 0x00000000: MI_NOOP 0x00000348: 0x10800001: MI_STORE_DATA_INDEX 0x0000034c: 0x00000080: dword 1 0x00000350: 0x00000018: dword 2 0x00000354: 0x01000000: MI_USER_INTERRUPT 0x00000358: 0x02000000: MI_FLUSH 0x0000035c: 0x00000000: MI_NOOP 0x00000360: 0x18800080: MI_BATCH_BUFFER_START 0x00000364: 0x010ac001: dword 1 0x00000368: 0x02000004: MI_FLUSH 0x0000036c: 0x00000000: MI_NOOP 0x00000370: 0x10800001: MI_STORE_DATA_INDEX 0x00000374: 0x00000080: dword 1 0x00000378: 0x00000019: dword 2 0x0000037c: 0x01000000: MI_USER_INTERRUPT 0x00000380: 0x02000000: MI_FLUSH 0x00000384: 0x00000000: MI_NOOP 0x00000388: 0x10800001: MI_STORE_DATA_INDEX 0x0000038c: 0x00000080: dword 1 0x00000390: 0x0000001a: dword 2 0x00000394: 0x01000000: MI_USER_INTERRUPT 0x00000398: HEAD 0x00000000: MI_NOOP I will attach the output of intel_gpu_dump, in case it triggers memories in someone. Created attachment 36594 [details]
Gzipped output from intel_gpu_dump
That's normal behaviour of a mostly idle GPU. So it looks like X isn't responding to me, because it's waiting in the kernel: (gdb) bt #0 0x00472424 in __kernel_vsyscall () #1 0x009ce1f9 in ioctl () from /lib/libc.so.6 #2 0x00152d8f in drm_intel_gem_bo_mrb_exec2 (bo=0x8b17648, used=264, cliprects=0x0, num_cliprects=0, DR4=-1, ring_flag=1) at intel_bufmgr_gem.c:1608 #3 0x00152fb5 in drm_intel_gem_bo_exec2 (bo=0x8b17648, used=264, cliprects=0x0, num_cliprects=0, DR4=-1) at intel_bufmgr_gem.c:1649 #4 0x0014e59e in drm_intel_bo_exec (bo=0x8b17648, used=264, cliprects=0x0, num_cliprects=0, DR4=-1) at intel_bufmgr.c:145 #5 0x00259a1b in intel_batch_submit (scrn=0x8958050, flush=1) at intel_batchbuffer.c:194 #6 0x002585a5 in I830BlockHandler (i=0, blockData=0x0, pTimeout=0xbfaab5bc, pReadmask=0x81fbe80) at intel_driver.c:704 #7 0x0810f4fb in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0xbfaab5bc, pReadmask=0x81fbe80) at animcur.c:194 #8 0x0817f18e in compBlockHandler (i=0, blockData=0x0, pTimeout=0xbfaab5bc, pReadmask=0x81fbe80) at compinit.c:157 #9 0x08062a28 in BlockHandler (pTimeout=0xbfaab5bc, pReadmask=0x81fbe80) at dixutils.c:385 #10 0x080a0e8c in WaitForSomething (pClientsReady=0x8af2208) at WaitFor.c:216 #11 0x0808685e in Dispatch () at dispatch.c:368 #12 0x08062515 in main (argc=15, argv=0xbfaab724, envp=0xbfaab764) at main.c:289 Time to dig and find out what X is doing here. This looks to be pageflipping related. I did "echo t > /proc/sysrq-trigger", to get the following call trace: Xorg S 00000015 0 1380 1 0x00400000 f69fddc0 00203086 64aadc9c 00000015 c0a4fd40 c0a4fd40 c0a4fd40 c0a4fd40 f5cfa8ec c0a4fd40 c0a4fd40 00034a30 00000000 f6b04800 00000015 f5cfa640 00000000 f5cfa640 f69fde20 f72b70a4 f69fde40 f80101b3 00203246 80000000 Call Trace: [<f80101b3>] i915_gem_do_execbuffer+0x378/0xbf8 [i915] [<f800bce2>] ? list_move_tail+0x18/0x1b [i915] [<c04c8f62>] ? __kmalloc+0xfc/0x108 [<c045212d>] ? autoremove_wake_function+0x0/0x2f [<f8010acf>] i915_gem_execbuffer2+0x9c/0xe2 [i915] [<f7f85a8c>] drm_ioctl+0x237/0x317 [drm] [<f8010a33>] ? i915_gem_execbuffer2+0x0/0xe2 [i915] [<c04d198a>] ? fsnotify_modify+0x4f/0x5a [<c04dc1dd>] vfs_ioctl+0x27/0x91 [<f7f85855>] ? drm_ioctl+0x0/0x317 [drm] [<c04dc77e>] do_vfs_ioctl+0x48e/0x4cc [<c0431dcc>] ? pick_next_task_fair+0xb3/0xbb [<c0431df1>] ? pick_next_task+0x1d/0x34 [<c0786093>] ? schedule+0x585/0x5d9 [<c04dc7fd>] sys_ioctl+0x41/0x61 [<c040885f>] sysenter_do_call+0x12/0x28 [<c0780000>] ? init_intel+0x140/0x355 Disassembling i915.ko in gdb shows me that i915_gem_do_execbuffer+0x378 is in fact part of i915_gem_wait_for_pending_flip, line 3638 (or thereabouts), just after the mutex_lock(&dev->struct_mutex) in: static int i915_gem_wait_for_pending_flip(struct drm_device *dev, struct drm_gem_object **object_list, int count) { drm_i915_private_t *dev_priv = dev->dev_private; struct drm_i915_gem_object *obj_priv; DEFINE_WAIT(wait); int i, ret = 0; for (;;) { prepare_to_wait(&dev_priv->pending_flip_queue, &wait, TASK_INTERRUPTIBLE); for (i = 0; i < count; i++) { obj_priv = to_intel_bo(object_list[i]); if (atomic_read(&obj_priv->pending_flip) > 0) break; } if (i == count) break; if (!signal_pending(current)) { mutex_unlock(&dev->struct_mutex); schedule(); mutex_lock(&dev->struct_mutex); continue; } ret = -ERESTARTSYS; break; } finish_wait(&dev_priv->pending_flip_queue, &wait); return ret; } I'm getting stuck here - any suggestions will be welcomed. Assigning to Jesse as he lives for the thrill of broken page flip on i945. At the least he may have some additional patches in his tree for this issue. That trace helps. One of your processes is waiting for flip completion on a buffer that was just queued. Which means we never decremented the pending_flip count for the buffer, which means one of several things: - failed to prepare the flip which would keep the pending bit from getting set, so intel_finish_page_flip() would never decrement it (no flip pending interrupt?) - failed to finish page flip (no vblank interrupt?) - failed to wake up the pending flip queue (somehow) In the drm repo there's a test called vbltest, can you run that (you may need to pass -s depending on your output config) and see if it returns a frequency approximately equal to the display's refresh rate? If not, there's something wrong with vblank interrupts on your platform that could cause problems. Assuming that works, can you try the modetest program? It has a -v flag that lets you check page flipping basics. If that fails it may be easier to trace than a full stack with your compositor. If both of those seem ok then we're failing somewhere else. Tracing the failure points above may shed some light on things... With X still running, but the world in the failed state, vbltest shows the correct frequency (59.80Hz). modetest -c shows me that the DVI-D connector (the one I'm using) is id 8. modetest -v -s 8:1920x1200 gives me: trying to load module i915...success. setting mode 1920x1200 on connector 8, crtc 3 select timed out or error (ret 0) The select line repeats until I terminate modetest. At the same time, I have a nice colourful picture on screen. Once I've run modetest, vbltest stops working, and gives output: trying to load module i915...success. starting count: 0 select timed out or error (ret 0) Again, the select line repeats until I terminate it. After a power failure, without letting X or my OpenGL compositor run, vbltest works, and shows the correct frequency. When I run modetest, I see the colourful picture briefly, then it flips to a grey screen and stalls. I see the same output as I did after failure. Again, vbltest stops working at this point. I should add that when vbltest works, I get output like: trying to load module i915...success. starting count: 8063 freq: 60.04Hz freq: 59.80Hz The second freq: line repeats until I terminate vbltest, and the value of the first freq: line is always slightly different, although still around 60Hz. In addition, the starting count when vbltest works appears to vary in line with system uptime (as you would expect), whereas it's always 0 when it fails. I've just checked vbltest -s for sanity's sake, and that behaves identically to vbltest once I'm in the failure state. (In reply to comment #19) > With X still running, but the world in the failed state, vbltest shows the > correct frequency (59.80Hz). modetest -c shows me that the DVI-D connector (the > one I'm using) is id 8. modetest -v -s 8:1920x1200 gives me: > > trying to load module i915...success. > setting mode 1920x1200 on connector 8, crtc 3 > select timed out or error (ret 0) > > The select line repeats until I terminate modetest. At the same time, I have a > nice colourful picture on screen. Ok, so that means vblank interrupts work ok until you try to flip, then interrupts break altogether when we try to queue a flip. If modetest were working, you should see the nice screen alternate with a grey buffer making it look faded out if the flips are occurring at the right frequency. Did you run your tests with both the kernel patches Chris pointed you at applied? Is the behavior the same without them? All tests are currently being run with the patches Chris pointed out in use. modeset functions correctly if I remove those two patches, and just use the vanilla kernel, but I get a different failure out of X11. I'll attach a new intel_error_dump from the new failure state. XServer log ends with: [ 138.045] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 138.116] Backtrace: [ 138.116] 0: /usr/local/x11test/bin/Xorg (xorg_backtrace+0x3b) [0x80a05fb] [ 138.117] 1: /usr/local/x11test/bin/Xorg (0x8048000+0x54fe5) [0x809cfe5] [ 138.117] 2: (vdso) (__kernel_rt_sigreturn+0x0) [0xc8540c] [ 138.117] 3: /lib/libc.so.6 (__libc_malloc+0x5e) [0x44805e] [ 138.117] 4: /usr/local/x11test/bin/Xorg (AddResource+0x6f) [0x8088c7f] [ 138.117] 5: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x315c3) [0xcf35c3] [ 138.117] 6: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x33237) [0xcf5237] [ 138.117] 7: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x33362) [0xcf5362] [ 138.117] 8: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x36392) [0xcf8392] [ 138.117] 9: /usr/local/x11test/bin/Xorg (0x8048000+0x3eba7) [0x8086ba7] [ 138.118] 10: /usr/local/x11test/bin/Xorg (0x8048000+0x1a515) [0x8062515] [ 138.118] 11: /lib/libc.so.6 (__libc_start_main+0xe6) [0x3ebcc6] [ 138.118] 12: /usr/local/x11test/bin/Xorg (0x8048000+0x1a0f1) [0x80620f1] [ 138.118] Segmentation fault at address 0x85a79 [ 138.118] Fatal server error: [ 138.118] Caught signal 11 (Segmentation fault). Server aborting [ 138.119] [ 138.119] I've now spent far too long at work for one day, so I'm going to go quiet for the next 14 hours or so - I'll continue working on this at around 10am BST, so anything you come up with in the meantime will get tested. Created attachment 36610 [details]
New error state without the patches Chris pointed at
Oh, it's also possible we're hanging in the kernel somewhere with interrupts disabled, causing subsequent flip or vblank requests to hang. Can you check /proc/<pid>/wchan in the failure case as well (or use echo t > /proc/sysrq-trigger like you did before)? Out of paranoia, you could also try this: diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_d index cc8131f..2bfb2b1 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -4731,7 +4731,11 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc, atomic_inc(&obj_priv->pending_flip); work->pending_flip_obj = obj; - BEGIN_LP_RING(4); + BEGIN_LP_RING(8); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); OUT_RING(MI_DISPLAY_FLIP | MI_DISPLAY_FLIP_PLANE(intel_crtc->plane)); OUT_RING(fb->pitch); I would also include: https://bugs.freedesktop.org/attachment.cgi?id=35551 in your kernel patchset as that should reduce the number of spurious hangs. (In reply to comment #24) > Oh, it's also possible we're hanging in the kernel somewhere with interrupts > disabled, causing subsequent flip or vblank requests to hang. Can you check > /proc/<pid>/wchan in the failure case as well (or use echo t > > /proc/sysrq-trigger like you did before)? > In the new X failure case (without the patches that Chris pointed out), X dies due to acceleration failure - so I don't have a wchan to chase. I'm adding the third patch Chris pointed out in this bug (on top of the original 2 that fail), and I'll add your paranoia patch to xf86-video-intel. Remind me not to try and do things before coffee - I'll add your patch to the *kernel*. Your patch conflicts with the second patch Chris pointed at me: https://bugs.freedesktop.org/attachment.cgi?id=36464 Not sure how best to proceed - do I modify your patch to apply on top of 36464, or do I drop the three patches Chris pointed out? On Chris's advice from IRC, I've rebased your suggestion as: --- intel_display.c.orig 2010-06-30 10:22:40.000000000 +0100 +++ intel_display.c 2010-06-30 10:30:46.274401149 +0100 @@ -4756,7 +4756,11 @@ static int intel_crtc_page_flip(struct d while (I915_READ(ISR) & flip_mask) ; - BEGIN_LP_RING(4); + BEGIN_LP_RING(8); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); + OUT_RING(MI_FLUSH); if (IS_I965G(dev)) { OUT_RING(MI_DISPLAY_FLIP | MI_DISPLAY_FLIP_PLANE(intel_crtc->plane)); Adding the extra flushes on top of the other 3 patches that Chris points out is definitely changing behaviour - instead of locking, Xorg dies, and I get the following in the server log: [ 138.045] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 138.116] Backtrace: [ 138.116] 0: /usr/local/x11test/bin/Xorg (xorg_backtrace+0x3b) [0x80a05fb] [ 138.117] 1: /usr/local/x11test/bin/Xorg (0x8048000+0x54fe5) [0x809cfe5] [ 138.117] 2: (vdso) (__kernel_rt_sigreturn+0x0) [0xc8540c] [ 138.117] 3: /lib/libc.so.6 (__libc_malloc+0x5e) [0x44805e] [ 138.117] 4: /usr/local/x11test/bin/Xorg (AddResource+0x6f) [0x8088c7f] [ 138.117] 5: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x315c3) [0xcf35c3] [ 138.117] 6: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x33237) [0xcf5237] [ 138.117] 7: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x33362) [0xcf5362] [ 138.117] 8: /usr/local/x11test/lib/xorg/modules/extensions/libglx.so (0xcc2000+0x36392) [0xcf8392] [ 138.117] 9: /usr/local/x11test/bin/Xorg (0x8048000+0x3eba7) [0x8086ba7] [ 138.118] 10: /usr/local/x11test/bin/Xorg (0x8048000+0x1a515) [0x8062515] [ 138.118] 11: /lib/libc.so.6 (__libc_start_main+0xe6) [0x3ebcc6] [ 138.118] 12: /usr/local/x11test/bin/Xorg (0x8048000+0x1a0f1) [0x80620f1] [ 138.118] Segmentation fault at address 0x85a79 (In reply to comment #30) > Adding the extra flushes on top of the other 3 patches that Chris points out is > definitely changing behaviour - instead of locking, Xorg dies, and I get the > following in the server log: > > [ 138.045] (EE) intel(0): Detected a hung GPU, disabling acceleration. We still have a GPU hang, I haven't yet debugged all the error paths that we hit subsequently through dri/glx so these segfaults are an annoyance. The difference in behaviour I guess is that the hang is detected during a flush so that we don't find ourselves with the hang racing against the flip. Is the hang dependent upon the flip path at all? Can you reproduce the hang if you do the buffer rotation without the final swap/page flip? If I patch xf86-video-intel to not use pageflipping, it works, albeit not smoothly. diff --git a/src/drmmode_display.c b/src/drmmode_display.c index 17f6541..2f847f7 100644 --- a/src/drmmode_display.c +++ b/src/drmmode_display.c @@ -1453,7 +1453,7 @@ Bool drmmode_pre_init(ScrnInfoPtr scrn, int fd, int cpp) gp.value = &has_flipping; (void)drmCommandWriteRead(intel->drmSubFD, DRM_I915_GETPARAM, &gp, sizeof(gp)); - if (has_flipping) { + if (has_flipping && 0) { xf86DrvMsg(scrn->scrnIndex, X_INFO, "Kernel page flipping support detected, enabling\n"); intel->use_pageflipping = TRUE; is the change I made to prevent page flipping being used. Enabling pageflipping results in the GPU hang - but no error state is collected as far as intel_error_decode is concerned. It's better to disable flipping slightly differently: diff --git a/src/drmmode_display.c b/src/drmmode_display.c index d8b158e..e06a2fc 100644 --- a/src/drmmode_display.c +++ b/src/drmmode_display.c @@ -1464,7 +1464,7 @@ Bool drmmode_pre_init(ScrnInfoPtr scrn, int fd, int cpp) if (has_flipping) { xf86DrvMsg(scrn->scrnIndex, X_INFO, "Kernel page flipping support detected, enabling\n"); - intel->use_pageflipping = TRUE; + intel->use_pageflipping = FALSE; drmmode->flip_count = 0; drmmode->event_context.version = DRM_EVENT_CONTEXT_VERSION; drmmode->event_context.vblank_handler = drmmode_vblank_handler; diff --git a/src/i830_dri.c b/src/i830_dri.c index 321faf6..d220e3d 100644 --- a/src/i830_dri.c +++ b/src/i830_dri.c @@ -1013,7 +1013,7 @@ Bool I830DRI2ScreenInit(ScreenPtr screen) info.CopyRegion = I830DRI2CopyRegion; #if DRI2INFOREC_VERSION >= 4 - if (intel->use_pageflipping) { + if (intel->use_pageflipping || 1) { info.version = 4; info.ScheduleSwap = I830DRI2ScheduleSwap; info.GetMSC = I830DRI2GetMSC; That way you keep the other GL features that require vblank events but disable flip ioctls. A discussion with Jesse on IRC resulted in him noticing that the order of prepare_page_flip and "finish_page_flip" calls in https://bugs.freedesktop.org/attachment.cgi?id=36464 were wrong. I've flipped them round, and now my 945 is page flipping - albeit it struggles to sustain any frame rate worth noting at 1920x1200 (it's fine at 1280x720, so I'm happy to call that a 945 limit). The faulty hunk is: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2479be0..a846cd8 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -940,22 +940,30 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) if (HAS_BSD(dev) && (iir & I915_BSD_USER_INTERRUPT)) DRM_WAKEUP(&dev_priv->bsd_ring.irq_queue); - if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) { intel_prepare_page_flip(dev, 0); + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 0); + } - if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) { + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 1); intel_prepare_page_flip(dev, 1); + } if (pipea_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 0); - intel_finish_page_flip(dev, 0); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 0); } if (pipeb_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 1); - intel_finish_page_flip(dev, 1); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 1); } if ((pipea_stats & I915_LEGACY_BLC_EVENT_STATUS) || Changing it so that "finish_page_flip" calls are always after prepare_page_flip calls makes it work. Marking as fixed per IRC comment. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.