Summary: | Recent mesa git revisions cause frequent gpu hangs on radeonsi | ||
---|---|---|---|
Product: | Mesa | Reporter: | José Suárez <j.suarez.agapito> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Full dmesg
dmesg Sample of dmesg output |
Description
José Suárez
2013-09-13 23:10:45 UTC
Created attachment 85793 [details]
Full dmesg
Full dmesg of the system
Just an update: I have tried building the .deb packages with the lines committed in 395b9410 removed from the source and the hangs are still there, so I am not sure if the problem lies in that commit... Created attachment 85797 [details]
dmesg
I'm experiencing similar problems with unigene tropics. File attached above is my dmesg. My specs: intel hd 4000 + AMD Radeon 7750M Software is: OpenSUSE 12.3 x86_64 kernel-3.11 Mesa, libdrm are from git xserver 1.14 Created attachment 85830 [details] Sample of dmesg output I began seeing these symptoms on my HD 7850 PITCAIRN when I attempted to upgrade Mesa from commit 6b5c802c (Sep. 2) to commit 2937d704 (Sep. 6). I had upgraded libdrm from 2.4.46 to commit 58d00888 at the same time. The attached dmesg output looks a lot like the bug reported here by José, so I hope I'm not interfering with an unrelated problem. I have been having trouble finding time to investigate, which is why I did not report this myself sooner. I am using a stable 3.10 kernel with DRM cherry picks from 3.11 and upcoming 3.12 -- which is not appropriate for use when reporting bugs. I also did not rebuild the entire X stack when I upgraded, but just libdrm and Mesa. There was a lot of due diligence I needed to perform before filing a bug report here... Anyway, I'm glad to see others reporting this -- now I feel less alone. Not a lot happened in Mesa between 6b5c802c and 2937d704, so if it turns out that my Frankenstein kernel is not to blame, and rebuilding the X stack doesn't help, then I'm going to bisect Mesa. I have "good" and "bad" commits to use, and I'm really interested in seeing whether the one big Radeon change in that interval is the culprit: commit a81beee37e0dd7b75422448420e8e8b0b4b76c1e Author: Alex Deucher <alexander.deucher@amd.com> Date: Fri Sep 6 16:43:34 2013 -0400 radeon/winsys: pad IBs to a multiple of 8 DWs (In reply to comment #5) I forgot to mention... The X server runs fine, with no GPU spew in dmesg. I can use the web browser and other programs which are not very challenging for the GPU. DOSBox uses some OpenGL, and it runs fine; prboom-plus uses a bit more OpenGL, and it runs OK. Its when I tested 'torcs' that everything ground to a halt. After navigating the torcs menus to start a game, the screen goes black. One time, the screen came back for a moment -- the FPS indicator showed 0.2 frames/sec -- before blacking out again. I was able to get to VT1 with Ctrl-Alt-F1 and kill torcs, and X was running OK again with I attempted to go back to it with Alt-F7. Update: today's mesa git works with unigene tropics for me. (In reply to comment #7) Seeing Hohahiu's good news, I thought I would try updating Mesa again (commit 4b3c0a79). Now my desktop manager (lightdm) will not even start! ... [ 83180.086] (II) [KMS] Kernel modesetting enabled. [ 83180.086] (==) RADEON(0): Depth 24, (--) framebuffer bpp 32 [ 83180.086] (II) RADEON(0): Pixel depth = 24 bits stored in 4 bytes (32 bpp pixmaps) [ 83180.086] (==) RADEON(0): Default visual is TrueColor [ 83180.087] (**) RADEON(0): Option "ColorTiling" "on" [ 83180.087] (**) RADEON(0): Option "ColorTiling2D" "on" [ 83180.087] (**) RADEON(0): Option "AccelMethod" "glamor" [ 83180.087] (**) RADEON(0): Option "SwapbuffersWait" "off" [ 83180.087] (==) RADEON(0): RGB weight 888 [ 83180.087] (II) RADEON(0): Using 8 bits per RGB (8 bit DAC) [ 83180.087] (--) RADEON(0): Chipset: "PITCAIRN" (ChipID = 0x6819) [ 83180.087] (II) Loading sub module "dri2" [ 83180.087] (II) LoadModule: "dri2" [ 83180.087] (II) Module "dri2" already built-in [ 83180.087] (II) Loading sub module "glamoregl" [ 83180.087] (II) LoadModule: "glamoregl" [ 83180.087] (II) Loading /usr/lib/xorg/modules/libglamoregl.so [ 83180.087] (II) Module glamoregl: vendor="X.Org Foundation" [ 83180.087] compiled for 1.14.2.902, module version = 0.5.1 [ 83180.087] ABI class: X.Org ANSI C Emulation, version 0.4 [ 83180.087] (II) glamor: OpenGL accelerated X.org driver based. [ 83180.097] (II) glamor: EGL version 1.4 (DRI2): [ 83180.105] (EE) [ 83180.105] (EE) Backtrace: [ 83180.105] (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x57c51d] [ 83180.105] (EE) 1: /usr/bin/X (0x400000+0x17ffc9) [0x57ffc9] [ 83180.105] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fec25304000+0xf210) [0x7fec25313210] [ 83180.105] (EE) 3: /usr/lib/x86_64-linux-gnu/libLLVM-3.4.so.1 (_ZTIN4llvm18format_object_baseE+0x0) [0x7fec1f49d000] [ 83180.105] (EE) [ 83180.105] (EE) Segmentation fault at address 0x7fec1f49d000 [ 83180.105] (EE) Fatal server error: [ 83180.105] (EE) Caught signal 11 (Segmentation fault). Server aborting [ 83180.105] (EE) [ 83180.105] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 83180.105] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 83180.105] (EE) [ 83180.112] (EE) Server terminated with error (1). Closing log file. Oh happy happy joy joy!! I had just rebuilt Mesa, xorg-server, glamor-egl, and xf86-video-ati... hoping for the best. Unfortunately, I have no time to look into this right now. It seems to be related to LLVM and/or glamor. I had to downgrade to the last working versions of everything. (In reply to comment #8) Oops! My LLVM 3.4 is not new enough. Marek's transform feedback stuff went in; I need svn190575 or newer, but I had svn190499 installed. Will try again with svn190655.... I have the gpu lockups too on an 7750 in different apps, for instance xonotic. I did a bisect and it led to this commit: e8f9195e5fb34a45783d6491d2e0305a0b137439 is the first bad commit commit e8f9195e5fb34a45783d6491d2e0305a0b137439 Author: Axel Davy <axel.davy@ens.fr> Date: Thu Aug 15 12:47:58 2013 +0200 gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling. Signed-off-by: Axel Davy <axel.davy@ens.fr> And indeed reverting it seems to fix the lockups. (In reply to comment #10) > I have the gpu lockups too on an 7750 in different apps, for instance > xonotic. > I did a bisect and it led to this commit: > > e8f9195e5fb34a45783d6491d2e0305a0b137439 is the first bad commit > commit e8f9195e5fb34a45783d6491d2e0305a0b137439 > Author: Axel Davy <axel.davy@ens.fr> > Date: Thu Aug 15 12:47:58 2013 +0200 > > gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and > PIPE_BIND_LINEAR flags to enforce no tiling. > > Signed-off-by: Axel Davy <axel.davy@ens.fr> > > And indeed reverting it seems to fix the lockups. I can confirm that this commit was the problem in my case. That commit introduced a boolean error in src/gallium/drivers/radeonsi/r600_texture.c which was later fixed in 49f2ba2c. My Mesa build on Sep. 2 was before e8f9195e, and on Sep. 6 was after. The new Mesa I built today was only failing because my LLVM 3.4 did not include the necessary patch for Marek's transform feedback work. Once I update LLVM, no GPU failures were observed and all was well again. Hopefully José will have no more problems if he gets a new version of Mesa at or after commit 49f2ba2c. If he uses a version after 2b71b3d4, he will need LLVM 3.4 at svn190575 or later. Well, I am not that advanced as to compile my own mesa et ali from git, but I have rebuild my mesa .deb packages with those missing parenthesis manually applied to the source code and I can confirm that those gpu hangs / crashes are gone. So the problem is solved in current master. Thanks for the cooperative investigation! ;) Regards Closed based on users' feedback. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.