Bug 69340 - Recent mesa git revisions cause frequent gpu hangs on radeonsi
Summary: Recent mesa git revisions cause frequent gpu hangs on radeonsi
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-13 23:10 UTC by José Suárez
Modified: 2013-09-15 12:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Full dmesg (75.48 KB, text/x-log)
2013-09-13 23:22 UTC, José Suárez
Details
dmesg (97.01 KB, text/plain)
2013-09-14 01:13 UTC, Hohahiu
Details
Sample of dmesg output (10.07 KB, text/plain)
2013-09-14 14:35 UTC, Dave Witbrodt
Details

Description José Suárez 2013-09-13 23:10:45 UTC
After installing mesa git 395b9410 (from oibaf's ppa on Kubuntu raring) I am experiencing gpu hangs and kernel panics when launching "somewhat complex" 3D games. For example, glxgears and supertuxkart do not produce the gpu hang, but speed-dreams2 (it hangs when the game should show your car in order to drive), L4D2 (just after the Valve logo-video, just when the game intro movie should start playing) and Crusader Kings II (just at the very beginning, when the loading screen should come up).

The last mesa git version I had installed was 505fad04, which works correctly.

Moreover, the crashes happen both with radeon.dpm=1 and radeon.dpm=0.

I have managed to get some dmesg outputs of the crashes:

Crash #1

[  334.162270] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  334.162280] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000160ea)
[  334.162289] radeon 0000:01:00.0: failed to get a new IB (-35)
[  334.162291] [TTM] Failed to expire sync object before buffer eviction
[  334.162299] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !
[  334.162378] [TTM] Failed to expire sync object before buffer eviction
[  334.172123] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[  334.381742] radeon 0000:01:00.0: Saved 97917 dwords of commands on ring 0.
[  334.381879] radeon 0000:01:00.0: GPU softreset: 0x00000049
[  334.381882] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5D04028
[  334.381884] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEE400000
[  334.381886] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEE400000
[  334.381889] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  334.382000] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  334.382002] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  334.382004] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  334.382006] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408002
[  334.382009] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x84038643
[  334.382011] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.382013] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.382016] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  334.382018] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  334.386528] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[  334.386582] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  334.387728] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[  334.387730] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[  334.387731] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  334.387733] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  334.387844] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  334.387846] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  334.387848] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  334.387850] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  334.387852] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  334.387854] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.387856] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.387981] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  334.415260] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[  334.415264] [drm] PCIE gen 2 link speeds already enabled
[  334.417417] [drm] PCIE GART of 512M enabled (table at 0x0000000000276000).
[  334.417520] radeon 0000:01:00.0: WB enabled
[  334.417522] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff880412af4c00
[  334.417524] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff880412af4c04
[  334.417526] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff880412af4c08
[  334.417528] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff880412af4c0c
[  334.417530] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff880412af4c10
[  334.418521] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90011db5a18
[  334.436919] [drm] ring test on 0 succeeded in 3 usecs
[  334.436924] [drm] ring test on 1 succeeded in 1 usecs
[  334.436929] [drm] ring test on 2 succeeded in 1 usecs
[  334.436992] [drm] ring test on 3 succeeded in 2 usecs
[  334.437003] [drm] ring test on 4 succeeded in 1 usecs
[  334.612443] [drm] ring test on 5 succeeded in 2 usecs
[  334.612447] [drm] UVD initialized successfully.
[  334.657863] [drm] ib test on ring 0 succeeded in 0 usecs
[  334.658379] [drm] ib test on ring 1 succeeded in 0 usecs
[  334.658543] [drm] ib test on ring 2 succeeded in 0 usecs
[  334.658587] [drm] ib test on ring 3 succeeded in 0 usecs
[  334.658627] [drm] ib test on ring 4 succeeded in 1 usecs

Crash #2

[  768.143440] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  768.143452] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000015f1 last fence id 0x00000000000015ed)
[  768.642649] radeon 0000:01:00.0: GPU lockup CP stall for more than 10500msec
[  768.642659] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000073db9)
[  768.642666] radeon 0000:01:00.0: failed to get a new IB (-35)
[  768.642689] BUG: unable to handle kernel paging request at 0000100000000018
[  768.642756] IP: [<ffffffffa014f13d>] radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.642862] PGD 0
[  768.642883] Oops: 0000 [#1] SMP
[  768.642915] Modules linked in: snd_hrtimer parport_pc ppdev bnep rfcomm bluetooth binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek sp5100_tco eeepc_wmi asus_wmi sparse_keymap video snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc arc4 snd_seq_midi snd_seq_midi_event rt61pci snd_rawmidi rt2x00pci rt2x00mmio rt2x00lib mac80211 snd_seq snd_seq_device snd_timer cfg80211 edac_core snd psmouse eeprom_93cx6 edac_mce_amd crc_itu_t serio_raw fam15h_power k10temp i2c_piix4 ohci_pci soundcore mac_hid it87 hwmon_vid lp parport hid_generic usbhid hid mxm_wmi radeon i2c_algo_bit ttm e1000e drm_kms_helper ahci ptp drm pps_core libahci wmi
[  768.643499] CPU: 4 PID: 3886 Comm: ck2 Not tainted 3.11.0-031100-generic #201309021735
[  768.643565] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Crosshair V Formula, BIOS 1605 09/21/2012
[  768.643650] task: ffff8803c7cac650 ti: ffff88037be68000 task.ti: ffff88037be68000
[  768.643712] RIP: 0010:[<ffffffffa014f13d>]  [<ffffffffa014f13d>] radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.643830] RSP: 0018:ffff88037be69a08  EFLAGS: 00210206
[  768.643875] RAX: 0000100000000000 RBX: ffff880415251900 RCX: 0000000000000000
[  768.643934] RDX: 0000000000000000 RSI: ffff88037bdca8c0 RDI: ffff88037be69a30
[  768.643992] RBP: ffff88037be69a08 R08: 0000000000000006 R09: 0000000000001000
[  768.644051] R10: 0000000000000005 R11: 0000000000000000 R12: ffff88037c644ea0
[  768.644109] R13: ffff88041218c000 R14: 0000000000000000 R15: 0000000000000000
[  768.644169] FS:  00007ff86a366740(0000) GS:ffff88042ed00000(0000) knlGS:00000000e7a5fb40
[  768.644236] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  768.644283] CR2: 0000100000000018 CR3: 00000003cb38d000 CR4: 00000000000407e0
[  768.644342] Stack:
[  768.644360]  ffff88037be69ad8 ffffffffa013cb35 ffff880400000016 ffff88042ecd4580
[  768.644428]  ffff88037be69a48 0000000000000000 ffff880300000010 ffff8804130ba6e8
[  768.644496]  ffff88037be69a58 ffffffff8106a164 003fe00024200002 0000006000000000
[  768.644563] Call Trace:
[  768.644624]  [<ffffffffa013cb35>] radeon_vm_bo_update_pte+0x165/0x270 [radeon]
[  768.644691]  [<ffffffff8106a164>] ? local_bh_enable+0x94/0xa0
[  768.644776]  [<ffffffffa013cc9b>] radeon_vm_bo_rmv+0x5b/0xf0 [radeon]
[  768.644870]  [<ffffffffa014e203>] radeon_gem_object_close+0xf3/0x110 [radeon]
[  768.644951]  [<ffffffffa0028502>] drm_gem_object_release_handle+0x72/0xf0 [drm]
[  768.645016]  [<ffffffff81368051>] idr_for_each+0xa1/0xf0
[  768.645079]  [<ffffffffa0028490>] ? drm_gem_handle_create+0xf0/0xf0 [drm]
[  768.645140]  [<ffffffff81728c6d>] ? mutex_lock+0x1d/0x41
[  768.645203]  [<ffffffffa0028a14>] drm_gem_release+0x24/0x40 [drm]
[  768.645271]  [<ffffffffa00270e2>] drm_release+0x482/0x520 [drm]
[  768.645326]  [<ffffffff811b3f1a>] __fput+0xba/0x240
[  768.645371]  [<ffffffff811b40ee>] ____fput+0xe/0x10
[  768.645415]  [<ffffffff81085848>] task_work_run+0xc8/0xf0
[  768.645463]  [<ffffffff81067ede>] do_exit+0x19e/0x480
[  768.645510]  [<ffffffff81068254>] do_group_exit+0x44/0xa0
[  768.645558]  [<ffffffff810782a1>] get_signal_to_deliver+0x231/0x480
[  768.645615]  [<ffffffff81013be7>] do_signal+0x47/0x140
[  768.645662]  [<ffffffff81712eda>] ? is_prefetch.isra.12.part.13+0x1a4/0x1ff
[  768.645724]  [<ffffffff8109d134>] ? vtime_account_user+0x74/0x90
[  768.645777]  [<ffffffff81013d68>] do_notify_resume+0x88/0xc0
[  768.645827]  [<ffffffff8172cdbc>] retint_signal+0x48/0x8c
[  768.645873] Code: 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 00 00 66 66 66 66 90 55 48 85 f6 48 89 e5 74 25 8b 4e 18 89 ca 48 8b 44 d7 40 48 85 c0 74 1b <3b> 48 18 75 1b 48 8b 48 10 48 39 4e 10 48 0f 47 c6 48 89 44 d7
[  768.646149] RIP  [<ffffffffa014f13d>] radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.646246]  RSP <ffff88037be69a08>
[  768.646276] CR2: 0000100000000018
[  768.666026] ---[ end trace 50e00cc0d778d510 ]---
[  768.666030] Fixing recursive fault but reboot is needed!
[  769.141752] radeon 0000:01:00.0: GPU lockup CP stall for more than 11000msec
[  769.141760] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000073db9)
[  769.141768] radeon 0000:01:00.0: failed to get a new IB (-35)
[  769.141772] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !

Crash #3

[  125.411256] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  125.411265] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014894)
[  125.411273] radeon 0000:01:00.0: failed to get a new IB (-35)
[  125.411278] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !
[  125.430578] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[  125.640276] radeon 0000:01:00.0: Saved 96301 dwords of commands on ring 0.
[  125.640416] radeon 0000:01:00.0: GPU softreset: 0x000000CD
[  125.640418] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5D04028
[  125.640420] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEE400000
[  125.640423] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEE400000
[  125.640425] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200046C0
[  125.640535] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  125.640538] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  125.640540] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  125.640542] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408002
[  125.640544] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x84038643
[  125.640546] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x60C83146
[  125.640548] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  125.640551] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  125.640553] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
[  125.645027] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[  125.645080] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00108100
[  125.646226] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[  125.646228] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[  125.646230] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  125.646232] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200008C0
[  125.646343] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  125.646345] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  125.646347] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  125.646349] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  125.646351] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  125.646353] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  125.646355] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  125.646479] radeon 0000:01:00.0: GPU reset succeeded, trying to resume

Although the last log stated the GPU reset was successful, the system never recovered from the crash.

Between those two mesa commits, based on the kernel log output, I only see a possible culprit, which could be 	a81beee37e0dd7b75422448420e8e8b0b4b76c1e.

My PC specs are as follows (taken from steam's system info):

Información sobre el procesador:
    Fabricante:  AuthenticAMD
    CPU Family:  0x15
    CPU Model:  0x1
    CPU Stepping:  0x2
    CPU Type:  0x0
    Velocidad: 3600 Mhz
    Procesadores lógicos 8
    Procesadores físicos 8
    HyperThreading:  No compatible
    FCMOV:  Compatible
    SSE2:  Compatible
    SSE3:  Compatible
    SSSE3:  Compatible
    SSE4a:  Compatible
    SSE41:  Compatible
    SSE42:  Compatible
    
Información sobre la red:
    Velocidad de la red:  
    
Versión del sistema operativo:
    Ubuntu 13.04 (64 bits)
    Nombre de kernel: Linux
    Versión de kernel: 3.11.0-031100-generic
    Editor de X Server: The X.Org Foundation
    Versión de X Server: 11303000
    Gestor X Window: KWin
    Versión del runtime de Steam: steam-runtime-release_2013-09-05
    
Tarjeta de vídeo:
    Controlador:  X.Org Gallium 0.4 on AMD PITCAIRN

    Versión de controlador: 2.1 Mesa 9.3.0-devel (git-505fad0 raring-oibaf-ppa)
    OpenGL Version: 2.1
    Densidad de color del escritorio: 24 bits por píxel
    Frecuencia de actualización del monitor: 60 Hz
    Identificador del fabricante: 0x1002
    Identificador del dispositivo: 0x6818
    Número de monitores: 1
    Número de tarjetas de vídeo lógicas: 1
    Resolución de pantalla principal: 1920 x 1080
    Resolución de escritorio: 1920 x 1080
    Tamaño de pantalla principal: 18,78" x 10,55"  (21,54" diag)
                                            47,7cm x 26,8cm  (54,7cm diag)
    No se ha detectado la memoria VRAM principal
    
Tarjeta de sonido:
    Dispositivo de sonido: Realtek ALC889
    
Memoria:
    RAM:  15993 Mb
    
Varios:
    Idioma de la IU:  Español
    LANG:  es_ES.UTF-8
    Micrófono:  Not set
    Espacio total en disco disponible: 469324 MB
    Bloque libre más grande en el disco: 187784 MB
    
Software Instalado:
    
Informes de fallos recientes:
    
The GPU is a Radeon HD 7870. VRAM is 2 GB. llvm's version is 3.3-5ubuntu1~r~gd, and libdrm is at version 2.4.46+git1309121700.b6da44, both installed from oibaf's ppa.
Comment 1 José Suárez 2013-09-13 23:22:01 UTC
Created attachment 85793 [details]
Full dmesg

Full dmesg of the system
Comment 2 José Suárez 2013-09-13 23:31:50 UTC
Just an update: I have tried building the .deb packages with the lines committed in 395b9410 removed from the source and the hangs are still there, so I am not sure if the problem lies in that commit...
Comment 3 Hohahiu 2013-09-14 01:13:19 UTC
Created attachment 85797 [details]
dmesg
Comment 4 Hohahiu 2013-09-14 01:15:42 UTC
I'm experiencing similar problems with unigene tropics. File attached above is my dmesg.

My specs:
intel hd 4000 + AMD Radeon 7750M

Software is:
OpenSUSE 12.3 x86_64
kernel-3.11
Mesa, libdrm are from git
xserver 1.14
Comment 5 Dave Witbrodt 2013-09-14 14:35:44 UTC
Created attachment 85830 [details]
Sample of dmesg output

I began seeing these symptoms on my HD 7850 PITCAIRN when I attempted to upgrade Mesa from commit 6b5c802c (Sep. 2) to commit 2937d704 (Sep. 6).  I had upgraded libdrm from 2.4.46 to commit 58d00888 at the same time.

The attached dmesg output looks a lot like the bug reported here by José, so I hope I'm not interfering with an unrelated problem.

I have been having trouble finding time to investigate, which is why I did not report this myself sooner.  I am using a stable 3.10 kernel with DRM cherry picks from 3.11 and upcoming 3.12 -- which is not appropriate for use when reporting bugs.  I also did not rebuild the entire X stack when I upgraded, but just libdrm and Mesa.  There was a lot of due diligence I needed to perform before filing a bug report here...

Anyway, I'm glad to see others reporting this -- now I feel less alone.  Not a lot happened in Mesa between 6b5c802c and 2937d704, so if it turns out that my Frankenstein kernel is not to blame, and rebuilding the X stack doesn't help, then I'm going to bisect Mesa.  I have "good" and "bad" commits to use, and I'm really interested in seeing whether the one big Radeon change in that interval is the culprit:


commit a81beee37e0dd7b75422448420e8e8b0b4b76c1e
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Sep 6 16:43:34 2013 -0400

    radeon/winsys: pad IBs to a multiple of 8 DWs
Comment 6 Dave Witbrodt 2013-09-14 14:45:48 UTC
(In reply to comment #5)

I forgot to mention...

The X server runs fine, with no GPU spew in dmesg.  I can use the web browser and other programs which are not very challenging for the GPU.  DOSBox uses some OpenGL, and it runs fine; prboom-plus uses a bit more OpenGL, and it runs OK.  Its when I tested 'torcs' that everything ground to a halt.  After navigating the torcs menus to start a game, the screen goes black.  One time, the screen came back for a moment -- the FPS indicator showed 0.2 frames/sec -- before blacking out again.  I was able to get to VT1 with Ctrl-Alt-F1 and kill torcs, and X was running OK again with I attempted to go back to it with Alt-F7.
Comment 7 Hohahiu 2013-09-14 14:55:19 UTC
Update: today's mesa git works with unigene tropics for me.
Comment 8 Dave Witbrodt 2013-09-14 17:13:28 UTC
(In reply to comment #7)

Seeing Hohahiu's good news, I thought I would try updating Mesa again (commit 4b3c0a79).  Now my desktop manager (lightdm) will not even start!

...
[ 83180.086] (II) [KMS] Kernel modesetting enabled.
[ 83180.086] (==) RADEON(0): Depth 24, (--) framebuffer bpp 32
[ 83180.086] (II) RADEON(0): Pixel depth = 24 bits stored in 4 bytes (32 bpp pixmaps)
[ 83180.086] (==) RADEON(0): Default visual is TrueColor
[ 83180.087] (**) RADEON(0): Option "ColorTiling" "on"
[ 83180.087] (**) RADEON(0): Option "ColorTiling2D" "on"
[ 83180.087] (**) RADEON(0): Option "AccelMethod" "glamor"
[ 83180.087] (**) RADEON(0): Option "SwapbuffersWait" "off"
[ 83180.087] (==) RADEON(0): RGB weight 888
[ 83180.087] (II) RADEON(0): Using 8 bits per RGB (8 bit DAC)
[ 83180.087] (--) RADEON(0): Chipset: "PITCAIRN" (ChipID = 0x6819)
[ 83180.087] (II) Loading sub module "dri2"
[ 83180.087] (II) LoadModule: "dri2"
[ 83180.087] (II) Module "dri2" already built-in
[ 83180.087] (II) Loading sub module "glamoregl"
[ 83180.087] (II) LoadModule: "glamoregl"
[ 83180.087] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[ 83180.087] (II) Module glamoregl: vendor="X.Org Foundation"
[ 83180.087] 	compiled for 1.14.2.902, module version = 0.5.1
[ 83180.087] 	ABI class: X.Org ANSI C Emulation, version 0.4
[ 83180.087] (II) glamor: OpenGL accelerated X.org driver based.
[ 83180.097] (II) glamor: EGL version 1.4 (DRI2):
[ 83180.105] (EE) 
[ 83180.105] (EE) Backtrace:
[ 83180.105] (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x57c51d]
[ 83180.105] (EE) 1: /usr/bin/X (0x400000+0x17ffc9) [0x57ffc9]
[ 83180.105] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fec25304000+0xf210) [0x7fec25313210]
[ 83180.105] (EE) 3: /usr/lib/x86_64-linux-gnu/libLLVM-3.4.so.1 (_ZTIN4llvm18format_object_baseE+0x0) [0x7fec1f49d000]
[ 83180.105] (EE) 
[ 83180.105] (EE) Segmentation fault at address 0x7fec1f49d000
[ 83180.105] (EE) 
Fatal server error:
[ 83180.105] (EE) Caught signal 11 (Segmentation fault). Server aborting
[ 83180.105] (EE) 
[ 83180.105] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[ 83180.105] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 83180.105] (EE) 
[ 83180.112] (EE) Server terminated with error (1). Closing log file.


Oh happy happy joy joy!!  I had just rebuilt Mesa, xorg-server, glamor-egl, and xf86-video-ati... hoping for the best.

Unfortunately, I have no time to look into this right now.  It seems to be related to LLVM and/or glamor.  I had to downgrade to the last working versions of everything.
Comment 9 Dave Witbrodt 2013-09-14 17:48:08 UTC
(In reply to comment #8)

Oops!  My LLVM 3.4 is not new enough.  Marek's transform feedback stuff went in; I need svn190575 or newer, but I had svn190499 installed.  Will try again with svn190655....
Comment 10 hadack 2013-09-14 18:41:33 UTC
I have the gpu lockups too on an 7750 in different apps, for instance xonotic.
I did a bisect and it led to this commit:

e8f9195e5fb34a45783d6491d2e0305a0b137439 is the first bad commit
commit e8f9195e5fb34a45783d6491d2e0305a0b137439
Author: Axel Davy <axel.davy@ens.fr>
Date:   Thu Aug 15 12:47:58 2013 +0200

    gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.
    
    Signed-off-by: Axel Davy <axel.davy@ens.fr>

And indeed reverting it seems to fix the lockups.
Comment 11 Dave Witbrodt 2013-09-14 19:38:33 UTC
(In reply to comment #10)
> I have the gpu lockups too on an 7750 in different apps, for instance
> xonotic.
> I did a bisect and it led to this commit:
> 
> e8f9195e5fb34a45783d6491d2e0305a0b137439 is the first bad commit
> commit e8f9195e5fb34a45783d6491d2e0305a0b137439
> Author: Axel Davy <axel.davy@ens.fr>
> Date:   Thu Aug 15 12:47:58 2013 +0200
> 
>     gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and
> PIPE_BIND_LINEAR flags to enforce no tiling.
>     
>     Signed-off-by: Axel Davy <axel.davy@ens.fr>
> 
> And indeed reverting it seems to fix the lockups.

I can confirm that this commit was the problem in my case.  That commit introduced a boolean error in src/gallium/drivers/radeonsi/r600_texture.c which was later fixed in 49f2ba2c.

My Mesa build on Sep. 2 was before e8f9195e, and on Sep. 6 was after.  The new Mesa I built today was only failing because my LLVM 3.4 did not include the necessary patch for Marek's transform feedback work.  Once I update LLVM, no GPU failures were observed and all was well again.

Hopefully José will have no more problems if he gets a new version of Mesa at or after commit 49f2ba2c.  If he uses a version after 2b71b3d4, he will need LLVM 3.4 at svn190575 or later.
Comment 12 José Suárez 2013-09-15 10:51:33 UTC
Well, I am not that advanced as to compile my own mesa et ali from git, but I have rebuild my mesa .deb packages with those missing parenthesis manually applied to the source code and I can confirm that those gpu hangs / crashes are gone. So the problem is solved in current master.

Thanks for the cooperative investigation! ;)

Regards
Comment 13 Marek Olšák 2013-09-15 12:10:28 UTC
Closed based on users' feedback.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.