82628 – bisected: GALLIUM_HUD hangs radeon 7970M (PRIME)

Bug 82628 - bisected: GALLIUM_HUD hangs radeon 7970M (PRIME)

Summary: bisected: GALLIUM_HUD hangs radeon 7970M (PRIME)

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Mesa core (show other bugs)
Version:	git
Hardware:	Other Linux (All)

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-08-14 18:14 UTC by Christoph Haag
Modified:	2014-08-16 16:52 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
journalctl log including the hang in the report (251.86 KB, text/plain) 2014-08-14 18:14 UTC, Christoph Haag	Details
View All

Description Christoph Haag 2014-08-14 18:14:09 UTC

Created attachment 104629 [details]
journalctl log including the hang in the report

I'm using only DRI_PRIME=1 GALLIUM_HUD="fps,VRAM-usage+GTT-usage". I have not tested much different combinations.

Tested on linux 3.16 mainline.

I'm not 100% sure, but I think this is my bisect result:
1cfcd0164e1be7d7b05b693f60a262ad735b7565 is the first bad commit

I haven't tested that much applications because it has a good chance of a hard lockup of the whole machine. But often it recovers after killing the application with the HUD.

With the Unreal Engine Cave Effects demo I think I can reproduce it every time and in fact it is immediately visible whether it will work (when changing mesa versions) because when it works immediately a black window with the HUD is visible until the demo loads and when it won't work, a few fat white garbage lines will be visible in the window or something like that.

I only tried glxgears as another application a few times and sometimes it worked and sometimes it produced the lockup.

radeon 0000:01:00.0: ring 0 stalled for more than 10003msec
radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000011 last fence id 0x000000000000000d on ring 0)
radeon 0000:01:00.0: Saved 269 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x0000004D
radeon 0000:01:00.0:   GRBM_STATUS               = 0xB3525028
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x2F800002
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x2F800002
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x40000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408002
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x84228647
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83146
radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100
radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200400C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume

Without the HUD it always works.

Oh and sometimes with the buggy mesa revisions *and* with the HUD it just won't start and there is immediately the message that the kernel rejected the CS stream or whatever the message is. I have not seen this without the HUD.

Comment 1 Christoph Haag 2014-08-15 23:58:33 UTC

It's now reverted but for the record so that it's not forgotten:

Whether it makes sense or not, the GPU does not hang if I change the count from one to two:

diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c
index a05d3c4..2d8bdca 100644
--- a/src/gallium/auxiliary/hud/hud_context.c
+++ b/src/gallium/auxiliary/hud/hud_context.c
@@ -532,7 +532,7 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex)
    pipe_resource_reference(&hud->text.vbuf.buffer, NULL);
 
    /* draw the rest */
-   cso_set_vertex_elements(cso, 1, hud->velems);
+   cso_set_vertex_elements(cso, 2, hud->velems);
    LIST_FOR_EACH_ENTRY(pane, &hud->pane_list, head) {
       if (pane)
          hud_pane_draw_colored_objects(hud, pane);

Comment 2 Marek Olšák 2014-08-16 16:52:39 UTC

Don't worry. It won't forgotten. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.