Summary: | [rv620] GPU reset followed by black screen | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Stefano Carignano <scary.moo> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | marvin24 | ||||||||
Version: | XOrg git | Keywords: | patch | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Created attachment 37841 [details]
Xorg log
Created attachment 37845 [details] [review] rebased V2 blit patch from dri-devel can you try this patch ontop of d-r-t? with the above patch the problem got not really cured here. It just happens more seldom. So there is still something wrong with the blit code in d-r-t. (In reply to comment #3) > with the above patch the problem got not really cured here. It just happens > more seldom. So there is still something wrong with the blit code in d-r-t. oh that's too bad, I tried the patch for a couple days now and it did seem to improve things, namely I haven't managed to crash the system anymore (I'm not using it heavily though, is this related to the gpu load or is it somewhat random ?) In fact, the bug is a little different now. Instead of a GPU hang, Xorg just blocks in D-state, no dmesg output. I did a cat /proc/`pidof X`/stack and got: [<ffffffffa01ac7b2>] radeon_fence_wait+0x1d1/0x2ea [radeon] [<ffffffffa01acf41>] radeon_sync_obj_wait+0x11/0x13 [radeon] [<ffffffffa009295c>] ttm_bo_wait+0xbe/0x153 [ttm] [<ffffffffa0095b54>] ttm_bo_move_accel_cleanup+0x8b/0x29f [ttm] [<ffffffffa01ad07d>] radeon_move_blit+0x12a/0x148 [radeon] [<ffffffffa01ad420>] radeon_bo_move+0x114/0x13c [radeon] [<ffffffffa0092da9>] ttm_bo_handle_move_mem+0x1b6/0x2b1 [ttm] [<ffffffffa009449e>] ttm_bo_evict+0x2e1/0x34a [ttm] [<ffffffffa009467d>] ttm_mem_evict_first+0x176/0x1a4 [ttm] [<ffffffffa0094141>] ttm_bo_mem_space+0x3fd/0x479 [ttm] [<ffffffffa0094b6e>] ttm_bo_move_buffer+0xb3/0x11b [ttm] [<ffffffffa0094c83>] ttm_bo_validate+0xad/0xf6 [ttm] [<ffffffffa0094ffe>] ttm_bo_init+0x332/0x36b [ttm] [<ffffffffa01ae8e9>] radeon_bo_create+0x17f/0x246 [radeon] [<ffffffffa01beac8>] radeon_gem_object_create+0x7d/0xda [radeon] [<ffffffffa01beb72>] radeon_gem_create_ioctl+0x4d/0xab [radeon] [<ffffffffa002543c>] drm_ioctl+0x255/0x34d [drm] [<ffffffff810f719c>] vfs_ioctl+0x32/0xa6 [<ffffffff810f7aba>] do_vfs_ioctl+0x46a/0x4a3 [<ffffffff810f7b49>] sys_ioctl+0x56/0x79 [<ffffffff81002b9b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff btw. this happens often when displaying images.google.com (with some images) in firefox and try to scroll. My screen has 1920x1200 res, but my computer at work crash also today with 1280x1024 resolution. Maybe this is not so relevant, just in case... Are you getting these issues specific to d-r-t or are you seeing them on 2.6.36-rc1 or drm-core-next? it happens on d-r-t since the blit the cleanup http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commit;h=36dff284447cfd7dce032b760842952eefa7bddf I also have V2 of the cleanup (see comment #2) applied (and 2.6.35.2 also). (In reply to comment #8) > it happens on d-r-t since the blit the cleanup > http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commit;h=36dff284447cfd7dce032b760842952eefa7bddf > I also have V2 of the cleanup (see comment #2) applied (and 2.6.35.2 also). That patch is currently busted as is. You need to either revert it, or apply v2 that I posted on dri-devel. well that's what I did. basicly, the patch in comment #2 is an interdiff of blit_V1 and blit_V2, so I should produce the same result as unapplying V1 and applying V2 - correct? output of interdiff: diff -u b/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c --- b/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -448,19 +448,8 @@ int num_packet2s = 0; /* pin copy shader into vram if already initialized */ - if (rdev->r600_blit.shader_obj) { - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (unlikely(r != 0)) - return r; - r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, - &rdev->r600_blit.shader_gpu_addr); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - if (r) { - dev_err(rdev->dev, "(%d) pin blit object failed\n", r); - return r; - } - return 0; - } + if (rdev->r600_blit.shader_obj) + goto done; mutex_init(&rdev->r600_blit.mutex); rdev->r600_blit.state_offset = 0; @@ -519,6 +508,18 @@ memcpy(ptr + rdev->r600_blit.ps_offset, r6xx_ps, r6xx_ps_size * 4); radeon_bo_kunmap(rdev->r600_blit.shader_obj); radeon_bo_unreserve(rdev->r600_blit.shader_obj); + +done: + r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); + if (unlikely(r != 0)) + return r; + r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, + &rdev->r600_blit.shader_gpu_addr); + radeon_bo_unreserve(rdev->r600_blit.shader_obj); + if (r) { + dev_err(rdev->dev, "(%d) pin blit object failed\n", r); + return r; + } return 0; } is this still a problem with the current d-r-t? yes it does (In reply to comment #12) > yes it does eh - is! Can you bisect to see what commit is causing the problem? the bug is hard to trigger (10 min scrolling with firefox), so bisecting will take a lot of time. I booted with no_wb=1 just for testing and now it seems to work fine. So maybe the blit and the writeback change have some unhealthy relationship. Also somehow the git history got changed... I'm sure the writeback changes where there many days ago and before the blit change. Maybe it also helps, that original reporter (Stefano) has a rv620 chip which is (AFAIK) similar to the rs780/785 chips. (In reply to comment #15) > the bug is hard to trigger (10 min scrolling with firefox), so bisecting will > take a lot of time. I booted with no_wb=1 just for testing and now it seems to > work fine. So maybe the blit and the writeback change have some unhealthy writeback has nothing to do with the blit but it might cause problems on it's own. If no_wb=1 fixes the issue, then writeback might not work well on your system. > relationship. Also somehow the git history got changed... I'm sure the > writeback changes where there many days ago and before the blit change. Maybe The branch was rebased. (In reply to comment #0) > Created an attachment (id=37840) [details] > system dmesg > > Using latest git (as of 12/08/2010) of libdrm, mesa(classic),xf86-video-ati and > drm-radeon-testing (commit drm/radeon/kms: enable writeback on remaing asics ), > gpu is a hd3470 mobility (rv620), forced to lowest power state > (echo "low" > /sys/class/drm/card0/device/power_profile). I managed to get the same with the last d-r-t, using low power + gits like you, but this was on a rv790. It seems like seamonkey was involved, but just to confuse the issue I wasn't running a clean d-r-t or ddx - which may well be irrelevant but - d-r-t had tiling fixed + 2 cs parser fixes from the list, ddx had wait for vline FALSE and dri2 sync was off in drirc. Had tested without issue many games, mplayer and mesa demos over the day. The lockup was triggered when I found a seamonkey bug that makes it spawn a new window every 1/2-1 sec. While this was happening as X was unuseable due to the constant new windows I was switching back and forth between vt2 and 7. Then it locked up and I didn't get the screen back. After sysrq reboot, the card was still in a state - alsa failed to probe hardware, but I carried on into X looked at kern log in an xterm OK, but as soon as I started seamonkey it went again. Reboot went OK this time and try as hard as I could - triggering seamonkey bug and switching vts I couldn't reproduce. Now running current vanilla d-r-t and ddx I have so far failed to trigger it, but then I ran the other d-r-t for days OK. seems the bug I was seeing (see comment #5) is fixed by v3 fencing patch, so for me it is ready to be closed... (In reply to comment #18) > seems the bug I was seeing (see comment #5) is fixed by v3 fencing patch, so > for me it is ready to be closed... I have failed to reproduce the one lockup I had with various d-r-ts, and now am running d-r-t + v3 fence. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 37840 [details] system dmesg Using latest git (as of 12/08/2010) of libdrm, mesa(classic),xf86-video-ati and drm-radeon-testing (commit drm/radeon/kms: enable writeback on remaing asics ), gpu is a hd3470 mobility (rv620), forced to lowest power state (echo "low" > /sys/class/drm/card0/device/power_profile). During normal web browsing, maybe while playing a flash video, the mouse cursor suddenly stops and immediately after I get a black screen from which I cannot recover unless I sysrq-reboot (haven't tried ssh). Upon reboot a check of the system log shows [ 2325.450063] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec [ 2325.450066] ------------[ cut here ]------------ [ 2325.450083] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:239 radeon_fence_wait+0x35b/0x3c0 [radeon]() [ 2325.450085] Hardware name: Satellite A300 [ 2325.450087] GPU lockup (waiting for 0x0000BDBC last fence id 0x0000BDBA) [ 2325.450089] Modules linked in: radeon ttm ath5k drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect i2c_i801 ath [ 2325.450099] Pid: 1954, comm: X Not tainted 2.6.35+ #19 [ 2325.450101] Call Trace: [ 2325.450108] [<ffffffff8103a04a>] warn_slowpath_common+0x7a/0xb0 [ 2325.450111] [<ffffffff8103a121>] warn_slowpath_fmt+0x41/0x50 [ 2325.450120] [<ffffffffa009ba1b>] radeon_fence_wait+0x35b/0x3c0 [radeon] [ 2325.450125] [<ffffffff810521f0>] ? autoremove_wake_function+0x0/0x40 [ 2325.450134] [<ffffffffa009c1fc>] radeon_sync_obj_wait+0xc/0x10 [radeon] [ 2325.450139] [<ffffffffa005ad69>] ttm_bo_wait+0xf9/0x1b0 [ttm] [ 2325.450144] [<ffffffffa005e11f>] ttm_bo_move_accel_cleanup+0x9f/0x2e0 [ttm] [ 2325.450153] [<ffffffffa009c32f>] radeon_move_blit+0x11f/0x180 [radeon] [ 2325.450162] [<ffffffffa009c786>] radeon_bo_move+0xb6/0x1e0 [radeon] [ 2325.450166] [<ffffffffa005b1a5>] ttm_bo_handle_move_mem+0x135/0x410 [ttm] [ 2325.450170] [<ffffffffa005d2c9>] ttm_bo_evict+0x1b9/0x3f0 [ttm] [ 2325.450175] [<ffffffff81090001>] ? __isolate_lru_page+0x81/0xa0 [ 2325.450179] [<ffffffffa005c6f7>] ttm_mem_evict_first+0x147/0x1e0 [ttm] [ 2325.450183] [<ffffffffa005d059>] ttm_bo_mem_space+0x3e9/0x4a0 [ttm] [ 2325.450187] [<ffffffffa005d5e7>] ttm_bo_move_buffer+0xe7/0x160 [ttm] [ 2325.450192] [<ffffffff81260028>] ? drm_mapbufs+0x318/0x340 [ 2325.450196] [<ffffffffa005d6f6>] ttm_bo_validate+0x96/0x120 [ttm] [ 2325.450199] [<ffffffffa005db35>] ttm_bo_init+0x2e5/0x340 [ttm] [ 2325.450209] [<ffffffffa009d198>] radeon_bo_create+0x128/0x220 [radeon] [ 2325.450218] [<ffffffffa009cf10>] ? radeon_ttm_bo_destroy+0x0/0xc0 [radeon] [ 2325.450228] [<ffffffffa00b1aa4>] radeon_gem_object_create+0x84/0x100 [radeon] [ 2325.450232] [<ffffffff810c9030>] ? pollwake+0x0/0x60 [ 2325.450242] [<ffffffffa00b1f1f>] radeon_gem_create_ioctl+0x4f/0xe0 [radeon] [ 2325.450246] [<ffffffff81398e94>] ? sock_aio_read+0x134/0x150 [ 2325.450249] [<ffffffff8126138c>] drm_ioctl+0x33c/0x410 [ 2325.450259] [<ffffffffa00b1ed0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] [ 2325.450262] [<ffffffff810b8bf2>] ? do_sync_read+0xd2/0x110 [ 2325.450266] [<ffffffff810c7b2c>] vfs_ioctl+0x3c/0xd0 [ 2325.450268] [<ffffffff810c812c>] do_vfs_ioctl+0x7c/0x520 [ 2325.450271] [<ffffffff810b9345>] ? vfs_read+0x105/0x140 [ 2325.450274] [<ffffffff810c861a>] sys_ioctl+0x4a/0x80 [ 2325.450277] [<ffffffff81004669>] ? do_device_not_available+0x9/0x10 [ 2325.450280] [<ffffffff8100256b>] system_call_fastpath+0x16/0x1b [ 2325.450283] ---[ end trace b2d00ea6bab57761 ]--- [ 2325.450295] [drm] Disabling audio support [ 2325.451428] radeon 0000:01:00.0: GPU softreset [ 2325.451431] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 [ 2325.451435] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 [ 2325.451438] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200010C0 [ 2325.451452] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 2325.468635] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 2325.484646] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0x00003030 [ 2325.484651] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 [ 2325.484654] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 [ 2325.485660] radeon 0000:01:00.0: GPU reset succeed [ 2325.506653] [drm] Clocks initialized ! [ 2382.495138] SysRq : Emergency Sync