Bug 27822

Summary: [REGR] Soft lockup with "[TTM] Buffer eviction failed" on resume
Product: DRI Reporter: Rafał Miłecki <zajec5>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: astronomo
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg using "drm/radeon/kms: enable use of unmappable VRAM V2"
none
dmesg of kernel where resume fails
none
Don't iomap system memory none

Description Rafał Miłecki 2010-04-24 00:22:24 UTC
2.6.33 was fine. With drm-radeon-testing I can not resume anymore (it may mean that this bug is present in 2.6.34-rc5 as well!).

Notebook, RV620, KMS.

drm-radeon-testing:

commit f2594933df9719bd2b0aaaa8ea9b2b850d6e1c42
Author: Christian Koenig <deathsimple@vodafone.de>
Date:   Sat Apr 10 03:13:16 2010 +0200

    drm/radeon/kms: HDMI irq support
Comment 1 Alex Deucher 2010-04-24 09:21:56 UTC
Any chance you can bisect the problematic commit?  Try drm-linus as well; if the bug is not there it will be easier to track down since it's likely something in drm-radeon-testing
Comment 2 Rafał Miłecki 2010-04-28 11:11:57 UTC
(In reply to comment #1)
> Any chance you can bisect the problematic commit?  Try drm-linus as well; if
> the bug is not there it will be easier to track down since it's likely
> something in drm-radeon-testing

I already tried before reporting bug but every older commit I tried didn't boot (hang at beginning of drm initialization).

I tried few recent commits and it looks (quite crazy) like I need:

commit 10fd883ce384706f88554a0b08cc4d63345e7d8b
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Apr 20 16:34:20 2010 +1000

    agp/intel: put back check that we have a driver for the bridge.

to boot successfully. First I thought it's Intel GPU related, so didn't treat it seriously.

Anyway, trying:

commit 7547a917fa5f3b2406f52c7dcf7ec9ad3c8532eb
Merge: a8089e8 6b8b178
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Apr 20 14:15:09 2010 +1000

    Merge branch 'drm-ttm-unmappable' into drm-core-next

with manually applied "agp/intel: put back check that we have a driver for the bridge." still fails on resume.

Will try to bisect it, now I know fix for booting it should be easier.
Comment 3 Rafał Miłecki 2010-04-28 12:38:29 UTC
I've manually checkouted:

commit 6b8b1786a8c29ce6e32298b93ac8d4a18a2b11c4
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Wed Apr 7 10:21:31 2010 +0000

    drm/radeon/kms: enable use of unmappable VRAM V2

which is one commit before "Merge branch 'drm-ttm-unmappable' into drm-core-next". Manually applied "agp/intel: put back check that we have a driver for the
bridge." and compiled.

With this kernel I tried suspend&resume. This time I saw "[TTM] Buffer eviction failed" as well but luckily this didn't lock up! This way I was able to see dmesg for more details. I believe it should give you idea what goes wrong.
Comment 4 Rafał Miłecki 2010-04-28 12:40:12 UTC
Created attachment 35324 [details]
dmesg using "drm/radeon/kms: enable use of unmappable VRAM V2"

With "agp/intel: put back check that we have a driver for the bridge." manually applied to make booting possible.
Comment 5 Rafał Miłecki 2010-04-29 14:20:21 UTC
Result of first git bisect:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
82c5da6bf8b55a931b042fb531083863d26c8020
0a2d50e3a8faaf36cde36920431586090411ea15
We cannot bisect more!

Unfortunately I'd to:
# skip: [82c5da6bf8b55a931b042fb531083863d26c8020] drm/ttm: ttm_fault callback to allow driver to handle bo placement V6
because this commit caused 3 locks up in a row for me. First at KDE logging screen, next two as suspending (just before machines was expected to power off).


So it seems suspected commits are:

commit 0a2d50e3a8faaf36cde36920431586090411ea15
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Fri Apr 9 14:39:24 2010 +0200

    drm/radeon/kms: add support for new fault callback V7

and

commit 82c5da6bf8b55a931b042fb531083863d26c8020
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Fri Apr 9 14:39:23 2010 +0200

    drm/ttm: ttm_fault callback to allow driver to handle bo placement V6
Comment 6 Christian Schmidt 2010-04-30 06:51:54 UTC
Hi Rafał,

Can you eleborate on what you see if resume fails? I do have some issues on my RV635, and testing with various kernels has shown me:

== rc5 resume failure
10fd883ce384706f88554a0b08cc4d63345e7d8b agp/intel: put back check that we have a driver for the bridge.
d4b74bf07873da2e94219a7b67a334fc1c3ce649 Revert "drm/i915: Configure the TV sense state correctly on GM45 to make TV detection reliable"
== rc5 resume failure
7547a917fa5f3b2406f52c7dcf7ec9ad3c8532eb Merge branch 'drm-ttm-unmappable' into drm-core-next
== rc2 resume ok (slow) with 10fd883ce384706f88554a0b08cc4d63345e7d8b
6b8b1786a8c29ce6e32298b93ac8d4a18a2b11c4 drm/radeon/kms: enable use of unmappable VRAM V2
0c321c79627189204d7d0bf65ab19f5ac419abed drm/ttm: remove io_ field from TTM V6
== rc2 resume ok with 10fd883ce384706f88554a0b08cc4d63345e7d8
96bf8b8778976a6e6a4fe4e6e0421d8ed7892798 drm/vmwgfx: add support for new TTM fault callback V5
f32f02fd81f3177cce0c16cc7d210fcc9cad953c drm/nouveau/kms: add support for new TTM fault callback V5
0a2d50e3a8faaf36cde36920431586090411ea15 drm/radeon/kms: add support for new fault callback V7
== rc2 suspend failure (boot with 10fd883ce384706f88554a0b08cc4d63345e7d8b)
82c5da6bf8b55a931b042fb531083863d26c8020 drm/ttm: ttm_fault callback to allow driver to handle bo placement V6
== rc5 resume ok with 10fd883ce384706f88554a0b08cc4d63345e7d8
a8089e849a32c5b6bfd6c88dbd09c0ea4a779b71 drm/i915: drop pointer to drm_gem_object
62b8b21515065235bd363ad07094d301532e14ce drm/i915: don't use ->driver_private anymore
c397b9084cabdcaae26266bd0bd32ba62e757046 drm/i915: embed the gem object into drm_i915_gem_object
ac52bc56de25535a907ef07f8755f1387b89b0f5 drm/i915: introduce i915_gem_alloc_object
== rc5 resume ok with 10fd883ce384706f88554a0b08cc4d63345e7d8
fd632aa34c8592fb1d37fc83cbffa827bc7dd42c drm: free core gem object from driver callbacks
1d397043bcc2c8cdccb584a8ef73131f28f18e4c drm: extract drm_gem_object_init
== rc5 resume ok with 10fd883ce384706f88554a0b08cc4d63345e7d8b
153549b8b63d71a9c5d8cbde887097b995c32bd6 Merge branch 'drm-radeon-evergreen-accel' into drm-core-next
7fff400be6fbf64f10abca9939718aaf1d61c255 Merge branch 'drm-fbdev-cleanup' into drm-core-next
0bcb1d844ac638a4c4280f697d5bfac9791e9a70 Merge branch 'drm-radeon-lockup' into drm-core-next
c9c2625ff4fc4ce652e686f895059d2902c01ca0 Merge branch 'drm-edid-fixes' into drm-core-next
c2b41276da65481d36311a13d69020d150861c43 Merge branch 'drm-ttm-pool' into drm-core-next
== rc5 resume ok with 10fd883ce384706f88554a0b08cc4d63345e7d8b
97921a5b03d40681b3aed620a5e719710336c6df Merge remote branch 'anholt/drm-intel-next' of /home/airlied/kernel/drm-next into drm-core-next
== rc5 resume ok
01bf0b64579ead8a82e7cfc32ae44bc667e7ad0f Linux 2.6.34-rc5

Resume failures are total screen corruption and inoperational X. All kernels in the range need the fix from 10fd883ce384706f88554a0b08cc4d63345e7d8b to boot as otherwise they oops.

I've attached the dmesg that shows:
[   68.840476] resource map sanity check conflict: 0x0 0xfffff 0xa0000 0xbffff PCI Bus 0000:00
...
[   68.841002] [TTM] Buffer eviction failed

and later multiple GPU stalls on resume.
Comment 7 Christian Schmidt 2010-04-30 06:53:20 UTC
Created attachment 35344 [details]
dmesg of kernel where resume fails
Comment 8 Jerome Glisse 2010-05-05 02:06:17 UTC
Created attachment 35425 [details] [review]
Don't iomap system memory

Please test if the attached patch fix the issue for you. Thanks
Comment 9 Rafał Miłecki 2010-05-05 12:45:30 UTC
(In reply to comment #8)
> Created an attachment (id=35425) [details]
> Don't iomap system memory
> 
> Please test if the attached patch fix the issue for you. Thanks

It fixes problem. Thanks!

Resolving with hope it'll be taken for d-r-t soon.
Comment 10 Alex Deucher 2010-05-07 06:18:03 UTC
*** Bug 28016 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.