Summary: | [regression] [amdgpu] Errors scheduling IBs | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Mike Lothian <mike> | ||||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||
Severity: | normal | ||||||||||||||||
Priority: | medium | CC: | alexdeucher, ckoenig.leichtzumerken, mike | ||||||||||||||
Version: | DRI git | ||||||||||||||||
Hardware: | Other | ||||||||||||||||
OS: | All | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
Description
Mike Lothian
2016-06-19 09:01:41 UTC
Created attachment 124602 [details]
journalctl output
Reverting that commit makes the errors go away Could be some kind of race condition, please provide the output of "journalctl --dmesg -o short-monotonic". Created attachment 124607 [details]
demsg output
Ok, clearly not a race condition but something is wrong here the cause the driver tries to initialize the ring buffers multiple times. Maybe something is causing a GPU reset, but as far as I remember those should still be turned of by default. Please provide a journalctl output from a boot with the patch in question reverted. I should probably have specified this is a prime laptop with dynpm The card initialises each time it's needed, it always does this during boot and again when X starts, and each time I load a game with DRI_PRIME=1 Ah, enlightenment! Thanks that was the info I was missing. We probably just forget to wait for all evictions before we turn of the GPU resulting in the still running jobs to produce this error message. Give me a second to hack together a patch. Created attachment 124616 [details] [review] Possible fix Please test the attached patch it should fix the issue. I'll test this tonight when I get home, thanks Created attachment 124623 [details]
dmesg
Still seems to happen
I'm running out of ideas. Does that have any other negative results except for the error message? Alex any idea what else could cause an eviction during switching of the dGPU? (In reply to Christian König from comment #11) > I'm running out of ideas. Does that have any other negative results except > for the error message? > > Alex any idea what else could cause an eviction during switching of the dGPU? powering up/down the dGPU should hit the same code as resume/suspend. Are you seeing similar issues with suspend and resume? Maybe the scheduler isn't getting stopped properly on suspend? We recently fixed something like this for gpu reset. Created attachment 124672 [details]
dmesg
It seems to spam the logs more when I fire up a game
Created attachment 124708 [details] [review] Additional fix. Alex suspend/resume idea was the right approach. I was able to reproduce the issue and so find a pretty fundamental bug in one of my recent patches. Please see the additional fix, together with the first patch it should resolve the issue. The patch already seems to have landed in drm-next-4.8-wip and it does indeed seem to fix it |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.