Bug 111869 - Navi "divide error" hang
Summary: Navi "divide error" hang
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: Other All
: not set not set
Assignee: Default DRI bug account
QA Contact:
Depends on:
Reported: 2019-09-30 12:30 UTC by Doug Ty
Modified: 2019-11-19 09:56 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:

divide-error_09-20.txt (4.27 KB, text/plain)
2019-09-30 12:30 UTC, Doug Ty
no flags Details
divide-error_09-30.txt (4.26 KB, text/plain)
2019-09-30 12:30 UTC, Doug Ty
no flags Details
divide-error_09-30-x2.txt (4.26 KB, text/plain)
2019-09-30 12:30 UTC, Doug Ty
no flags Details

Description Doug Ty 2019-09-30 12:30:27 UTC
Created attachment 145593 [details]

Occasionally, usually while watching videos in Firefox, my GPU will hang and the screen will freeze -- sound and keyboard input still work in the background, and I need to use REISUB hotkeys to reboot. This is separate, in addition to the sdma and ring gfx_0.0.0 hangs.

Upon rebooting, journalctl shows the attached "divide error". I've included logs from 3 instances of it happening. I'm currently using the Jul 14 firmware from Fedora's linux-firmware package as the hang appears to occur more often on the newer firmware from https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/ however this may just be placebo.

It occurs with or without the "0-sized IBs" kernel patch from https://bugs.freedesktop.org/show_bug.cgi?id=111481#c33 and on both PCIe 3.0 and 4.0. I'm not using a PCIe riser and the card works without issue on Windows 10 dual boot.

CPU: 3700X
GPU: Sapphire 5700XT (reference)
Motherboard: Gigabyte X570-I (BIOS F4)
Kernel: 5.3.0
Mesa: mesa-git 1:19.3.0_devel.115682.3c966fd688c-1
LLVM: llvm-git 10.0.0_r327425.63f6066b53d-1

Please let me know if any more information would be helpful, or if there's anything I can do to troubleshoot. Thanks.
Comment 1 Doug Ty 2019-09-30 12:30:48 UTC
Created attachment 145594 [details]
Comment 2 Doug Ty 2019-09-30 12:30:59 UTC
Created attachment 145595 [details]
Comment 3 Andrew Sheldon 2019-11-01 01:27:40 UTC
I also get this error frequently with amd-staging-drm-next, but not with 5.4-rcX (at least I can't remember getting one with the latter).

Not sure if that suggests there is a regression, or something to do with the 5.3 kernel specifically (I don't remember having the error when amd-staging-drm-next was using 5.2 kernel).
Comment 4 Andrew Sheldon 2019-11-05 00:40:06 UTC
I just want to add that I do still get this bug with 5.4-rcX, unfortunately. It's the only remaining non-Mesa hang that I haven't been able to workaround.
Comment 5 Martin Peres 2019-11-19 09:56:45 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/926.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.