Bug 111869

Summary: Navi "divide error" hang
Product: DRI Reporter: Doug Ty <git>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: not set    
Priority: not set CC: alexandr.kara, git, popovic.marko, univerz
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
divide-error_09-20.txt
none
divide-error_09-30.txt
none
divide-error_09-30-x2.txt none

Description Doug Ty 2019-09-30 12:30:27 UTC
Created attachment 145593 [details]
divide-error_09-20.txt

Occasionally, usually while watching videos in Firefox, my GPU will hang and the screen will freeze -- sound and keyboard input still work in the background, and I need to use REISUB hotkeys to reboot. This is separate, in addition to the sdma and ring gfx_0.0.0 hangs.

Upon rebooting, journalctl shows the attached "divide error". I've included logs from 3 instances of it happening. I'm currently using the Jul 14 firmware from Fedora's linux-firmware package as the hang appears to occur more often on the newer firmware from https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/ however this may just be placebo.

It occurs with or without the "0-sized IBs" kernel patch from https://bugs.freedesktop.org/show_bug.cgi?id=111481#c33 and on both PCIe 3.0 and 4.0. I'm not using a PCIe riser and the card works without issue on Windows 10 dual boot.

CPU: 3700X
GPU: Sapphire 5700XT (reference)
Motherboard: Gigabyte X570-I (BIOS F4)
Kernel: 5.3.0
Mesa: mesa-git 1:19.3.0_devel.115682.3c966fd688c-1
LLVM: llvm-git 10.0.0_r327425.63f6066b53d-1

Please let me know if any more information would be helpful, or if there's anything I can do to troubleshoot. Thanks.
Comment 1 Doug Ty 2019-09-30 12:30:48 UTC
Created attachment 145594 [details]
divide-error_09-30.txt
Comment 2 Doug Ty 2019-09-30 12:30:59 UTC
Created attachment 145595 [details]
divide-error_09-30-x2.txt
Comment 3 Andrew Sheldon 2019-11-01 01:27:40 UTC
I also get this error frequently with amd-staging-drm-next, but not with 5.4-rcX (at least I can't remember getting one with the latter).

Not sure if that suggests there is a regression, or something to do with the 5.3 kernel specifically (I don't remember having the error when amd-staging-drm-next was using 5.2 kernel).
Comment 4 Andrew Sheldon 2019-11-05 00:40:06 UTC
I just want to add that I do still get this bug with 5.4-rcX, unfortunately. It's the only remaining non-Mesa hang that I haven't been able to workaround.
Comment 5 Martin Peres 2019-11-19 09:56:45 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/926.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.