Bug 92524 - system hang on "radeon_hwmon_get_pwm1_enable"
Summary: system hang on "radeon_hwmon_get_pwm1_enable"
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-18 15:14 UTC by Thomas DEBESSE
Modified: 2015-10-21 01:21 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
complete dmesg log (95.72 KB, text/plain)
2015-10-18 15:18 UTC, Thomas DEBESSE
Details
verbose lspci output for the GPU (1.06 KB, text/plain)
2015-10-18 15:18 UTC, Thomas DEBESSE
Details
possible fix (1.57 KB, patch)
2015-10-19 13:34 UTC, Alex Deucher
Details | Splinter Review
attachment-7674-0.html (694 bytes, text/html)
2015-10-19 15:35 UTC, Thomas DEBESSE
Details

Description Thomas DEBESSE 2015-10-18 15:14:47 UTC
Hi, every time I do "cat /sys/class/hwmon/hwmon0/pwm1_enable" my system crashes, and just before it hangs dmesg prints that:

--8<------
[  340.012644] BUG: unable to handle kernel paging request at 0000000000001659
[  340.012717] IP: [<ffffffffc031ab9e>] ci_fan_ctrl_get_mode+0xe/0x30 [radeon]
[  340.012820] PGD 7e1b0b067 PUD 7e19e0067 PMD 0 
[  340.012865] Oops: 0000 [#1] SMP 
[  340.012896] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables ctr ccm cmac binfmt_misc rfcomm bnep fglrx(POE) eeepc_wmi asus_wmi sparse_keymap video arc4 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ath9k snd_usb_audio ath3k snd_hda_intel ath9k_common btusb hid_logitech_hidpp snd_hda_codec ath9k_hw btrtl snd_usbmidi_lib snd_hda_core btbcm snd_hwdep btintel kvm_amd ath snd_pcm bluetooth kvm snd_seq_midi snd_seq_midi_event mac80211 snd_rawmidi snd_seq input_leds snd_seq_device edac_core serio_raw fam15h_power snd_timer edac_mce_amd
[  340.013643]  k10temp cfg80211 snd i2c_piix4 soundcore 8250_fintek shpchp tpm_infineon mac_hid cuse it87 hwmon_vid parport_pc ppdev lp parport autofs4 drbg ansi_cprng dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear amdgpu hid_logitech_dj hid_logitech ff_memless raid1 hid_generic usbhid hid amdkfd amd_iommu_v2 mxm_wmi radeon crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper i2c_algo_bit ablk_helper ttm cryptd drm_kms_helper r8169 drm mii ahci libahci wmi
[  340.014176] CPU: 5 PID: 5797 Comm: cat Tainted: P           OE   4.2.3-040203-generic #201510030832
[  340.014248] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2501 04/08/2014
[  340.014359] task: ffff8800b4db9b80 ti: ffff8807e1b20000 task.ti: ffff8807e1b20000
[  340.014418] RIP: 0010:[<ffffffffc031ab9e>]  [<ffffffffc031ab9e>] ci_fan_ctrl_get_mode+0xe/0x30 [radeon]
[  340.014528] RSP: 0018:ffff8807e1b23d00  EFLAGS: 00010246
[  340.014571] RAX: 0000000000000000 RBX: ffff88081533c000 RCX: ffff880813b31e10
[  340.014627] RDX: 0000000000000000 RSI: ffffffffc038eea0 RDI: ffff880813248000
[  340.014683] RBP: ffff8807e1b23d18 R08: ffff880814583818 R09: 0000000000000000
[  340.014739] R10: 0000000000001000 R11: 0000000000000246 R12: ffffffff81873090
[  340.014796] R13: 0000000000000001 R14: ffff8807e1b23f20 R15: ffff880814bb6780
[  340.014853] FS:  00007f4ddf962700(0000) GS:ffff88083ed40000(0000) knlGS:0000000000000000
[  340.014916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  340.014962] CR2: 0000000000001659 CR3: 00000007beccb000 CR4: 00000000000406e0
[  340.015018] Stack:
[  340.015036]  ffffffffc02aaebd ffff8807e1b23d68 ffffffffc038eea0 ffff8807e1b23d48
[  340.015103]  ffffffff814e0be0 0000000000000000 ffffffff817a7ed6 ffff880814061300
[  340.015170]  ffff880814bb6780 ffff8807e1b23d68 ffffffff812626d2 0000000000000000
[  340.015237] Call Trace:
[  340.015296]  [<ffffffffc02aaebd>] ? radeon_hwmon_get_pwm1_enable+0x2d/0x60 [radeon]
[  340.015361]  [<ffffffff814e0be0>] dev_attr_show+0x20/0x50
[  340.015408]  [<ffffffff817a7ed6>] ? mutex_lock+0x16/0x37
[  340.015454]  [<ffffffff812626d2>] sysfs_kf_seq_show+0xc2/0x1a0
[  340.015532]  [<ffffffff81260f23>] kernfs_seq_show+0x23/0x30
[  340.015580]  [<ffffffff8120c5f5>] seq_read+0xe5/0x350
[  340.015623]  [<ffffffff812616dd>] kernfs_fop_read+0x10d/0x170
[  340.015685]  [<ffffffff811e9418>] __vfs_read+0x28/0xe0
[  340.015732]  [<ffffffff8130b293>] ? security_file_permission+0xa3/0xc0
[  340.015785]  [<ffffffff811e9976>] ? rw_verify_area+0x56/0xe0
[  340.015832]  [<ffffffff811e9a86>] vfs_read+0x86/0x130
[  340.015875]  [<ffffffff811ea8a6>] SyS_read+0x46/0xa0
[  340.015917]  [<ffffffff817a9b72>] entry_SYSCALL_64_fastpath+0x16/0x75
[  340.015968] Code: 01 75 af 41 c6 85 59 16 00 00 00 eb a5 e8 ab c3 ff ff 48 83 c4 08 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 48 8b 97 30 1e 00 00 31 c0 <80> ba 59 16 00 00 00 74 01 c3 55 be 6c 00 30 c0 48 89 e5 e8 ea 
[  340.016273] RIP  [<ffffffffc031ab9e>] ci_fan_ctrl_get_mode+0xe/0x30 [radeon]
[  340.016364]  RSP <ffff8807e1b23d00>
[  340.016393] CR2: 0000000000001659
[  340.037044] ---[ end trace 94bb46b2f2c467e6 ]---
--8<------

It seems the complete system hangs, not only the display, if I have a terminal open as root I can't do things like "reboot" at all, even the keyboard is frozen.

I need to be able to do "cat /sys/class/hwmon/hwmon0/pwm1_enable" since it's something that pwmconfig does by default (the pwmconfig script from fancontrol, to use with lm-sensors) and I have to use fancontrol to control some fans to reduce noise. I probably don't have the need to tune the radeon fan itself but pwmconfig check for all fans available so its hang my system.

I'm using some 11.1 daily mesa stuff compiled from git by https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers and 4.2.3 kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/ on an Ubuntu 15.04, so that is the versions of some packages that may interest you:

libdrm-radeon1:amd64                2.4.65+git1510161830.304552~gd~v
libdrm-radeon1:i386                 2.4.65+git1510161830.304552~gd~v
libegl1-mesa-drivers:amd64          11.1~git1510171930.006fcc~gd~v
libegl1-mesa-drivers:i386           11.1~git1510171930.006fcc~gd~v
libgl1-mesa-dri:amd64               11.1~git1510171930.006fcc~gd~v
libgl1-mesa-dri:i386                11.1~git1510171930.006fcc~gd~v
libgl1-mesa-glx:amd64               11.1~git1510171930.006fcc~gd~v
libgl1-mesa-glx:i386                11.1~git1510171930.006fcc~gd~v
linux-headers-4.2.3-040203          4.2.3-040203.201510030832
linux-headers-4.2.3-040203-generic  4.2.3-040203.201510030832
linux-image-4.2.3-040203-generic    4.2.3-040203.201510030832
xserver-xorg-video-radeon           1:7.5.99+git1510071932.ce9914~gd~v

uname -srvop says:

Linux 4.2.3-040203-generic #201510030832 SMP Sat Oct 3 12:34:31 UTC 2015 x86_64 GNU/Linux

lspci -nn says that about my graphic card:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X] [1002:67b0] (rev 80)

I hope this bug can be fixed!
Comment 1 Thomas DEBESSE 2015-10-18 15:18:06 UTC
Created attachment 118963 [details]
complete dmesg log
Comment 2 Thomas DEBESSE 2015-10-18 15:18:58 UTC
Created attachment 118964 [details]
verbose lspci output for the GPU
Comment 3 Alex Deucher 2015-10-19 13:34:17 UTC
Created attachment 118978 [details] [review]
possible fix

The attached patch should fix it.  pwm control is only available with dpm, and you've disabled dpm via the kernel command line.
Comment 4 Thomas DEBESSE 2015-10-19 15:35:25 UTC
Created attachment 118987 [details]
attachment-7674-0.html

> The attached patch should fix it. pwm control is only available with dpm,
and you've disabled dpm via the kernel command line.

Oh yes, the last grub update (fortuitously made the morning before I change
my GPU for the one described above) introduced a "radeon.dpm=0" default
kernel command line option, probably to workaround that other bug
https://bugzilla.kernel.org/show_bug.cgi?id=103271

Thank you for your fast answer, I will try both (the patch and the kernel
command line option).

On which source tree this patch must be applied?

--
Thomas DEBESSE
Comment 5 Alex Deucher 2015-10-19 15:37:14 UTC
(In reply to Thomas DEBESSE from comment #4)
> 
> On which source tree this patch must be applied?

It was against my drm-next tree, but should apply to any recent kernel.
Comment 6 Thomas DEBESSE 2015-10-19 19:32:12 UTC
So, some news!

I've just tested with radeon.dpm=1, but less than 10 minute after the boot the graphic display completely hangs with graphic glitches (the system was available from ssh for example, but was not able to reboot), the command "cat /sys/class/hwmon/hwmon0/pwm1_enable" works (I did it *after* the graphical hang, so the graphical hang was not a consequence of this command). So, the glitch is another bug, but I now verified that "cat /sys/class/hwmon/hwmon0/pwm1_enable" is not a crash cause when radeon.dpm=1.

Also, I tried to load amdgpu instead of radeon module, blacklinsting radeon, then I verified that with amdgpu.dpm=0 there is no "/sys/class/hwmon/hwmon0/pwm1_enable" file and with amdgpu.dpm=1, there is a "/sys/class/hwmon/hwmon0/pwm1_enable" file and I can "cat" it. So, the bug was not reproduced in amdgpu module, which is a good news.

I've not yet tried to recompile the kernel to test your patch.

I'm now running Ubuntu 15.10, of course it changes nothing about this bug, but they ship by default a 4.2 kernel so I can use my distro tools to recompile the kernel and do it cleanly, applying the patch on the official Ubuntu kernel tree.
Comment 7 Thomas DEBESSE 2015-10-21 01:21:19 UTC
Hi, I just recompiled my kernel applying your patch and it works, you fixed that issue!

I cannot anymore "cat /sys/class/hwmon/hwmon0/pwm1_enable" with radeon.dpm=0 so my system does not hang when I run pwmconfig. :-)

Thanks a lot for your fast response and your efficient help.

About the other bug, the unexpected hang when using radeon.dpm=1 , I added some comments on bug #68059 .

I close the current bug as RESOLVED/FIXED.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.