Bug 100058

Summary: amdgpu/dpm: NULL pointer dereference
Product: DRI Reporter: Adam Wolk <adam.wolk>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: adam.wolk, funfunctor
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
system log from multiple reboots
none
dpm patch none

Description Adam Wolk 2017-03-04 12:45:06 UTC
Created attachment 130063 [details]
system log from multiple reboots

I noticed my external display constantly turning on and off unless a DRI app is active (ie. running DRI_PRIME=1 glxgears).
I was suggested to blacklist the `radeon` driver as I am using `amdgpu` and I proceeded to do this.

Blacklisting the driver results in the system not being able to boot.

From the first attempts I caught those 2 screenshots:
https://imgur.com/AjG7IgB,xEi2L4B
https://imgur.com/xEi2L4B

my last attempt revealed a kernel null pointer dereference that was logged in journalctl (other errors; stack traces were not logged)
https://gist.github.com/mulander/6f4d8bfc0fe73af25ee2c95014754822

I'm attaching all journalctl entries since today, search it for 'BUG' to see the boot with the null pointer dereference. It was started
with radeon blacklisted on boot.

I tried several blacklisting methods including modprobe.conf & regenerating initramfs.

[mulander@napalm ~]$ uname -a
Linux napalm 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux

[mulander@napalm ~]$ lspci
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1c.3 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 4 (rev e4)
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4)
00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)
02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter (rev 01)
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun LE [Radeon HD 8550M / R5 M230]

lsmod
Module                  Size  Used by
ctr                    16384  6
ccm                    20480  3
hid_generic            16384  0
usbhid                 49152  0
joydev                 20480  0
mousedev               20480  0
amdgpu               1499136  0
snd_hda_codec_hdmi     45056  1
amdkfd                122880  1
amd_iommu_v2           20480  1 amdkfd
intel_rapl             20480  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
coretemp               16384  0
kvm                   524288  0
radeon               1478656  4
irqbypass              16384  1 kvm
intel_cstate           16384  0
ttm                    86016  2 amdgpu,radeon
intel_rapl_perf        16384  0
snd_soc_rt5640        110592  0
snd_soc_rl6231         16384  1 snd_soc_rt5640
ppdev                  20480  0
snd_soc_core          188416  1 snd_soc_rt5640
snd_hda_codec_conexant    24576  1
snd_hda_codec_generic    69632  1 snd_hda_codec_conexant
snd_hda_intel          32768  7
snd_hda_codec         106496  4 snd_hda_intel,snd_hda_codec_conexant,snd_hda_codec_hdmi,snd_hda_codec_generic
snd_compress           20480  1 snd_soc_core
evdev                  24576  15
snd_hda_core           65536  5 snd_hda_intel,snd_hda_codec_conexant,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic
snd_pcm_dmaengine      16384  1 snd_soc_core
snd_hwdep              16384  1 snd_hda_codec
psmouse               131072  0
ideapad_laptop         24576  0
pcspkr                 16384  0
input_leds             16384  0
sparse_keymap          16384  1 ideapad_laptop
arc4                   16384  2
mac_hid                16384  0
r8169                  77824  0
ath9k                 131072  0
ath9k_common           32768  1 ath9k
ath9k_hw              442368  2 ath9k,ath9k_common
uvcvideo               86016  0
ath                    28672  3 ath9k_hw,ath9k,ath9k_common
videobuf2_vmalloc      16384  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
ath3k                  20480  0
btusb                  40960  0
videobuf2_v4l2         20480  1 uvcvideo
mac80211              688128  1 ath9k
videobuf2_core         36864  2 uvcvideo,videobuf2_v4l2
btrtl                  16384  1 btusb
btbcm                  16384  1 btusb
btintel                16384  1 btusb
videodev              151552  3 uvcvideo,videobuf2_core,videobuf2_v4l2
rtsx_usb_ms            20480  0
bluetooth             499712  6 btrtl,btintel,btbcm,ath3k,btusb
media                  32768  2 uvcvideo,videodev
memstick               16384  1 rtsx_usb_ms
snd_pcm                90112  9 snd_hda_intel,snd_hda_codec,snd_pcm_dmaengine,snd_hda_core,snd_soc_rt5640,snd_hda_codec_hdmi,snd_soc_core
cfg80211              516096  4 mac80211,ath9k,ath,ath9k_common
mii                    16384  1 r8169
battery                20480  0
rfkill                 20480  5 bluetooth,ideapad_laptop,cfg80211
wmi                    16384  1 ideapad_laptop
ac97_bus               16384  1 snd_soc_core
i2c_hid                20480  0
hid                   114688  3 i2c_hid,hid_generic,usbhid
fjes                   28672  0
elan_i2c               32768  0
i915                 1204224  12
parport_pc             28672  0
parport                40960  2 parport_pc,ppdev
video                  36864  2 i915,ideapad_laptop
mei_me                 36864  0
spi_pxa2xx_platform    24576  0
8250_dw                16384  0
i2c_designware_platform    16384  0
drm_kms_helper        126976  3 amdgpu,radeon,i915
drm                   294912  13 amdgpu,radeon,i915,ttm,drm_kms_helper
snd_soc_sst_acpi       16384  0
intel_gtt              20480  1 i915
snd_soc_sst_match      16384  1 snd_soc_sst_acpi
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
i2c_algo_bit           16384  3 amdgpu,radeon,i915
snd_timer              28672  1 snd_pcm
snd                    69632  22 snd_compress,snd_hda_intel,snd_hwdep,snd_hda_codec_conexant,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_soc_core,snd_pcm
lpc_ich                24576  0
mei                    86016  1 mei_me
shpchp                 32768  0
soc_button_array       16384  0
i2c_i801               24576  0
i2c_designware_core    20480  1 i2c_designware_platform
i2c_smbus              16384  1 i2c_i801
soundcore              16384  1 snd
tpm_tis                16384  0
tpm_tis_core           20480  1 tpm_tis
tpm                    36864  2 tpm_tis,tpm_tis_core
ac                     16384  0
button                 16384  1 i915
sch_fq_codel           20480  5
ip_tables              28672  0
x_tables               28672  1 ip_tables
ext4                  528384  3
crc16                  16384  2 bluetooth,ext4
jbd2                   90112  1 ext4
fscrypto               24576  1 ext4
mbcache                16384  4 ext4
algif_skcipher         20480  0
af_alg                 16384  1 algif_skcipher
dm_crypt               28672  1
dm_mod                106496  12 dm_crypt
sr_mod                 24576  0
sd_mod                 36864  3
cdrom                  53248  1 sr_mod
rtsx_usb_sdmmc         28672  0
rtsx_usb               20480  2 rtsx_usb_sdmmc,rtsx_usb_ms
serio_raw              16384  0
atkbd                  24576  0
libps2                 16384  2 atkbd,psmouse
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
crc32c_intel           24576  0
ghash_clmulni_intel    16384  0
ahci                   36864  2
libahci                28672  1 ahci
aesni_intel           167936  9
xhci_pci               16384  0
aes_x86_64             20480  1 aesni_intel
lrw                    16384  1 aesni_intel
xhci_hcd              172032  1 xhci_pci
gf128mul               16384  1 lrw
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
cryptd                 20480  4 ablk_helper,ghash_clmulni_intel,aesni_intel
libata                212992  2 ahci,libahci
ehci_pci               16384  0
ehci_hcd               73728  1 ehci_pci
usbcore               208896  9 uvcvideo,usbhid,ehci_hcd,xhci_pci,rtsx_usb,ath3k,btusb,xhci_hcd,ehci_pci
scsi_mod              159744  3 sd_mod,libata,sr_mod
usb_common             16384  1 usbcore
i8042                  28672  1 ideapad_laptop
serio                  20480  6 serio_raw,atkbd,psmouse,i8042
sdhci_acpi             16384  0
sdhci                  40960  1 sdhci_acpi
led_class              16384  4 rtsx_usb_sdmmc,sdhci,input_leds,ath9k
mmc_core              122880  3 rtsx_usb_sdmmc,sdhci,sdhci_acpi
Comment 1 Adam Wolk 2017-03-04 13:09:16 UTC
I just tried on a newer kernel from Archlinux [testing].
[mulander@napalm ~]$ uname -a
Linux napalm 4.10.1-1-ARCH #1 SMP PREEMPT Sun Feb 26 21:08:53 UTC 2017 x86_64 GNU/Linux

Same problem. Here is a photo I managed to grab while trying to boot it up.

https://imgur.com/PCC42Bj
Comment 2 Edward O'Callaghan 2017-03-04 13:19:34 UTC
OK so we narrowed the problem down to dpm.

(12:17:01 AM) mulander: yep, blacklisted radeon, amdgpu.dpm=0 and booted to X properly without a crash
Comment 3 Michel Dänzer 2017-03-06 02:31:12 UTC
(In reply to Adam Wolk from comment #0)
> I noticed my external display constantly turning on and off unless a DRI app
> is active (ie. running DRI_PRIME=1 glxgears).

Which GPU is the external display connected to? If you're not sure, attach the output of xrandr.
Comment 4 Adam Wolk 2017-03-06 10:41:35 UTC
Here is the xrandr output.

[mulander@napalm ~]$ xrandr
Screen 0: minimum 8 x 8, current 3286 x 1080, maximum 32767 x 32767
eDP1 connected 1366x768+1920+0 (normal left inverted right x axis y axis) 340mm x 190mm
   1366x768      59.97*+
   1024x768      60.00
   1024x576      60.00
   960x540       60.00
   800x600       60.32    56.25
   864x486       60.00
   640x480       59.94
   720x405       60.00
   680x384       60.00
   640x360       60.00
DP1 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 480mm x 270mm
   1920x1080     60.00*+
   1680x1050     59.95
   1280x1024     75.02    60.02
   1152x864      75.00
   1024x768      75.03    60.00
   800x600       75.00    60.32
   640x480       75.00    59.94
   720x400       70.08
HDMI1 disconnected (normal left inverted right x axis y axis)
HDMI2 disconnected (normal left inverted right x axis y axis)
VIRTUAL1 disconnected (normal left inverted right x axis y axis)
Comment 5 Alex Deucher 2017-03-06 14:11:14 UTC
Possibly a duplicate of bug 99387.
Comment 6 Michel Dänzer 2017-03-07 01:13:17 UTC
FWIW, since all display outputs are connected to the iGPU, it's unlikely that using amdgpu instead of radeon will have any effect on the external display turning on and off. You'd have to bring that up with the iGPU drivers.
Comment 7 Adam Wolk 2017-03-07 12:54:23 UTC
Regarding the display flicking on/off (the effect feels like changing resolution - the way it goes out and back). This is completely mitigated by running DRI_PRIME=1 glxgears hence why I thought it might be AMD driver related.

Regardless, the main thing reported here is a null pointer dereference in the kernel and a system unable to boot completely.

I can live with the flicker - I just workaround it by running DRI_PRIME=1 glxgears all day...
Comment 8 Alex Deucher 2017-03-07 15:13:46 UTC
Do the patches in bug 99387 help?
Comment 9 Michel Dänzer 2017-03-08 01:38:06 UTC
(In reply to Adam Wolk from comment #7)
> Regarding the display flicking on/off (the effect feels like changing
> resolution - the way it goes out and back). This is completely mitigated by
> running DRI_PRIME=1 glxgears hence why I thought it might be AMD driver
> related.

It can't really be directly related, since the display isn't connected to the AMD GPU. I'd report it against the i915 kernel driver.
Comment 10 Adam Wolk 2017-03-25 13:51:44 UTC
> Do the patches in bug 99387 help?

This is a machine I use for work unfortunately I can't fiddle with it more.

Regarding the flickering issue I reported it as a separate bug.

https://bugs.freedesktop.org/show_bug.cgi?id=100386
Comment 11 Edward O'Callaghan 2017-04-21 01:16:53 UTC
(In reply to Adam Wolk from comment #10)
> > Do the patches in bug 99387 help?
> 
> This is a machine I use for work unfortunately I can't fiddle with it more.
> 
> Regarding the flickering issue I reported it as a separate bug.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=100386

[edward@skytop linux]$ git tag --contains c10c8f7 -l
v4.10.0
v4.11-rc1
v4.11-rc2
v4.11-rc3
v4.11-rc4
v4.11-rc5
v4.11-rc6
v4.11-rc7

@Adam, can you please try updating to at minimum kernel v4.10.0 and seeing if that fixes the issue for you?
Comment 12 Edward O'Callaghan 2017-04-21 11:46:57 UTC
@Alex Deucher I confirmed with Adam that, even with c10c8f7 he still has the null pointer issue.
Comment 13 Edward O'Callaghan 2017-04-21 12:43:17 UTC
Created attachment 130967 [details] [review]
dpm patch

@Adam, please try applying the attached patch and let me know if it helps with your issue?
Comment 14 Martin Peres 2019-11-19 08:14:44 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/147.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.