Bug 99264 - Deterministic crash on RX460 "NULL pointer dereference"
Summary: Deterministic crash on RX460 "NULL pointer dereference"
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-03 20:44 UTC by Daniël Mantione
Modified: 2019-11-19 08:12 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Complete syslog data from boot to crash (223.33 KB, text/plain)
2017-01-03 20:44 UTC, Daniël Mantione
no flags Details
Xorg log file (no weird things are visible here) (45.98 KB, text/plain)
2017-01-03 20:45 UTC, Daniël Mantione
no flags Details

Description Daniël Mantione 2017-01-03 20:44:15 UTC
Created attachment 128734 [details]
Complete syslog data from boot to crash

Hello,

I am in the process of migrating from a Radeon HD6670 to an RX 460 for quite a few months now. I regularily fit the RX460, but keep running into issues, crashes and others, that force me to install the HD6670 again if I need my computer for serious work or even a more stable gaming situation. However, I am making progress identifying issues, and it looks like there are 3 different causes for crashes. One of them I can now reproduce very easy, and smells like a real driver bug, so therefore I would like to report it.

My hardware is as follows:
 Xeon E5-2650v2 CPU (once it was an Opteron, but you stopped making new ones :( )
 Supermicro X9SRE-3F mainboard
 32GB RAM
 HIS Radeon RX 460 2GB
 3 * HP LP2065 1600x1200 monitor
  - One connected via active DP to DVI converter
  - One connected via DVI
  - One connected via HDMI to DVI cable

My software configuration is as follows:
 OpenSuSE 13.1 with the following modifications:
  - Amd-staging-4.7 kernel as of 21 december 2016 (compiled it myself)
      (DAL is needed to use all my 3 monitors)
  - Xorg upgraded to 7.6  (via http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_13.1/ )
  - Mesa upgraded to 13.0.1 (via http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_13.1/ )

How to reproduce?

Using the game The Great Whale Road, I have captured the OpenGL command stream with Apitrace:

http://apitrace.github.io/

I have uploaded it here, be warned that this is a 770MB download:

http://www.freepascal.org/~daniel/greatwhaleroad.trace.bz2

Then:

bunzip2 greatwhaleroad.trace.bz2
apitrace replay greatwhaleroad.trace

At the end of the replay, all monitors lose signal and go black. Because my mainboard has a small Aspeed onboard VGA controller, I can switch my monitor input to that VGA controller, login to the Linux VGA text console and recover some information. In dmesg the stack trace below is visible. You can also see that the X server and game processes are still running, but hanging inside the kernel, so they cannot be killed.

Best regards,

Daniël Mantione

[ 1631.286172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 1631.333419] IP: [<ffffffffa08e3e7a>] amdgpu_gtt_mgr_alloc+0x2a/0x150 [amdgpu]
[ 1631.367823] PGD b9adf067 PUD b8ef5067 PMD 0
[ 1631.402734] Oops: 0000 [#1] SMP
[ 1631.436707] Modules linked in: ppdev parport zram lz4_compress lz4_decompress fuse af_packet k8temp hwmon_vid sr_mod cdrom amdkfd amd_iommu_v2 amdgpu x86_pkg_temp_thermal
intel_powerclamp coretemp snd_seq_dummy snd_seq_oss snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_midi snd_seq_midi_event kvm_intel
snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel kvm snd_hda_codec snd_rawmidi snd_hda_core snd_ac97_codec ipmi_ssif ac97_bus snd_pcm_oss snd_pcm irqbypass crct10dif_pclmul isci
crc32_pclmul crc32c_intel ttm ghash_clmulni_intel snd_util_mem snd_hwdep snd_seq drbg iTCO_wdt iTCO_vendor_support igb ansi_cprng drm_kms_helper libsas aesni_intel
snd_seq_device snd_timer ablk_helper cryptd lrw snd_mixer_oss gf128mul mei_me ptp glue_helper drm snd emu10k1_gp usb_storage mei scsi_transport_sas
[ 1631.592832]  aes_x86_64 pps_core joydev md_mod gameport ioatdma backlight soundcore fb_sys_fops pcspkr serio_raw shpchp sysimgblt i2c_i801 sysfillrect syscopyarea lpc_ich
dca i2c_algo_bit mfd_core wmi ipmi_si ipmi_msghandler button binfmt_misc sg dm_mod autofs4 ext4 mbcache jbd2 crc16 hid_generic usbhid ehci_pci ehci_hcd usbcore sd_mod
usb_common xenbus_probe_frontend reiserfs fan thermal ahci libahci libata scsi_mod [last unloaded: parport_pc]
[ 1631.721604] CPU: 0 PID: 3948 Comm: glretrace Not tainted 4.7.0-2-default+ #1
[ 1631.764334] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2a 08/31/2015
[ 1631.785241] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=10998, last emitted seq=10998
[ 1631.785245] [drm] IP block:tonga_ih is hung!
[ 1631.785518] [drm] Atomic commit: RESET. crtc id 0:[ffff880816a06000]
[ 1631.785540] [drm] Atomic commit: RESET. crtc id 1:[ffff880817700000]
[ 1631.785559] [drm] Atomic commit: RESET. crtc id 2:[ffff88081c3e4000]
[ 1631.785578] [drm] dc_commit_targets: 0 targets
[ 1632.074160] task: ffff88080564ccc0 ti: ffff8808057c0000 task.ti: ffff8808057c0000
[ 1632.119037] RIP: 0010:[<ffffffffa08e3e7a>]  [<ffffffffa08e3e7a>] amdgpu_gtt_mgr_alloc+0x2a/0x150 [amdgpu]
[ 1632.164304] RSP: 0018:ffff8808057c3a10  EFLAGS: 00010282
[ 1632.210314] RAX: ffff880808fe1970 RBX: ffff880818f1f890 RCX: 7fffffffffffffff
[ 1632.256563] RDX: 0000000000000000 RSI: ffff880818f1f858 RDI: ffff880808fe1970
[ 1632.302370] RBP: ffff8808057c3a70 R08: 0000000000000001 R09: ffff8806ccb7b928
[ 1632.348805] R10: ffff880811ebe540 R11: 0000000000000287 R12: 0000000000000000
[ 1632.394486] R13: ffff880818f1f890 R14: ffff880818f1f800 R15: ffff8807bfaf4d80
[ 1632.439637] FS:  00007fed53c56700(0000) GS:ffff88081f200000(0000) knlGS:0000000000000000
[ 1632.485411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1632.531172] CR2: 0000000000000030 CR3: 00000000355cf000 CR4: 00000000001406f0
[ 1632.577066] Stack:
[ 1632.622940]  ffff8806ccb7b000 0000000000000680 0000000000000246 ffff8808057c3a50
[ 1632.669400]  ffffffff810a1799 ffff880808fe8608 ffff880808fe8608 ffff8807de2293c0
[ 1632.719718]  ffff8807de229470 ffff880818f1f890 ffff880818f1f800 ffff8807bfaf4d80
[ 1632.769794] Call Trace:
[ 1632.824132]  [<ffffffff810a1799>] ? __might_sleep+0x49/0x80
[ 1632.879008]  [<ffffffffa08c7afb>] amdgpu_ttm_bind+0x5b/0x150 [amdgpu]
[ 1632.932452]  [<ffffffffa08df45d>] amdgpu_vm_update_page_directory+0x7d/0x480 [amdgpu]
[ 1632.978788]  [<ffffffff811b019b>] ? krealloc+0x2b/0xa0
[ 1633.025524]  [<ffffffffa040ff54>] ? ttm_eu_reserve_buffers+0x184/0x330 [ttm]
[ 1633.072382]  [<ffffffffa08ce70b>] amdgpu_gem_va_update_vm+0x13b/0x180 [amdgpu]
[ 1633.119681]  [<ffffffffa0409c99>] ? ttm_bo_add_to_lru+0x89/0xe0 [ttm]
[ 1633.167081]  [<ffffffffa08cf7af>] amdgpu_gem_va_ioctl+0x1df/0x2a0 [amdgpu]
[ 1633.215410]  [<ffffffff810a1799>] ? __might_sleep+0x49/0x80
[ 1633.262823]  [<ffffffffa050062d>] drm_ioctl+0x25d/0x510 [drm]
[ 1633.310491]  [<ffffffff8122ea93>] ? touch_atime+0x23/0xa0
[ 1633.358466]  [<ffffffffa08cf5d0>] ? amdgpu_gem_metadata_ioctl+0x1f0/0x1f0 [amdgpu]
[ 1633.406696]  [<ffffffffa08b504b>] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
[ 1633.454146]  [<ffffffff81224896>] do_vfs_ioctl+0x96/0x690
[ 1633.501757]  [<ffffffff81003246>] ? do_audit_syscall_entry+0x66/0x70
[ 1633.549670]  [<ffffffff81003729>] ? syscall_trace_enter_phase1+0xf9/0x110
[ 1633.597796]  [<ffffffff81224f09>] SyS_ioctl+0x79/0x90
[ 1633.645903]  [<ffffffff81003a79>] do_syscall_64+0x69/0x110
[ 1633.694445]  [<ffffffff81600925>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1633.742299] Code: 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 cb 48 83 ec 38 4c 8b 21 48 b9 ff ff ff ff ff ff ff 7f 4c 8b 57 30 <49> 39 4c 24 30 74 11
31 c0 48 83 c4 38 5b 41 5c 41 5d 41 5e 41
[ 1633.842789] RIP  [<ffffffffa08e3e7a>] amdgpu_gtt_mgr_alloc+0x2a/0x150 [amdgpu]
[ 1633.894212]  RSP <ffff8808057c3a10>
[ 1633.944739] CR2: 0000000000000030
[ 1633.994974] ---[ end trace d06de6dc7a13ea3e ]---
[ 1634.047545] [drm] dc_link_handle_hpd_rx_irq: Got short pulse HPD on link 0
[ 1634.047549] amdgpu 0000:04:00.0: SRBM_SOFT_RESET=0x00000400
[ 1635.184392] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184446] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184501] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184543] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184585] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184623] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184673] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :4
[ 1635.184717] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :5
[ 1635.184758] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :6
[ 1635.184799] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :7
[ 1635.184838] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :8
[ 1635.184879] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :9
[ 1635.184917] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :10
[ 1635.184955] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR* amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :11
[ 1635.184998] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185038] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.185078] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185116] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.185158] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185195] [drm:log_to_debug_console [amdgpu]] *ERROR* dal_irq_service_dummy_set: called for non-implemented irq source
Comment 1 Daniël Mantione 2017-01-03 20:45:32 UTC
Created attachment 128735 [details]
Xorg log file (no weird things are visible here)
Comment 2 Martin Peres 2019-11-19 08:12:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/118.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.