Bug 105284

Summary: Every boot I get an error in dmesg "WARNING: CPU: 2 PID: 1380 at drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:132 generic_reg_update_ex+0x108/0x150 [amdgpu]"
Product: DRI Reporter: mikhail.v.gavrilov
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: harry.wentland, lucas.yamanishi
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description mikhail.v.gavrilov 2018-02-28 03:40:56 UTC
Created attachment 137679 [details]
dmesg

Every boot I get an error in dmesg "WARNING: CPU: 2 PID: 1380 at drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:132 generic_reg_update_ex+0x108/0x150 [amdgpu]"

[   17.285753] WARNING: CPU: 2 PID: 1380 at drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:132 generic_reg_update_ex+0x108/0x150 [amdgpu]
[   17.285786] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc xfs vfat fat libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btrfs kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul xor zstd_compress ghash_clmulni_intel hid_logitech_hidpp iTCO_wdt ppdev iTCO_vendor_support intel_cstate raid6_pq snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi intel_uncore
[   17.285890]  snd_usb_audio zstd_decompress snd_hda_intel xxhash intel_rapl_perf snd_hda_codec btusb btrtl snd_usbmidi_lib btbcm snd_rawmidi snd_hda_core btintel huawei_cdc_ncm cdc_wdm snd_hwdep bluetooth gspca_zc3xx snd_seq cdc_ncm gspca_main option v4l2_common usb_wwan snd_seq_device pcspkr snd_pcm videodev cdc_ether joydev hid_logitech_dj usbnet snd_timer media snd ecdh_generic rfkill mei_me soundcore mei i2c_i801 lpc_ich shpchp parport_pc parport video binfmt_misc uas usb_storage amdgpu chash i2c_algo_bit gpu_sched drm_kms_helper ttm drm crc32c_intel r8169 mii
[   17.285987] CPU: 2 PID: 1380 Comm: gnome-shell Not tainted 4.16.0-rc1-amd-vega+ #1
[   17.285989] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
[   17.286032] RIP: 0010:generic_reg_update_ex+0x108/0x150 [amdgpu]
[   17.286035] RSP: 0018:ffffaca7493fb688 EFLAGS: 00010246
[   17.286041] RAX: ffffaca7493fb6a8 RBX: ffffa080ddc78000 RCX: 0000000000000000
[   17.286044] RDX: 0000000000000007 RSI: 0000000000003aa3 RDI: ffffa080df9c9400
[   17.286046] RBP: ffffaca7493fb6f8 R08: 0000000000000000 R09: 0000000000000000
[   17.286048] R10: ffffaca7493fb710 R11: 0000000000000001 R12: 0000000000000007
[   17.286050] R13: 0000000000000000 R14: 0000000000000000 R15: ffffa080ddc7ae90
[   17.286052] FS:  00007f2e73266ac0(0000) GS:ffffa0811da00000(0000) knlGS:0000000000000000
[   17.286055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.286057] CR2: 00007efd5a0c6d64 CR3: 00000007ab80c001 CR4: 00000000001606e0
[   17.286059] Call Trace:
[   17.286110]  dce110_opp_program_regamma_pwl+0x625/0x850 [amdgpu]
[   17.286153]  dce110_set_output_transfer_func+0x47d/0x700 [amdgpu]
[   17.286202]  dce110_program_front_end_for_pipe+0x1f5/0x2e0 [amdgpu]
[   17.286247]  dce110_apply_ctx_for_surface+0xe9/0x1e0 [amdgpu]
[   17.286286]  dc_commit_updates_for_stream+0x32c/0x520 [amdgpu]
[   17.286327]  dc_commit_planes_to_stream+0x36c/0x420 [amdgpu]
[   17.286385]  amdgpu_dm_atomic_commit_tail+0x790/0xdc0 [amdgpu]
[   17.286407]  commit_tail+0x3d/0x70 [drm_kms_helper]
[   17.286414]  drm_atomic_helper_commit+0xdf/0x150 [drm_kms_helper]
[   17.286423]  drm_atomic_helper_legacy_gamma_set+0x112/0x160 [drm_kms_helper]
[   17.286442]  drm_mode_gamma_set_ioctl+0x183/0x1f0 [drm]
[   17.286464]  ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm]
[   17.286474]  drm_ioctl_kernel+0x5b/0xb0 [drm]
[   17.286486]  drm_ioctl+0x2e2/0x380 [drm]
[   17.286498]  ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm]
[   17.286506]  ? __pm_runtime_resume+0x54/0x90
[   17.286513]  ? trace_hardirqs_on_caller+0xed/0x180
[   17.286541]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[   17.286548]  do_vfs_ioctl+0xa5/0x6e0
[   17.286559]  SyS_ioctl+0x74/0x80
[   17.286566]  do_syscall_64+0x7a/0x220
[   17.286572]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
[   17.286575] RIP: 0033:0x7f2e703168e7
[   17.286577] RSP: 002b:00007ffef3c8a8a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   17.286581] RAX: ffffffffffffffda RBX: 00005634248cc070 RCX: 00007f2e703168e7
[   17.286584] RDX: 00007ffef3c8a8e0 RSI: 00000000c02064a5 RDI: 0000000000000008
[   17.286586] RBP: 00007ffef3c8a8e0 R08: 000056342465bd60 R09: 000056342464d340
[   17.286588] R10: 00005634248cc070 R11: 0000000000000246 R12: 00000000c02064a5
[   17.286590] R13: 0000000000000008 R14: 00007f2e4c0aa190 R15: 00007f2e4c0aa130
[   17.286601] Code: 48 8b 40 38 e8 9a 36 76 da 48 8b 75 a8 65 48 33 34 25 28 00 00 00 89 d8 75 18 48 83 c4 50 5b 41 5a 41 5c 41 5d 5d c3 0f ff eb b3 <0f> ff e9 42 ff ff ff e8 9c 58 c1 d9 41 ba 01 00 00 00 44 89 c0 
[   17.286705] ---[ end trace a1d24a964f852ad1 ]---
Comment 1 Harry Wentland 2018-02-28 16:08:48 UTC
Is this on a Vega or Raven ASIC?

If so it's known and a fix should land shortly.
Comment 2 mikhail.v.gavrilov 2018-02-28 17:09:00 UTC
(In reply to Harry Wentland from comment #1)
> Is this on a Vega or Raven ASIC?
> 
> If so it's known and a fix should land shortly.

This is Sapphire Radeon RX VEGA 56
Comment 3 Harry Wentland 2018-03-27 17:28:38 UTC
Can you try the latest amd-staging-drm-next or drm-next-4.17-wip from https://cgit.freedesktop.org/~agd5f/linux? It should be fixed now.
Comment 4 mikhail.v.gavrilov 2018-03-31 13:42:20 UTC
I couldn't try drm-next-4.17-wip because RX Vega 56 hangs immediately after start gdm with this kernel: https://bugs.freedesktop.org/show_bug.cgi?id=105833

But I checked this issue on mainline kernel: https://git.kernel.org/torvalds/t/linux-4.16-rc7.tar.gz and don't saw that it happens on 4.16-rc7
Comment 5 mikhail.v.gavrilov 2018-03-31 13:44:58 UTC
But I see that on mainline kernel still happens another issue:
[43261.036201] amdgpu 0000:07:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[43261.036205] amdgpu 0000:07:00.0: swiotlb: coherent allocation failed, size=2097152

https://bugs.freedesktop.org/show_bug.cgi?id=104082
Comment 6 Harry Wentland 2018-04-24 18:48:52 UTC
Marking resolved as no longer an issue on recent mainline.
Comment 7 Jon 2018-04-25 23:44:49 UTC
(In reply to Harry Wentland from comment #6)
> Marking resolved as no longer an issue on recent mainline.

Which commit fixes this? I merged in agd5f/drm-fixes-4.17 into linus master:
Merge: 3be4aaf4e2d3 7ad35721e7d5

I still see these crashes on every startup, with the following graphics card:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)

regards
Comment 8 Harry Wentland 2018-04-26 13:54:14 UTC
(In reply to Jon from comment #7)
> (In reply to Harry Wentland from comment #6)
> > Marking resolved as no longer an issue on recent mainline.
> 
> Which commit fixes this? I merged in agd5f/drm-fixes-4.17 into linus master:
> Merge: 3be4aaf4e2d3 7ad35721e7d5
> 

I believe it was this:
8acad1a18a78 drm/amd/display: Add regamma lut write mask to SOC base


> I still see these crashes on every startup, with the following graphics card:
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)
> 

Are you seeing a crash or simply the error log described in this ticket?

> regards
Comment 9 Jon 2018-04-26 19:37:52 UTC
(In reply to Harry Wentland from comment #8)
> (In reply to Jon from comment #7)
> > (In reply to Harry Wentland from comment #6)
> > > Marking resolved as no longer an issue on recent mainline.
> > 
> > Which commit fixes this? I merged in agd5f/drm-fixes-4.17 into linus master:
> > Merge: 3be4aaf4e2d3 7ad35721e7d5
> > 
> 
> I believe it was this:
> 8acad1a18a78 drm/amd/display: Add regamma lut write mask to SOC base
> 
> 
> > I still see these crashes on every startup, with the following graphics card:
> > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> > Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)
> > 
> 
> Are you seeing a crash or simply the error log described in this ticket?
I'm sorry, I think I've been mislead by the warning so I didn't actually go through the stack properly on my recent boot. What I'm seeing now seem to be the same warning line, however it shows a different stack and hence most likely a different issue. And of course I was wrong to say crash, as nothing stops after those warnings.

The crash I'm trying to debug is something completely different(every time I lock screen, machine hangs at least keyboard/screen etc.), I'm just trying to filter out the other warnings/errors I see to figure out what might be related. Sorry for the disturbance :)
> 
> > regards
Comment 10 Harry Wentland 2018-04-26 19:44:48 UTC
No worries. Don't hesitate to open a new ticket if your warning/error log seems to indicate amdgpu. I'd be happy to take a brief look.

Even if I won't have time to provide an immediate fix it will still allow me to better understand what problems people have with our driver and where we might need to spend more effort.
Comment 11 burak 2018-05-15 17:25:02 UTC
H(In reply to Harry Wentland from comment #10)
> No worries. Don't hesitate to open a new ticket if your warning/error log
> seems to indicate amdgpu. I'd be happy to take a brief look.
> 
> Even if I won't have time to provide an immediate fix it will still allow me
> to better understand what problems people have with our driver and where we
> might need to spend more effort.

Hi, sorry to bother for closed issue but I am having exact same error on every boot, I build kernel from master head of linus repo, fedora with Fury X gpu. Is this related with my driver? i have xorg-x11-drv-amdgpu.x86_64               (18.0.1-1.fc27)

 WARNING: CPU: 8 PID: 288 at drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:132 generic_reg_update_ex+0x108/0x150 [amdgpu]
[    7.598040] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched drm_kms_helper hid_logitech_hidpp ttm drm igb ptp crc32c_intel nvme pps_core hid_logitech_dj hid_microsoft nvme_core dca i2c_algo_b
it
[    7.598045] CPU: 8 PID: 288 Comm: kworker/8:1 Tainted: G        W         4.17.0-rc5-1-burak+ #4
[    7.598045] Hardware name: System manufacturer System Product Name/PRIME X399-A, BIOS 0601 03/27/2018
[    7.598047] Workqueue: events work_for_cpu_fn
[    7.598073] RIP: 0010:generic_reg_update_ex+0x108/0x150 [amdgpu]
[    7.598073] RSP: 0018:ffffb82e475d3760 EFLAGS: 00010246
[    7.598074] RAX: ffffb82e475d3780 RBX: ffff937e1efdc000 RCX: 0000000000000000
[    7.598074] RDX: 0000000000012390 RSI: 0000000000000000 RDI: ffff937e21691d80
[    7.598075] RBP: ffffb82e475d37d0 R08: 0000000000000000 R09: 0000000000000000
[    7.598075] R10: ffffb82e475d37e8 R11: 0000000000000001 R12: 0000000000000001
[    7.598075] R13: 0000000000000000 R14: ffff937e1f5fc800 R15: ffff937e1e9ac000
[    7.598076] FS:  0000000000000000(0000) GS:ffff937e3f400000(0000) knlGS:0000000000000000
[    7.598077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.598077] CR2: 00007f830e34465e CR3: 000000041fb9e000 CR4: 00000000003406e0
[    7.598078] Call Trace:
[    7.598108]  dce110_stream_encoder_update_hdmi_info_packets+0x20e/0x3a0 [amdgpu]
[    7.598135]  dc_stream_adjust_vmin_vmax+0xb3/0xf0 [amdgpu]
[    7.598161]  set_freesync_on_streams.part.6+0x4d/0x250 [amdgpu]
[    7.598187]  mod_freesync_notify_mode_change+0x11e/0x150 [amdgpu]
[    7.598215]  amdgpu_dm_atomic_commit_tail+0x523/0xd00 [amdgpu]
[    7.598240]  ? amdgpu_bo_pin_restricted+0x202/0x2c0 [amdgpu]
[    7.598267]  ? dm_plane_helper_prepare_fb+0x19c/0x260 [amdgpu]
[    7.598271]  commit_tail+0x3d/0x70 [drm_kms_helper]
[    7.598274]  drm_atomic_helper_commit+0xfc/0x110 [drm_kms_helper]
[    7.598277]  restore_fbdev_mode_atomic+0x1ac/0x210 [drm_kms_helper]
[    7.598280]  drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
[    7.598283]  drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[    7.598284]  fbcon_init+0x4d7/0x680
[    7.598285]  visual_init+0xd5/0x130
[    7.598286]  do_bind_con_driver+0x1f4/0x400
[    7.598287]  do_take_over_console+0x7b/0x190
[    7.598288]  do_fbcon_takeover+0x58/0xb0
[    7.598289]  notifier_call_chain+0x47/0x70
[    7.598291]  blocking_notifier_call_chain+0x3e/0x60
[    7.598291]  ? down+0x12/0x50
[    7.598292]  register_framebuffer+0x248/0x350
[    7.598296]  __drm_fb_helper_initial_config_and_unlock+0x221/0x460 [drm_kms_helper]
[    7.598320]  amdgpu_fbdev_init+0xc4/0xf0 [amdgpu]
[    7.598344]  amdgpu_device_init+0xd56/0x1450 [amdgpu]
[    7.598345]  ? kmalloc_order+0x14/0x40
[    7.598368]  amdgpu_driver_load_kms+0x86/0x2b0 [amdgpu]
[    7.598374]  drm_dev_register+0x132/0x1c0 [drm]
[    7.598397]  amdgpu_pci_probe+0x13f/0x200 [amdgpu]
[    7.598398]  local_pci_probe+0x42/0xa0
[    7.598400]  work_for_cpu_fn+0x16/0x20
[    7.598401]  process_one_work+0x175/0x360
[    7.598402]  worker_thread+0x1c6/0x380
[    7.598403]  ? process_one_work+0x360/0x360
[    7.598404]  kthread+0x113/0x130
[    7.598405]  ? kthread_create_worker_on_cpu+0x70/0x70
Comment 12 Harry Wentland 2018-05-15 18:58:47 UTC
Hi burak,

your stack is different. See dce110_opp_program_regamma_pwl vs dce110_stream_encoder_update_hdmi_info_packets on the stack.

Can you open a new ticket?

Thanks,
Harry
Comment 13 burak 2018-05-16 14:57:32 UTC
(In reply to Harry Wentland from comment #12)
> Hi burak,
> 
> your stack is different. See dce110_opp_program_regamma_pwl vs
> dce110_stream_encoder_update_hdmi_info_packets on the stack.
> 
> Can you open a new ticket?
> 
> Thanks,
> Harry

Thanks Harry, I am opening now

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.