Bug 110140

Summary: Green bottom half of video frame when using JPEG acceleration (vcn_v1_0_jpeg_ring_emit_fence() WARNING)
Product: Mesa Reporter: Rafał Miłecki <zajec5>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: boyuan.zhang
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: good frame ("Use hardware acceleration when available" disabled)
bad frame ("Use hardware acceleration when available" enabled)
all green frame (started with the commit 1b25d340b791 ("radeonsi: use compute for resource_copy_region when possible"))

Description Rafał Miłecki 2019-03-16 17:42:46 UTC
I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U.

When using Chromium and HTML5 based video chat:
https://meet.jit.si/amdtest
I see image from my webcam corrupted (bottom half is all green).

It only happens with Chromium's setting "Use hardware acceleration when available" enabled which appears to involve some JPEG hardware acceleration.

With above web chat page opened I see about 10 kernel WARNINGs per second:

[  290.169611] WARNING: CPU: 0 PID: 374 at drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:1669 vcn_v1_0_jpeg_ring_emit_fence+0xc2f/0xc40 [amdgpu]
[  290.169615] Modules linked in: ccm(E) fuse(E) rfcomm(E) af_packet(E) bnep(E) hid_logitech_hidpp(E) btusb(E) btrtl(E) btbcm(E) btintel(E) hid_logitech_dj(E) hid_generic(E) bluetooth(E) cp210x(E) usbserial(E) ecdh_generic(E) usbhid(E) uvcvideo(E) videobuf2_vmalloc(E) videobuf2_memops(E) videobuf2_v4l2(E) videodev(E) videobuf2_common(E) nf_nat_tftp(E) nf_conntrack_tftp(E) xt_CT(E) xt_tcpudp(E) ip6t_rpfilter(E) ip6t_REJECT(E) nf_reject_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_conntrack(E) ebtable_nat(E) ip6table_nat(E) nf_nat_ipv6(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) snd_hda_intel(E)
[  290.169671]  snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) arc4(E) edac_mce_amd(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) msr(E) iwlmvm(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) mac80211(E) iwlwifi(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) realtek(E) cryptd(E) glue_helper(E) hp_wmi(E) joydev(E) pcspkr(E) sparse_keymap(E) wmi_bmof(E) sp5100_tco(E) cfg80211(E) k10temp(E) i2c_piix4(E) ipmi_devintf(E) r8169(E) rfkill(E) ipmi_msghandler(E) libphy(E) ucsi_acpi(E) typec_ucsi(E) thermal(E) typec(E) battery(E) hp_wireless(E) pinctrl_amd(E) ac(E) button(E) pcc_cpufreq(E) acpi_cpufreq(E) amdgpu(E) i2c_algo_bit(E) gpu_sched(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) serio_raw(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) drm(E) ehci_hcd(E) usbcore(E) wmi(E) video(E) i2c_hid(E) l2tp_ppp(E) l2tp_netlink(E) l2tp_core(E) ip6_udp_tunnel(E) udp_tunnel(E) pppox(E) ppp_generic(E)
[  290.169730]  slhc(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) efivarfs(E)
[  290.169745] CPU: 0 PID: 374 Comm: vcn_jpeg Tainted: G        W   E     5.0.0-rc1+ #6
[  290.169747] Hardware name: HP HP EliteBook 745 G5/83D5, BIOS Q81 Ver. 01.03.01 07/26/2018
[  290.169822] RIP: 0010:vcn_v1_0_jpeg_ring_emit_fence+0xc2f/0xc40 [amdgpu]
[  290.169825] Code: c0 e8 15 74 db ff 48 8b 83 38 02 00 00 e9 ea f4 ff ff 48 c7 c7 20 0f 67 c0 e8 fd 73 db ff 48 8b 83 38 02 00 00 e9 6e f4 ff ff <0f> 0b e9 ee f3 ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[  290.169827] RSP: 0018:ffffab50022fbdc8 EFLAGS: 00010202
[  290.169830] RAX: ffffffffc04f6500 RBX: ffff8fba5655c5a8 RCX: 0000000000000001
[  290.169831] RDX: 00000000000007f9 RSI: 000000000076e040 RDI: ffff8fba5655c5a8
[  290.169833] RBP: ffff8fba56550000 R08: ffffab5001c1d000 R09: ffffab5001c1d000
[  290.169834] R10: ffffab5001c1d000 R11: ffffab5001c1d000 R12: 0000000000000000
[  290.169836] R13: 000000000076e040 R14: 00000000000007f9 R15: ffff8fba5b43ea10
[  290.169839] FS:  0000000000000000(0000) GS:ffff8fba5fc00000(0000) knlGS:0000000000000000
[  290.169840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  290.169842] CR2: 00007f8cc86e0008 CR3: 00000003f9e76000 CR4: 00000000003406f0
[  290.169844] Call Trace:
[  290.169920]  amdgpu_ib_schedule+0x29d/0x560 [amdgpu]
[  290.169996]  amdgpu_job_run+0xfd/0x170 [amdgpu]
[  290.170004]  drm_sched_main+0xdf/0x250 [gpu_sched]
[  290.170014]  ? wait_woken+0x80/0x80
[  290.170019]  ? drm_sched_stop+0x130/0x130 [gpu_sched]
[  290.170023]  kthread+0x116/0x130
[  290.170027]  ? kthread_create_worker_on_cpu+0x40/0x40
[  290.170034]  ret_from_fork+0x27/0x50
[  290.170040] ---[ end trace ffe1a144a94cb37d ]---

This problem occurs with kernels:
4.20.12
5.0.0
5.0.0-rc1 from agd5f's amd-staging-drm-next (2019-03-16)

I use Mesa 18.3.4.
Comment 1 Rafał Miłecki 2019-03-16 17:48:30 UTC
Created attachment 143694 [details]
good frame ("Use hardware acceleration when available" disabled)
Comment 2 Rafał Miłecki 2019-03-16 17:48:48 UTC
Created attachment 143695 [details]
bad frame ("Use hardware acceleration when available" enabled)
Comment 3 Rafał Miłecki 2019-03-16 23:20:14 UTC
This is a regression introduced by the commit 36258308a794 ("st/va: fix the incorrect max profiles report").

Previously I was using Mesa 18.2 for a month or two without ever seeing this issue.
Comment 4 Rafał Miłecki 2019-03-16 23:51:13 UTC
Another problem caused by the above commit 36258308a794 is re-introducing corrupted colors in YouTube playbacks when using Chromium with "Use hardware acceleration when available" enabled. That was originally reported by me in the bug 109080.

Playing YouTube video using kernel 5.0.1 and Mesa 18.3.4 results in corrupted colors and following errors:

[ 6376.312717] gmc_v9_0_process_interrupt: 38 callbacks suppressed
[ 6376.312726] amdgpu 0000:04:00.0: [mmhub] VMC page fault (src_id:0 ring:40 vmid:3 pasid:32775, for process chromium pid 21146 thread chromium -:cs0 pid 21208)
[ 6376.312730] amdgpu 0000:04:00.0:   in page starting at address 0x0000800106198000 from 18
[ 6376.312733] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00340451
[ 6376.312746] amdgpu 0000:04:00.0: [mmhub] VMC page fault (src_id:0 ring:40 vmid:3 pasid:32775, for process chromium pid 21146 thread chromium -:cs0 pid 21208)
[ 6376.312748] amdgpu 0000:04:00.0:   in page starting at address 0x0000800106199000 from 18
[ 6376.312750] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 6376.312762] amdgpu 0000:04:00.0: [mmhub] VMC page fault (src_id:0 ring:174 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[ 6376.312764] amdgpu 0000:04:00.0:   in page starting at address 0x0000000000000000 from 18
[ 6376.312766] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000073C
[ 6376.312781] amdgpu 0000:04:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[ 6376.312783] amdgpu 0000:04:00.0:   in page starting at address 0x0000000000000000 from 18
[ 6376.312785] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 6386.301258] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=192, emitted seq=194
[ 6386.301398] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[ 6386.301414] [drm] GPU recovery disabled.
[ 6396.541548] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=192, emitted seq=194
[ 6396.541636] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[ 6396.541639] [drm] GPU recovery disabled.
[ 6406.782141] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=192, emitted seq=194
[ 6406.782241] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[ 6406.782244] [drm] GPU recovery disabled.
[ 6417.022165] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=192, emitted seq=194
[ 6417.022268] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[ 6417.022271] [drm] GPU recovery disabled.

my screen freezes shortly after that.

I've verified that reverting 36258308a794 from the top of 18.3.4 release fixes both issues.
Comment 5 leoxsliu 2019-03-17 02:24:57 UTC
This is about what the patch does:
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -175,7 +175,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
ctx->version_minor = 1;
*ctx->vtable = vtable;
*ctx->vtable_vpp = vtable_vpp;
- ctx->max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - PIPE_VIDEO_PROFILE_UNKNOWN;
+ ctx->max_profiles = PIPE_VIDEO_PROFILE_MAX - PIPE_VIDEO_PROFILE_UNKNOWN - 1;

It just correct the number of max_profile, that HW can support. 

It's not make sense to me this would cause such issue based on the ticket, unless player wrongly use the this after querying this.

Can this be reproduced with Mesa master branch? or Can this be reproduced with any other players?
Comment 6 leoxsliu 2019-03-17 02:42:04 UTC
Other confusion is that the patch is to fix the chromium browser issue
https://bugs.freedesktop.org/show_bug.cgi?id=109107
Comment 7 Rafał Miłecki 2019-03-17 22:16:30 UTC
Thanks for looking at this Leo!

Initially I didn't realize the consequences of your patch. I hopefully understand it now.

Before the commit 36258308a794 ("st/va: fix the incorrect max profiles report") my Chromium was simply NOT USING any Video Acceleration. Chromium was complaining with the following errors:
[21202:21202:0317/122002.223002:ERROR:vaapi_wrapper.cc(587)] : vaQueryConfigProfiles returned: 14
[21202:21202:0317/122002.223045:ERROR:vaapi_wrapper.cc(587)] : vaQueryConfigProfiles returned: 14

So your change didn't introduce any regression in JPEG hardware decoding. It only exposed an existing bug by allowing Chromium to use Video Acceleration.

*****

Let me provide some new (hopefully useful) info on this problem:

1) This problem is clearly about the JPEG decoding. If I revert commit 55e7de7b1935 ("radeonsi: enable vcn jpeg decode for raven") from any recent branch (18.3, 19.0 or master) the problem disappears.

2) The problem exists from the beginning. If I do:
git checkout 55e7de7b1935
git cherry-pick 36258308a794
git cherry-pick dafa02c980c1
I get a Chromium with working Video Acceleration AND half bottom of webcam frames being green.

3) I tested few branches:
18.3: bottom half of webcam frames are green
19.0: whole webcam frames are green
master: whole webcam frames are green

As you can see things got worse with the 19.0 branch (frames are totally green instead of half green). That regression was added by the commit 1b25d340b791 ("radeonsi: use compute for resource_copy_region when possible"). Hopefully this is some hint on what's wrong and what's causing the original problem.

*****

Leo: I didn't try any other players, I've yet to learn how to use JPEG hw decoding with some other application/player.
Comment 8 Rafał Miłecki 2019-03-17 22:19:00 UTC
Created attachment 143705 [details]
all green frame (started with the commit 1b25d340b791 ("radeonsi: use compute for resource_copy_region when possible"))

Webcam frames became all green with the commit 1b25d340b791 ("radeonsi: use compute for resource_copy_region when possible"):
https://cgit.freedesktop.org/mesa/mesa/commit/?id=1b25d340b791ad8350bdfb27f1a91ac79fa17748
Comment 9 leoxsliu 2019-03-18 13:12:15 UTC
It makes more sense now, we'll take a look. But the thing is we don't got Raven system with a camera, so it's probably hard to reproduce.
Comment 10 GitLab Migration User 2019-09-25 18:49:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1383.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.