Created attachment 106051 [details] kernel panic messages Hi, I try to using gpu hardware accelerate to play video from mplayer (vdpau+uvd) . The machine architecture is sparc64,video card is radeon HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is 1.1-21,mesa version is 9.2.5-6. But everything is ok on X86 platforms, all versions are the same. The kernel panic occured when radeon call radeon_cs_ioctl to receive the command stream from libdrm interface. and then radeon_cs_ib_chunk() call radeon_uvd_cs_parse to parse the command. First I thought this error maybe occured in a permanent position and maybe the ioctl data or size mismatch between user space and kernel space ,I set 1 to /sys/module/drm/para... to open drm debug switch , after I catch the messages, the panic occur everywhere .the error context maybe after RADEON_GEM_WAIT_IDLE? after RADEON_GEM_CREATE and so forth. so is not the data stream error during ioctl. And I check the changelog from 3.10 to 3.15, no patch about this. Other, I run rpm -e mesa-vdpau-drivers to uninstall the devices driver from mesa ,the mplayer can play video stably using cpu , when I install this package,the blue screen is come out . and I must run mplayer -vo x11 to solve the blue screen problem. Because I am not clearly about the framework of the mesa-vdpau-driver and uvd. Can anyone help me ? Thks the error messages is below: pid=3037, dev=0xe200, auth=1, RADEON_GEM_WAIT_IDLE Oct 8 21:43:26 localhost kernel: [17318.899669] pid=4483, dev=0xe200, auth=1, RADEON_GEM_WAIT_IDLE Oct 8 21:43:26 localhost kernel: [17318.902069] Unable to handle kernel paging request at virtual address 000000ca14312000 Oct 8 21:43:26 localhost kernel: [17318.902090] tsk->{mm,active_mm}->context = 0000000000000a2b Oct 8 21:43:26 localhost kernel: [17318.902104] tsk->{mm,active_mm}->pgd = fffff80008f90000 Oct 8 21:43:26 localhost kernel: [17318.902119] \|/ ____ \|/ Oct 8 21:43:26 localhost kernel: [17318.902119] "@'/ .. \`@" Oct 8 21:43:26 localhost kernel: [17318.902119] /_| \__/ |_\ Oct 8 21:43:26 localhost kernel: [17318.902119] \__U_/ Oct 8 21:43:26 localhost kernel: [17318.902169] mplayer(4483): Oops [#1] Oct 8 21:43:26 localhost kernel: [17318.902186] CPU: 28 PID: 4483 Comm: mplayer Not tainted 3.10.0-54.0.1.el7.4ACL.sparc64 #1 Oct 8 21:43:26 localhost kernel: [17318.902201] task: fffff800faa2a200 ti: fffff800fbec4000 task.ti: fffff800fbec4000 Oct 8 21:43:26 localhost kernel: [17318.902220] TSTATE: 0000004411001607 TPC: 00000000007f573c TNPC: 00000000007f5740 Y: 00000000 Not tainted Oct 8 21:43:26 localhost kernel: [17318.902252] TPC: <radeon_uvd_cs_parse+0x414/0x7b4> Oct 8 21:43:26 localhost kernel: [17318.902267] g0: 0000000000083a15 g1: 000000ca14312000 g2: 000000ca14312000 g3: 0000000004314000 Oct 8 21:43:26 localhost kernel: [17318.902285] g4: fffff800faa2a200 g5: fffff800fc2fe000 g6: fffff800fbec4000 g7: 0000000010000000 Oct 8 21:43:26 localhost kernel: [17318.902301] o0: 0000000000000000 o1: fffff800fbec7860 o2: 0000000000000000 o3: 0000000004314000 Oct 8 21:43:26 localhost kernel: [17318.902317] o4: 0000000000000800 o5: fffff800f62d4840 sp: fffff800fbec6f91 ret_pc: 00000000007f56f4 Oct 8 21:43:26 localhost kernel: [17318.902336] RPC: <radeon_uvd_cs_parse+0x3cc/0x7b4> Oct 8 21:43:26 localhost kernel: [17318.902352] l0: fffff80008dbbbc0 l1: 0000000000a213c8 l2: 0000000000000003 l3: 0000000000000001 Oct 8 21:43:26 localhost kernel: [17318.902366] l4: 000000000000ef10 l5: 0000000000d7fb10 l6: 0000000000d7fae8 l7: 0000000000d7fac8 Oct 8 21:43:26 localhost kernel: [17318.902382] i0: fffff800fbec7948 i1: 0000000000000000 i2: 000000000000ec00 i3: fffff800e30b9000 Oct 8 21:43:26 localhost kernel: [17318.902398] i4: 000000ca14312000 i5: 0000000000000000 i6: fffff800fbec7091 i7: 00000000007a1e5c Oct 8 21:43:26 localhost kernel: [17318.902429] I7: <radeon_cs_ioctl+0x394/0x930> Oct 8 21:43:26 localhost kernel: [17318.902439] Call Trace: Oct 8 21:43:26 localhost kernel: [17318.902458] [00000000007a1e5c] radeon_cs_ioctl+0x394/0x930 Oct 8 21:43:26 localhost kernel: [17318.902486] [000000000073d1ac] drm_ioctl+0x2fc/0x420 Oct 8 21:43:26 localhost kernel: [17318.902516] [0000000000518c94] vfs_ioctl+0x24/0x40 Oct 8 21:43:26 localhost kernel: [17318.902537] [0000000000519518] do_vfs_ioctl+0x440/0x4e0 Oct 8 21:43:26 localhost kernel: [17318.902560] [00000000005195e4] SyS_ioctl+0x2c/0x50 Oct 8 21:43:26 localhost kernel: [17318.902588] [00000000004061b4] linux_sparc_syscall+0x34/0x44 Oct 8 21:43:26 localhost kernel: [17318.902607] Disabling lock debugging due to kernel taint Oct 8 21:43:26 localhost kernel: [17318.902633] Caller[00000000007a1e5c]: radeon_cs_ioctl+0x394/0x930 Oct 8 21:43:26 localhost kernel: [17318.902657] Caller[000000000073d1ac]: drm_ioctl+0x2fc/0x420 Oct 8 21:43:26 localhost kernel: [17318.902681] Caller[0000000000518c94]: vfs_ioctl+0x24/0x40 Oct 8 21:43:26 localhost kernel: [17318.902702] Caller[0000000000519518]: do_vfs_ioctl+0x440/0x4e0 Oct 8 21:43:26 localhost kernel: [17318.902721] Caller[00000000005195e4]: SyS_ioctl+0x2c/0x50 Oct 8 21:43:26 localhost kernel: [17318.902742] Caller[00000000004061b4]: linux_sparc_syscall+0x34/0x44 Oct 8 21:43:26 localhost kernel: [17318.902757] Caller[fffff801086074ac]: 0xfffff801086074ac Oct 8 21:43:26 localhost kernel: [17318.902765] Instruction DUMP: 913b6000 c25fa7cf b800401c <e0072008> 80a42000 12480007 fa072004 11002884 130035fe Oct 8 21:43:30 localhost kernel: [17323.141686] pid=3037, dev=0xe200, auth=1, RADEON_GEM_CREATE
(In reply to comment #1) > Hi, I try to using gpu hardware accelerate to play video from mplayer > (vdpau+uvd) . The machine architecture is sparc64,video card is radeon > HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is > 1.1-21,mesa version is 9.2.5-6. Please attach the output of dmesg (showing at least all radeon driver related initialization), the /var/log/Xorg.0.log file and the output of vdpauinfo. Can you try newer versions of the kernel and Mesa? P.S. AFAICT the 7450 is Northern Islands generation (Caicos) based, not Southern Islands based, otherwise I'd be very surprised you even got this far, given bug 82455. :)
Created attachment 106479 [details] xorg log
Created attachment 106480 [details] vdpau log
Created attachment 106481 [details] radeon init log
Created attachment 106482 [details] error log
(In reply to comment #1) > (In reply to comment #1) > > Hi, I try to using gpu hardware accelerate to play video from mplayer > > (vdpau+uvd) . The machine architecture is sparc64,video card is radeon > > HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is > > 1.1-21,mesa version is 9.2.5-6. > > Please attach the output of dmesg (showing at least all radeon driver > related initialization), the /var/log/Xorg.0.log file and the output of > vdpauinfo. > > Can you try newer versions of the kernel and Mesa? > > P.S. AFAICT the 7450 is Northern Islands generation (Caicos) based, not > Southern Islands based, otherwise I'd be very surprised you even got this > far, given bug 82455. :) Hi, attachment are the logs of xort,radeon init,vdpauinfo, the output seem to be ok. I try to trace the code ,found that it execute radeon_uvd_cs_reloc func to get the start address of the relocs in GPU RAM. reloc = p->relocs_ptr[idx/4] (idx is 0) start = reloc->lobj.gpu_offset and the start address is start:0x182c000, the radeon_uvd_cs_msg func can remap the gpu address to cpu virtual address use radeon_bo_kmap(bo, &ptr)->ttm_bo_ioremap, the return cpu virtual address is [zd radeon_uvd_cs_msg] ptr:000000ca1182c000, msg:000000ca1182c000. and this address is the unhandle address, see the error log. I try to change the PAGE_SHIFT to 12(gpu page shift) in the ttm_bo_ioremap(cpu is 8K page size ,gpu is 4K page size) ,but no effects. I think the bo store in gpu ram has the wrong gpu_offset.But I don't know the reason. the only way can effect this is the remap,base the different cpu page size and gpu page size. Can give me any idea about this? Or maybe have some ways to verified? Tks By the way, where can download the document about the ttm and gem ? I am puzzled about the principle of gpu memory managerment .
Looks like maybe ttm_bo_ioremap() doesn't work correctly. Which path does it take?
(In reply to comment #7) > Looks like maybe ttm_bo_ioremap() doesn't work correctly. Which path does it > take? which path does it take? you mean... but I get the start = reloc->lobj.gpu_offset, start is 0x182c000, this address is gpu address,base address is 0x0. when do ttm_bo_ioremap the cpu virtual address is 0x000000ca1182c000, reference as radeon.log ,the address mapping seems good. By the way, when I do the ring_ib_test on ring 5, also failed, unhandle request address come out. Next I will do some trace for ring_ib_test,because this not depend on libdrm.only construct msg and send.
I solved that problem by use readl()/writel() func to access cpu virtual address insteand of set address directly. such as use readl(&msg[1]) instead of msg_type=msg[1]. and use the l/b endian transfer. But the new error ocurred (error message) . radeon:GPU lockup cp stall for ... more than... I try to change the lockup timeout to longer but no use. I don't know why??
Created attachment 106571 [details] error message
The mplayer UI appear,no output. and then kernel panic
(In reply to comment #9) > radeon:GPU lockup cp stall for ... more than... That means the GPU hangs while processing commands, probably because the commands are malformed in some way (e.g. wrong byte order). Please attach the exact changes you've made to get to that point.
Created attachment 106731 [details] [review] solve the problem of unhandle request address This patch can solve the address problem.
(In reply to comment #13) > This patch can solve the address problem. writel() and readl() already convert to/from little endian. Does it work better if you remove cpu_to_le32() from all lines using those functions? > @@ -751,22 +748,12 @@ > } > > /* stitch together an UVD destroy msg */ > -#if 0 > - msg[0] = cpu_to_le32(0x00000de4); > - msg[1] = cpu_to_le32(0x00000002); > - msg[2] = cpu_to_le32(handle); > - msg[3] = cpu_to_le32(0x00000000); > - for (i = 4; i < 1024; ++i) > - msg[i] = cpu_to_le32(0x0); > -#endif > -#if 1 > writel(cpu_to_le32(0x00000de4),&msg[1]); > writel(cpu_to_le32(0x00000002),&msg[2]); > writel(cpu_to_le32(handle),&msg[3]); > writel(cpu_to_le32(0x00000000),&msg[4]); > for (i = 4; i < 1024; ++i) > writel(cpu_to_le32(0x0),&msg[i]); > -#endif > radeon_bo_kunmap(bo); > radeon_bo_unreserve(bo); Why are you enabling this code?
(In reply to comment #14) > (In reply to comment #13) > > This patch can solve the address problem. > > writel() and readl() already convert to/from little endian. Does it work > better if you remove cpu_to_le32() from all lines using those functions? > > > > @@ -751,22 +748,12 @@ > > } > > > > /* stitch together an UVD destroy msg */ > > -#if 0 > > - msg[0] = cpu_to_le32(0x00000de4); > > - msg[1] = cpu_to_le32(0x00000002); > > - msg[2] = cpu_to_le32(handle); > > - msg[3] = cpu_to_le32(0x00000000); > > - for (i = 4; i < 1024; ++i) > > - msg[i] = cpu_to_le32(0x0); > > -#endif > > -#if 1 > > writel(cpu_to_le32(0x00000de4),&msg[1]); > > writel(cpu_to_le32(0x00000002),&msg[2]); > > writel(cpu_to_le32(handle),&msg[3]); > > writel(cpu_to_le32(0x00000000),&msg[4]); > > for (i = 4; i < 1024; ++i) > > writel(cpu_to_le32(0x0),&msg[i]); > > -#endif > > radeon_bo_kunmap(bo); > > radeon_bo_unreserve(bo); > > Why are you enabling this code? 1. I do not add cpu_to_le32() first ,but when I trace the messages from printk,the value of msg_type is reversed. Add, all is ok. 2. #if 0 ...#endif is the original code from kernel. #if 1 ...#endif is changed code. is the same as 1. Must do cpu_to_le32 transfer By the way, u said writel() and readl() already convert to/from little endian. is based on the X86 arch implement?
(In reply to comment #15) > 1. I do not add cpu_to_le32() first ,but when I trace the messages from > printk,the value of msg_type is reversed. That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the messages in host byte order, not in little endian. Maybe Christian can clarify which byte order should be used for them. > 2. #if 0 ...#endif is the original code from kernel. #if 1 ...#endif is > changed code. Ah right, never mind, I misread that hunk before. > By the way, u said writel() and readl() already convert to/from little > endian. is based on the X86 arch implement? It's the same on all architectures: writel() takes a datum in host byte order and writes it in little endian. readl() reads a little endian datum and returns it in host byte order. (This means that on little endian hosts such as x86, the datum is transferred unchanged)
(In reply to comment #16) > (In reply to comment #15) > > 1. I do not add cpu_to_le32() first ,but when I trace the messages from > > printk,the value of msg_type is reversed. > > That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the > messages in host byte order, not in little endian. Maybe Christian can > clarify which byte order should be used for them. The hardware supports byte swapping for the message and feedback buffer, but I think always writing/reading it in little endian will be simpler to get working. The userspace code currently doesn't supports big endian hosts and so will probably write it in the wrong byte order.
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > > 1. I do not add cpu_to_le32() first ,but when I trace the messages from > > > printk,the value of msg_type is reversed. > > > > That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the > > messages in host byte order, not in little endian. Maybe Christian can > > clarify which byte order should be used for them. > > The hardware supports byte swapping for the message and feedback buffer, but > I think always writing/reading it in little endian will be simpler to get > working. > > The userspace code currently doesn't supports big endian hosts and so will > probably write it in the wrong byte order. I think you are right. The patch I give it it's not the original point.So I have some work to do on userspace, such as mesa. A huge project! By the way, which version of mesa add big endian support.or you have some plans to do it.
(In reply to comment #18) > > So I have some work to do on userspace, such as mesa. A huge project! Actually, I don't think the relevant code for UVD support is that big. > By the way, which version of mesa add big endian support.or you have some > plans to do it. As Christian said, the UVD code currently currently doesn't handle big endian hosts properly yet. I don't know of any plans to fix that either, so don't wait for us to do it. :)
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/526.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.