Bug 83708

Summary:

[vdpau,uvd] kernel oops, Unable to handle kernel paging request at virtual address

Product:

DRI

Reporter:

Jack <zduo006>

Component:

DRM/Radeon

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED MOVED

QA Contact:

Severity:

normal

Priority:

medium

Version:

unspecified

Hardware:

SPARC

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
kernel panic messages	none
xorg log	none
vdpau log	none
radeon init log	none
error log	none
error message	none
solve the problem of unhandle request address	none

Description Jack 2014-09-10 11:02:53 UTC

Created attachment 106051 [details]
kernel panic messages

Hi, I try to using gpu hardware accelerate to play video from mplayer (vdpau+uvd)  . The machine architecture is sparc64,video card is radeon HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is 1.1-21,mesa version is 9.2.5-6. But everything is ok on X86 platforms, all versions are the same.
         The kernel panic occured when radeon call radeon_cs_ioctl to receive the command stream from libdrm interface. and then radeon_cs_ib_chunk() call radeon_uvd_cs_parse to parse the command.
         First I thought this error maybe occured in a permanent position and maybe the ioctl data or size mismatch between user space and kernel space ，I set 1 to /sys/module/drm/para... to open drm debug switch , after I catch the messages, the panic occur  everywhere .the error context maybe after RADEON_GEM_WAIT_IDLE? after RADEON_GEM_CREATE and so forth. so is not the data stream error during ioctl.
        And I check the changelog from 3.10 to 3.15, no patch about this.
        Other, I run rpm -e mesa-vdpau-drivers to uninstall the devices driver from mesa ,the mplayer can play video  stably using cpu , when I install this package,the blue screen is come out . and I must run mplayer -vo x11 to solve the blue screen problem.
       Because I am not clearly about the framework of the mesa-vdpau-driver and uvd. Can anyone help me ? Thks 
          
the error messages is below:
pid=3037, dev=0xe200, auth=1, RADEON_GEM_WAIT_IDLE
Oct  8 21:43:26 localhost kernel: [17318.899669] pid=4483, dev=0xe200, auth=1, RADEON_GEM_WAIT_IDLE
Oct  8 21:43:26 localhost kernel: [17318.902069] Unable to handle kernel paging request at virtual address 000000ca14312000
Oct  8 21:43:26 localhost kernel: [17318.902090] tsk->{mm,active_mm}->context = 0000000000000a2b
Oct  8 21:43:26 localhost kernel: [17318.902104] tsk->{mm,active_mm}->pgd = fffff80008f90000
Oct  8 21:43:26 localhost kernel: [17318.902119]               \|/ ____ \|/
Oct  8 21:43:26 localhost kernel: [17318.902119]               "@'/ .. \`@"
Oct  8 21:43:26 localhost kernel: [17318.902119]               /_| \__/ |_\
Oct  8 21:43:26 localhost kernel: [17318.902119]                  \__U_/
Oct  8 21:43:26 localhost kernel: [17318.902169] mplayer(4483): Oops [#1]
Oct  8 21:43:26 localhost kernel: [17318.902186] CPU: 28 PID: 4483 Comm: mplayer Not tainted 3.10.0-54.0.1.el7.4ACL.sparc64 #1
Oct  8 21:43:26 localhost kernel: [17318.902201] task: fffff800faa2a200 ti: fffff800fbec4000 task.ti: fffff800fbec4000
Oct  8 21:43:26 localhost kernel: [17318.902220] TSTATE: 0000004411001607 TPC: 00000000007f573c TNPC: 00000000007f5740 Y: 00000000    Not tainted
Oct  8 21:43:26 localhost kernel: [17318.902252] TPC: <radeon_uvd_cs_parse+0x414/0x7b4>
Oct  8 21:43:26 localhost kernel: [17318.902267] g0: 0000000000083a15 g1: 000000ca14312000 g2: 000000ca14312000 g3: 0000000004314000
Oct  8 21:43:26 localhost kernel: [17318.902285] g4: fffff800faa2a200 g5: fffff800fc2fe000 g6: fffff800fbec4000 g7: 0000000010000000
Oct  8 21:43:26 localhost kernel: [17318.902301] o0: 0000000000000000 o1: fffff800fbec7860 o2: 0000000000000000 o3: 0000000004314000
Oct  8 21:43:26 localhost kernel: [17318.902317] o4: 0000000000000800 o5: fffff800f62d4840 sp: fffff800fbec6f91 ret_pc: 00000000007f56f4
Oct  8 21:43:26 localhost kernel: [17318.902336] RPC: <radeon_uvd_cs_parse+0x3cc/0x7b4>
Oct  8 21:43:26 localhost kernel: [17318.902352] l0: fffff80008dbbbc0 l1: 0000000000a213c8 l2: 0000000000000003 l3: 0000000000000001
Oct  8 21:43:26 localhost kernel: [17318.902366] l4: 000000000000ef10 l5: 0000000000d7fb10 l6: 0000000000d7fae8 l7: 0000000000d7fac8
Oct  8 21:43:26 localhost kernel: [17318.902382] i0: fffff800fbec7948 i1: 0000000000000000 i2: 000000000000ec00 i3: fffff800e30b9000
Oct  8 21:43:26 localhost kernel: [17318.902398] i4: 000000ca14312000 i5: 0000000000000000 i6: fffff800fbec7091 i7: 00000000007a1e5c
Oct  8 21:43:26 localhost kernel: [17318.902429] I7: <radeon_cs_ioctl+0x394/0x930>
Oct  8 21:43:26 localhost kernel: [17318.902439] Call Trace:
Oct  8 21:43:26 localhost kernel: [17318.902458]  [00000000007a1e5c] radeon_cs_ioctl+0x394/0x930
Oct  8 21:43:26 localhost kernel: [17318.902486]  [000000000073d1ac] drm_ioctl+0x2fc/0x420
Oct  8 21:43:26 localhost kernel: [17318.902516]  [0000000000518c94] vfs_ioctl+0x24/0x40
Oct  8 21:43:26 localhost kernel: [17318.902537]  [0000000000519518] do_vfs_ioctl+0x440/0x4e0
Oct  8 21:43:26 localhost kernel: [17318.902560]  [00000000005195e4] SyS_ioctl+0x2c/0x50
Oct  8 21:43:26 localhost kernel: [17318.902588]  [00000000004061b4] linux_sparc_syscall+0x34/0x44
Oct  8 21:43:26 localhost kernel: [17318.902607] Disabling lock debugging due to kernel taint
Oct  8 21:43:26 localhost kernel: [17318.902633] Caller[00000000007a1e5c]: radeon_cs_ioctl+0x394/0x930
Oct  8 21:43:26 localhost kernel: [17318.902657] Caller[000000000073d1ac]: drm_ioctl+0x2fc/0x420
Oct  8 21:43:26 localhost kernel: [17318.902681] Caller[0000000000518c94]: vfs_ioctl+0x24/0x40
Oct  8 21:43:26 localhost kernel: [17318.902702] Caller[0000000000519518]: do_vfs_ioctl+0x440/0x4e0
Oct  8 21:43:26 localhost kernel: [17318.902721] Caller[00000000005195e4]: SyS_ioctl+0x2c/0x50
Oct  8 21:43:26 localhost kernel: [17318.902742] Caller[00000000004061b4]: linux_sparc_syscall+0x34/0x44
Oct  8 21:43:26 localhost kernel: [17318.902757] Caller[fffff801086074ac]: 0xfffff801086074ac
Oct  8 21:43:26 localhost kernel: [17318.902765] Instruction DUMP: 913b6000  c25fa7cf  b800401c <e0072008> 80a42000  12480007  fa072004  11002884  130035fe 
Oct  8 21:43:30 localhost kernel: [17323.141686] pid=3037, dev=0xe200, auth=1, RADEON_GEM_CREATE

Comment 1 Michel Dänzer 2014-09-11 02:15:28 UTC

(In reply to comment #1)
> Hi, I try to using gpu hardware accelerate to play video from mplayer
> (vdpau+uvd)  . The machine architecture is sparc64,video card is radeon
> HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is
> 1.1-21,mesa version is 9.2.5-6.

Please attach the output of dmesg (showing at least all radeon driver related initialization), the /var/log/Xorg.0.log file and the output of vdpauinfo.

Can you try newer versions of the kernel and Mesa?

P.S. AFAICT the 7450 is Northern Islands generation (Caicos) based, not Southern Islands based, otherwise I'd be very surprised you even got this far, given bug 82455. :)

Comment 2 Jack 2014-09-18 06:54:55 UTC

Created attachment 106479 [details]
xorg log

Comment 3 Jack 2014-09-18 06:55:22 UTC

Created attachment 106480 [details]
vdpau log

Comment 4 Jack 2014-09-18 06:55:43 UTC

Created attachment 106481 [details]
radeon init log

Comment 5 Jack 2014-09-18 06:56:05 UTC

Created attachment 106482 [details]
error log

Comment 6 Jack 2014-09-18 07:18:45 UTC

(In reply to comment #1)
> (In reply to comment #1)
> > Hi, I try to using gpu hardware accelerate to play video from mplayer
> > (vdpau+uvd)  . The machine architecture is sparc64,video card is radeon
> > HD7450,OS version is redhat7,kernel version is 3.10.0, mplayer version is
> > 1.1-21,mesa version is 9.2.5-6.
> 
> Please attach the output of dmesg (showing at least all radeon driver
> related initialization), the /var/log/Xorg.0.log file and the output of
> vdpauinfo.
> 
> Can you try newer versions of the kernel and Mesa?
> 
> P.S. AFAICT the 7450 is Northern Islands generation (Caicos) based, not
> Southern Islands based, otherwise I'd be very surprised you even got this
> far, given bug 82455. :)

Hi, attachment are the logs of xort,radeon init,vdpauinfo, the output seem to be ok.

I try to trace the code ,found that it execute  radeon_uvd_cs_reloc func to get the start address of the relocs in GPU RAM.
 reloc = p->relocs_ptr[idx/4]     (idx is 0)
 start = reloc->lobj.gpu_offset
and the start address is start:0x182c000, the radeon_uvd_cs_msg func can remap the gpu address to cpu virtual address use radeon_bo_kmap(bo, &ptr)->ttm_bo_ioremap, the return cpu virtual address is  [zd radeon_uvd_cs_msg]  ptr:000000ca1182c000, msg:000000ca1182c000. and this address is the unhandle address, see the error log. 
       I try to change the PAGE_SHIFT to 12(gpu page shift) in the ttm_bo_ioremap(cpu is 8K page size ,gpu is 4K page size) ,but no effects. I think the bo store in gpu ram has the wrong gpu_offset.But I don't know the reason. the only way can effect this is the remap,base the different cpu page size and gpu page size.
       Can give me any idea about this? Or maybe have some ways to  verified? Tks
       By the way, where can download the document about the ttm and gem ? I am puzzled about the principle of gpu memory managerment .

Comment 7 Michel Dänzer 2014-09-18 07:28:50 UTC

Looks like maybe ttm_bo_ioremap() doesn't work correctly. Which path does it take?

Comment 8 Jack 2014-09-18 12:41:55 UTC

(In reply to comment #7)
> Looks like maybe ttm_bo_ioremap() doesn't work correctly. Which path does it
> take?

which path does it take?  you mean...

but I get the start = reloc->lobj.gpu_offset, start is 0x182c000, this address is gpu address,base address is 0x0. when do ttm_bo_ioremap the cpu virtual address is 0x000000ca1182c000, reference as radeon.log ,the address mapping seems good.
By the way, when I do the ring_ib_test on ring 5, also failed, unhandle request address come out. Next I will do some trace for ring_ib_test,because this not depend on libdrm.only construct msg and send.

Comment 9 Jack 2014-09-20 04:09:19 UTC

I solved that problem by use readl()/writel() func to access cpu virtual address insteand of set address directly. such as use readl(&msg[1]) instead of msg_type=msg[1]. and use the l/b endian transfer. But the new error ocurred (error message) .
    radeon:GPU lockup cp stall for ... more than...
   I try to change the lockup timeout to longer but no use.
   I don't know why??

Comment 10 Jack 2014-09-20 04:09:51 UTC

Created attachment 106571 [details]
error message

Comment 11 Jack 2014-09-20 04:10:34 UTC

The mplayer UI appear,no output.  and then kernel panic

Comment 12 Michel Dänzer 2014-09-22 08:59:30 UTC

(In reply to comment #9)
>     radeon:GPU lockup cp stall for ... more than...

That means the GPU hangs while processing commands, probably because the commands are malformed in some way (e.g. wrong byte order).

Please attach the exact changes you've made to get to that point.

Comment 13 Jack 2014-09-23 13:00:16 UTC

Created attachment 106731 [details] [review]
solve the problem of unhandle request address

This patch can solve the address problem.

Comment 14 Michel Dänzer 2014-09-24 09:28:55 UTC

(In reply to comment #13)
> This patch can solve the address problem.

writel() and readl() already convert to/from little endian. Does it work better if you remove cpu_to_le32() from all lines using those functions?


> @@ -751,22 +748,12 @@
>  	}
>  
>  	/* stitch together an UVD destroy msg */
> -#if 0
> -	msg[0] = cpu_to_le32(0x00000de4);
> -	msg[1] = cpu_to_le32(0x00000002);
> -	msg[2] = cpu_to_le32(handle);
> -	msg[3] = cpu_to_le32(0x00000000);
> -	for (i = 4; i < 1024; ++i)
> -		msg[i] = cpu_to_le32(0x0);
> -#endif
> -#if 1
>  	writel(cpu_to_le32(0x00000de4),&msg[1]);
>          writel(cpu_to_le32(0x00000002),&msg[2]);
>          writel(cpu_to_le32(handle),&msg[3]);
>          writel(cpu_to_le32(0x00000000),&msg[4]);
>          for (i = 4; i < 1024; ++i)
>                  writel(cpu_to_le32(0x0),&msg[i]);
> -#endif
>  	radeon_bo_kunmap(bo);
>  	radeon_bo_unreserve(bo);

Why are you enabling this code?

Comment 15 Jack 2014-09-24 15:18:34 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > This patch can solve the address problem.
> 
> writel() and readl() already convert to/from little endian. Does it work
> better if you remove cpu_to_le32() from all lines using those functions?
> 
> 
> > @@ -751,22 +748,12 @@
> >  	}
> >  
> >  	/* stitch together an UVD destroy msg */
> > -#if 0
> > -	msg[0] = cpu_to_le32(0x00000de4);
> > -	msg[1] = cpu_to_le32(0x00000002);
> > -	msg[2] = cpu_to_le32(handle);
> > -	msg[3] = cpu_to_le32(0x00000000);
> > -	for (i = 4; i < 1024; ++i)
> > -		msg[i] = cpu_to_le32(0x0);
> > -#endif
> > -#if 1
> >  	writel(cpu_to_le32(0x00000de4),&msg[1]);
> >          writel(cpu_to_le32(0x00000002),&msg[2]);
> >          writel(cpu_to_le32(handle),&msg[3]);
> >          writel(cpu_to_le32(0x00000000),&msg[4]);
> >          for (i = 4; i < 1024; ++i)
> >                  writel(cpu_to_le32(0x0),&msg[i]);
> > -#endif
> >  	radeon_bo_kunmap(bo);
> >  	radeon_bo_unreserve(bo);
> 
> Why are you enabling this code?

1. I do not add cpu_to_le32() first ,but when I trace the messages from printk,the value of msg_type is reversed. Add, all is ok.
2. #if 0 ...#endif is the original code from kernel. #if 1 ...#endif is changed code. is the same as 1. Must do cpu_to_le32 transfer

  By the way, u said writel() and readl() already convert to/from little endian.
is based on the X86 arch implement?

Comment 16 Michel Dänzer 2014-09-25 02:18:31 UTC

(In reply to comment #15)
> 1. I do not add cpu_to_le32() first ,but when I trace the messages from
> printk,the value of msg_type is reversed.

That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the messages in host byte order, not in little endian. Maybe Christian can clarify which byte order should be used for them.


> 2. #if 0 ...#endif is the original code from kernel. #if 1 ...#endif is
> changed code.

Ah right, never mind, I misread that hunk before.


> By the way, u said writel() and readl() already convert to/from little
> endian. is based on the X86 arch implement?

It's the same on all architectures: writel() takes a datum in host byte order and writes it in little endian. readl() reads a little endian datum and returns it in host byte order. (This means that on little endian hosts such as x86, the datum is transferred unchanged)

Comment 17 Christian König 2014-09-25 07:26:35 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > 1. I do not add cpu_to_le32() first ,but when I trace the messages from
> > printk,the value of msg_type is reversed.
> 
> That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the
> messages in host byte order, not in little endian. Maybe Christian can
> clarify which byte order should be used for them.

The hardware supports byte swapping for the message and feedback buffer, but I think always writing/reading it in little endian will be simpler to get working.

The userspace code currently doesn't supports big endian hosts and so will probably write it in the wrong byte order.

Comment 18 Jack 2014-09-28 12:01:59 UTC

(In reply to comment #17)
> (In reply to comment #16)
> > (In reply to comment #15)
> > > 1. I do not add cpu_to_le32() first ,but when I trace the messages from
> > > printk,the value of msg_type is reversed.
> > 
> > That's in radeon_uvd_cs_msg()? Sounds like the Mesa UVD code writes the
> > messages in host byte order, not in little endian. Maybe Christian can
> > clarify which byte order should be used for them.
> 
> The hardware supports byte swapping for the message and feedback buffer, but
> I think always writing/reading it in little endian will be simpler to get
> working.
> 
> The userspace code currently doesn't supports big endian hosts and so will
> probably write it in the wrong byte order.

I think you are right. The patch I give it it's not the original point.So I have some work to do on userspace, such as mesa. A huge project!
By the way, which version of mesa add big endian support.or you have some plans to do it.

Comment 19 Michel Dänzer 2014-09-29 07:16:19 UTC

(In reply to comment #18)
> 
> So I have some work to do on userspace, such as mesa. A huge project!

Actually, I don't think the relevant code for UVD support is that big.


> By the way, which version of mesa add big endian support.or you have some
> plans to do it.

As Christian said, the UVD code currently currently doesn't handle big endian hosts properly yet. I don't know of any plans to fix that either, so don't wait for us to do it. :)

Comment 20 Martin Peres 2019-11-19 08:55:33 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/526.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.