Bug 10432

Summary: DRM git broken for VT8623 rev. 3 (CLE266)
Product: DRI Reporter: Rafał Bilski <rafalbilski>
Component: DRM/otherAssignee: Thomas Hellström <thomas>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium    
Version: DRI git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Patch that adds a printout. none

Description Rafał Bilski 2007-03-27 12:59:38 UTC
Kernel version where bug don't exists: 2.6.21-rc4-git5
Distribution: Gentoo
Hardware: VIA EPIA M10000, CLE266 chipset with integrated CastleRock graphics
Good "via" module version seems to be: 2.11.0 20061227
Bad "via" module version seems to be: 2.11.1 20070202

Gentoo's x11-drm package replaced kernel "drm" and "via" modules with version from git. After reboot X screen was trashed. No window background, no window frames. Only text visible, but printed on previous lines. Problem dependents on "EnableAGPDMA" option set in xorg.conf and I can't reproduce it in my current kernel. I can't reach broken commit with git-bisect nor git-reset because I'm using Linux 2.6.21-rc4 and I need "fix build for 2.6.21-rc1" patch to compile drm modules. For older commits bug seems to occur only if OpenGL is used (about 10s). Gears from glxgears command aren't visible after it hit. Switching to text  console and backward is causing lockup.

Output of "lscpi -v" command:
01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8623 [Apollo CLE266] integrated CastleRock graphics (rev 03) (prog-if 00 [VGA])
        Subsystem: VIA Technologies, Inc. VT8623 [Apollo CLE266] integrated CastleRock graphics
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
        Memory at d0000000 (32-bit, prefetchable) [size=64M]
        Memory at d4000000 (32-bit, non-prefetchable) [size=16M]
        [virtual] Expansion ROM at d5000000 [disabled] [size=64K]

After trying today's git I found these errors in log (this is first time, no errors earlier):
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 0 next_addr 80100
Mar 27 21:38:01 elke [drm:via_cmdbuf_jump] *ERROR* via_cmdbuf_jump failed
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 30 next_addr 80230
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 100 next_addr 80300
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80302
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80302
Mar 27 21:38:01 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80302
Mar 27 21:38:02 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80348
Mar 27 21:38:02 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80398
Mar 27 21:38:02 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 803e8
Mar 27 21:38:02 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80430
Mar 27 21:38:03 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80478
Mar 27 21:38:03 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 804c0
Mar 27 21:38:03 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80508
Mar 27 21:38:03 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80550
Mar 27 21:38:04 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80598
Mar 27 21:38:04 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 805e0
Mar 27 21:38:04 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80628
Mar 27 21:38:04 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80670
Mar 27 21:38:05 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 806b8
Mar 27 21:38:05 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80700
Mar 27 21:38:05 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80748
Mar 27 21:38:06 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80790
Mar 27 21:38:06 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 807d8
Mar 27 21:38:06 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80820
Mar 27 21:38:06 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80868
Mar 27 21:38:07 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 808b0
Mar 27 21:38:07 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 808f8
Mar 27 21:38:07 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80950
Mar 27 21:38:07 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 809a0
Mar 27 21:38:08 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 809f8
Mar 27 21:38:08 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80a48
Mar 27 21:38:08 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80a98
Mar 27 21:38:08 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80ae0
Mar 27 21:38:09 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80b28
Mar 27 21:38:09 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80b70
Mar 27 21:38:09 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80bb8
Mar 27 21:38:09 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80c00
Mar 27 21:38:10 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80c48
Mar 27 21:38:10 elke [drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw 21600 cur_addr 200 next_addr 80c90
[...]
Comment 1 Rafał Bilski 2007-04-04 11:45:14 UTC
I had some time, so I start to bisecting with Linux 2.6.20. Looks like that commit 6c04185857694b2293046b7ea1d4515404a740c3:
> Author: Thomas Hellstrom <thomas-at-tungstengraphics-dot-com>
> Date:   Fri Feb 2 09:15:44 2007 +0100
>
>    via: Try to improve command-buffer chaining.
>
>    Bump driver date and patchlevel.
broke dri on my machine. DRI is OK one commit before.
Comment 2 Rafał Bilski 2007-04-07 13:25:24 UTC
Hunk #2 of "via: Try to improve command-buffer chaining" seems to be a reason. Reverting hunk #2 is solving my problem.
Comment 3 Rafał Bilski 2007-04-08 02:44:40 UTC
Patch below is solving my problem or, at least, is making it very hard to reproduce. It is reverted part of hunk 2. With this change I can use glxgears again. Tested with 2.6.20 and 2.6.21-rc5. Btw. I don't know what this patch is doing. Please make it correct.

diff --git a/shared-core/via_dma.c b/shared-core/via_dma.c
--- a/shared-core/via_dma.c
+++ b/shared-core/via_dma.c
@@ -419,7 +419,6 @@ static inline uint32_t *via_get_dma(drm_via_private_t * dev_priv)
  * modifying the pause address stored in the buffer itself. If
  * the regulator has already paused, restart it.
  */
-
 static int via_hook_segment(drm_via_private_t *dev_priv,
 			    uint32_t pause_addr_hi, uint32_t pause_addr_lo,
 			    int no_pci_fire)
@@ -430,12 +429,20 @@ static int via_hook_segment(drm_via_private_t *dev_priv,
 
 	paused = 0;
 	via_flush_write_combine();
+	while(! *(via_get_dma(dev_priv)-1));
 	*dev_priv->last_pause_ptr = pause_addr_lo;
 	via_flush_write_combine();
+	/*
+	 * The below statement is inserted to really force the flush.
+	 * Not sure it is needed.
+	 */
+
+	while(! *dev_priv->last_pause_ptr);
 	reader = *(dev_priv->hw_addr_ptr);
 	ptr = ((volatile char *)paused_at - dev_priv->dma_ptr) +
 		dev_priv->dma_offset + (uint32_t) dev_priv->agpAddr + 4;
 	dev_priv->last_pause_ptr = via_get_dma(dev_priv) - 1;
+	while(! *dev_priv->last_pause_ptr);
 
 	if ((ptr - reader) <= dev_priv->dma_diff ) {
 		count = 10000000;
Comment 4 Thomas Hellström 2007-04-09 23:39:45 UTC
Created attachment 9537 [details] [review]
Patch that adds a printout.

Hi,

The patch adds a printout in the kernel log at X server start if AGPDMA is enabled, that looks like: "DMA DIFF is " and a number.

Can you check that number and report back?

Regards,
Thomas
Comment 5 Thomas Hellström 2007-04-09 23:40:30 UTC
Marking as needinfo
Comment 6 Rafał Bilski 2007-04-10 10:54:22 UTC
Below log of my "sure" branch.
X + ion3 - good
glxgears - good
glaxium - good

[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
[drm] Initialized via 2.11.1 20070202 on minor 0
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm:via_cmdbuf_start] *ERROR* DMA DIFF is 0x00000000
[drm:via_cmdbuf_start] *ERROR* DMA DIFF is 0x00000000
[drm:via_cmdbuf_start] *ERROR* DMA DIFF is 0x00000000
[drm:via_cmdbuf_start] *ERROR* DMA DIFF is 0x00000000
ACPI: PCI interrupt for device 0000:01:00.0 disabled
[drm] Module unloaded

Below log of "master" branch.
X + ion3 - good
glxgears - bad

[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
[drm] Initialized via 2.11.1 20070202 on minor 0
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm:via_cmdbuf_start] *ERROR* DMA DIFF is 0x00000000
Comment 7 Thomas Hellström 2007-04-17 00:01:54 UTC
(In reply to comment #3)
> Patch below is solving my problem or, at least, is making it very hard to
> reproduce. It is reverted part of hunk 2. With this change I can use glxgears
> again. Tested with 2.6.20 and 2.6.21-rc5. Btw. I don't know what this patch is
> doing. Please make it correct.
> 
> diff --git a/shared-core/via_dma.c b/shared-core/via_dma.c
> --- a/shared-core/via_dma.c
> +++ b/shared-core/via_dma.c
> @@ -419,7 +419,6 @@ static inline uint32_t *via_get_dma(drm_via_private_t *
> dev_priv)
>   * modifying the pause address stored in the buffer itself. If
>   * the regulator has already paused, restart it.
>   */
> -
>  static int via_hook_segment(drm_via_private_t *dev_priv,
>                             uint32_t pause_addr_hi, uint32_t pause_addr_lo,
>                             int no_pci_fire)
> @@ -430,12 +429,20 @@ static int via_hook_segment(drm_via_private_t *dev_priv,
> 
>         paused = 0;
>         via_flush_write_combine();
> +       while(! *(via_get_dma(dev_priv)-1));
>         *dev_priv->last_pause_ptr = pause_addr_lo;
>         via_flush_write_combine();
> +       /*
> +        * The below statement is inserted to really force the flush.
> +        * Not sure it is needed.
> +        */
> +
> +       while(! *dev_priv->last_pause_ptr);
>         reader = *(dev_priv->hw_addr_ptr);
>         ptr = ((volatile char *)paused_at - dev_priv->dma_ptr) +
>                 dev_priv->dma_offset + (uint32_t) dev_priv->agpAddr + 4;
>         dev_priv->last_pause_ptr = via_get_dma(dev_priv) - 1;
> +       while(! *dev_priv->last_pause_ptr);
> 
>         if ((ptr - reader) <= dev_priv->dma_diff ) {
>                 count = 10000000;
> 


Hi, The code you have added (except the last +) is used to flush write-combining registers in the processor. I thought DRM_MEMORYBARRIER() should be sufficient for that, but apparently not.

I have added some equivalent code in drm_git. Can you try it and see if it runs?

/Thomas
Comment 8 Rafał Bilski 2007-04-17 10:02:37 UTC
Yes. It is working for me. Thanks.
Comment 9 Octavian Petre 2007-10-20 09:54:55 UTC
Have a VIA CN700 and I have the same nightmare.

After trying all the patches on the net I had also tried the latest libdrm/drm 
git clone git://anongit.freedesktop.org/git/mesa/drm

and mesa 7.0.1

However, I am still getting the mesage below after 30seconds of using googleearth. Then sometime the system freeze.

[drm:via_cmdbuf_wait] *ERROR* via_cmdbuf_wait timed out hw c9e00 cur_addr 49e00 next_addr ca000

Any new ideas?

My board is EPIA EN12000EG

lspic -v:
01:00.0 VGA compatible controller: VIA Technologies, Inc. UniChrome Pro IGP (rev 01) (prog-if 00 [VGA])
        Subsystem: VIA Technologies, Inc. UniChrome Pro IGP
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 21
        Memory at f4000000 (32-bit, prefetchable) [size=64M]
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        [virtual] Expansion ROM at fc000000 [disabled] [size=64K]
        Capabilities: [60] Power Management version 2
        Capabilities: [70] AGP version 3.0

Thank you in advance for your answer,
Octavian

P.S. I am trying to solve this problem for more than half a year.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.