Bug 82588

Summary: X fails to start with linus-tip or drm-next
Product: DRI Reporter: Mike Lothian <mike>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ckoenig.leichtzumerken, mike
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
DRM Next
none
Linux Tip
none
Dmesg Revert
none
dmesg of linux-3.17_rc2
none
Xorg.log
none
Dmesg Linus's Tip
none
Firmware sha1sums none

Description Mike Lothian 2014-08-13 22:41:45 UTC
Created attachment 104590 [details]
DRM Next

I've not had a chance to diagnose this issue yet or bisect
Comment 1 Mike Lothian 2014-08-13 22:42:18 UTC
Created attachment 104591 [details]
Linux Tip
Comment 2 Mike Lothian 2014-08-13 22:42:57 UTC
Both were compiled with the new kabini firmware
Comment 4 Mike Lothian 2014-08-14 07:28:08 UTC
Created attachment 104605 [details]
Dmesg Revert

It made no difference
Comment 5 Michel Dänzer 2014-08-18 06:03:30 UTC
If reverting both that commit and http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-3.17&id=810b73d1909298b67db5c7c047ed99b487ff7341 doesn't help either, can you bisect?
Comment 6 jospezial 2014-08-26 22:05:08 UTC
Created attachment 105305 [details]
dmesg of linux-3.17_rc2

This do I get on a Gentoo linux with xf86-video-ati-7.4.0 since linux-3.17_rc1.
X starts but GPU acceleration is disabled.

[    6.787592] [drm] Initialized drm 1.1.0 20060810
[    7.007890] [drm] radeon kernel modesetting enabled.
[    7.008441] [drm] initializing kernel modesetting (RS740 0x1002:0x796E 0x105B:0x0E13).
[    7.008456] [drm] register mmio base: 0xFEAF0000
[    7.008457] [drm] register mmio size: 65536
[    7.009087] ATOM BIOS: ATI
[    7.009102] radeon 0000:01:05.0: VRAM: 128M 0x0000000038000000 - 0x000000003FFFFFFF (128M used)
[    7.009104] radeon 0000:01:05.0: GTT: 512M 0x0000000040000000 - 0x000000005FFFFFFF
[    7.009117] [drm] Detected VRAM RAM=128M, BAR=128M
[    7.009118] [drm] RAM width 128bits DDR
[    7.009204] [TTM] Zone  kernel: Available graphics memory: 443784 kiB
[    7.009206] [TTM] Initializing pool allocator
[    7.009212] [TTM] Initializing DMA pool allocator
[    7.009235] [drm] radeon: 128M of VRAM memory ready
[    7.009236] [drm] radeon: 512M of GTT memory ready.
[    7.009251] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    7.025657] [drm] radeon: 1 quad pipes, 1 z pipes initialized.
[    7.025670] [drm] PCIE GART of 512M enabled (table at 0x0000000032F00000).
[    7.025726] radeon 0000:01:05.0: WB enabled
[    7.025730] radeon 0000:01:05.0: fence driver on ring 0 use gpu addr 0x0000000040000000 and cpu addr 0xffff880032ea5000
[    7.025733] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    7.025734] [drm] Driver supports precise vblank timestamp query.
[    7.025744] [drm] radeon: irq initialized.
[    7.025753] [drm] Loading RS690/RS740 Microcode
[    7.071137] [drm] radeon: ring at 0x0000000040001000
[    7.228868] [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[    7.228875] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
[    7.228883] radeon 0000:01:05.0: failed initializing CP (-22).
[    7.228888] radeon 0000:01:05.0: Disabling GPU acceleration
[    7.375499] [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
[    7.521222] Failed to wait GUI idle while programming pipes. Bad things might happen.
[    7.521467] [drm] radeon: cp finalized

With linux-3.16.1 it works.
lspci:
01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS740 [Radeon 2100]
Comment 7 jospezial 2014-08-26 22:13:25 UTC
Created attachment 105306 [details]
Xorg.log

some messages from Xorg.log:

[    24.729] (--) RADEON(0): Chipset: "ATI RS740" (ChipID = 0x796e)
[    24.729] (II) RADEON(0): GPU accel disabled or not working, using shadowfb for KMS

[    24.869] (WW) RADEON(0): Direct rendering disabled
[    24.869] (II) RADEON(0): Acceleration disabled

[    24.871] (WW) RADEON(0): Option "AccelMethod" is not used

[    24.879] (II) AIGLX: Screen 0 is not DRI2 capable
[    24.879] (EE) AIGLX: reverting to software rendering
[    25.578] (II) AIGLX: Loaded and initialized swrast
[    25.578] (II) GLX: Initialized DRISWRAST GL provider for screen 0
Comment 8 Alex Deucher 2014-08-26 22:16:00 UTC
(In reply to comment #6)
> With linux-3.16.1 it works.
> lspci:
> 01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> RS740 [Radeon 2100]

I don't think this is related to the original report since it's a different chip, but it would be nice if both you and Mike could bisect.
Comment 9 jospezial 2014-09-07 22:58:47 UTC
Reverting these two patches did not help. Tested with 3.17.0-rc3 .

Btw, reverting the second patch was not so easy.

radeon_ring.c was splitted up and the patch had to be applied to radeon_ib.c .

the first patch has been reverted here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/radeon?id=b738ca5d68e4051c86e32f46f67a69f3bb9cee5e
Comment 10 Dieter Nützel 2014-09-08 21:36:08 UTC
Please have a look, here:

https://bugs.freedesktop.org/show_bug.cgi?id=83616
Comment 11 jospezial 2014-09-13 00:22:14 UTC
(In reply to comment #10)
> Please have a look, here:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=83616

For me patch https://bugs.freedesktop.org/attachment.cgi?id=105904 does not help.
But I think r600.c is not used for my chip.
It is r100_ring_test that fails.
I'm a bit confused, which files are related to my
RS740 [Radeon 2100] (onboard graphic from 740G chipset)

I have to find out, how I can debug that failing ring test and to see, why it fails.
Comment 12 Mike Lothian 2014-09-13 16:11:43 UTC
Created attachment 106218 [details]
Dmesg Linus's Tip
Comment 13 Mike Lothian 2014-09-13 16:12:56 UTC
Linus's tree still fails to start X - I've attached the latest output that shows some drm errors

I can't get the drm-next tree to boot at all now
Comment 14 Alex Deucher 2014-09-13 16:44:44 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Please have a look, here:
> > 
> > https://bugs.freedesktop.org/show_bug.cgi?id=83616
> 
> For me patch https://bugs.freedesktop.org/attachment.cgi?id=105904 does not
> help.
> But I think r600.c is not used for my chip.
> It is r100_ring_test that fails.
> I'm a bit confused, which files are related to my
> RS740 [Radeon 2100] (onboard graphic from 740G chipset)

As I said before in comment 8, I don't think your issue is the same as Mike's since your chips are very different generations.  It could be nice if one of you could bisect.
Comment 15 Mike Lothian 2014-09-13 16:51:49 UTC
I'm bisecting now
Comment 16 Mike Lothian 2014-09-13 18:52:30 UTC
This looks bad:

f2c6b0f452c3804496f55655fda28c2809e1a58b is the first bad commit
commit f2c6b0f452c3804496f55655fda28c2809e1a58b
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Jun 25 19:32:36 2014 -0400

    drm/radeon/cik: Add support for new ucode format (v5)
    
    This adds CIK support for the new ucode format.
    
    v2: add size validation, integrate debug info
    v3: add support for MEC2 on KV
    v4: fix typos
    v4: update to latest format
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 9d67f55d30562a528f15cd21ae4bde4598688525 6e1c473d75fb7bc7598872f9e535c7a074d1fcfd M      drivers
Comment 17 Mike Lothian 2014-09-13 18:54:32 UTC
Created attachment 106226 [details]
Firmware sha1sums
Comment 18 Mike Lothian 2014-09-15 18:43:50 UTC
That's the issue - it seems there was an issue with the firmwares I downloaded

Isn't there new checking in the code to make sure the firmwares are correct?

Now running 3.16-rc5 with no issues
Comment 19 jospezial 2014-09-15 22:00:46 UTC
(In reply to comment #18)
> That's the issue - it seems there was an issue with the firmwares I
> downloaded
> 
> Isn't there new checking in the code to make sure the firmwares are correct?
> 
> Now running 3.16-rc5 with no issues

You wanted to say 3.17-rc5?

So is it bad to use 3.16.2 and 3.17-rc5 because every kernel installs his firmware files to the same directory and overwrites the other one's?

The kernel installs the firmware with most of the filenames of r(v)+numbers in /lib/firmware/radeon/ .

And the gentoo package "x11-drivers/radeon-ucode-20140823" installs the files with the chip codenames and some numbered files which are not installed by the kernel in the same dir.

So much from me now for more confusion.

You marked that bug as resolved/invalid.

Would you please tell me in detail, how you fixed this? I'd like to reopen this bug.
Reverting "drm/radeon/cik: Add support for new ucode format (v5)" did not help me.

I know I have to bisect ... But this is new stuff for me.
Comment 20 Alex Deucher 2014-09-15 22:05:00 UTC
(In reply to comment #19)
> 
> So is it bad to use 3.16.2 and 3.17-rc5 because every kernel installs his
> firmware files to the same directory and overwrites the other one's?
> 
> The kernel installs the firmware with most of the filenames of r(v)+numbers
> in /lib/firmware/radeon/ .
> 
> And the gentoo package "x11-drivers/radeon-ucode-20140823" installs the
> files with the chip codenames and some numbered files which are not
> installed by the kernel in the same dir.
> 
> So much from me now for more confusion.

The firmware is not shipped with the kernel.  It's packaged separately.  The firmware for your card has not changed in years so it's not likely to the same issue.
Comment 21 Mike Lothian 2014-09-15 22:06:02 UTC
Yes 3.17-rc5

I downloaded the firmware files when they were first made available back when drm-next first added the support for the new firmware at the end of the 3.16 cycle

Those files must have been damaged - installing the latest linux-firmware (or radeon-ucode) fixed this for me

You are probably suffering from a different issue and should raise a separate bug
Comment 22 jospezial 2014-09-16 04:25:49 UTC
This night I spend a few hours for bisecting between v3.16 and v3.17-rc1.

This is my result:

77497f2735ad6e29c55475e15e9790dbfa2c2ef8 is the first bad commit
commit 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Thu Jul 17 19:01:07 2014 +0900

    drm/radeon: Pass GART page flags to radeon_gart_set_page() explicitly
    
    Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 7da27ed892f4ea02ef8e758eda7165ce336d19cc 369d9e0ff185b6e6c9614de87296fc60072f56b9 M      drivers
:040000 040000 c3203bef4546e1781ba218fa5232c12cd2a883a2 b655879d0fefad7b591333930fddfd3cc67afa8d M      include

Will try reverting that patch.
Comment 23 Dieter Nützel 2014-09-16 05:25:52 UTC
(In reply to comment #22)
> This night I spend a few hours for bisecting between v3.16 and v3.17-rc1.
> 
> This is my result:
> 
> 77497f2735ad6e29c55475e15e9790dbfa2c2ef8 is the first bad commit
> commit 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
> Author: Michel Dänzer <michel.daenzer@amd.com>
> Date:   Thu Jul 17 19:01:07 2014 +0900
> 
>     drm/radeon: Pass GART page flags to radeon_gart_set_page() explicitly
>     
>     Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> :040000 040000 7da27ed892f4ea02ef8e758eda7165ce336d19cc
> 369d9e0ff185b6e6c9614de87296fc60072f56b9 M      drivers
> :040000 040000 c3203bef4546e1781ba218fa5232c12cd2a883a2
> b655879d0fefad7b591333930fddfd3cc67afa8d M      include
> 
> Will try reverting that patch.

Tried that, but 'git revert 77497f2' failed.
Tomorrow I'l try it by hand.

Christian König and I hunting for this:

[   11.264803] radeon 0000:01:00.0: (-1) pin WB bo failed
[   11.264814] radeon 0000:01:00.0: f6107800 unpin not necessary
[   11.264834] radeon 0000:01:00.0: disabling GPU acceleration
[   11.312639] radeon 0000:01:00.0: f2f68000 unpin not necessary
[   11.367419] [TTM] Finalizing pool allocator
[   11.367541] [TTM] Zone  kernel: Used memory at exit: 0 kiB
[   11.367547] [TTM] Zone highmem: Used memory at exit: 0 kiB
[   11.367551] [drm] radeon: ttm finalized
[   11.367556] [drm] Forcing AGP to PCIE mode

My own 'git bisect' pointed to 'drm/radeon: Remove radeon_gart_restore()'
One commit before yours.

author	Michel Dänzer <michel.daenzer@amd.com>	2014-07-09 18:15:42 (GMT)
committer	Alex Deucher <alexander.deucher@amd.com>	2014-08-05 12:53:31 (GMT)
commit	a3eb06dbca08e3fdad7039021ae03b46b215f22a (patch)
tree	2086d5a660f3581f0b7fd617d41241873ff07ce8
parent	380670aebfca998bb67b9cf05fc7f28ebeac4b18 (diff)
drm/radeon: Remove radeon_gart_restore()
Doesn't seem necessary, the GART table memory should be persistent.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 24 Christian König 2014-09-16 14:03:42 UTC
The good news is that I can reproduce the problem, so I can take a look as soon as I have time. The bad news is that I currently don't have time right now.
Comment 25 Michel Dänzer 2014-09-17 03:51:06 UTC
The issues of the reporter of this bug have been fixed.

Dieter and jospezial, please stop hijacking this report and file your own.
Comment 26 jospezial 2014-09-17 13:44:35 UTC
My bug now at https://bugs.freedesktop.org/show_bug.cgi?id=83996

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.