Bug 107855 - [regression] [amdgpu] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, giving up!!!
Summary: [regression] [amdgpu] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not respondin...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 107910 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-09-07 09:05 UTC by Pontus Gråskæg
Modified: 2018-09-12 16:48 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg_4.19.0-041900rc2_dc.eq.1_initrdfixed (74.08 KB, text/plain)
2018-09-07 13:35 UTC, Pontus Gråskæg
no flags Details
dmesg_4.19.0-041900rc2_dc.eq.0_initrdfixed (69.92 KB, text/plain)
2018-09-07 13:36 UTC, Pontus Gråskæg
no flags Details

Description Pontus Gråskæg 2018-09-07 09:05:33 UTC
kernel 4.18.0-041800-generic on AMD A10 (Kaveri)
kernel boot parameters:
amdgpu.cik_support=1 
radeon.cik_support=0 
amdgpu.dc=1 
amdgpu.dc_log=1 
drm.debug=0

Laptop has HDMI, VGA and internal eDP.

Using amdgpu.dc=1 to force use of amdgpu instead of radeon driver. This means VGA connection is expected to be inoperative. (See bug 105880)

dmesg show Video Coding Engine (VCE) is not successfully starting

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.18.0-041800-generic root=<redacted> amdgpu.cik_support=1 radeon.cik_support=0 amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=0
[    2.319941] [drm] radeon kernel modesetting enabled.
[    2.321218] fb: switching to radeondrmfb from EFI VGA
[    2.371470] [drm] amdgpu kernel modesetting enabled.
[    2.380860] [drm] initializing kernel modesetting (KAVERI 0x1002:0x130A 0x17AA:0x3988 0x00).
[    2.380878] [drm] register mmio base: 0xF0B00000
[    2.380881] [drm] register mmio size: 262144
[    2.380889] [drm] add ip block number 0 <cik_common>
[    2.380892] [drm] add ip block number 1 <gmc_v7_0>
[    2.380895] [drm] add ip block number 2 <cik_ih>
[    2.380898] [drm] add ip block number 3 <kv_dpm>
[    2.380901] [drm] add ip block number 4 <dm>
[    2.380904] [drm] add ip block number 5 <gfx_v7_0>
[    2.380907] [drm] add ip block number 6 <cik_sdma>
[    2.380909] [drm] add ip block number 7 <uvd_v4_2>
[    2.380912] [drm] add ip block number 8 <vce_v2_0>
[    2.404997] [drm] BIOS signature incorrect 0 0
[    2.405131] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    2.405153] [drm] Detected VRAM RAM=1024M, BAR=1024M
[    2.405155] [drm] RAM width 64bits UNKNOWN
[    2.405409] [drm] amdgpu: 1024M of VRAM memory ready
[    2.405412] [drm] amdgpu: 3072M of GTT memory ready.
[    2.405424] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.405460] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[    2.405542] [drm] Internal thermal controller without fan control
[    2.405546] [drm] amdgpu: dpm initialized
[    2.406632] [drm] Found UVD firmware Version: 1.55 Family ID: 9
[    2.406987] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    2.408834] [drm] Unsupported Connector type:5!
[    2.409033] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:3! type 0 expected 3
[    2.409141] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:4! type 0 expected 3
[    2.409248] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:5! type 0 expected 3
[    2.409419] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:6! type 0 expected 3
[    2.420176] [drm] Display Core initialized with v3.1.44!
[    2.426685] [drm] SADs count is: -2, don't need to read it
[    2.427264] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.427271] [drm] Driver supports precise vblank timestamp query.
[    2.446238] [drm] UVD initialized successfully.
[    2.562838] [drm] VCE initialized successfully.
[    2.567812] [drm] fb mappable at 0xA0BD1000
[    2.567821] [drm] vram apper at 0xA0000000
[    2.567824] [drm] size 8294400
[    2.567826] [drm] fb depth is 24
[    2.567828] [drm]    pitch is 7680
[    2.567942] fbcon: amdgpudrmfb (fb0) is primary device
[    2.570418] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 141400
[    2.600980] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    2.618335] [drm] Initialized amdgpu 3.26.0 20150101 for 0000:00:01.0 on minor 0
[    5.754156] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[    6.774125] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[    7.794092] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[    8.812428] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[    9.829125] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   10.849149] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   11.869148] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   12.888162] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   13.908161] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   14.928124] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, trying to reset the ECPU!!!
[   14.948142] [drm:vce_v2_0_start [amdgpu]] *ERROR* VCE not responding, giving up!!!
[   14.948181] [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <vce_v2_0> failed -110
[   15.964329] [drm:amdgpu_vce_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[   15.964388] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 12 (-110).
[   15.964425] [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).
[  403.874735] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 141400

The laptop screen remains blank, and the cpu fan ramps up. I used the 'magic SysRq' key combination followed by R E I whereupon it was possible to get to the first virtual console (Ctrl-Alt-F1) and save the dmesg ring buffer.

kernel 4.17.0-041700 generic does not show this behaviour, and the laptop screen works, allowing X-Windows login.

[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-041700-generic root=<redacted> amdgpu.cik_support=1 radeon.cik_support=0 amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=0
[    2.356844] [drm] radeon kernel modesetting enabled.
[    2.358519] fb: switching to radeondrmfb from EFI VGA
[    2.403908] [drm] amdgpu kernel modesetting enabled.
[    2.412726] [drm] initializing kernel modesetting (KAVERI 0x1002:0x130A 0x17AA:0x3988 0x00).
[    2.412744] [drm] register mmio base: 0xF0B00000
[    2.412748] [drm] register mmio size: 262144
[    2.412757] [drm] add ip block number 0 <cik_common>
[    2.412760] [drm] add ip block number 1 <gmc_v7_0>
[    2.412763] [drm] add ip block number 2 <cik_ih>
[    2.412766] [drm] add ip block number 3 <kv_dpm>
[    2.412769] [drm] add ip block number 4 <dm>
[    2.412773] [drm] add ip block number 5 <gfx_v7_0>
[    2.412776] [drm] add ip block number 6 <cik_sdma>
[    2.412778] [drm] add ip block number 7 <uvd_v4_2>
[    2.412781] [drm] add ip block number 8 <vce_v2_0>
[    2.436383] [drm] BIOS signature incorrect 0 0
[    2.436506] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    2.436528] [drm] Detected VRAM RAM=1024M, BAR=1024M
[    2.436531] [drm] RAM width 64bits UNKNOWN
[    2.436704] [drm] amdgpu: 1024M of VRAM memory ready
[    2.436708] [drm] amdgpu: 3072M of GTT memory ready.
[    2.436720] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.436758] [drm] PCIE GART of 1024M enabled (table at 0x000000F400040000).
[    2.436843] [drm] Internal thermal controller without fan control
[    2.436847] [drm] amdgpu: dpm initialized
[    2.437906] [drm] Found UVD firmware Version: 1.55 Family ID: 9
[    2.438263] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    2.440362] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:3! type 0 expected 3
[    2.440455] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:4! type 0 expected 3
[    2.440547] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:5! type 0 expected 3
[    2.440674] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:6! type 0 expected 3
[    2.456216] [drm] Display Core initialized with v3.1.38!
[    2.462792] [drm] SADs count is: -2, don't need to read it
[    2.463390] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.463397] [drm] Driver supports precise vblank timestamp query.
[    2.481585] [drm] UVD initialized successfully.
[    2.598081] [drm] VCE initialized successfully.
[    2.605222] [drm] fb mappable at 0xA042A000
[    2.605231] [drm] vram apper at 0xA0000000
[    2.605234] [drm] size 8294400
[    2.605236] [drm] fb depth is 24
[    2.605239] [drm]    pitch is 7680
[    2.605367] fbcon: amdgpudrmfb (fb0) is primary device
[    2.686915] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    2.706752] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:00:01.0 on minor 0
Comment 1 Michel Dänzer 2018-09-07 09:24:32 UTC
Can you a try a 4.19-rc2 or later kernel? Specifically, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d39df146ff12fb5c71634ad135144d5423590ec fixed this for me.
Comment 2 Pontus Gråskæg 2018-09-07 11:53:44 UTC
It doesn't appear to solve it. Instead I get a different problem earlier in the boot sequence, so it does not appear to reach the VCE code - dmesg extract below.

The effect on me is that I don't get a visible prompt for my LUKS password, but if I type it in blind, _eventually_ the laptop display (and only the laptop display) shows an X-windows login. This behaviour occurs with *both* the kernel command line having amdgpu.dc=1 *AND* amdgpu.dc=0 - in other words the old radeon driver is also having problems here.

The key point might be

[    2.619673] gfx7: Failed to load firmware "amdgpu/kaveri_pfp.bin"
[    2.619798] [drm:gfx_v7_0_sw_init [amdgpu]] *ERROR* Failed to load gfx 


+++dmesg extract follows+++

[    0.000000] Linux version 4.19.0-041900rc2-generic (kernel@kathleen) (gcc version 8.2.0 (Ubuntu 8.2.0-4ubuntu1)) #201809022230 SMP Sun Sep 2 22:32:22 UTC 2018
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-041900rc2-generic root=<redacted> amdgpu.cik_support=1 radeon.cik_support=0 amdgpu.dc=0 amdgpu.dc_log=1 drm.debug=0
...
[    2.573103] amdgpu: unknown parameter 'dc_log' ignored
[    2.573410] [drm] amdgpu kernel modesetting enabled.
[    2.576507] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    2.576512] AMD IOMMUv2 functionality not available on this system
[    2.582189] CRAT table not found
[    2.582197] Virtual CRAT table created for CPU
[    2.582200] Parsing CRAT table with 1 nodes
[    2.582204] Creating topology SYSFS entries
[    2.582219] Topology: Add CPU node
[    2.582221] Finished initializing topology
[    2.582287] kfd kfd: Initialized module
[    2.583151] [drm] initializing kernel modesetting (KAVERI 0x1002:0x130A 0x17AA:0x3988 0x00).
[    2.592261] [drm] register mmio base: 0xF0B00000
[    2.592271] [drm] register mmio size: 262144
[    2.592282] [drm] add ip block number 0 <cik_common>
[    2.592285] [drm] add ip block number 1 <gmc_v7_0>
[    2.592288] [drm] add ip block number 2 <cik_ih>
[    2.592291] [drm] add ip block number 3 <kv_dpm>
[    2.592294] [drm] add ip block number 4 <dce_v8_0>
[    2.592297] [drm] add ip block number 5 <gfx_v7_0>
[    2.592300] [drm] add ip block number 6 <cik_sdma>
[    2.592302] [drm] add ip block number 7 <uvd_v4_2>
[    2.592305] [drm] add ip block number 8 <vce_v2_0>
[    2.615911] [drm] BIOS signature incorrect 0 0
[    2.615938] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c3fff window]
[    2.615947] caller pci_map_rom+0x71/0x1c0 mapping multiple BARs
[    2.615994] ATOM BIOS: 113-SPEC-X24
[    2.616825] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    2.616842] amdgpu 0000:00:01.0: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
[    2.616848] amdgpu 0000:00:01.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[    2.616861] [drm] Detected VRAM RAM=1024M, BAR=1024M
[    2.616863] [drm] RAM width 64bits UNKNOWN
[    2.616996] [TTM] Zone  kernel: Available graphics memory: 3531364 kiB
[    2.617000] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    2.617003] [TTM] Initializing pool allocator
[    2.617010] [TTM] Initializing DMA pool allocator
[    2.617076] [drm] amdgpu: 1024M of VRAM memory ready
[    2.617080] [drm] amdgpu: 3072M of GTT memory ready.
[    2.617094] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.617132] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[    2.617210] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.617213] [drm] Driver supports precise vblank timestamp query.
[    2.617238] [drm] Internal thermal controller without fan control
[    2.617242] [drm] amdgpu: dpm initialized
[    2.619601] [drm] amdgpu atom DIG backlight initialized
[    2.619607] [drm] AMDGPU Display Connectors
[    2.619609] [drm] Connector 0:
[    2.619611] [drm]   VGA-1
[    2.619613] [drm]   HPD2
[    2.619616] [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[    2.619618] [drm]   Encoders:
[    2.619620] [drm]     CRT1: INTERNAL_UNIPHY2
[    2.619622] [drm]     CRT1: NUTMEG
[    2.619624] [drm] Connector 1:
[    2.619625] [drm]   HDMI-A-1
[    2.619627] [drm]   HPD3
[    2.619629] [drm]   DDC: 0x1954 0x1954 0x1955 0x1955 0x1956 0x1956 0x1957 0x1957
[    2.619632] [drm]   Encoders:
[    2.619633] [drm]     DFP1: INTERNAL_UNIPHY2
[    2.619635] [drm] Connector 2:
[    2.619637] [drm]   eDP-1
[    2.619639] [drm]   HPD1
[    2.619641] [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[    2.619643] [drm]   Encoders:
[    2.619645] [drm]     LCD1: INTERNAL_UNIPHY
[    2.619668] amdgpu 0000:00:01.0: Direct firmware load for amdgpu/kaveri_pfp.bin failed with error -2
[    2.619673] gfx7: Failed to load firmware "amdgpu/kaveri_pfp.bin"
[    2.619798] [drm:gfx_v7_0_sw_init [amdgpu]] *ERROR* Failed to load gfx firmware!
[    2.619859] [drm:amdgpu_device_init.cold.28 [amdgpu]] *ERROR* sw_init of IP block <gfx_v7_0> failed -2
[    2.619864] amdgpu 0000:00:01.0: amdgpu_device_ip_init failed
[    2.619867] amdgpu 0000:00:01.0: Fatal error during GPU init
[    2.619870] [drm] amdgpu: finishing device.
[    2.620041] [drm] amdgpu atom LVDS backlight unloaded
[    2.620568] ------------[ cut here ]------------
[    2.620573] Memory manager not clean during takedown.
[    2.620660] WARNING: CPU: 1 PID: 186 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x23/0x30 [drm]
[    2.620665] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) crct10dif_pclmul chash crc32_pclmul gpu_sched ghash_clmulni_intel pcbc radeon rtsx_usb_sdmmc i2c_algo_bit ttm aesni_intel drm_kms_helper aes_x86_64 syscopyarea ahci crypto_simd sysfillrect sysimgblt cryptd fb_sys_fops glue_helper psmouse rtsx_usb drm libahci r8169 video
[    2.620691] CPU: 1 PID: 186 Comm: systemd-udevd Not tainted 4.19.0-041900rc2-generic #201809022230
[    2.620695] Hardware name: LENOVO 80EC/Lancer 5B3, BIOS A4CN40WW (V 2.09) 08/24/2015
[    2.620712] RIP: 0010:drm_mm_takedown+0x23/0x30 [drm]
[    2.620716] Code: 00 00 00 0f 1f 40 00 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 01 c3 55 48 c7 c7 88 ca 30 c0 48 89 e5 e8 ef fe 7b ed <0f> 0b 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5
[    2.620721] RSP: 0018:ffffa1b2010ff8f0 EFLAGS: 00010286
[    2.620725] RAX: 0000000000000000 RBX: ffff8b40cb5dbae8 RCX: ffffffffaf0628a8
[    2.620728] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246
[    2.620731] RBP: ffffa1b2010ff8f0 R08: 0000000000000000 R09: 0720072007200720
[    2.620734] R10: ffff8b40cb604f00 R11: 0720072007200720 R12: ffff8b40cb5dba00
[    2.620737] R13: ffff8b40ca1929a0 R14: 0000000000000000 R15: 0000000000000170
[    2.620741] FS:  00007fdb713528c0(0000) GS:ffff8b40d7a80000(0000) knlGS:0000000000000000
[    2.620744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.620747] CR2: 00007f5a4ee401ab CR3: 000000020b1bc000 CR4: 00000000000406e0
[    2.620751] Call Trace:
[    2.620841]  amdgpu_vram_mgr_fini+0x2d/0x50 [amdgpu]
[    2.620852]  ttm_bo_clean_mm+0xa6/0xc0 [ttm]
[    2.620910]  amdgpu_ttm_fini+0x76/0x110 [amdgpu]
[    2.620968]  amdgpu_bo_fini+0x12/0x40 [amdgpu]
[    2.621032]  gmc_v7_0_sw_fini+0x36/0x60 [amdgpu]
[    2.621104]  amdgpu_device_fini+0x2da/0x4a1 [amdgpu]
[    2.621161]  amdgpu_driver_unload_kms+0x47/0x90 [amdgpu]
[    2.621218]  amdgpu_driver_load_kms+0x153/0x2d0 [amdgpu]
[    2.621233]  drm_dev_register+0x11f/0x160 [drm]
[    2.621289]  amdgpu_pci_probe+0x140/0x1c0 [amdgpu]
[    2.621295]  local_pci_probe+0x46/0x90
[    2.621299]  pci_device_probe+0x18d/0x1a0
[    2.621305]  really_probe+0x243/0x3b0
[    2.621309]  driver_probe_device+0xba/0x100
[    2.621314]  __driver_attach+0xe4/0x110
[    2.621318]  ? driver_probe_device+0x100/0x100
[    2.621322]  bus_for_each_dev+0x74/0xb0
[    2.621327]  ? kmem_cache_alloc_trace+0x1c8/0x1e0
[    2.621332]  driver_attach+0x1e/0x20
[    2.621336]  bus_add_driver+0x159/0x230
[    2.621340]  ? 0xffffffffc01a6000
[    2.621344]  driver_register+0x70/0xc0
[    2.621347]  ? 0xffffffffc01a6000
[    2.621351]  __pci_register_driver+0x57/0x60
[    2.621407]  amdgpu_init+0x87/0x89 [amdgpu]
[    2.621412]  do_one_initcall+0x4a/0x1c4
[    2.621417]  ? _cond_resched+0x19/0x30
[    2.621421]  ? kmem_cache_alloc_trace+0x172/0x1e0
[    2.621425]  ? vfree+0x2e/0x70
[    2.621430]  do_init_module+0x60/0x220
[    2.621434]  load_module+0x16c1/0x1930
[    2.621440]  __do_sys_finit_module+0xbd/0x120
[    2.621444]  ? __do_sys_finit_module+0xbd/0x120
[    2.621449]  __x64_sys_finit_module+0x1a/0x20
[    2.621452]  do_syscall_64+0x5a/0x110
[    2.621457]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.621460] RIP: 0033:0x7fdb701a94d9
[    2.621463] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
[    2.621469] RSP: 002b:00007ffdf1b91498 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    2.621473] RAX: ffffffffffffffda RBX: 000055b721f168e0 RCX: 00007fdb701a94d9
[    2.621476] RDX: 0000000000000000 RSI: 000055b721f15180 RDI: 0000000000000016
[    2.621479] RBP: 000055b721f15180 R08: 0000000000000000 R09: 000000000000001b
[    2.621482] R10: 0000000000000016 R11: 0000000000000246 R12: 0000000000000000
[    2.621485] R13: 000055b721f3da50 R14: 0000000000020000 R15: 0000000000000000
[    2.621489] ---[ end trace 273999c88a5eef55 ]---
Comment 3 Christian König 2018-09-07 11:56:34 UTC
(In reply to Pontus Gråskæg from comment #2)
> This behaviour occurs with *both*
> the kernel command line having amdgpu.dc=1 *AND* amdgpu.dc=0 - in other
> words the old radeon driver is also having problems here.

Sounds like a misunderstanding here: amdgpu.dc=0 does NOT select the older radeon driver, instead it just disables the DC code base in amdgpu.
Comment 4 Michel Dänzer 2018-09-07 12:24:21 UTC
(In reply to Pontus Gråskæg from comment #2)
> [    2.619673] gfx7: Failed to load firmware "amdgpu/kaveri_pfp.bin"
> [    2.619798] [drm:gfx_v7_0_sw_init [amdgpu]] *ERROR* Failed to load gfx 

The new kernel reads the microcode from /lib/firmware/amdgpu/kaveri_*. Since you don't have those files yet, you can create symlinks to those in /lib/firmware/radeon/. Don't forget to update the initrd afterwards.
Comment 5 Pontus Gråskæg 2018-09-07 12:28:40 UTC
Sorry for the confusion.

"The new kernel reads the microcode from /lib/firmware/amdgpu/kaveri_*. Since you don't have those files yet, you can create symlinks to those in /lib/firmware/radeon/. Don't forget to update the initrd afterwards."

I will create the symlinks and retry, but might not be able to do so until Monday.

Pontus
Comment 6 Pontus Gråskæg 2018-09-07 13:35:25 UTC
Created attachment 141473 [details]
dmesg_4.19.0-041900rc2_dc.eq.1_initrdfixed
Comment 7 Pontus Gråskæg 2018-09-07 13:35:57 UTC
OK, symlinks done, initrd rebuilt

Results are:

With amdgpu.dc=0 System boots with normal behaviour. Laptop eDP screen works, VGA output works. I have no HDMI monitor to check if HDMI output works.

With amdgpu.dc=1 I get a blank laptop screen (eDP) and (of course) no VGA, and I don't have an HDMI monitor to test the HDMI output. Given the problem I experienced before, I guessed that a blind input of my LUKS password would get me somewhere, so I enabled sshd beforehand so I could ssh into the laptop and save the dmesg ring buffer to a file. It shows a problem. I have extracted what I hope is the relevant part below, but have also attached full dmesg output for both the amdgpu.dc=0 and amdgpu=1 cases.

[    2.047839] [TTM] Initializing pool allocator
[    2.047849] [TTM] Initializing DMA pool allocator
[    2.047915] [drm] amdgpu: 1024M of VRAM memory ready
[    2.047919] [drm] amdgpu: 3072M of GTT memory ready.
[    2.047933] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.047971] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[    2.048069] [drm] Internal thermal controller without fan control
[    2.048074] [drm] amdgpu: dpm initialized
[    2.049295] [drm] Found UVD firmware Version: 1.55 Family ID: 9
[    2.049697] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    2.055250] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel state: 0!
[    2.055339] [drm] Unsupported Connector type:5!
[    2.055459] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:3! type 0 expected 3
[    2.055535] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:4! type 0 expected 3
[    2.055610] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:5! type 0 expected 3
[    2.055686] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:6! type 0 expected 3
[    2.067894] [drm] Display Core initialized with v3.1.59!
[    2.068603] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[    2.068611] PGD 0 P4D 0 
[    2.068616] Oops: 0000 [#1] SMP NOPTI
[    2.068622] CPU: 2 PID: 177 Comm: systemd-udevd Not tainted 4.19.0-041900rc2-generic #201809022230
[    2.068626] Hardware name: LENOVO 80EC/Lancer 5B3, BIOS A4CN40WW (V 2.09) 08/24/2015
[    2.068759] RIP: 0010:dc_link_aux_transfer+0x88/0x160 [amdgpu]
[    2.068764] Code: 00 00 48 c7 45 a8 00 00 00 00 48 c7 45 a0 00 00 00 00 48 8b 16 48 8b 00 8b 52 14 48 8b 80 38 01 00 00 48 8b 9c d0 b0 01 00 00 <48> 8b 43 18 48 89 df 48 8b 40 48 e8 68 29 90 f3 8b 45 10 4c 89 6d
[    2.068770] RSP: 0018:ffffa2dec1133590 EFLAGS: 00010246
[    2.068774] RAX: ffff8fba8a10b400 RBX: 0000000000000000 RCX: ffffa2dec1133708
[    2.068777] RDX: 0000000000000000 RSI: ffff8fba8aa6cd00 RDI: ffff8fba8a10a400
[    2.068780] RBP: ffffa2dec1133600 R08: 0000000000000001 R09: 0000000000000000
[    2.068783] R10: ffffec6ac82a9b00 R11: 0000000000000001 R12: 0000000000000001
[    2.068786] R13: ffffa2dec1133708 R14: 0000000000000000 R15: 0000000000000000
[    2.068790] FS:  00007f3f17e1e8c0(0000) GS:ffff8fba97b00000(0000) knlGS:0000000000000000
[    2.068794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.068797] CR2: 0000000000000018 CR3: 000000020a948000 CR4: 00000000000406e0
[    2.068800] Call Trace:
[    2.068882]  dm_dp_aux_transfer+0x61/0x130 [amdgpu]
[    2.068899]  drm_dp_dpcd_access+0x75/0x110 [drm_kms_helper]
[    2.068908]  drm_dp_dpcd_read+0x33/0xc0 [drm_kms_helper]
[    2.068984]  dm_helpers_dp_read_dpcd+0x2b/0x50 [amdgpu]
[    2.069057]  core_link_read_dpcd+0x23/0x30 [amdgpu]
[    2.069130]  retrieve_link_cap+0x55/0x440 [amdgpu]
[    2.069203]  detect_edp_sink_caps+0x12/0x40 [amdgpu]
[    2.069276]  dc_link_detect+0x54a/0xb30 [amdgpu]
[    2.069286]  ? drm_dp_mst_topology_mgr_init+0x345/0x430 [drm_kms_helper]
[    2.069356]  amdgpu_dm_initialize_drm_device+0x68d/0xb22 [amdgpu]
[    2.069429]  ? setup_x_points_distribution+0x72/0x100 [amdgpu]
[    2.069498]  dm_hw_init.cold.55+0x8c/0x116 [amdgpu]
[    2.069589]  amdgpu_device_init.cold.28+0x113a/0x12e9 [amdgpu]
[    2.069596]  ? kmalloc_order+0x18/0x40
[    2.069600]  ? kmalloc_order_trace+0x24/0xb0
[    2.069647]  amdgpu_driver_load_kms+0x8b/0x2d0 [amdgpu]
[    2.069672]  drm_dev_register+0x11f/0x160 [drm]
[    2.069720]  amdgpu_pci_probe+0x140/0x1c0 [amdgpu]
[    2.069725]  local_pci_probe+0x46/0x90
[    2.069728]  pci_device_probe+0x18d/0x1a0
[    2.069734]  really_probe+0x243/0x3b0
[    2.069737]  driver_probe_device+0xba/0x100
[    2.069741]  __driver_attach+0xe4/0x110
[    2.069744]  ? driver_probe_device+0x100/0x100
[    2.069748]  bus_for_each_dev+0x74/0xb0
[    2.069752]  ? kmem_cache_alloc_trace+0x1c8/0x1e0
[    2.069756]  driver_attach+0x1e/0x20
[    2.069760]  bus_add_driver+0x159/0x230
[    2.069763]  ? 0xffffffffc0530000
[    2.069766]  driver_register+0x70/0xc0
[    2.069769]  ? 0xffffffffc0530000
[    2.069773]  __pci_register_driver+0x57/0x60
[    2.069819]  amdgpu_init+0x87/0x89 [amdgpu]
[    2.069824]  do_one_initcall+0x4a/0x1c4
[    2.069829]  ? _cond_resched+0x19/0x30
[    2.069833]  ? kmem_cache_alloc_trace+0x172/0x1e0
[    2.069836]  ? kfree+0x15b/0x180
[    2.069840]  do_init_module+0x60/0x220
[    2.069844]  load_module+0x16c1/0x1930
[    2.069848]  __do_sys_finit_module+0xbd/0x120
[    2.069852]  ? __do_sys_finit_module+0xbd/0x120
[    2.069856]  __x64_sys_finit_module+0x1a/0x20
[    2.069859]  do_syscall_64+0x5a/0x110
[    2.069863]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.069866] RIP: 0033:0x7f3f16c794d9
[    2.069869] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
[    2.069874] RSP: 002b:00007ffecb3898d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    2.069877] RAX: ffffffffffffffda RBX: 000055d589f17be0 RCX: 00007f3f16c794d9
[    2.069880] RDX: 0000000000000000 RSI: 000055d589eb5e90 RDI: 0000000000000016
[    2.069882] RBP: 000055d589eb5e90 R08: 0000000000000000 R09: 000000000000001b
[    2.069885] R10: 0000000000000016 R11: 0000000000000246 R12: 0000000000000000
[    2.069888] R13: 000055d589e9c0c0 R14: 0000000000020000 R15: 0000000000000000
[    2.069891] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel chash gpu_sched pcbc radeon aesni_intel aes_x86_64 i2c_algo_bit crypto_simd ttm cryptd psmouse glue_helper drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci drm r8169 video rtsx_usb_sdmmc rtsx_usb
[    2.069916] CR2: 0000000000000018
[    2.069920] ---[ end trace 7fedfef671967588 ]---
[    2.069982] RIP: 0010:dc_link_aux_transfer+0x88/0x160 [amdgpu]
[    2.069985] Code: 00 00 48 c7 45 a8 00 00 00 00 48 c7 45 a0 00 00 00 00 48 8b 16 48 8b 00 8b 52 14 48 8b 80 38 01 00 00 48 8b 9c d0 b0 01 00 00 <48> 8b 43 18 48 89 df 48 8b 40 48 e8 68 29 90 f3 8b 45 10 4c 89 6d
[    2.069989] RSP: 0018:ffffa2dec1133590 EFLAGS: 00010246
[    2.069992] RAX: ffff8fba8a10b400 RBX: 0000000000000000 RCX: ffffa2dec1133708
[    2.069995] RDX: 0000000000000000 RSI: ffff8fba8aa6cd00 RDI: ffff8fba8a10a400
[    2.069997] RBP: ffffa2dec1133600 R08: 0000000000000001 R09: 0000000000000000
[    2.070000] R10: ffffec6ac82a9b00 R11: 0000000000000001 R12: 0000000000000001
[    2.070002] R13: ffffa2dec1133708 R14: 0000000000000000 R15: 0000000000000000
[    2.070005] FS:  00007f3f17e1e8c0(0000) GS:ffff8fba97b00000(0000) knlGS:0000000000000000
[    2.070008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.070010] CR2: 0000000000000018 CR3: 000000020a948000 CR4: 00000000000406e0
Comment 8 Pontus Gråskæg 2018-09-07 13:36:56 UTC
Created attachment 141474 [details]
dmesg_4.19.0-041900rc2_dc.eq.0_initrdfixed
Comment 9 Pontus Gråskæg 2018-09-10 10:29:55 UTC
Just to be clear, booting with kernel 4.19rc2 does indeed remove the "*ERROR* VCE not responding, giving up!!!" and associated messages. So that can be marked solved (I think).
However, there is still a regression in behaviour, but I'll raise a separate bug for that, as it appears to be a different issue.

graaskaeg
Comment 10 Michel Dänzer 2018-09-10 10:39:17 UTC
Makes sense, thanks.
Comment 11 Pontus Gråskæg 2018-09-10 13:49:23 UTC
The new bug I have appears already to be documented: Bug 107595
Comment 12 Michel Dänzer 2018-09-12 16:48:05 UTC
*** Bug 107910 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.