Bug 106225 - Kernel panic after modesetting (not on every boot) on ryzen 5 2400g
Summary: Kernel panic after modesetting (not on every boot) on ryzen 5 2400g
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-24 22:54 UTC by Francisco Pina Martins
Modified: 2018-09-07 07:42 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
journaltcl log file (121.58 KB, text/plain)
2018-04-24 22:54 UTC, Francisco Pina Martins
no flags Details
journaltcl log file (90.25 KB, text/plain)
2018-04-25 22:36 UTC, Francisco Pina Martins
no flags Details
journaltcl log file with KASAN enabled in kernel (286.19 KB, text/plain)
2018-04-26 23:03 UTC, Francisco Pina Martins
no flags Details
journalctl log with KASAN_OUTLINE (135.00 KB, text/plain)
2018-04-27 15:01 UTC, Francisco Pina Martins
no flags Details
journalctl log with KASAN_OUTLINE and kasan_multi_shot (38.92 KB, application/gzip)
2018-04-27 16:42 UTC, Francisco Pina Martins
no flags Details
journalctl log with KASAN_OUTLINE and kasan_multi_shot using amdgpu.ko compiled with debug info (11.89 KB, application/x-xz)
2018-05-01 00:21 UTC, Francisco Pina Martins
no flags Details
journalctl log with KASAN_OUTLINE and kasan_multi_shot using a kernel compiled with debug info (12.09 KB, application/x-xz)
2018-05-02 22:18 UTC, Francisco Pina Martins
no flags Details
journaltcl log file for linux-amd-staging-drm-next (22.97 KB, application/x-xz)
2018-05-26 00:06 UTC, Francisco Pina Martins
no flags Details
relevant kernel 4.17.5 log of the oops (9.55 KB, text/plain)
2018-07-12 15:26 UTC, Andrea Vettorello
no flags Details
relevant kernel 4.17.5 source of the oops (2.24 KB, text/plain)
2018-07-12 15:28 UTC, Andrea Vettorello
no flags Details

Description Francisco Pina Martins 2018-04-24 22:54:33 UTC
Created attachment 139081 [details]
journaltcl log file

On a new Ryzen 5 2400G build, I am using linux-4.16.3 (Arch Linux).
On some boots (~1 out of every 3) I get a kernel panic after modesetting occurs.
I have attached the relevant systemd logfile (for the entire boot process).
I am not sure what component is failing, but I suspect AMDGPU, since it happens suspiciously after modesetting.
If it matters, the mainboard is a Gigabyte "AB350N-Gaminig Wifi", with the latest available BIOS as of writing (BIOS F23d 04/17/2018).
Please advise if more information is required.
Comment 1 Michel Dänzer 2018-04-25 08:11:09 UTC
The journalctl log file shows oopses from the evdev driver, no obvious amdgpu related issues.

Can you attach a picture of the output from the actual kernel panic?
Comment 2 Francisco Pina Martins 2018-04-25 22:36:27 UTC
Created attachment 139114 [details]
journaltcl log file

Apologies, as this was the wrong logfile (I was getting some journalctl corruption).
But it seems you are correct and this is not a kernel panic, as I still can reboot using the "Magic SysRq key" (which I usually can't during kernel panics), albeit the "Caps Lock" key is unresponsive.

Attached is a log that should show the issue. Note the line:
abr 25 23:23:09 ZenBox systemd-udevd[297]: worker [308] failed while handling '/devices/pci0000:00/0000:00:08.1/0000:09:00.0'

Where '/devices/pci0000:00/0000:00:08.1/0000:09:00.0' seems to be the "graphics card".

Symptoms are the same. During some boots, modesetting "seems" to occur, but I get a black screen instead. System is unresponsive, including "Caps Lock" lights not changing on key press. Magic SysRq key does seem to successfully reboot the system.

How should I edit the title to correctly to reflect this?
Comment 3 Michel Dänzer 2018-04-26 08:54:52 UTC
There's a general protection fault within kmem_cache_alloc_trace when dcn10_create_resource_pool calls kzalloc (which looks innocuous). There's another general protection fault in kmem_cache_alloc_trace later, called from cgroup code. Looks like there might be a general memory management related issue.
Comment 4 Michel Dänzer 2018-04-26 08:55:59 UTC
Maybe you can try enabling KASAN and see if that catches anything earlier.
Comment 5 Francisco Pina Martins 2018-04-26 23:03:20 UTC
Created attachment 139154 [details]
journaltcl log file with KASAN enabled in kernel

I have compiled a new kernel with KASAN module enabled.
However, I am not sure I am getting any KASAN related output in the logs (attached).
Is there any boot option I should be passing it? Alternatively, can you point me towards a good source of documentation on using KASAN (it is the first time I am trying this).
Comment 6 Michel Dänzer 2018-04-27 08:07:24 UTC
Looks like KASAN isn't enabled yet — the lines in dmesg containing "PREEMPT SMP NOPTI" should contain "KASAN" as well.

FWIW, I just enable CONFIG_KASAN and CONFIG_KASAN_INLINE in .config, I don't have to do anything at runtime to enable it.
Comment 7 Francisco Pina Martins 2018-04-27 13:21:41 UTC
I had previously made a mistake in loading the kernel's .config file.
I have now managed to compile the kernel with the options for KASAN set.
However, booting this kernel results in an instant reboot after displaying the message "Loading initial ramdisk...".
I will try to compile another kernel using CONFIG_KASAN_OUTLINE this time and see if I have better luck.
Comment 8 Francisco Pina Martins 2018-04-27 15:01:47 UTC
Created attachment 139178 [details]
journalctl log with KASAN_OUTLINE

Ok, so using CONFIG_KASAN_OUTLINE works.
I get a KASAN enabled Kernel, which boots.
Every
Single
Time
I cannot reproduce the error with KASAN enabled after more than 15 reboots (I lost count after that).
Using the "regular" kernel keeps giving me the error ~1 out of 3 boots.
I have attached a KASAN enabled log, but I am not sure how useful it might be.
Comment 9 Michel Dänzer 2018-04-27 16:05:56 UTC
Getting closer, please try again with kasan_multi_shot on the kernel command line, otherwise KASAN only reports the first thing it catches.
Comment 10 Francisco Pina Martins 2018-04-27 16:42:33 UTC
Created attachment 139181 [details]
journalctl log with KASAN_OUTLINE and kasan_multi_shot

Bingo.
Now here is the thing...
With the KASAN enabled kernel and the multi_shot option set, I can **never** bot successfully.
In fact, even mode-setting is not happening. I am uploading 2 log files, since on one of the occasions, I got a black screen when I was expecting modesetting to happen. On the other log, I got a non-modeset text screen with the KASAN dumps.
I hope this is what was needed.
Comment 11 Michel Dänzer 2018-04-27 16:50:41 UTC
Please provide the output of the following in your kernel build tree:

scripts/faddr2line drivers/gpu/drm/amd/amdgpu/amdgpu.ko firmware_parser_create+0xa70/0xd90
Comment 12 Francisco Pina Martins 2018-04-27 17:14:55 UTC
francisco@ZenBox [18:03:53] [/usr/lib/modules/4.16.5-1-kasan/build] 
-> $ scripts/faddr2line drivers/gpu/drm/amd/amdgpu/amdgpu.ko firmware_parser_create+0xa70/0xd90
ERROR: can't find objfile drivers/gpu/drm/amd/amdgpu/amdgpu.ko

the directory "drivers/gpu/drm/amd/amdgpu/" contains only a file named "Kconfig", containing the following:


```
config DRM_AMDGPU_SI
	bool "Enable amdgpu support for SI parts"
	depends on DRM_AMDGPU
	help
	  Choose this option if you want to enable experimental support
	  for SI asics.

	  SI is already supported in radeon. Experimental support for SI
	  in amdgpu will be disabled by default and is still provided by
	  radeon. Use module options to override this:

	  radeon.si_support=0 amdgpu.si_support=1

config DRM_AMDGPU_CIK
	bool "Enable amdgpu support for CIK parts"
	depends on DRM_AMDGPU
	help
	  Choose this option if you want to enable support for CIK asics.

	  CIK is already supported in radeon. Support for CIK in amdgpu
	  will be disabled by default and is still provided by radeon.
	  Use module options to override this:

	  radeon.cik_support=0 amdgpu.cik_support=1

config DRM_AMDGPU_USERPTR
	bool "Always enable userptr write support"
	depends on DRM_AMDGPU
	select MMU_NOTIFIER
	help
	  This option selects CONFIG_MMU_NOTIFIER if it isn't already
	  selected to enabled full userptr support.

config DRM_AMDGPU_GART_DEBUGFS
	bool "Allow GART access through debugfs"
	depends on DRM_AMDGPU
	depends on DEBUG_FS
	default n
	help
	  Selecting this option creates a debugfs file to inspect the mapped
	  pages. Uses more memory for housekeeping, enable only for debugging.

source "drivers/gpu/drm/amd/acp/Kconfig"
source "drivers/gpu/drm/amd/display/Kconfig"
```

I did find amdgpu.ko.xz under "/usr/lib/modules/4.16.5-1-kasan/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz" which I have decompressed using "xz -k -d".
Running the command you requested resulted in the following output:

francisco@ZenBox [18:12:53] [/usr/lib/modules/4.16.5-1-kasan/build] 
-> $ scripts/faddr2line ../kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko firmware_parser_create+0xa70/0xd90 
firmware_parser_create+0xa70/0xd90:
firmware_parser_create at ??:?


PS - Thank you for your patience
Comment 13 Michel Dänzer 2018-04-30 08:55:16 UTC
(In reply to Francisco Pina Martins from comment #12)
> francisco@ZenBox [18:12:53] [/usr/lib/modules/4.16.5-1-kasan/build] 
> -> $ scripts/faddr2line ../kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
> firmware_parser_create+0xa70/0xd90 
> firmware_parser_create+0xa70/0xd90:
> firmware_parser_create at ??:?

What does

 file ../kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko

say? If its output doesn't say "not stripped", look for an unstripped version of the module. If the file output for that doesn't say "with debug_info", try enabling CONFIG_DEBUG_INFO (and maybe CONFIG_DEBUG_INFO_REDUCED).

If that still doesn't result in better output than ??:? from faddr2line, the best guess so far is that there's an issue somewhere in drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c:bios_parser_construct.
Comment 14 Francisco Pina Martins 2018-04-30 09:35:26 UTC
francisco@ZenBox [10:31:49] [~] 
-> $ file /usr/lib/modules/4.16.5-1-kasan/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
/usr/lib/modules/4.16.5-1-kasan/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=fa6433331b1f8048ce1c9b487d7b67d7a4aa4c31, not stripped

It does not say "with debug_info", so I'm compiling a new kernel with CONFIG_DEBUG_INFO and CONFIG_DEBUG_INFO_REDUCED activated. I will post the results as soon as I have them.
Comment 15 Francisco Pina Martins 2018-04-30 10:15:30 UTC
Here you go:

francisco@ZenBox [11:13:37] [/tmp/build/linux-kasan/src/linux-4.16] 
-> $ scripts/faddr2line drivers/gpu/drm/amd/amdgpu/amdgpu.ko firmware_parser_create+0xa70/0xd90
firmware_parser_create+0xa70/0xd90:
get_integrated_info_v11 at /tmp/build/linux-kasan/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1572
 (inlined by) construct_integrated_info at /tmp/build/linux-kasan/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1714
 (inlined by) bios_parser_create_integrated_info at /tmp/build/linux-kasan/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1755
 (inlined by) bios_parser_construct at /tmp/build/linux-kasan/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1912
 (inlined by) firmware_parser_create at /tmp/build/linux-kasan/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1927

Is this it?
Comment 16 Michel Dänzer 2018-04-30 10:32:12 UTC
That does look helpful, thanks! I'll leave it to the DC folks to take it from here.

Francisco, just one more thing: Can you double-check that KASAN still reports firmware_parser_create+0xa70/0xd90 with the current amdgpu.ko, or otherwise pass the current values to faddr2line?
Comment 17 Francisco Pina Martins 2018-04-30 11:26:55 UTC
I don't think I'll be able to test that, since The kernel with debug enabled weights in at 590Mb, which is larger than my 512Mb /boot partition.
Will it work if I just replace the amdgpu.ko file with the debug-enabled version?
Comment 18 Michel Dänzer 2018-04-30 13:00:29 UTC
(In reply to Francisco Pina Martins from comment #17)
> Will it work if I just replace the amdgpu.ko file with the debug-enabled
> version?

Yeah, that could work. If it doesn't, then it's not a big deal, it's unlikely that the value before the slash has changed but the one after the slash hasn't (if the latter had changed, faddr2line should have complained). Just double-checking.
Comment 19 Francisco Pina Martins 2018-05-01 00:21:14 UTC
Created attachment 139239 [details]
journalctl log with KASAN_OUTLINE and kasan_multi_shot using amdgpu.ko compiled with debug info

Here you go. journalctl log after booting with kasan_multi_shot and using the `amdgpu.ko` file compiled with debug info. Now this is a Frankenkernel monster if I ever saw one.
I have taken the liberty of "grepping" for the pattern, and it seems that "firmware_parser_create+0xa70/0xd90" is still there.
Thank you for passing this along to the DC folk. Please let me know when a patch I can test is made available. Also, if you need any more information (or try something), just ask away.
At the very least, with this bug report I have discovered that the KASAN enabled kernel boots every time, at the expense of an extra second while booting. I did not notice any other performance differences between the KASAN enabled and the stock kernel (was I supposed to?).
Once again, thank you, Michael for your patience guiding me through the tasks. Next time will be easier. _-)
Comment 20 Michel Dänzer 2018-05-02 09:08:53 UTC
Now I notice there's another report for firmware_parser_create+0xa9b/0xd90. What does faddr2line say for that?

Though it's weird that KASAN claims the memory written in firmware_parser_create was freed from rcu_cpu_kthread. If that's accurate[0], it might indicate a lower level issue.

[0] There is some doubt about that due to the "Frankenkernel monster". :) Does enabling CONFIG_DEBUG_INFO_REDUCED as well allow you to keep debugging symbols for everything, or at least for the vmlinuz image?
Comment 21 Francisco Pina Martins 2018-05-02 22:18:59 UTC
Created attachment 139288 [details]
journalctl log with KASAN_OUTLINE and kasan_multi_shot using a kernel compiled with debug info

Ok, so here's what I did.
Since I did not have space for the kernel with both KASAN and debug info enabled (the 590Mb was already with CONFIG_DEBUG_INFO_REDUCED), I got my hands dirty and used nconfig to strip a ton of drivers I was pretty sure I didn't need from the build (stuff like nouveau, industrial controllers, game-pads, etc..). I got the kernel down to ~390Mb which was enough to install.
I have attached the journalctl log file with this new kernel (linux-kasan-debug-stripped).
As for faddr2line output, here is the original command with the new kernel:

```
francisco@ZenBox [23:09:51] [/usr/lib/modules/4.16.5-1-kasan-debug-stripped/kernel/drivers/gpu/drm/amd/amdgpu] 
-> $ /usr/lib/modules/4.16.5-1-kasan-debug-stripped/build/scripts/faddr2line amdgpu.ko firmware_parser_create+0xa70/0xd90
firmware_parser_create+0xa70/0xd90:
get_integrated_info_v11 at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1572
 (inlined by) construct_integrated_info at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1714
 (inlined by) bios_parser_create_integrated_info at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1755
 (inlined by) bios_parser_construct at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1912
 (inlined by) firmware_parser_create at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1927
```

And here is the "new" command with the new kernel:

```
francisco@ZenBox [23:10:27] [/usr/lib/modules/4.16.5-1-kasan-debug-stripped/kernel/drivers/gpu/drm/amd/amdgpu] 
-> $ /usr/lib/modules/4.16.5-1-kasan-debug-stripped/build/scripts/faddr2line amdgpu.ko firmware_parser_create+0xa9b/0xd90
firmware_parser_create+0xa9b/0xd90:
get_integrated_info_v11 at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1574
 (inlined by) construct_integrated_info at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1714
 (inlined by) bios_parser_create_integrated_info at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1755
 (inlined by) bios_parser_construct at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1912
 (inlined by) firmware_parser_create at /tmp/build/linux-kasan-debug-stripped/src/linux-4.16/drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser2.c:1927
```

This is no longer a "franken-kernel-mosnter", does this output reveal anything new?
Comment 22 Michel Dänzer 2018-05-03 09:03:30 UTC
It still says the memory is freed from rcu_cpu_kthread. Weird. It's not clear this actually is an amdgpu issue.
Comment 23 Francisco Pina Martins 2018-05-03 09:11:30 UTC
This is a bit out of my league... but should this issue be filed somewhere else then?
If yes, where, and what information should I provide?
Also, is there a possibility that there are multiple issues at play here?
Comment 24 Michel Dänzer 2018-05-03 10:38:48 UTC
(In reply to Francisco Pina Martins from comment #23)
> [...] should this issue be filed somewhere else then?
> If yes, where,

I'm not sure. :( Maybe start with memory management (https://www.linux-mm.org/, https://bugzilla.kernel.org/describecomponents.cgi?product=Memory%20Management)

> and what information should I provide?

At least all the same information as here, I guess.

> Also, is there a possibility that there are multiple issues at play here?

Quite possibly.
Comment 25 Francisco Pina Martins 2018-05-03 21:46:26 UTC
I have submitted the bug to kernel bugzilla as you have suggested:
https://bugzilla.kernel.org/show_bug.cgi?id=199613

Is there anything else I can do here to help track the eventual problem with AMDGPU? Or do you think the AMDGPU memory problem is being caused by `rcu_cpu_kthread`?

Is there anything else you can recommend me to do in order to figure out to disentangle eventual multiple issues?
Comment 26 Michel Dänzer 2018-05-07 09:58:08 UTC
(In reply to Francisco Pina Martins from comment #25)
> Is there anything else I can do here to help track the eventual problem with
> AMDGPU? Or do you think the AMDGPU memory problem is being caused by
> `rcu_cpu_kthread`?

That's what it looks like from the KASAN output.
Comment 27 Jerry Zuo 2018-05-25 16:01:18 UTC
The issue could get reproduced on 4.16.3, but not on 4.16-rc7. 

I've verified the commit a0f282dcdb1775cbcc0a151570fc01c0aae5ca0f (current top) on amd-staging-drm-next without seeing the issue on Raven by having 20 times bootup. 

Please give a try on that commit at your setup. Thanks.
Comment 28 Francisco Pina Martins 2018-05-25 23:12:39 UTC
Currently "amd-staging-drm-next" fails to build for me, with the following error:

```
../lib/str_error_r.c: In function ‘str_error_r’:
../lib/str_error_r.c:25:3: error: passing argument 1 to restrict-qualified parameter aliases with argument 5 [-Werror=restrict]
   snprintf(buf, buflen, "INTERNAL ERROR: strerror_r(%d, %p, %zd)=%d", errnum, buf, buflen, err);
   ^~~~~~~~
```

From what I was able to research, it seems to be missing a patch that was applied in March (https://patchwork.kernel.org/patch/10291671/). But maybe I'm doing something wrong here, since I'm not very experienced.


Or did you mean for me to try building linux-4.16, and patch it with commit "a0f282dcdb1775cbcc0a151570fc01c0aae5ca0f" from the "amd-staging-drm-next" tree?
Comment 29 Francisco Pina Martins 2018-05-26 00:06:54 UTC
Created attachment 139779 [details]
journaltcl log file for linux-amd-staging-drm-next

I was able to compile the "amd-staging-drm-next" tree with the help of a small patch (https://github.com/StuntsPT/linux-amd-staging-drm-next-git/blob/master/PKGBUILD#L50) as of commit "46c04bb3e028217255b578cc6101823e9fbc11bc".

However, using this kernel I can never get a successful boot (10/10) failures.
I have attached the journalctl log for one of these failed boots.
I hope this helps.
Comment 30 Jerry Zuo 2018-05-28 14:29:16 UTC
The commit on amd-staging-drm-next I checked out for verification is:

Author:     Shaoyun Liu <Shaoyun.Liu@amd.com>
AuthorDate: Tue May 22 11:45:41 2018 -0400
Commit:     Alex Deucher <alexander.deucher@amd.com>
CommitDate: Thu May 24 10:28:35 2018 -0500

    drm/amdgpu: Update GFX info structure to match what vega20 used
    
    Update to the latest version from the vbios team.
    
    Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

It was working for me to directly checkout the commit and build. It is 4.16-rc7 build.
Comment 31 Francisco Pina Martins 2018-05-31 21:56:46 UTC
I have used [this PKGBUILD](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=linux-amd-staging-drm-next-git) to build the kernel.
Albeit considering the version number it looks more like linux-4.17 than 4.16.
The commit I checked out was 46c04bb3e028217255b578cc6101823e9fbc11bc.

I will investigate further and try checking out at commit a0f282dcdb1775cbcc0a151570fc01c0aae5ca0f and see if there is any other difference in the build. The fact that you did not require the patch makes me think I'm doing something fundamentally different from you.
Comment 32 Luca 2018-06-12 19:08:29 UTC
Happens to me too (kernel 4.17 ubuntu 18.04), especially the first boot of the day I always get black screen after modeset. I haven't tried with KASAN and can't get any log but I suspect is the same issue.

Asrock ab350m pro4 - firmware 4.7
PSU seasonic s12ii
ram 2x8 gb kingston hyperx fury 2666
CPU ryzen 2200g
Comment 33 Andrea Vettorello 2018-07-12 15:26:43 UTC
Created attachment 140596 [details]
relevant kernel 4.17.5 log of the oops
Comment 34 Andrea Vettorello 2018-07-12 15:28:14 UTC
Created attachment 140597 [details]
relevant kernel 4.17.5 source of the oops
Comment 35 Andrea Vettorello 2018-07-12 15:33:44 UTC
Sorry for the spam, I thought the two attachments would be inserted in the same comment.

I think I'm afflicted by the same bug on different HW (Asrock AB350 ixt, Ryzen 5 2400G), Debian Stretch running vanilla kernel 4.17.5. I would say my boot rate is 1 in 4~5 attempts.

Feel free to require other info.
Comment 36 Andrea Vettorello 2018-08-02 09:35:53 UTC
With recent development kernels from https://cgit.freedesktop.org/~agd5f/ (drm-next-4.19-wip, I think commit ddf74e79a54070f277ae520722d3bab7f7a6c67a) I can consistently complete cold/warm boot on my 2400G, before it was 1 in 4~5 attempts.

I think unrelated to the above, I still have various asserts with stack traces on the logs in the "write_i2c_retimer_setting" function in "drivers/gpu/drm/amd/display/dc/core/dc_link.c". They seems to be all write failures but they don't seem fatal.
Comment 37 Francisco Pina Martins 2018-09-06 21:29:36 UTC
Confirming that the issue seems to be solved with mainline linux-4.19-rc[1,2].
Comment 38 Michel Dänzer 2018-09-07 07:42:22 UTC
(In reply to Francisco Pina Martins from comment #37)
> Confirming that the issue seems to be solved with mainline
> linux-4.19-rc[1,2].

I'm glad to hear that! Resolving accordingly.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.