Bug 99353

Summary:

Kaveri 7400K shows random colored noise instead of GUI in X or Wayland

Product:

DRI

Reporter:

Vedran Miletić <vedran>

Component:

DRM/Radeon

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED FIXED

QA Contact:

Severity:

major

Priority:

highest

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Xorg.log	none
Piglit test results	none
dmesg during piglit testing	none
glxgears -info	none
umr -c	none
Video BIOS	none
drm.debug=0x1e	none
Gnome desktop	none
Proposed patch	none
dmesg	none
radeonsi: force si_write_harvested_raster_configs	none
radeonsi: force si_write_harvested_raster_configs when we fail to determine enabled backends	none
possible fix for radeon	none
possible fix for amdgpu	none
kaveri.patch	none

Description Vedran Miletić 2017-01-10 21:57:30 UTC

I have a Kaveri 7400K on ASRock FM2A58M-VG3+. Running Fedora 25, KMS works and it boots nicely, but instead of GDM running on Wayland, I get screen full of colored noise. Using Xorg makes no difference.

Phoronix claims this APU worked in 2014[1]. It never worked for me with the open source driver (previously not even KMS worked), however it worked perfectly with fglrx last time I tried it.

I can try older Mesa and LLVM if that would be useful, but compiling it will take a while.

[1] https://www.phoronix.com/scan.php?page=article&item=amd_apus_august&num=1

Comment 1 Michel Dänzer 2017-01-11 09:40:16 UTC

Please attach the dmesg output and maybe Xorg log file corresponding to the problem.

You could try running some piglit tests PIGLIT_PLATFORM=gbm . That could help us narrow down if the problem is with the rendering or the display.

Comment 2 Vedran Miletić 2017-06-26 12:40:44 UTC

Michel, I missed this reply, sorry! I will test ASAP.

Comment 3 Vedran Miletić 2017-07-07 23:22:39 UTC

Created attachment 132559 [details]
Xorg.log

Comment 4 Vedran Miletić 2017-07-07 23:23:06 UTC

dmesg doesn't show anything suspicious:

[    1.526705] [drm] radeon kernel modesetting enabled.
[    1.537819] fb: switching to radeondrmfb from EFI VGA
[    1.538312] radeon 0000:00:01.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[    1.538314] radeon 0000:00:01.0: GTT: 2048M 0x0000000020000000 - 0x000000009FFFFFFF
[    1.538406] [drm] radeon: 512M of VRAM memory ready
[    1.538407] [drm] radeon: 2048M of GTT memory ready.
[    1.539953] [drm] radeon: dpm initialized
[    1.570061] radeon 0000:00:01.0: WB enabled
[    1.570073] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff943255643c00
[    1.570076] radeon 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000020000c04 and cpu addr 0xffff943255643c04
[    1.570078] radeon 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000020000c08 and cpu addr 0xffff943255643c08
[    1.570079] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff943255643c0c
[    1.570081] radeon 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000020000c10 and cpu addr 0xffff943255643c10
[    1.570436] radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000078d30 and cpu addr 0xffffa6a201438d30
[    1.570648] radeon 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000020000c18 and cpu addr 0xffff943255643c18
[    1.570650] radeon 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000020000c1c and cpu addr 0xffff943255643c1c
[    1.570711] radeon 0000:00:01.0: radeon: using MSI.
[    1.570786] [drm] radeon: irq initialized.
[    3.510847] fbcon: radeondrmfb (fb0) is primary device
[    3.588698] radeon 0000:00:01.0: fb0: radeondrmfb frame buffer device
[    3.611102] [drm] Initialized radeon 2.49.0 20080528 for 0000:00:01.0 on minor 0

Comment 5 Vedran Miletić 2017-07-07 23:27:46 UTC

Few more random details:
1) monitor is attached via VGA, the only output on the motherboard
2) different monitors show the same behaviour
3) GDM vs SDDM, no difference
4) in both GDM and SDDM, the mouse cursor is drawn correctly
5) Wayland vs Xorg, no difference
6) with older kernels the system would freeze the moment KMS kicked in, so this is already a big improvement

Comment 6 Vedran Miletić 2017-07-08 11:51:47 UTC

Created attachment 132564 [details]
Piglit test results

$ PIGLIT_PLATFORM=gbm ./piglit run quick results/quick
hang at
[17212/39364] skip: 545, pass: 7373, warn: 3, fail: 9290, crash: 1 /|/
after ^C
Traceback (most recent call last):                                   
  File "./piglit", line 174, in <module>
    main()
  File "./piglit", line 170, in main
    sys.exit(runner(args))
  File "/home/vedranm/workspace/piglit/framework/exceptions.py", line 51, in _inner
    func(*args, **kwargs)
  File "/home/vedranm/workspace/piglit/framework/programs/run.py", line 357, in run
    profile.run(profiles, args.log_level, backend, args.concurrency)
  File "/home/vedranm/workspace/piglit/framework/profile.py", line 445, in run
    pool.join()
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 510, in join
    self._worker_handler.join()
  File "/usr/lib64/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/usr/lib64/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

Comment 7 Vedran Miletić 2017-07-08 11:59:59 UTC

Created attachment 132565 [details]
dmesg during piglit testing

Quite a number of errors, but they could be a separate problem as I don't get any of those upon starting Xorg/GDM/SDDM.

Comment 8 Michel Dänzer 2017-07-10 07:08:52 UTC

(In reply to Vedran Miletić from comment #6)
> $ PIGLIT_PLATFORM=gbm ./piglit run quick results/quick
> hang at

Does it always hang at the same test? If so, can you find out which test it is?


> [17212/39364] skip: 545, pass: 7373, warn: 3, fail: 9290, crash: 1 /|/

Such a large number of failed tests indicates that the problem is on the rendering side, not the display side.


Any chance you can try if the problem also occurs with the amdgpu kernel driver instead of radeon?

Comment 9 Vedran Miletić 2017-07-13 12:47:29 UTC

(In reply to Michel Dänzer from comment #8)
> (In reply to Vedran Miletić from comment #6)
> > $ PIGLIT_PLATFORM=gbm ./piglit run quick results/quick
> > hang at
> 
> Does it always hang at the same test? If so, can you find out which test it
> is?
> 

Not sure, I can retest and probably find out which test that is.

> 
> > [17212/39364] skip: 545, pass: 7373, warn: 3, fail: 9290, crash: 1 /|/
> 
> Such a large number of failed tests indicates that the problem is on the
> rendering side, not the display side.
> 
> 
> Any chance you can try if the problem also occurs with the amdgpu kernel
> driver instead of radeon?

Possibly. I'll check if Fedora's 4.11 has CI support. Will report back soon(ish).

Comment 10 Vedran Miletić 2017-07-15 22:28:56 UTC

Fedora does not enable AMDGPU CIK, but I rebuilt the kernel manually. Same story with Xorg/Wayland with the amdgpu kernel driver.

As for piglit, managed to run till the end this time (with radeon):
[39371/39371] skip: 3237, pass: 14626, warn: 8, fail: 21498, crash: 2

Will retest with amdgpu.

Comment 11 Vedran Miletić 2017-07-15 23:29:48 UTC

dmesg with radeon:

[ 1443.752932] show_signal_msg: 8 callbacks suppressed
[ 1443.752936] glslparsertest[31344]: segfault at 20 ip 00007f10f2a45a25 sp 00007ffd7c7b4ea0 error 4 in radeonsi_dri.so[7f10f2713000+a7f000]
[ 1453.316958] [TTM] Out of kernel memory
[ 1453.317022] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 6, 4096, -12)
[ 1453.326439] [TTM] Out of kernel memory
[ 1453.326501] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 6, 4096, -12)
[ 1453.326672] glslparsertest[32001]: segfault at 0 ip           (null) sp 00007ffd440e3e68 error 14 in glslparsertest[400000+3000]
[ 1453.446181] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 2, 4096, -12)
[ 1454.292472] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 2, 4096, -12)
[ 1470.746846] [TTM] Out of kernel memory
[ 1470.746919] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 2, 4096, -12)
[ 1470.747217] [TTM] Out of kernel memory
[ 1470.747252] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 2, 4096, -12)
[ 1470.796559] [TTM] Out of kernel memory
[ 1470.796626] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (131072, 6, 16384, -12)
[ 1470.797251] [TTM] Out of kernel memory
[ 1470.797283] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (131072, 6, 16384, -12)
[ 1471.951962] [TTM] Out of kernel memory
[ 1471.952038] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[ 1471.952099] [TTM] Out of kernel memory
[ 1471.952126] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[ 1471.952453] [TTM] Out of kernel memory
[ 1471.952491] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (131072, 6, 4096, -12)
[ 1471.952543] [TTM] Out of kernel memory
[ 1471.952571] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (131072, 6, 4096, -12)
[ 1471.952820] [TTM] Out of kernel memory
[ 1471.952857] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (131072, 6, 4096, -12)
[ 1514.410139] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 1519.145813] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 1549.734495] abrt-action-ana[2377]: segfault at 20 ip 0000559ddb8a74f3 sp 00007fff5b29c650 error 4 in abrt-action-analyze-c[559ddb8a6000+2000]
[ 1840.953467] glslparsertest[10334]: segfault at 20 ip 00007f4e7a1b5a25 sp 00007ffc99dcd3c0 error 4 in radeonsi_dri.so[7f4e79e83000+a7f000]
[ 1853.794208] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.794598] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.805828] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.806023] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.806233] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.806299] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.806370] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1853.806431] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)
[ 1953.982528] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 2, 4096, -12)
[ 1953.982937] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 2, 4096, -12)
[ 1953.986720] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1048576, 2, 4096, -12)
[ 1953.986838] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1048576, 2, 4096, -12)
[ 1953.987165] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (258048, 2, 4096, -12)
[ 1953.987661] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (258048, 2, 4096, -12)
[ 2003.316542] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.324763] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.695833] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.784308] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.844831] [TTM] Out of kernel memory
[ 2003.844892] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (8192, 6, 16384, -12)
[ 2003.845148] [TTM] Out of kernel memory
[ 2003.845183] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (8192, 6, 16384, -12)
[ 2003.845739] [TTM] Out of kernel memory
[ 2003.845791] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[ 2003.845935] [TTM] Out of kernel memory
[ 2003.845963] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[ 2003.846668] [TTM] Out of kernel memory
[ 2003.846727] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[ 2003.854655] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.908303] shader_runner invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
[ 2003.908308] shader_runner cpuset=/ mems_allowed=0
[ 2003.908313] CPU: 0 PID: 16248 Comm: shader_runner Not tainted 4.11.10-300.fc26.x86_64 #1
[ 2003.908314] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A58M-VG3+ R2.0, BIOS P2.70 01/11/2016
[ 2003.908315] Call Trace:
[ 2003.908322]  dump_stack+0x63/0x84
[ 2003.908325]  dump_header+0x97/0x213
[ 2003.908328]  ? selinux_capable+0x20/0x30
[ 2003.908330]  ? security_capable_noaudit+0x45/0x60
[ 2003.908334]  oom_kill_process+0x202/0x3c0
[ 2003.908336]  out_of_memory+0x2b4/0x4e0
[ 2003.908338]  pagefault_out_of_memory+0x68/0x80
[ 2003.908341]  mm_fault_error+0x90/0x180
[ 2003.908343]  __do_page_fault+0x49a/0x4c0
[ 2003.908344]  do_page_fault+0x30/0x80
[ 2003.908348]  page_fault+0x28/0x30
[ 2003.908350] RIP: 0033:0x7f0935382791
[ 2003.908351] RSP: 002b:00007ffc1e33b808 EFLAGS: 00010287
[ 2003.908352] RAX: 00007f092ab87120 RBX: 0000000000000080 RCX: 0000000000002000
[ 2003.908352] RDX: fffffffffffffff0 RSI: 0000000000a04f98 RDI: 00007f092ab87170
[ 2003.908353] RBP: 0000000000a04f48 R08: 00007f092ab87120 R09: 00007f0935382755
[ 2003.908354] R10: 0000000000a060f8 R11: 00007f09353c0990 R12: 0000000000000001
[ 2003.908354] R13: 00007ffc1e33b860 R14: 0000000000000001 R15: 00007ffc1e33ba00
[ 2003.908356] Mem-Info:
[ 2003.908360] active_anon:46909 inactive_anon:176569 isolated_anon:0
                active_file:17015 inactive_file:7652 isolated_file:0
                unevictable:0 dirty:184 writeback:0 unstable:0
                slab_reclaimable:7217 slab_unreclaimable:16135
                mapped:12421 shmem:163166 pagetables:2653 bounce:0
                free:146481 free_pcp:393 free_cma:0
[ 2003.908364] Node 0 active_anon:187636kB inactive_anon:706276kB active_file:68060kB inactive_file:30608kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:49684kB dirty:736kB writeback:0kB shmem:652664kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
[ 2003.908365] Node 0 DMA free:15740kB min:308kB low:384kB high:460kB active_anon:64kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15904kB mlocked:0kB slab_reclaimable:16kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 2003.908369] lowmem_reserve[]: 0 2921 3356 3356 3356
[ 2003.908371] Node 0 DMA32 free:555932kB min:58544kB low:73180kB high:87816kB active_anon:128968kB inactive_anon:628564kB active_file:51164kB inactive_file:18392kB unevictable:0kB writepending:464kB present:3086400kB managed:3018344kB mlocked:0kB slab_reclaimable:11860kB slab_unreclaimable:35888kB kernel_stack:820kB pagetables:4008kB bounce:0kB free_pcp:760kB local_pcp:108kB free_cma:0kB
[ 2003.908374] lowmem_reserve[]: 0 0 435 435 435
[ 2003.908376] Node 0 Normal free:14252kB min:8724kB low:10904kB high:13084kB active_anon:58544kB inactive_anon:77676kB active_file:16896kB inactive_file:12216kB unevictable:0kB writepending:272kB present:524288kB managed:449888kB mlocked:0kB slab_reclaimable:16992kB slab_unreclaimable:28640kB kernel_stack:2640kB pagetables:6604kB bounce:0kB free_pcp:812kB local_pcp:152kB free_cma:0kB
[ 2003.908379] lowmem_reserve[]: 0 0 0 0 0
[ 2003.908381] Node 0 DMA: 21*4kB (UM) 23*8kB (UM) 21*16kB (UME) 15*32kB (UME) 15*64kB (UME) 5*128kB (UME) 3*256kB (UE) 2*512kB (UE) 3*1024kB (UE) 2*2048kB (UE) 1*4096kB (M) = 15740kB
[ 2003.908390] Node 0 DMA32: 1991*4kB (UME) 4333*8kB (UME) 8294*16kB (UME) 6366*32kB (UME) 2254*64kB (UME) 249*128kB (UME) 3*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 555940kB
[ 2003.908398] Node 0 Normal: 849*4kB (UMEH) 59*8kB (MEH) 43*16kB (UMEH) 19*32kB (UEH) 120*64kB (UMEH) 7*128kB (UMH) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 14252kB
[ 2003.908407] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 2003.908407] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 2003.908408] 201349 total pagecache pages
[ 2003.908410] 13489 pages in swap cache
[ 2003.908411] Swap cache stats: add 1208005, delete 1194520, find 119384/137678
[ 2003.908411] Free swap  = 2086744kB
[ 2003.908412] Total swap = 3615740kB
[ 2003.908412] 906671 pages RAM
[ 2003.908413] 0 pages HighMem/MovableOnly
[ 2003.908413] 35637 pages reserved
[ 2003.908414] 0 pages cma reserved
[ 2003.908414] 0 pages hwpoisoned
[ 2003.908415] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 2003.908431] [  452]     0   452    26660      916      50       3       95             0 systemd-journal
[ 2003.908434] [  465]     0   465    30331      100      28       3      424             0 lvmetad
[ 2003.908435] [  481]     0   481    14392       66      27       3      884         -1000 systemd-udevd
[ 2003.908438] [  577]     0   577    16008       67      29       3       88         -1000 auditd
[ 2003.908441] [  579]     0   579    21148      113      12       3       20             0 audispd
[ 2003.908443] [  584]     0   584    10189        0      26       3       68             0 sedispatch
[ 2003.908445] [  604]    81   604    19770      406      33       3       77          -900 dbus-daemon
[ 2003.908447] [  605]     0   605    51189       98      36       3      118             0 gssproxy
[ 2003.908449] [  612]     0   612    75562       49      74       4     2645             0 rsyslogd
[ 2003.908451] [  613]     0   613     4255        0      13       3       56             0 alsactl
[ 2003.908453] [  616]   996   616   138144      124      59       4     1301             0 polkitd
[ 2003.908455] [  617]     0   617     6316       36      19       3      178             0 smartd
[ 2003.908456] [  618]     0   618    24956        0      19       3       43             0 irqbalance
[ 2003.908458] [  619]     0   619   115978      470      69       4      213             0 abrtd
[ 2003.908460] [  620]     0   620    69514      150      80       3      207             0 sssd
[ 2003.908462] [  621]     0   621   103779       70      72       4      438             0 ModemManager
[ 2003.908464] [  622]    70   622    18086       27      30       3       77             0 avahi-daemon
[ 2003.908465] [  628]    70   628    18055        8      26       3       81             0 avahi-daemon
[ 2003.908467] [  630]   388   630    30622       73      28       3       88             0 chronyd
[ 2003.908469] [  639]     0   639    86210      167      86       3     4835             0 firewalld
[ 2003.908471] [  645]     0   645    75500       17      70       4      301             0 abrt-dump-journ
[ 2003.908473] [  646]     0   646    85150      123      94       3      295             0 abrt-dump-journ
[ 2003.908474] [  647]     0   647    80500      414      85       4      266             0 abrt-dump-journ
[ 2003.908476] [  649]     0   649    71093       58      84       3      296             0 sssd_be
[ 2003.908477] [  650]     0   650    69339      148      88       3      167             0 sssd_nss
[ 2003.908479] [  651]     0   651    17846       29      33       3      467             0 systemd-logind
[ 2003.908481] [  661]     0   661   163528      324      97       3      403             0 NetworkManager
[ 2003.908483] [  673]     0   673    28070        0      54       3      250         -1000 sshd
[ 2003.908485] [  677]     0   677   241966      149     189       3     1730             0 libvirtd
[ 2003.908486] [  686]     0   686    35826      130      19       4      142             0 crond
[ 2003.908488] [  688]     0   688    10645       61      23       3       55             0 atd
[ 2003.908490] [  689]     0   689    30836       17      11       3       11             0 agetty
[ 2003.908491] [  840]     0   840    19272      142      41       3      474             0 dhclient
[ 2003.908493] [  926]    99   926    15128        8      25       3       80             0 dnsmasq
[ 2003.908494] [  927]     0   927    15121        2      25       3       86             0 dnsmasq
[ 2003.908496] [  975]     0   975    19272      127      39       3      477             0 dhclient
[ 2003.908497] [ 1002]     0  1002    47055       42      84       3      363             0 sshd
[ 2003.908498] [ 1006]  1000  1006    20666      106      40       3      272             0 systemd
[ 2003.908500] [ 1008]  1000  1008    33218      174      54       3      719             0 (sd-pam)
[ 2003.908501] [ 1013]  1000  1013    47092       99      82       3      348             0 sshd
[ 2003.908502] [ 1014]  1000  1014    33176      485      13       3      305             0 bash
[ 2003.908504] [ 3319]  1000  3319   106036     2862     101       4        0             0 tex3d-maxsize
[ 2003.908506] [ 9219]  1000  9219   256907    37838     148       4        0             0 python3
[ 2003.908507] [11652]     0 11652    88565     1842      70       3        0          -900 abrt-dbus
[ 2003.908509] [15393]  1000 15393   106056    11831     103       3        0             0 tex3d-maxsize
[ 2003.908510] [16248]  1000 16248    72245    10810      98       3        0             0 shader_runner
[ 2003.908511] Out of memory: Kill process 9219 (python3) score 21 or sacrifice child
[ 2003.908556] Killed process 15393 (tex3d-maxsize) total-vm:424224kB, anon-rss:10368kB, file-rss:36956kB, shmem-rss:0kB
[ 2003.923161] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2003.993172] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2004.067124] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2004.144203] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2004.216831] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12!
[ 2503.078943] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (1073741824, 6, 16384, -12)

Piglit result with amdgpu:
[39371/39371] skip: 3231, pass: 15456, warn: 13, fail: 20670, crash: 1

dmesg with amdgpu:
[  451.254879] show_signal_msg: 4 callbacks suppressed
[  451.254883] glslparsertest[2278]: segfault at 20 ip 00007f01a17b5a25 sp 00007fffa63ebb90 error 4 in radeonsi_dri.so[7f01a1483000+a7f000]

Let's see how consistent that is, I will retest both a few more times over the next days. Any other suggestion?

Comment 12 Vedran Miletić 2017-07-16 01:12:26 UTC

Interestingly enough, not exactly the same result, amdgpu again:

[39371/39371] skip: 3231, pass: 15458, warn: 13, fail: 20668, crash: 1

Then on the next try:

[39371/39371] skip: 3231, pass: 15456, warn: 13, fail: 20670, crash: 1

I can check what the diff is, but doubt it's important.

Comment 13 Vedran Miletić 2017-07-30 09:42:15 UTC

Kernel 4.12.4-300.fc26.x86_64, no change. Should I try AMDGPU-PRO? What kind of action would be useful next?

Comment 14 Bong Cosca 2018-01-10 22:59:58 UTC

I have the same setup and I'm experiencing the same symptoms. 7400K on Asus A68HM-K with VGA/DVI-D output on ArchLinux 4.14.12 Xorg 1.19.6, colored noise (sometimes tiled garbage). I tried every possible combination of kernel and R600_DEBUG flags along with xorg.conf options with same result. Both amdgpu and radeon drivers exhibit identical behavior.

Any updates on this issue?

Comment 15 Bong Cosca 2018-01-11 21:39:35 UTC

[39409/39409] skip: 2874, pass: 15671, warn: 12, fail: 20845, crash: 7

Piglit results haven't changed much since, but notably with more crashes.

Comment 16 Vedran Miletić 2018-01-12 14:33:49 UTC

(In reply to Bong Cosca from comment #14)
> I have the same setup and I'm experiencing the same symptoms. 7400K on Asus
> A68HM-K with VGA/DVI-D output on ArchLinux 4.14.12 Xorg 1.19.6, colored
> noise (sometimes tiled garbage). I tried every possible combination of
> kernel and R600_DEBUG flags along with xorg.conf options with same result.
> Both amdgpu and radeon drivers exhibit identical behavior.
> 
> Any updates on this issue?

Look at MrCooper's comments here: https://people.freedesktop.org/~cbrill/dri-log/?channel=radeon&highlight_names=&date=2017-08-23

I haven't had a chance to test it, that Kaveri machine is in use at the moment for a headless server, so I can't easily test things.

Comment 17 Bong Cosca 2018-01-12 22:53:26 UTC

Disabling hardware acceleration isn't exactly the workaround I was looking for. I use modesetting with "AccelMethod" "none" just to make this work and I am forced to use Win10 to do 3D animation stuff which is not ideal for me.

If there's anything I can do to help isolate this bugger, let me know. As it stands right now, the R5 graphics on this APU is practically inutile without acceleration. Moreso with multiple monitors that I plan to set up.

Comment 18 Bong Cosca 2018-01-16 15:15:18 UTC

The only saving grace with all this display corruption is that there's a mouse cursor when I use amdgpu with 2D acceleration disabled.

Comment 19 Bong Cosca 2018-01-26 02:16:58 UTC

If I may add, compositing works by setting DRI_PRIME=1 with KMS and AccelMethod "none" despite having only the APU.

Comment 20 Michel Dänzer 2018-01-26 08:54:08 UTC

(In reply to Bong Cosca from comment #19)
> If I may add, compositing works by setting DRI_PRIME=1 with KMS and
> AccelMethod "none" despite having only the APU.

AccelMethod "none" means Xorg does all its rendering using the CPU. Only direct rendering clients may use the GPU in this setup. Does e.g.

 glxgears -info

work and display correctly? If so, please attach its terminal output.

Comment 21 Bong Cosca 2018-01-29 06:54:30 UTC

Created attachment 137013 [details]
glxgears -info

Thanks for responding, Michel. Here's the output of `glxgears -info` that I managed to grab despite having a corrupted display (I was typing it blindly from a Gnome terminal).

Comment 22 Bong Cosca 2018-01-29 06:57:58 UTC

Created attachment 137014 [details]
umr -c

Here's the output generated from `umr -c` in case it provides anything useful. Again I grabbed the output from a corrupted display.

Comment 23 Bong Cosca 2018-02-06 08:20:17 UTC

Created attachment 137183 [details]
Video BIOS

Attached video BIOS may also prove useful in isolating this bug in the graphics stack.

Comment 24 Bong Cosca 2018-02-08 08:41:51 UTC

My screen width is 1280px. Is there a reason why the amdgpu driver says the GPU has a front buffer pitch of 6144 bytes in non-accelerated mode while the pitch in accelerated mode is 5120?

Comment 25 Michel Dänzer 2018-02-08 09:31:23 UTC

(In reply to Bong Cosca from comment #24)
> My screen width is 1280px. Is there a reason why the amdgpu driver says the
> GPU has a front buffer pitch of 6144 bytes in non-accelerated mode while the
> pitch in accelerated mode is 5120?

Where do you see that?

Comment 26 Bong Cosca 2018-02-08 16:03:49 UTC

(In reply to Michel Dänzer from comment #25)
> 
> Where do you see that?

Xorg.0.log

Comment 27 Bong Cosca 2018-02-13 12:38:28 UTC

Created attachment 137313 [details]
drm.debug=0x1e

Output generated by: dmesg | grep drm

Additional trace log to aid perhaps in finding the root cause. I would gladly help out in tracking this bug; already tried looking at the sources but really don't know what to look for. Perhaps a nudge from someone knowledgeable so we can narrow it down.

Comment 28 Bong Cosca 2018-02-13 15:11:21 UTC

I find this message particularly interesting because it occurs several times in the dmesg log:

[drm:radeon_crtc_handle_flip [radeon]] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)

Comment 29 Michel Dänzer 2018-02-13 15:26:53 UTC

(In reply to Bong Cosca from comment #28)
> [drm:radeon_crtc_handle_flip [radeon]] radeon_crtc->flip_status = 0 !=
> RADEON_FLIP_SUBMITTED(2)

This is harmless and not related to the issue this report is about.

Comment 30 Bong Cosca 2018-02-15 14:29:18 UTC

Created attachment 137374 [details]
Gnome desktop

The situation has improved somehow - but screen is still unusable. I tried to patch si_state.c with:

/* KV should be 0x00000002, but that causes problems with radeon */
raster_config = 0x00000002; /* 0x00000002 */

How do I remove the checkerboard pattern overlay? And why is 0x00000000 the default value? This appears to be just a workaround to a greater underlying problem that needs to be fixed.

Comment 31 Bong Cosca 2018-02-16 22:00:37 UTC

This fixes the problem for me:

        case CHIP_KAVERI:
                /* KV should be 0x00000002, but that causes problems with radeon */
                raster_config = 0x00000003;
                raster_config_1 = 0x00000000;
                break;

I will come back with the piglit results shortly.

Comment 32 Bong Cosca 2018-02-17 03:17:37 UTC

Piglit results improved with only 70 fails, after stopping at 23705 tests.

This also works:

raster_config = 0x0000000f;

Comment 33 Bong Cosca 2018-02-17 13:18:06 UTC

Wine works perfect with above change.

Comment 34 Bong Cosca 2018-02-19 06:38:11 UTC

@Michel, are there any impediments in getting this patch merged with mesa-git? I tried this with both 17.3.3 and 17.3.4 without any side effects.

Comment 35 Bong Cosca 2018-02-20 00:19:01 UTC

Created attachment 137448 [details] [review]
Proposed patch

Comment 36 Bong Cosca 2018-02-22 02:51:14 UTC

Mesa 18.1 shows no problems with this patch. Are we going to see this code change in the mainstream anytime soon?

Comment 37 Alex Deucher 2018-02-22 03:10:50 UTC

(In reply to Bong Cosca from comment #36)
> Mesa 18.1 shows no problems with this patch. Are we going to see this code
> change in the mainstream anytime soon?

That will break other peoples cards.  Apparently the harvest config is wrong for your card.  Please attach your full dmesg output.

Comment 38 Alex Deucher 2018-02-22 03:17:30 UTC

Does forcing si_write_harvested_raster_configs() in si_state.c help?

Comment 39 Bong Cosca 2018-02-22 03:54:55 UTC

Created attachment 137521 [details]
dmesg

Comment 40 Bong Cosca 2018-02-22 03:55:42 UTC

I'm afraid forcing si_write_harvested_raster_configs() doesn't help any. We're back to the old tiled garbage.

Comment 41 Bong Cosca 2018-02-22 18:44:40 UTC

cik.c reports the following in cik_setup_rb():

max_rb_num_per_se: 1
enabled_rbs: 0
disabled_rbs: 1
mask: 2
se_num: 1
sh_per_se: 1

Comment 42 Bong Cosca 2018-02-23 00:35:18 UTC

Created attachment 137550 [details] [review]
radeonsi: force si_write_harvested_raster_configs

Here's what finally worked.

Force the call to si_write_harvested_raster_configs() and remove the rb_per_se check >= 2 inside it so even those cards with rb_per_se = 1 (such as mine) will get the rb0/rb1 masking and the subsequent RASTER_CONFIG_RB_MAP computed correctly.

Comment 43 Bong Cosca 2018-02-28 05:07:01 UTC

Created attachment 137682 [details] [review]
radeonsi: force si_write_harvested_raster_configs when we fail to determine enabled backends

Attached patch is less radical than the previous proposal.

Prevent screen corruption by forcing si_write_harvested_raster_configs() when we fail to determine enabled backends so rb0/rb1 masks and raster_config_se are computed correctly for cards with one rb_per_se.

Comment 44 Bong Cosca 2018-03-01 10:53:38 UTC

This bug has been on this tracker for more than a year and it now has a proposed working solution. What's the hold up? Don't we even get the courtesy of a response?

Comment 45 Michel Dänzer 2018-03-01 11:18:40 UTC

(In reply to Bong Cosca from comment #41)
> cik.c reports the following in cik_setup_rb():
> 
> max_rb_num_per_se: 1
> enabled_rbs: 0
> disabled_rbs: 1
> mask: 2
> se_num: 1
> sh_per_se: 1

Looks like there's a bug here in the kernel. These numbers indicate that there's only a single RB, which is disabled.

Maybe this device ID needs to be added to one of the max_backends_per_se = 2 cases in cik_gpu_init?

Comment 46 Alex Deucher 2018-03-01 16:07:13 UTC

Created attachment 137721 [details] [review]
possible fix for radeon

Does this kernel patch fix it?

Comment 47 Alex Deucher 2018-03-01 16:07:49 UTC

Created attachment 137722 [details] [review]
possible fix for amdgpu

Same patch for amdgpu as well for anyone using that.

Comment 48 Bong Cosca 2018-03-02 00:40:33 UTC

Created attachment 137740 [details] [review]
kaveri.patch

Alex, the patch you suggested overrides the cu_per_sh and backends_per_se for ALL Kaveri cards/APUs.

The attached alternative patch works on the A6 7400K, which this attempts to solve in the absence of documentation on this.

Comment 49 Bong Cosca 2018-03-02 00:44:58 UTC

I have no objection to the patch you recommended since it works for me. You would be the better resource person on this subject matter. Hoping this gets to the mainline asap.

Comment 50 Bong Cosca 2018-03-04 06:20:57 UTC

Will we see this patch/commit on the stable kernel too?

Comment 51 Alex Deucher 2018-03-04 18:49:21 UTC

(In reply to Bong Cosca from comment #48)
> Created attachment 137740 [details] [review] [review]
> kaveri.patch
> 
> Alex, the patch you suggested overrides the cu_per_sh and backends_per_se
> for ALL Kaveri cards/APUs.

Yes, that is intended.  The previous code was incorrect.

Comment 52 Alex Deucher 2018-03-04 18:49:45 UTC

(In reply to Bong Cosca from comment #50)
> Will we see this patch/commit on the stable kernel too?

Yes, I will CC stable.

Comment 53 Vedran Miletić 2018-03-22 16:29:40 UTC

545b0bcde7fbd3ee408fa842ea0731451dc4bd0a (amdgpu)
0b58d90f89545e021d188c289fa142e5ff9e708b (radeon)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.