Bug 60879 - [radeonsi] Tahiti LE: GFX block is not functional, CP is okay
Summary: [radeonsi] Tahiti LE: GFX block is not functional, CP is okay
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 70778 70779 71689 74154 79231 87728 92518 93023 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-02-15 08:51 UTC by Hristo Venev
Modified: 2019-09-25 17:50 UTC (History)
17 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshot of Xorg (1.76 MB, image/jpeg)
2013-02-15 08:51 UTC, Hristo Venev
Details
Kernel log when starting Xorg (lines containing radeon or drm) (8.11 KB, text/plain)
2013-02-15 08:52 UTC, Hristo Venev
Details
Kernel log when starting weston (lines containing radeon or drm) (5.71 KB, text/plain)
2013-02-15 08:53 UTC, Hristo Venev
Details
Debug output (52.49 KB, text/plain)
2013-02-23 14:36 UTC, Hristo Venev
Details
Mesa test patch (770 bytes, patch)
2013-03-13 15:20 UTC, Michel Dänzer
Details | Splinter Review
Screenshots with and without glamor (1.61 MB, application/x-tar)
2013-03-22 21:16 UTC, Hristo Venev
Details
Outputs of avivotool (1.18 KB, application/x-xz)
2013-03-25 18:43 UTC, Hristo Venev
Details
Outputs of avivotool (6.82 KB, application/x-xz)
2013-03-29 14:27 UTC, Hristo Venev
Details
Results of OpenCL test (2.38 KB, application/x-xz)
2013-05-18 19:31 UTC, Hristo Venev
Details
Xorg.log (45.59 KB, text/plain)
2013-10-22 18:18 UTC, Philipp Klein
Details
Possible Fix (5.67 KB, patch)
2014-01-28 20:15 UTC, Tom Stellard
Details | Splinter Review
Fix v2 (6.80 KB, patch)
2014-04-30 18:02 UTC, Tom Stellard
Details | Splinter Review
Tahiti Fix (1.22 KB, patch)
2014-04-30 18:04 UTC, Tom Stellard
Details | Splinter Review
/var/log/Xorg.0.log (84.02 KB, text/plain)
2014-05-02 00:30 UTC, Honza Brázdil
Details
/var/log/Xorg.0.log (84.12 KB, text/plain)
2014-05-03 12:01 UTC, Honza Brázdil
Details
diff between my patch and patch from comment #74 (873 bytes, patch)
2014-08-11 16:01 UTC, Pali Rohár
Details | Splinter Review
Fix v3 (7.02 KB, patch)
2014-09-09 20:10 UTC, Tom Stellard
Details | Splinter Review
Fix v4 (7.03 KB, text/plain)
2014-09-11 01:19 UTC, Tom Stellard
Details
Another approach (6.98 KB, patch)
2014-09-11 08:49 UTC, Michel Dänzer
Details | Splinter Review
Fix v5 (8.62 KB, patch)
2014-09-19 01:21 UTC, Tom Stellard
Details | Splinter Review
kernel logs with patch v5 (61.38 KB, text/plain)
2014-09-19 20:44 UTC, Łukasz Krzyżak
Details
xorg.log with mesa-git and patch 5 (39.13 KB, text/plain)
2014-09-19 20:46 UTC, Łukasz Krzyżak
Details
startx out with -verbose 9, fix v5 + printf's (33.91 KB, text/plain)
2014-10-04 13:37 UTC, Łukasz Krzyżak
Details
kernel log with tahiti-fix after starting and killing Xorg.bin (65.21 KB, text/plain)
2014-10-04 13:42 UTC, Łukasz Krzyżak
Details
glxgears output with "Fix v5" in place on 7730 LE (312 bytes, text/plain)
2014-10-25 18:39 UTC, madcatx
Details
Tahiti fix v2 (1.40 KB, patch)
2015-11-20 01:14 UTC, Michel Dänzer
Details | Splinter Review
v2 patch dmesg (81.99 KB, text/plain)
2015-11-20 14:55 UTC, daniel.barabasa
Details
dmesg.log with kernel 4.2 with Tahiti fix v2 patch (91.99 KB, text/plain)
2015-11-22 00:44 UTC, Ben
Details
Xorg.0.log with kernel 4.2 with Tahiti fix v2 patch (39.01 KB, text/plain)
2015-11-22 00:45 UTC, Ben
Details
Dmesg Kernel 4.5RC3 (119.98 KB, text/plain)
2016-02-09 20:50 UTC, madmalkav
Details
Xorg.0.log (50.46 KB, text/plain)
2016-02-09 20:50 UTC, madmalkav
Details
attachment-32429-0.html (2.87 KB, text/html)
2016-03-10 10:41 UTC, madmalkav
Details
Mesa debug file after tests with Marek on the IRC (78.90 KB, text/html)
2016-03-10 21:26 UTC, madmalkav
Details
possible fix (2.38 KB, patch)
2016-03-11 14:04 UTC, Marek Olšák
Details | Splinter Review
Dump with Marek patch (93.01 KB, text/html)
2016-03-11 21:38 UTC, madmalkav
Details
Dmesg from the boot I took the debug dump (62.51 KB, text/plain)
2016-03-11 22:24 UTC, madmalkav
Details
New Mesa dump with Kernel 4.7rc1, mesa-git, llvm-svn (91.87 KB, text/html)
2016-06-05 02:11 UTC, madmalkav
Details
dmesg after the mesa dump (62.29 KB, text/plain)
2016-06-05 02:23 UTC, madmalkav
Details
Xorg log with lastest 4.9-wip kernel and mesa master branch (21.03 KB, text/plain)
2016-10-11 20:30 UTC, madmalkav
Details
openSuse Tumbleweed - Linux 4.8.10 (225.99 KB, text/x-log)
2016-11-25 07:19 UTC, Ben
Details
dmesg for linux 4.9 amdgpu driver (62.60 KB, text/plain)
2016-12-13 18:27 UTC, Ayhan
Details
dmesg amdgpu kernel 4.10 (73.72 KB, text/plain)
2017-08-03 21:10 UTC, MAD
Details
Xorg.log amdgpu kernel 4.10 (24.86 KB, text/plain)
2017-08-03 21:11 UTC, MAD
Details
dmesg radeon kernel 4.10 (88.80 KB, text/plain)
2017-08-03 21:14 UTC, MAD
Details
dmesg radeon kernel 4.12.13 (69.35 KB, text/plain)
2017-09-20 19:47 UTC, David Verelst
Details
journalctl -k amdgpu with linux-amd-staging-git (74.93 KB, text/plain)
2017-09-30 14:58 UTC, David Verelst
Details
journalctl_radeon_4.14.0-041400rc7 (171.22 KB, text/x-log)
2017-11-04 20:45 UTC, MAD
Details
journalctl_amdgpu_4.14.0-041400rc7 (171.65 KB, text/x-log)
2017-11-04 21:14 UTC, MAD
Details
attachment-10505-0.html (2.77 KB, text/html)
2018-03-20 22:30 UTC, madmalkav
Details
journalctl-b0-radeonsi-4.20.6.log (113.75 KB, text/x-log)
2019-02-03 10:01 UTC, MAD
Details
Xorg-radeonsi-4.20.6.log (36.48 KB, text/x-log)
2019-02-03 10:01 UTC, MAD
Details
journalctl-b0-amdgpu-4.20.6.log (112.46 KB, text/x-log)
2019-02-03 10:02 UTC, MAD
Details
Xorg-amdgpu-4.20.6.log (16.43 KB, text/x-log)
2019-02-03 10:03 UTC, MAD
Details

Description Hristo Venev 2013-02-15 08:51:22 UTC
Created attachment 74857 [details]
Screenshot of Xorg

I've recently bought a Radeon 7000 graphics card - Sapphire Radeon HD 7870 XT. I enabled glamor in xorg.conf.

The X server fails to start with glamor enabled. For 10 seconds it shows nothing. I've attached a screenshot taken with my mobile phone after that. After killing X the monitor turns off as if the GPU is off (probably true).
X works if I disable glamor. However this way I'd be forced to use swrast.
xorg-xserver - 1.13.2
xf86-video-ati - git or 7.1.0 (both tried)
glamor - git or 0.5 (both tried)
libdrm - git or 2.4.42 (both tried)
kernel - git or 3.7.7 (both tried)
mesa - git

weston also fails to start. It just shows a black screen with a white stripe on top. Killing it returns me to a tty.

I'll attach parts of the kernel log when starting Xorg and weston.
Comment 1 Hristo Venev 2013-02-15 08:52:19 UTC
Created attachment 74859 [details]
Kernel log when starting Xorg (lines containing radeon or drm)
Comment 2 Hristo Venev 2013-02-15 08:53:01 UTC
Created attachment 74860 [details]
Kernel log when starting weston (lines containing radeon or drm)
Comment 3 Michel Dänzer 2013-02-15 09:12:17 UTC
glamor doesn't work with xserver 1.13 or newer yet. See also bug 58910.
Comment 4 Hristo Venev 2013-02-15 10:22:00 UTC
glamor builds fine with Xorg 1.13.2. The exact same thing happens with Xorg 1.12.4 - builds fine but doesn't start.
Comment 5 Hristo Venev 2013-02-16 23:08:55 UTC
I don't think it's a Xorg/glamor bug because weston doesn't start too. It may be in the kernel, libdrm, mesa or llvm. When starting weston, nothing renders. When starting X, a shader probably enters an infinite loop. Then the kernel tries to reset the GPU and fails. Reconfirmed with today's git for mesa, llvm and libdrm.
Comment 6 Michel Dänzer 2013-02-22 10:40:33 UTC
(In reply to comment #4)
> glamor builds fine with Xorg 1.13.2.

Yes, but it really can't work with it per bug 58910. You'll have to stick to pre-1.13 X servers for testing glamor.

(In reply to comment #5)
> I don't think it's a Xorg/glamor bug because weston doesn't start too.

How about a simple EGL app, e.g. mesa/demos/src/egl/opengl/egltri_screen ?

This could be a Tahiti specific problem.
Comment 7 Hristo Venev 2013-02-22 18:22:48 UTC
egltri_screen shows the same crap as weston. It segfaults 5 seconds after start.
Comment 8 Michel Dänzer 2013-02-22 19:30:40 UTC
(In reply to comment #7)
> egltri_screen shows the same crap as weston. It segfaults 5 seconds after
> start.

Please attach a backtrace of the segfault and the stderr output from running it with the environment variables

EGL_LOG_LEVEL=debug RADEON_DUMP_SHADERS=1

set.
Comment 9 Hristo Venev 2013-02-23 14:36:49 UTC
Created attachment 75409 [details]
Debug output

Here is the debug output and the stack trace (llvm and mesa from yesterday git/svn)
Comment 10 Michel Dänzer 2013-03-06 11:17:30 UTC
The segfault looks like some kind of LLVM issue on shutdown, we can probably ignore that for now.

The real problem is that your GPU hangs even while running the trivial shaders egltri_screen uses. Alex, could it be we're trying to use a missing/disabled shader engine or something like that?

Hristo, can you also attach the kernel output corresponding to egltri_screen? I expect it'll be basically the same as for starting X, but just in case.
Comment 11 Hristo Venev 2013-03-08 16:27:04 UTC
It's the same as starting weston. egltri_screen doesn't hang the GPU. Only Xorg does. egltri_screen and weston fail to clear the buffer or render anything. With current llvm svn, mesa git, libdrm git and linux 3.8.2 the problem is still there.
Comment 12 Michel Dänzer 2013-03-08 17:01:10 UTC
(In reply to comment #11)
> egltri_screen doesn't hang the GPU. Only Xorg does. egltri_screen and weston
> fail to clear the buffer or render anything.

Because the GPU hangs trying to render anything. :) The kernel output is from resetting the hung GPU.
Comment 13 Michel Dänzer 2013-03-13 15:20:45 UTC
Created attachment 76481 [details] [review]
Mesa test patch

Does it work better with this Mesa patch?
Comment 14 Hristo Venev 2013-03-13 20:57:04 UTC
Sadly this patch doesn't fix this bug. egltri_screen does not render anything and does not cause GPU reset. However eglgears_screen and Xorg cause the GPU to reset. Without the patch it's the same.
Comment 15 Hristo Venev 2013-03-22 21:06:48 UTC
Now egl{tri,gears}_screen work. However they don't render properly. Square pixel blocks seem to be "misplaced". I will test with Xorg and attach a screenshot.
Comment 16 Hristo Venev 2013-03-22 21:16:34 UTC
Created attachment 76921 [details]
Screenshots with and without glamor

X11 doesn't render properly. The pixels seem to be shuffled. Maybe a shader doesn't write where it's supposed to?

There is no kernel output caused by this.
Comment 17 Alex Deucher 2013-03-22 21:56:30 UTC
(In reply to comment #16)
> Created attachment 76921 [details]
> Screenshots with and without glamor
> 
> X11 doesn't render properly. The pixels seem to be shuffled. Maybe a shader
> doesn't write where it's supposed to?

Looks like the tiling configuration is wrong on your system.
Comment 18 Hristo Venev 2013-03-23 12:11:36 UTC
Actually the GPU only works if it has been initialized by fglrx and then the driver is switched to radeon without rebooting. When booting with radeon it doesn't work.

I bruteforced all 24 tiling configs by setting them in evergreen_interpret_tiling  in radeonsi_pipe.c and none of them worked for eglgears_screen. They all lead to the same result. The tiling config given to evergreen_interpret_tiling is 0x1023. I have no idea what the (1<<12) bit in it means.
Comment 19 Alex Deucher 2013-03-25 14:39:27 UTC
(In reply to comment #18)
> Actually the GPU only works if it has been initialized by fglrx and then the
> driver is switched to radeon without rebooting. When booting with radeon it
> doesn't work.

Can you dump the registers with avivotool (http://cgit.freedesktop.org/~airlied/radeontool/) from both radeon and fglrx?  E.g., cold boot with radeon and dump the registers:
sudo avivotool regmatch '*' > radeon.regs
then boot with fglrx and switch radeon:
sudo avivotool regmatch '*' > fglrx.regs
and post the outputs here?
Comment 20 Alex Deucher 2013-03-25 14:44:53 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > Actually the GPU only works if it has been initialized by fglrx and then the
> > driver is switched to radeon without rebooting. When booting with radeon it
> > doesn't work.
> 
> Can you dump the registers with avivotool
> (http://cgit.freedesktop.org/~airlied/radeontool/) from both radeon and
> fglrx?  E.g., cold boot with radeon and dump the registers:
> sudo avivotool regmatch '*' > radeon.regs
> then boot with fglrx and switch radeon:
> sudo avivotool regmatch '*' > fglrx.regs
> and post the outputs here?

Actually, three outputs would be better:
1. cold boot with radeon
2. cold boot with fglrx
3. cold boot with fglrx, warm boot radeon
Comment 21 Hristo Venev 2013-03-25 18:43:02 UTC
Created attachment 77012 [details]
Outputs of avivotool

AVIVO_D2GRPH_{ENABLE,CONTROL} seem to make the difference between lockup and wrong rendering.
Comment 22 Alex Deucher 2013-03-29 13:23:44 UTC
Ugh.  Sorry.  I told you the wrong option for avivotool so it dumped the wrong registers.  It should be :

avivotool regs all
Comment 23 Hristo Venev 2013-03-29 14:27:01 UTC
Created attachment 77211 [details]
Outputs of avivotool

avivotool while running fglrx nicely halts the GPU. Xorg wouldn't die so I couldn't switch to radeon. cold-fglrx-warm-radeon was from the next boot.
Comment 24 Hristo Venev 2013-04-20 19:37:44 UTC
Any idea regarding the reason for this behavior?

P.S. I noted that glxgears renders OK on fullscreen with fglrx->radeon. In window it's wrong.
Comment 25 Alex Deucher 2013-04-20 22:33:08 UTC
(In reply to comment #24)
> Any idea regarding the reason for this behavior?
> 
> P.S. I noted that glxgears renders OK on fullscreen with fglrx->radeon. In
> window it's wrong.

Unfortunately, I didn't see anything obvious in the dump.  Can you try my drm-next branch?
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.10
Does that help?
Comment 26 Hristo Venev 2013-04-21 08:54:20 UTC
Sadly the problem still remains.
Comment 27 Alex Deucher 2013-04-21 13:30:10 UTC
(In reply to comment #26)
> Sadly the problem still remains.

Which problem?  Does that kernel act like radeon with fglrx loaded first, or still like radeon without fglrx loaded at all?
Comment 28 Hristo Venev 2013-04-22 17:03:14 UTC
The kernel acts the same way as it did before.
Comment 29 Hristo Venev 2013-04-30 11:59:30 UTC
I just noticed a regression: the GPU halts when cold boot was done by fglrx and driver is switched to radeon.
Comment 30 Hristo Venev 2013-05-12 19:17:06 UTC
Are the golden registers the same in radeonsi and fglrx?
Comment 31 Alex Deucher 2013-05-12 23:01:27 UTC
(In reply to comment #30)
> Are the golden registers the same in radeonsi and fglrx?

Yes.
Comment 32 Hristo Venev 2013-05-18 19:31:20 UTC
Created attachment 79504 [details]
Results of OpenCL test

BREAKTHROUGH!

OpenCL works. Kinda. Tried the following kernel:
__kernel void add(__global const uint *a,  __global const uint *b, __global uint *c){
    c[0]=1;
}
Complicated operations such as addition, memory loads, getting global ID, etc. fail with Cannot select errors.
I have no idea if this has worked with earlier LLVM/mesa.

After the kernel is run, the 0-th element of c is equal to 1. I've attached full source code and outputs for various kernels.
Comment 33 Tom Stellard 2013-05-18 23:18:02 UTC
(In reply to comment #32)
> Created attachment 79504 [details]
> Results of OpenCL test
> 
> BREAKTHROUGH!
> 
> OpenCL works. Kinda. Tried the following kernel:
> __kernel void add(__global const uint *a,  __global const uint *b, __global
> uint *c){
>     c[0]=1;
> }
> Complicated operations such as addition, memory loads, getting global ID,
> etc. fail with Cannot select errors.
> I have no idea if this has worked with earlier LLVM/mesa.
> 

All that is supported in the git tree is stores to global memory.  I have global loads, work item functions, and a fair amount of arithmetic operations working in a local branch, and I hope to get that pushed to mainline in the next week or two.

> After the kernel is run, the 0-th element of c is equal to 1. I've attached
> full source code and outputs for various kernels.
Comment 34 Hristo Venev 2013-05-19 10:54:41 UTC
What's the difference between integer addition in OpenGL shaders and OpenCL kernels? Aren't the intrinsics the same?
Comment 35 Hristo Venev 2013-06-04 07:50:45 UTC
OpenCL update: On floating point, addition, subtraction, multiplication, division and pow work. On integer, addition, subtraction and multiplication work. Division and modulo halt the GPU. If they are implemented the same way as in OpenGL, this might be the bug I'm facing.
Comment 36 Michel Dänzer 2013-06-04 17:12:29 UTC
For OpenCL with radeonsi, make sure your LLVM and Mesa SVN/Git snapshots are up to date as of today.

However, I'm afraid your success with OpenCL doesn't necessarily mean anything for the graphics problem, as the latter involves much more complex hardware state setup.
Comment 37 Hristo Venev 2013-06-04 20:33:16 UTC
I updated llvm, clang and mesa. Division and modulo still don't work. Another thing I noticed is that ifs which depend on memory loads cause llvm crash:

__kernel void add(__global const uint *a,  __global const uint *b, __global uint *c){
    ulong id=get_global_id(0); // OK
    if(id>10) return; // OK
    if(b[id]==0) return; // crash
    c[id]=a[id]/b[id]; // GPU hang
}

a[id] is id+1
b[id] is 2*id+2

Stack dump:
0.	Running pass 'Function Pass Manager' on module 'radeon'.
1.	Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@add'
Segmentation fault

#0  0x00007ffff461c8a7 in ?? () from /usr/lib64/llvm/libLLVM-3.4svn.so
#1  0x00007ffff3e36208 in llvm::SelectionDAGISel::DoInstructionSelection() () from /usr/lib64/llvm/libLLVM-3.4svn.so
#2  0x00007ffff3e3c620 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /usr/lib64/llvm/libLLVM-3.4svn.so
#3  0x00007ffff3e3e0f2 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#4  0x00007ffff3e3f421 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#5  0x00007ffff3acaeb2 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#6  0x00007ffff3acaf4b in llvm::FPPassManager::runOnModule(llvm::Module&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#7  0x00007ffff3acb195 in llvm::MPPassManager::runOnModule(llvm::Module&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#8  0x00007ffff3acd1dc in llvm::PassManagerImpl::run(llvm::Module&) () from /usr/lib64/llvm/libLLVM-3.4svn.so
#9  0x00007ffff417c009 in ?? () from /usr/lib64/llvm/libLLVM-3.4svn.so
#10 0x00007ffff417c382 in LLVMTargetMachineEmitToMemoryBuffer () from /usr/lib64/llvm/libLLVM-3.4svn.so
#11 0x00007ffff2ae6ab1 in radeon_llvm_compile () from /usr/lib64/gallium-pipe/pipe_radeonsi.so
#12 0x00007ffff2adc65d in si_compile_llvm () from /usr/lib64/gallium-pipe/pipe_radeonsi.so
#13 0x00007ffff2adef79 in ?? () from /usr/lib64/gallium-pipe/pipe_radeonsi.so
#14 0x00007ffff6d882a7 in _cl_kernel::exec_context::bind(_cl_command_queue*) () from /usr/lib64/libOpenCL.so.1
#15 0x00007ffff6d88e46 in _cl_kernel::launch(_cl_command_queue&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&) () from /usr/lib64/libOpenCL.so.1
#16 0x00007ffff6d847dc in _cl_event::trigger() () from /usr/lib64/libOpenCL.so.1
#17 0x00007ffff6d84e54 in clover::hard_event::hard_event(_cl_command_queue&, unsigned int, std::vector<_cl_event*, std::allocator<_cl_event*> >, std::function<void (_cl_event&)>) ()
   from /usr/lib64/libOpenCL.so.1
#18 0x00007ffff6d9fad5 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1
Comment 38 Tom Stellard 2013-06-04 21:49:13 UTC
The OpenCL failures are unrelated to the original bug, so can you please file a separate bug for them.

This bug has been outstanding for a while, and it seems like there are actually several "bugs".  Could you please summarize the problems you are currently having and list the versions or git HEAD commits that you are using for glamor, xf86-video-ati, Xorg server, Linux kernel, Mesa, and LLVM.  Thanks.
Comment 39 Hristo Venev 2013-07-24 09:19:25 UTC
What is the difference between OpenCL integer division and OpenGL shader integer division?
Comment 40 Hristo Venev 2013-09-16 13:54:45 UTC
After some testing I found out that my GPU crashes on big shaders. 67 32-bit words is enough to crash it, 50 isn't. How is udiv implemented?
Comment 41 Hristo Venev 2013-10-21 06:21:38 UTC
I'm really sorry I have to do that.

Bump.
Comment 42 Tom Stellard 2013-10-21 15:49:41 UTC
Does X11 still crash for you on radeonsi?
Comment 43 Philipp Klein 2013-10-21 23:28:58 UTC
Hi!
I'm not 100% sure if this is the same issue but when I try to start with the default settings (without an X conf file) the screen remains black and the gpu-fan speeds up. After 5 minutes I abort it.
According to the logs the X-Server starts and (for me) there are no suspicious messages.

As a workaround I put 'Option "AccelMethod" "EXA"' into a X conf file.

My system:
GPU: XFX Radeon HD 7870 GHz Edition (AMD Tathiti)
kernel: 3.11.6-1-ARCH
xf86-video-ati: 1:7.2.0
mesa: 9.2.2
glamor-egl: 0.5.1
X-server: 1.14.3
llvm: 3.3

Should I attach the Xorg.log? What else can I do to track this down?
Thanks in advance
Comment 44 Hristo Venev 2013-10-22 06:08:54 UTC
(In reply to comment #40)
> After some testing I found out that my GPU crashes on big shaders. 67 32-bit
> words is enough to crash it, 50 isn't. How is udiv implemented?
Comment 45 Tom Stellard 2013-10-22 14:22:22 UTC
(In reply to comment #44)
> (In reply to comment #40)
> > After some testing I found out that my GPU crashes on big shaders. 67 32-bit
> > words is enough to crash it, 50 isn't. How is udiv implemented?

Does this crash happen when X starts or when you are running on OpenCL program?
Comment 46 Tom Stellard 2013-10-22 14:22:46 UTC
(In reply to comment #43)
> Hi!
> I'm not 100% sure if this is the same issue but when I try to start with the
> default settings (without an X conf file) the screen remains black and the
> gpu-fan speeds up. After 5 minutes I abort it.
> According to the logs the X-Server starts and (for me) there are no
> suspicious messages.
> 
> As a workaround I put 'Option "AccelMethod" "EXA"' into a X conf file.
> 
> My system:
> GPU: XFX Radeon HD 7870 GHz Edition (AMD Tathiti)
> kernel: 3.11.6-1-ARCH
> xf86-video-ati: 1:7.2.0
> mesa: 9.2.2
> glamor-egl: 0.5.1
> X-server: 1.14.3
> llvm: 3.3
> 
> Should I attach the Xorg.log? What else can I do to track this down?
> Thanks in advance

Yes, please post your Xorg.log.
Comment 47 Hristo Venev 2013-10-22 17:36:37 UTC
What crashes the GPU:
    - OpenGL
    - OpenCL: big kernels (> 66 words)
What does not crash the GPU:
    - KMS
    - Copying buffers to/from GPU
    - HDMI (sound not tested, shows up with recent kernels)
    - OpenCL: small kernels (< 51 words)
Comment 48 Tom Stellard 2013-10-22 17:52:45 UTC
(In reply to comment #47)
> What crashes the GPU:
>     - OpenGL
>     - OpenCL: big kernels (> 66 words)

A new bug should be opened for these failures.

> What does not crash the GPU:
>     - KMS
>     - Copying buffers to/from GPU
>     - HDMI (sound not tested, shows up with recent kernels)
>     - OpenCL: small kernels (< 51 words)
Comment 49 Philipp Klein 2013-10-22 18:18:44 UTC
Created attachment 88014 [details]
Xorg.log
Comment 50 Michel Dänzer 2013-10-23 08:13:13 UTC
*** Bug 70778 has been marked as a duplicate of this bug. ***
Comment 51 Hristo Venev 2013-12-03 19:51:22 UTC
Bump
Comment 52 Hristo Venev 2014-01-08 19:39:07 UTC
Bump
Comment 53 Tom Stellard 2014-01-13 17:10:49 UTC
(In reply to comment #52)
> Bump

What is the PCI ID of your GPU?  If you run:

lspci -nn | grep VGA

The PCI ID will be the number at the end of the line inside the brackets.
Comment 54 Michel Dänzer 2014-01-14 08:39:20 UTC
(In reply to comment #53)
> What is the PCI ID of your GPU?

From attachment 74859 [details]:

> [drm] initializing kernel modesetting (TAHITI 0x1002:0x679E 0x174B:0xE246).

So the PCI ID is 0x679E, a harvested Tahiti I think.
Comment 55 Pali Rohár 2014-01-22 20:30:42 UTC
I have similar or same problem. I wrote info to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=71488#c10 If you need some more info let me know and I will try to provide it. I would like to see working radonsi driver with my card.
Comment 56 Pali Rohár 2014-01-25 14:06:14 UTC
It happends with graphic card AMD Radeon HD 7730. In lspci -nn it is identified by:

05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde LE [Radeon HD 7730/8730] [1002:6837]

So PCI ID should be 0x1002:0x6837.

@Tom Stellard, @Michel Dänzer: Can you look at this problem?
Comment 57 Tom Stellard 2014-01-28 20:15:55 UTC
Created attachment 92941 [details] [review]
Possible Fix

Does this patch help?
Comment 58 Hristo Venev 2014-01-29 07:55:27 UTC
OpenCL with big kernels and glxinfo still hang unconditionally.
Default raster_config = 0x2a00126a
rb mask = 255
Final raster_config = 0x2a00126a
Comment 59 Michel Dänzer 2014-01-30 02:31:40 UTC
(In reply to comment #58)
> Default raster_config = 0x2a00126a
> rb mask = 255
> Final raster_config = 0x2a00126a

The patch didn't modify the raster_config value, so either it's not correct yet, or the kernel is providing incorrect information about which backends are enabled.
Comment 60 Tom Stellard 2014-01-30 22:22:37 UTC
(In reply to comment #59)
> (In reply to comment #58)
> > Default raster_config = 0x2a00126a
> > rb mask = 255
> > Final raster_config = 0x2a00126a
> 
> The patch didn't modify the raster_config value, so either it's not correct
> yet, or the kernel is providing incorrect information about which backends
> are enabled.

The RB mask is 255, which means all 8 rbs are enabled, so either the kernel is providing the wrong information or there is something else besides the raster_config that we need to fix.
Comment 61 Pali Rohár 2014-02-05 15:14:10 UTC
Hi Tom Stellard!

Now I updated kernel to 3.13, drm from git, radeon x driver from git and mesa from git with above patch. And patch really fixed problem with glamor rendering :-)

Here is output from glxgears

$ glxgears 
Default raster_config = 0x124a
rb mask = 10
Final raster_config = 0x124f
Default raster_config = 0x124a
rb mask = 10
Final raster_config = 0x124f
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
270 frames in 5.0 seconds = 53.990 FPS
^C

Tested also with 3.14-rc1 kernel and still working. But with old kernels (3.11) there is same problem and not working.
Comment 62 Tom Stellard 2014-02-05 22:32:19 UTC
(In reply to comment #61)
> Hi Tom Stellard!
> 
> Now I updated kernel to 3.13, drm from git, radeon x driver from git and
> mesa from git with above patch. And patch really fixed problem with glamor
> rendering :-)
> 
> Here is output from glxgears
> 
> $ glxgears 
> Default raster_config = 0x124a
> rb mask = 10
> Final raster_config = 0x124f
> Default raster_config = 0x124a
> rb mask = 10
> Final raster_config = 0x124f
> Running synchronized to the vertical refresh.  The framerate should be
> approximately the same as the monitor refresh rate.
> 270 frames in 5.0 seconds = 53.990 FPS
> ^C
> 
> Tested also with 3.14-rc1 kernel and still working. But with old kernels
> (3.11) there is same problem and not working.

Older kernels don't provide an interface to query the enabled backends.  It should work with 3.12.x and newer.
Comment 63 Pali Rohár 2014-02-06 07:35:43 UTC
Ok, so do you need to test something other? Or will you include that patch into mesa?
Comment 64 Hristo Venev 2014-02-06 19:37:09 UTC
(sh_per_se | 0x1) /* WTF? Shift widths aren't often used that way in a bitmask. */
(1u<<sh_per_se)-1 /* probably what was meant */

Patch + kernel 3.13 + llvm,mesa,libdrm,glamor,xf86-video-ati,wayland,weston git: rb_config=255
    - OpenCL <64 dwords works
    - OpenCL >64 dwords hangs
    - X11 starts
    - glxinfo hangs
    - glxgears hangs
    - weston works

rb_config&=0b00001100:
    - OpenCL <64 dwords works
    - OpenCL >64 dwords hangs
    - X11 starts
    - glxinfo works
    - glxgears works! (~2600 Frames/s)
    - My X session hangs (probably chromium)
    - weston corrupted
    - Any OpenCL (even *a=0;) after OpenGL fails with error about kernel rejecting CS, nothing on dmesg

Also, GPU reset fails. I used S3 sleep in order to reset the GPU. Sleeps for <15 seconds may sometimes fail to reset it.
Comment 65 Michel Dänzer 2014-02-19 03:35:33 UTC
*** Bug 74154 has been marked as a duplicate of this bug. ***
Comment 66 Pali Rohár 2014-02-26 18:28:37 UTC
@Tom Stellard: Will you prepare new patch for testing? And when you include this fix into mesa?
Comment 67 Tom Stellard 2014-03-03 19:55:53 UTC
(In reply to comment #66)
> @Tom Stellard: Will you prepare new patch for testing? And when you include
> this fix into mesa?

I'm still trying to track down a Tahiti GPU, so I can see what the issue is there.
Comment 68 Pali Rohár 2014-03-14 11:58:02 UTC
@Tom Stellard: Do you have something new about this problem?
Comment 69 Pali Rohár 2014-04-02 10:12:54 UTC
BUMP!
Comment 70 Honza Brázdil 2014-04-06 10:45:47 UTC
Anything new?
Comment 71 Hristo Venev 2014-04-09 15:13:43 UTC
rb_config&=0b00001100; does not work(=run glxgears) anymore
Comment 72 Pali Rohár 2014-04-11 10:27:01 UTC
(In reply to comment #57)
> Created attachment 92941 [details] [review] [review]
> Possible Fix
> 
> Does this patch help?

@Tom Stellard: That patch does not apply anymore on top of mesa git.
Comment 73 Pali Rohár 2014-04-30 14:24:10 UTC
@Tom Stellard, @Michel Dänzer: ping
Comment 74 Tom Stellard 2014-04-30 18:02:39 UTC
Created attachment 98257 [details] [review]
Fix v2

Pali, I have sent this patch to the mailing list for review, can you confirm that it fixes the issue for you.
Comment 75 Tom Stellard 2014-04-30 18:04:44 UTC
Created attachment 98258 [details] [review]
Tahiti Fix

Hristo, can you try this kernel patch?
Comment 76 Alex Deucher 2014-04-30 19:30:42 UTC
(In reply to comment #75)
> Created attachment 98258 [details] [review] [review]
> Tahiti Fix

+        for (i = 0; i < rdev->config.si.max_texture_channel_caches; i++)
+                cgts_tcc_disable &= ~(1 << (16 + i));

this should be:

+        for (i = 0; i < rdev->config.cik.max_texture_channel_caches; i++)
+                cgts_tcc_disable &= ~(1 << (16 + i));
Comment 77 Honza Brázdil 2014-05-02 00:30:30 UTC
Created attachment 98320 [details]
/var/log/Xorg.0.log

Hi,

I have the same issue with Radeon HD 7870 XT (https://bugs.freedesktop.org/show_bug.cgi?id=74154).

I tried to apply the 0001-radeonsi-Program-RASTER_CONFIG-for-harvested-GPUs-v2 patch to mesa 10.1.1, build and install but it doesn't help. Maybe I did something wrong?


./autogen.sh --prefix=/usr --libdir=/usr/lib64/ --sysconfdir=/etc --enable-selinux --enable-osmesa --enable-egl --disable-gles1 --enable-gles2 --disable-gallium-egl --disable-xvmc --enable-vdpau --with-egl-platforms=x11,drm,wayland --enable-shared-glapi --enable-gbm --enable-opencl --enable-opencl-icd --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --with-llvm-shared-libs --enable-dri --enable-xa --with-gallium-drivers=svga,radeonsi,swrast,r600,r300,nouveau --disable-dri3 --with-clang-libdir=/usr/lib/
make
sudo make install
Comment 78 Michel Dänzer 2014-05-02 03:25:58 UTC
(In reply to comment #76)
> this should be:
> 
> +        for (i = 0; i < rdev->config.cik.max_texture_channel_caches; i++)
> +                cgts_tcc_disable &= ~(1 << (16 + i));

Why? This is si_gpu_init().


(In reply to comment #75)
> Created attachment 98258 [details] [review] [review]
> Tahiti Fix
[...]
>+        WREG32(CGTS_TCC_DISABLE, cgts_tcc_disable);

My understanding is that this register indicates which TCCs are not functional. So this line should be replaced by

        cgts_tcc_disable |= RREG32(CGTS_TCC_DISABLE);
Comment 79 Alex Deucher 2014-05-02 11:40:23 UTC
(In reply to comment #78)
> (In reply to comment #76)
> > this should be:
> > 
> > +        for (i = 0; i < rdev->config.cik.max_texture_channel_caches; i++)
> > +                cgts_tcc_disable &= ~(1 << (16 + i));
> 
> Why? This is si_gpu_init().
> 

whoops, I was thinking about CIK at the time.  disregard my comment.
Comment 80 Honza Brázdil 2014-05-03 12:01:08 UTC
Created attachment 98377 [details]
/var/log/Xorg.0.log

I tried also to rebuild kernel with the Tahiti Fix, but still nothing.
Comment 81 Pali Rohár 2014-05-03 13:02:13 UTC
(In reply to comment #74)
> Created attachment 98257 [details] [review] [review]
> Fix v2
> 
> Pali, I have sent this patch to the mailing list for review, can you confirm
> that it fixes the issue for you.

Hello, I applied this patch on top of mesa, but it is not working :-( Xserver show only black screen. And in dmesg I see this:

[   31.269778] radeon 0000:05:00.0: GPU fault detected: 147 0x09e25201
[   31.269785] radeon 0000:05:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0E6B9BCF
[   31.269788] radeon 0000:05:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02052001
[   31.269792] VM fault (0x01, vmid 1) at page 241933263, read from CB_CMASK (82)

In Xorg.0.log are no errors.

After killing X in dmesg are these lines:

[  307.388090] radeon 0000:05:00.0: GPU lockup CP stall for more than 276120msec
[  307.388104] radeon 0000:05:00.0: GPU lockup (waiting for 0x0000000000000002 last fence id 0x0000000000000001 on ring 0)
[  312.832194] pci_pm_runtime_suspend(): radeon_pmops_runtime_suspend+0x0/0xc0 [radeon] returns -22
[  320.270503] detected fb_set_par error, error code: -22

When I start X again, it immediately crash and in Xorg.0.log are these errors:

[   320.199] drmOpenDevice: node name is /dev/dri/card0
[   320.199] drmOpenDevice: open result is -1, (Invalid argument)
[   320.199] drmOpenByBusid: Searching for BusID pci:0000:05:00.0
[   320.199] drmOpenDevice: node name is /dev/dri/card0
[   320.199] drmOpenDevice: open result is -1, (Invalid argument)
...
[   320.270] (EE) RADEON(0): [drm] Failed to open DRM device for pci:0000:05:00.0: No such file or directory
[   320.270] (EE) RADEON(0): Kernel modesetting setup failed
[   320.270] (II) UnloadModule: "radeon"
[   320.270] (II) Unloading radeon
[   320.270] (EE) Screen(s) found, but none have a usable configuration.
[   320.270] 
Fatal server error:
[   320.270] no screens found

Old patch (which can be applied on older mesa version) worked fine without any problem. Note that I did not changed kernel, still using same version 3.14-rc1.
Comment 82 jyliu 2014-05-26 07:55:12 UTC
I tried ubuntu 14.04 with 7870xt,no bug happened , seems this issue is fixed in latest Ubuntu release
Comment 83 Pali Rohár 2014-08-07 14:35:57 UTC
With last version of mesa from git with v2 I'm still getting black screen with these errors in dmesg:

[   36.661540] radeon 0000:05:00.0: GPU fault detected: 147 0x06625201
[   36.661548] radeon 0000:05:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0A4FB8B3
[   36.661551] radeon 0000:05:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02052001
[   36.661555] VM fault (0x01, vmid 1) at page 172996787, read from CB_CMASK (82

@Tom Stellard: Can you look at it?
Comment 84 Pali Rohár 2014-08-11 16:01:39 UTC
Created attachment 104448 [details] [review]
diff between my patch and patch from comment #74

I commented lines

si_pm4_set_reg(pm4, GRBM_GFX_INDEX, SE_INDEX(i) | SH_BROADCAST_WRITES);

and

si_pm4_set_reg(pm4, GRBM_GFX_INDEX, SE_BROADCAST_WRITES);

in patch from comment #74 and my radeon hd 7730 started working :-) glamor and opengl3 working fine.

In attchment is diff between my patch and patch from comment #74.
Comment 85 Alex Deucher 2014-08-11 16:06:24 UTC
*** Bug 79231 has been marked as a duplicate of this bug. ***
Comment 86 madcatx 2014-08-15 10:34:07 UTC
You're the man, Pali! I just tried the modified patch and it works for me too. Glamor, OpenGL and vdpau seem to be working perfectly now!
Comment 87 Tom Stellard 2014-09-09 20:10:51 UTC
Created attachment 106006 [details] [review]
Fix v3

Thanks for tracking down the bug with v2.  Can you try this patch?
Comment 88 madcatx 2014-09-09 21:02:48 UTC
Is this patch supposed to apply cleanly against mesa 10.1.5? I'm getting the following build error:


In file included from ../../../../src/gallium/auxiliary/util/u_inlines.h:41:0,
                 from ../../../../src/gallium/auxiliary/pipebuffer/pb_buffer.h:49,
                 from ../../winsys/radeon/drm/radeon_winsys.h:43,
                 from si_pm4.h:30,
                 from si_state.h:30,
                 from si_pipe.h:29,
                 from si_state.c:27:
si_state.c: In function 'si_init_config':
si_state.c:3291:49: error: 'struct radeon_info' has no member named 'max_sh_per_se'
   unsigned sh_per_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);
                                                 ^
../../../../src/gallium/auxiliary/util/u_math.h:767:27: note: in definition of macro 'MAX2'
 #define MAX2( A, B )   ( (A)>(B) ? (A) : (B) )
                           ^
si_state.c:3291:49: error: 'struct radeon_info' has no member named 'max_sh_per_se'
   unsigned sh_per_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);
                                                 ^
../../../../src/gallium/auxiliary/util/u_math.h:767:37: note: in definition of macro 'MAX2'
 #define MAX2( A, B )   ( (A)>(B) ? (A) : (B) )
                                     ^
si_state.c:3292:46: error: 'struct radeon_info' has no member named 'max_sh_per_se'
   unsigned num_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);
                                              ^
../../../../src/gallium/auxiliary/util/u_math.h:767:27: note: in definition of macro 'MAX2'
 #define MAX2( A, B )   ( (A)>(B) ? (A) : (B) )
                           ^
si_state.c:3292:46: error: 'struct radeon_info' has no member named 'max_sh_per_se'
   unsigned num_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);
                                              ^
../../../../src/gallium/auxiliary/util/u_math.h:767:37: note: in definition of macro 'MAX2'
 #define MAX2( A, B )   ( (A)>(B) ? (A) : (B) )
Comment 89 Michel Dänzer 2014-09-10 02:30:50 UTC
(In reply to comment #88)
> Is this patch supposed to apply cleanly against mesa 10.1.5?

No, looks like it's for Git master, should probably apply against the 10.3 branch at least though.


(In reply to comment #87)
> Fix v3
[...]
> +		for (i = 0; i < num_se; i++) {
> +			si_pm4_set_reg(pm4, GRBM_GFX_INDEX,
> +				SE_INDEX(i) |
> +				SH_BROADCAST_WRITES |
> +				INSTANCE_BROADCAST_WRITES);
> +			si_pm4_set_reg(pm4, R_028350_PA_SC_RASTER_CONFIG, raster_config);
> +		}

Since this uses the same raster_config value for all SEs, couldn't it just use a single write with SE_BROADCAST_WRITES enabled in GRBM_GFX_INDEX?

If not:

> +		unsigned sh_per_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);
> +		unsigned num_se = MAX2(sctx->screen->b.info.max_sh_per_se, 1);

sh_per_se and num_se have the same value. Should one of them be calculated differently, or does a single variable suffice?
Comment 90 Tom Stellard 2014-09-11 01:19:13 UTC
Created attachment 106097 [details]
Fix v4

Here is an updated patch that addresses Michel's comments.
Comment 91 Michel Dänzer 2014-09-11 08:49:26 UTC
Created attachment 106113 [details] [review]
Another approach

If Tom's v4 patch doesn't work, you can try this patch on top of it. If that still doesn't work, please provide the stderr debugging output about raster_config.
Comment 92 Tom Stellard 2014-09-19 01:21:29 UTC
Created attachment 106530 [details] [review]
Fix v5

Can you try this patch?  I've merged Michel's patch with mine and it works on my Verde.

Even if this patch works for you could you still post the output when running glxgears?
Comment 93 Łukasz Krzyżak 2014-09-19 20:43:21 UTC
Hello
I've got Radeon HD 7870 XT running under arch linux. It works fine on catalyst drivers, but fails on OS driver.
Without patches from this bug (patch 3&4 or 5) my computer hangs. Signal to monitor is off and monitor suspends, logging through ssh is impossible, system logs are cut before launching X.
After applying patch (3&4 or 5) screen goes black, but monitor doesn't go to suspend. I can log in with ssh, after killing Xorg.bin console is visible and operational. Aftr killing X kernel logs show 

 1868.387061] radeon 0000:01:00.0: ring 0 stalled for more than 395470msec
[ 1868.387066] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000005 last fence id 0x0000000000000001 on ring 0)
[ 1868.695731] [drm:si_dpm_set_power_state] *ERROR* si_disable_ulv failed

Stack trace of running X shows it's waiting for some fence inside radeon dri. I'm attaching dmesg and xorg log after starting and killing X server with patch 5.
Comment 94 Łukasz Krzyżak 2014-09-19 20:44:51 UTC
Created attachment 106562 [details]
kernel logs with patch v5
Comment 95 Łukasz Krzyżak 2014-09-19 20:46:05 UTC
Created attachment 106563 [details]
xorg.log with mesa-git and patch 5
Comment 96 Tom Stellard 2014-09-19 21:40:57 UTC
What happens if you try only this patch: https://bugs.freedesktop.org/attachment.cgi?id=106097

Also do you see any output when you start X?  The best way to check is to ssh into the system and then run startx.
Comment 97 Łukasz Krzyżak 2014-09-19 22:33:46 UTC
Mesa-git with patch v4
startx output:

 X.Org X Server 1.16.0
Release Date: 2014-07-16
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.15.5-2-ARCH x86_64
Current Operating System: Linux pecet 3.16.2-1-ARCH #1 SMP PREEMPT Sat Sep 6 13:12:51 CEST 2014 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=b02f6229-160f-4999-9979-82f4e15dacff rw quiet
Build Date: 31 July 2014  11:53:19AM
 
Current version of pixman: 0.32.6
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat Sep 20 02:13:24 2014
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(II) [KMS] Kernel modesetting enabled.
 
Screen stays blank and on

After kill -9 on Xorg.bin:

 [root@pecet ~]# kill -9 425[root@pecet ~]# XIO:  fatal IO error 2 (No such file or directory) on X server ":0"
      after 18 requests (18 known processed) with 0 events remaining.
xset:  unable to open display ":0"
xsetroot:  unable to open display ':0'
startkde: Starting up...
xprop:  unable to open display ':0'
xprop:  unable to open display ':0'
Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)
kdeinit4: Can not connect to the X Server.
kdeinit4: Might not terminate at end of session.
QDBusConnection: session D-Bus connection created before QCoreApplication. Application may misbehave.
QDBusConnection: session D-Bus connection created before QCoreApplication. Application may misbehave.
kded4: cannot connect to X server :0
kded(476): Communication problem with  "kded" , it probably crashed.
Error message was:  "org.freedesktop.DBus.Error.NoReply" : " "Message did not receive a reply (timeout by message bus)" "
 
kcminit_startup: cannot connect to X server :0
unnamed app(481): Cannot connect to the X server
ksmserver: cannot connect to X server :0
startkde: Shutting down...
klauncher: Exiting on signal 1
startkde: Running shutdown scripts...
xprop:  unable to open display ':0'
xprop:  unable to open display ':0'
startkde: Done.
 
and monitor turns off
Dmesg output:
  243.740994] radeon 0000:01:00.0: ring 0 stalled for more than 211963msec  243.741009] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000003 last fence id 0x0000000000000001 on ring 0)

Second run of startx causes hard hangup - monitor turns on, startx output ends with KMS Kernel mode setting enabled, after few seconds monitor disables and system stops responding to ssh

Mesa-git built from master with options:
./autogen.sh --prefix=/usr \             --sysconfdir=/etc \
             --with-dri-driverdir=/usr/lib/xorg/modules/dri \
             --with-gallium-drivers=radeonsi \
             --with-dri-drivers=radeon \
             --with-egl-platforms=x11,drm,wayland \
             --enable-llvm-shared-libs \
             --disable-gallium-egl \
             --disable-gallium-gbm \
             --enable-egl \
             --enable-gbm \
             --enable-gallium-llvm \
             --enable-shared-glapi \
             --enable-glx-tls \
             --enable-dri \
             --enable-glx \
             --enable-osmesa \
             --enable-gles1 \
             --enable-gles2 \
             --enable-texture-float \
             --enable-xa \
             --enable-vdpau \
             --enable-xvmc \
             --enable-dri3 \
             --enable-omx \
             --enable-opencl \
             --enable-opencl-icd \
             --with-clang-libdir=/usr/lib
Comment 98 Michel Dänzer 2014-09-22 07:06:53 UTC
Łukasz, can you attach (as opposed to paste) the startx output with patch v5?

Tom, BTW, what happened to your kernel patches?
Comment 99 Tom Stellard 2014-09-22 13:30:25 UTC
(In reply to comment #97)
> Mesa-git with patch v4
> startx output:

Are you sure that your X server is using mesa-git and not your system Mesa?
I don't see any of the printfs from the patch in your output.
Comment 100 Tom Stellard 2014-09-22 13:33:18 UTC
(In reply to comment #98)
> Łukasz, can you attach (as opposed to paste) the startx output with patch v5?
> 
> Tom, BTW, what happened to your kernel patches?

I don't think the user with the bad Tahiti ever tested them.
Comment 101 Łukasz Krzyżak 2014-09-22 15:51:39 UTC
I'll verify library paths and attach x logs/outs with patch v5

Should I apply Tahiti Fix (https://bugs.freedesktop.org/attachment.cgi?id=98258) to my kernel ?
Comment 102 Tom Stellard 2014-09-22 15:59:16 UTC
(In reply to comment #101)
> I'll verify library paths and attach x logs/outs with patch v5
> 
> Should I apply Tahiti Fix
> (https://bugs.freedesktop.org/attachment.cgi?id=98258) to my kernel ?

Sure.
Comment 103 Michel Dänzer 2014-09-24 02:23:49 UTC
(In reply to comment #102)
> > Should I apply Tahiti Fix
> > (https://bugs.freedesktop.org/attachment.cgi?id=98258) to my kernel ?
> 
> Sure.

Tom, did you see comment 78?
Comment 104 Łukasz Krzyżak 2014-10-04 13:35:41 UTC
I've tested tahiti-fix.patch for kernel with version 3.16-3, it does not help. I've added debug info to it:

radeonsi cgts_tcc_disable: -268435456

that's the value of register for my card.

patch v5 also doesn't resolve problem, after adding additional prinf's it seems that si_init_config goes into:

if (rb_mask && util_bitcount(rb_mask) >= num_rb) {

so si_write_harvested_raster_configs is not called

after commenting out that if, xorg logs from si_write_harvested_raster_configs shows:

Original raster_config = 0x2a00126a, rb_mask = 0xff

attachments:
- startx-0410-2.out - output of startx with -verbose 9, tahiti-fix kernel, mesa-git (c74be01e80fcdd7feabc0f27df4aebe66abb626e) with patchv5 + additional fprintf's
- kernel-0410-2.out - dmesg, additional debug in tahiti-fix: radeonsi cgts_tcc_disable
Comment 105 Łukasz Krzyżak 2014-10-04 13:37:39 UTC
Created attachment 107322 [details]
startx out with -verbose 9, fix v5 + printf's
Comment 106 Łukasz Krzyżak 2014-10-04 13:42:35 UTC
Created attachment 107323 [details]
kernel log with tahiti-fix after starting and killing Xorg.bin
Comment 107 madcatx 2014-10-25 18:38:28 UTC
I finally updated to pre-release Fedora 21 which packages mesa 10.3. The "v5 fix" seems to work OK with my 7730 LE (Verde chip). glxgears output attached.

(Is there any reason why would the desktop animations feel much smoother and snappier than with the old hackofix?)
Comment 108 madcatx 2014-10-25 18:39:52 UTC
Created attachment 108412 [details]
glxgears output with "Fix v5" in place on 7730 LE
Comment 109 Michel Dänzer 2014-10-27 07:46:52 UTC
(In reply to madcatx from comment #107)
> (Is there any reason why would the desktop animations feel much smoother and
> snappier than with the old hackofix?)

Sounds like the 'hackofix' disabled more SEs than necessary, so the card wasn't running as fast as it can.
Comment 110 Pali Rohár 2014-12-05 18:57:59 UTC
Michel Dänzer: is this patch going to be included in mesa git?
Comment 111 Pali Rohár 2014-12-09 16:36:00 UTC
Looks like patch was commited to mesa git:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=67dcbcd92cb9877a04747d6cf7fef14c2b8af8b3
Comment 112 madcatx 2014-12-09 16:45:36 UTC
Is there any chance of this getting backported to stable 10.3 series? The v5 fix works fine for me with 10.3.3.
Comment 113 Alex Deucher 2014-12-09 16:54:54 UTC
(In reply to madcatx from comment #112)
> Is there any chance of this getting backported to stable 10.3 series? The v5
> fix works fine for me with 10.3.3.

Yes, it should show up in the stable releases.  From the commit message:

CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Comment 114 Michel Dänzer 2015-11-20 00:50:50 UTC
*** Bug 93023 has been marked as a duplicate of this bug. ***
Comment 115 Michel Dänzer 2015-11-20 00:51:02 UTC
*** Bug 92518 has been marked as a duplicate of this bug. ***
Comment 116 Michel Dänzer 2015-11-20 00:51:17 UTC
*** Bug 87728 has been marked as a duplicate of this bug. ***
Comment 117 Michel Dänzer 2015-11-20 01:13:11 UTC
Based on the latest duplicates, this still doesn't seem fixed for Tahiti XT. :(

I'll attach a v2 of the Tahiti kernel fix.
Comment 118 Michel Dänzer 2015-11-20 01:14:00 UTC
Created attachment 119959 [details] [review]
Tahiti fix v2

Does this help on 7870 XT?
Comment 119 daniel.barabasa 2015-11-20 14:55:55 UTC
Created attachment 119985 [details]
v2 patch dmesg

No dice with patch, attached another dmesg. Maybe I applied the patch wrong?
Comment 120 Ben 2015-11-21 12:08:57 UTC
Hi, I was the person who reported Bug 92518 and I switched to Fedora 23 to try and compile drm, xf86-video-ati, and mesa from the git master.

I am assuming the "Tahiti fix v2" patch is for mesa, but I have no idea how to apply the patch. After some googling, I guess I am supposed to use:
git-apply --directory=<path to file of interest>

The problem is that in the patch, I am supposed to look for a path to the file called a/drivers/gpu/drm/radeon/si.c, but that file si.c does not exist in the mesa tree when I ran the command:
find <mesa source folder> si.c

I need some help with this so that I can test the patch.
Comment 121 daniel.barabasa 2015-11-21 13:29:44 UTC
AFAIK the patch has to be applied to the linux kernel, not to mesa. That's how I did it after all, but it didn't seem to fix the issue.
I'm using Arch Linux btw, so I had to a command to the PKGBUILD in order to apply the patch during the build process.
Comment 122 Ben 2015-11-22 00:44:07 UTC
Thanks summerrainbowz, I had no idea that was supposed to be for the Kernel. I can also confirm summerrainbowz's results. My dmesg output looks different though.

My setup current is Fedora 23. I compiled and installed mesa, xf86-video-ati, and drm from git master. I also applied the "Tahiti fix v2" patch to vanilla Kernel 4.2 to test. Kernel config settings were identical to Fedora's config. I took out nomodeset from grub boot parameters when testing as well. I am uploading the attached Xorg and dmesg logs.
Comment 123 Ben 2015-11-22 00:44:52 UTC
Created attachment 120010 [details]
dmesg.log with kernel 4.2 with Tahiti fix v2 patch
Comment 124 Ben 2015-11-22 00:45:21 UTC
Created attachment 120011 [details]
Xorg.0.log with kernel 4.2 with Tahiti fix v2 patch
Comment 125 Ben 2015-11-22 01:06:26 UTC
I forgot to link the pictures I took of the graphical corruptions.
First picture:
http://i65.tinypic.com/5xlxj9.jpg

Second Picture:
http://i68.tinypic.com/2yv9k3l.jpg
Comment 126 madmalkav 2016-02-07 00:42:18 UTC
I can confirm same problem with Tahiti LE. Mesa 11.1.1-1 from Manjaro Linux repository.
Comment 127 madmalkav 2016-02-09 20:50:02 UTC
Created attachment 121633 [details]
Dmesg Kernel 4.5RC3

Demseg Kernel 4.5RC3
Comment 128 madmalkav 2016-02-09 20:50:30 UTC
Created attachment 121634 [details]
Xorg.0.log
Comment 129 daniel.barabasa 2016-02-16 19:01:04 UTC
Just wondering guys, is there really nothing that can be done in order to fix this issue?
Comment 130 John Bridgman 2016-02-16 19:16:31 UTC
madmalkav, your dmesg log had messages from the fglrx kernel driver... not saying that *is* your problem but it definitely can't help... any chance you can test on a vanilla system that hasn't had fglrx installed ?

[    9.614226] <6>[fglrx] Maximum main memory to use for locked dma buffers: 7714 MBytes.
[    9.614510] <6>[fglrx]   vendor: 1002 device: 679e revision: 0 count: 1
[    9.615005] <6>[fglrx] ioport: bar 4, base 0xe000, size: 0x100
[    9.615248] <6>[fglrx] Kernel PAT support is enabled
[    9.615263] <6>[fglrx] module loaded - fglrx 15.20.3 [Sep  8 2015] with 1 minors
Comment 131 madmalkav 2016-02-16 20:41:04 UTC
(In reply to John Bridgman from comment #130)
> madmalkav, your dmesg log had messages from the fglrx kernel driver... not
> saying that *is* your problem but it definitely can't help... any chance you
> can test on a vanilla system that hasn't had fglrx installed ?
> 
> [    9.614226] <6>[fglrx] Maximum main memory to use for locked dma buffers:
> 7714 MBytes.
> [    9.614510] <6>[fglrx]   vendor: 1002 device: 679e revision: 0 count: 1
> [    9.615005] <6>[fglrx] ioport: bar 4, base 0xe000, size: 0x100
> [    9.615248] <6>[fglrx] Kernel PAT support is enabled
> [    9.615263] <6>[fglrx] module loaded - fglrx 15.20.3 [Sep  8 2015] with 1
> minors

I can't at the moment as I need this computer for working. I can tell you the system had problems with OSS driver since minute 1, i.e. I had to use the option to use a kernel with propietary drivers in order to manage to install Linux in this machine.

If any other option can be valid -booting the installer media with the OSS driver or installing a kernel without the fglrx module- I'll gladly try that. If not, I will try to format the system and do vanilla tests as soon as possible.
Comment 132 Michel Dänzer 2016-02-17 00:54:47 UTC
I get updates to this report via the dri-devel mailing list, please don't add me to the CC list.
Comment 133 madmalkav 2016-02-27 21:18:28 UTC
I've set up a bounty for fixing this bug. Quantity is quite low, sorry. If anyone else affected by this bug can throw some bucks into this, I think it can help to get a solution sooner.

https://www.bountysource.com/issues/5643054-radeonsi-x11-can-t-start-with-acceleration-enabled
Comment 134 Marek Olšák 2016-03-08 22:44:53 UTC
(In reply to madmalkav from comment #131)
> (In reply to John Bridgman from comment #130)
> > madmalkav, your dmesg log had messages from the fglrx kernel driver... not
> > saying that *is* your problem but it definitely can't help... any chance you
> > can test on a vanilla system that hasn't had fglrx installed ?
> > 
> > [    9.614226] <6>[fglrx] Maximum main memory to use for locked dma buffers:
> > 7714 MBytes.
> > [    9.614510] <6>[fglrx]   vendor: 1002 device: 679e revision: 0 count: 1
> > [    9.615005] <6>[fglrx] ioport: bar 4, base 0xe000, size: 0x100
> > [    9.615248] <6>[fglrx] Kernel PAT support is enabled
> > [    9.615263] <6>[fglrx] module loaded - fglrx 15.20.3 [Sep  8 2015] with 1
> > minors
> 
> I can't at the moment as I need this computer for working. I can tell you
> the system had problems with OSS driver since minute 1, i.e. I had to use
> the option to use a kernel with propietary drivers in order to manage to
> install Linux in this machine.

You should at least set modprobe.blacklist=fglrx on the kernel command line.
Comment 135 Vedran Miletić 2016-03-09 16:35:04 UTC
(In reply to madmalkav from comment #133)
> I've set up a bounty for fixing this bug. Quantity is quite low, sorry. If
> anyone else affected by this bug can throw some bucks into this, I think it
> can help to get a solution sooner.
> 
> https://www.bountysource.com/issues/5643054-radeonsi-x11-can-t-start-with-
> acceleration-enabled

I have upped it to $100. I know this is very low amount of money for the amount of skill required to fix something like this, but I still hope that it will motivate someone.
Comment 136 Ben 2016-03-10 04:53:08 UTC
It is great that there is a bounty, but the developers are actually asking for more information to actually be able to solve the problem.

madmalkav, I think Marek is asking you to blacklist the fglrx driver temporarily at boot up so that the system reverts to the open source drivers if those drivers haven't been blacklisted already.
Comment 137 madmalkav 2016-03-10 08:31:00 UTC
I have been testing and commenting my results on the IRC channel, didn't mention anything here as nothing interesting happened.

Blacklisting the module bears no difference. I can upload logs if needed but they are the same just without the fglrx lines.

Also, I'm trying to repeat the tests of the user in comment #104 but I'm totally unable to get any debug message I insert in si_state.c to show on any log. Any tips or a link to some form of "Mesa debugging for dummies" will be gladly appreciate.
Comment 138 Marek Olšák 2016-03-10 10:01:09 UTC
I don't think we should use X if we know that the GPU driver is totally broken. Piglit should be used for such testing.

How to build it:

1) Mesa should be built with:
--with-egl-platforms=x11,drm
This is also required for X acceleration, so it should be set already.

2) Build and install waffle:
git://github.com/waffle-gl/waffle

3) Build piglit (no install):
https://cgit.freedesktop.org/piglit/
Configure it with ccmake and enable waffle.


How to get ready:
1) Boot with the "text" kernel parameter (disables X) and also add "radeon.lockup_timeout=0" to prevent the kernel driver from trying to recover from GPU hangs.
2) Go to the piglit/bin directory.
3) Type: export PIGLIT_PLATFORM=gbm


Tests to run:

1) If this works, most things will work:
./fbo-generatemipmap-formats -auto

2) Something simpler:
./ext_transform_feedback-position -auto

3) You can invoke very simple internal driver tests by setting GALLIUM_TESTS=1. This will exit before the program can do something, so the executable doesn't matter. For example:
GALLIUM_TESTS=1 ./ext_transform_feedback-position


Diagnosing GPU hangs:

If the GPU hangs during these tests, you can see errors in dmesg. I recommend using radeontop for overview of which GPU hw blocks are busy. If some blocks report 100% activity for no reason, they are stuck.

Which blocks are stuck is the first piece of information we need to know. Then, we need to know if any internal driver tests pass if you run something with GALLIUM_TESTS=1 (see above).
Comment 139 madmalkav 2016-03-10 10:41:16 UTC
Created attachment 122201 [details]
attachment-32429-0.html

Thanks for the great explanation, Marek. I will try to start tests today but probably I won't have enough time until weekend.





On Thu, Mar 10, 2016 at 2:01 AM -0800, <bugzilla-daemon@freedesktop.org> wrote:





https://bugs.freedesktop.org/show_bug.cgi?id=60879

--- Comment #138 from Marek Olšák <maraeo@gmail.com> ---
I don't think we should use X if we know that the GPU driver is totally broken.
Piglit should be used for such testing.

How to build it:

1) Mesa should be built with:
--with-egl-platforms=x11,drm
This is also required for X acceleration, so it should be set already.

2) Build and install waffle:
git://github.com/waffle-gl/waffle

3) Build piglit (no install):
https://cgit.freedesktop.org/piglit/
Configure it with ccmake and enable waffle.


How to get ready:
1) Boot with the "text" kernel parameter (disables X) and also add
"radeon.lockup_timeout=0" to prevent the kernel driver from trying to recover
from GPU hangs.
2) Go to the piglit/bin directory.
3) Type: export PIGLIT_PLATFORM=gbm


Tests to run:

1) If this works, most things will work:
./fbo-generatemipmap-formats -auto

2) Something simpler:
./ext_transform_feedback-position -auto

3) You can invoke very simple internal driver tests by setting GALLIUM_TESTS=1.
This will exit before the program can do something, so the executable doesn't
matter. For example:
GALLIUM_TESTS=1 ./ext_transform_feedback-position


Diagnosing GPU hangs:

If the GPU hangs during these tests, you can see errors in dmesg. I recommend
using radeontop for overview of which GPU hw blocks are busy. If some blocks
report 100% activity for no reason, they are stuck.

Which blocks are stuck is the first piece of information we need to know. Then,
we need to know if any internal driver tests pass if you run something with
GALLIUM_TESTS=1 (see above).

--
You are receiving this mail because:
You are on the CC list for the bug.
Comment 140 madmalkav 2016-03-10 10:41:23 UTC
*** Bug 71689 has been marked as a duplicate of this bug. ***
Comment 141 madmalkav 2016-03-10 21:26:31 UTC
Created attachment 122212 [details]
Mesa debug file after tests with Marek on the IRC
Comment 142 Marek Olšák 2016-03-10 22:29:15 UTC
CP works. Shaders don't work. The hardware hangs in the vertex shader. The draw call doesn't even enable the rasterizer.

radeon/si.c:si_setup_spi looks very wrong to me:
- The function sets SPI_STATIC_THREAD_MGMT_3, which only configures CUs for LS and HS stages.
- I don't understand why SPI_STATIC_THREAD_MGMT_3 is set 16 times?
- SPI_STATIC_THREAD_MGMT_1 (PS,VS) and SPI_STATIC_THREAD_MGMT_2 (GS,ES) are not set at all.

It looks like that's the root cause of this bug.
Comment 143 Marek Olšák 2016-03-11 14:04:01 UTC
Created attachment 122225 [details] [review]
possible fix

Can you test the attached patch?

How to build the kernel:
- use "git clone" to get the kernel source
- go to the kernel directory
- git am $patch_filename # apply the patch
- cp /boot/config-`uname -r` .config # copy your current kernel config
- make -j4
- sudo make modules_install
- sudo make install
Comment 144 madmalkav 2016-03-11 21:38:39 UTC
Created attachment 122240 [details]
Dump with Marek patch
Comment 145 Marek Olšák 2016-03-11 21:59:30 UTC
Can you please attach dmesg with the patch?
Comment 146 Marek Olšák 2016-03-11 22:07:55 UTC
You don't have to run the tests. Dmesg after boot is sufficient.
Comment 147 madmalkav 2016-03-11 22:24:23 UTC
Created attachment 122244 [details]
Dmesg from the boot I took the debug dump
Comment 148 Marek Olšák 2016-03-11 23:21:52 UTC
What I had thought was incorrect kernel code is actually correct and hw folks confirmed it. To be completely honest with you, I have absolutely no idea why Tahiti LE driver support is broken.
Comment 149 bhaallord 2016-04-28 19:00:53 UTC
Are there any new information about this bug? I would like to use my GPU again.
Comment 150 madmalkav 2016-06-05 02:11:09 UTC
Created attachment 124323 [details]
New Mesa dump with Kernel 4.7rc1, mesa-git, llvm-svn
Comment 151 madmalkav 2016-06-05 02:23:17 UTC
Created attachment 124324 [details]
dmesg after the mesa dump

No more "Ring stalled..." messages with kernel 4.7rc1
Comment 152 clogged.drainpipe 2016-10-09 10:09:40 UTC
Tahiti LE is still broken, after 3 years. Not trying to be a dick, but FFS, at this point ALL the other cards based on very similar chips(7950, 7970, 280X) work very well, and yet this one is still broken. I mean really, once you have all the work well done for all similar cards, how can it be so hard to bring this one to life?
I just upgraded from a HD4890 to a Tahiti LE card. I did not bother to check how well it is supported under Linux before buying it, because I assumed that all GCN1 cards are very well supported via radeonsi. Now I found out, that my card is probably the only one that is not supported at all, so to say I am mad would be an understatement.
Comment 153 Vedran Miletić 2016-10-09 14:17:51 UTC
(In reply to madmalkav from comment #151)
> Created attachment 124324 [details]
> dmesg after the mesa dump
> 
> No more "Ring stalled..." messages with kernel 4.7rc1

madmalkav, any more recent findings?
Comment 154 madmalkav 2016-10-09 14:35:57 UTC
Nothing. More people doing tests surely will help, getting one of this cards to a developer will be great, but I can't afford that at the moment -come on, AMD, you surely have one or two on a basement...-
Comment 155 smoki 2016-10-09 17:26:59 UTC
 At this point people might wanna try amdgpu driver too from agd5f tree... i think i saw some commits for harvested chips there maybe week ago.
Comment 156 smoki 2016-10-09 17:54:48 UTC
 Or it was 3 weeks ago :D anyway who knows some magic touches like this might change something:

 https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-4.7&id=d207295db45b576eddf60749c0c24fc8528f3c80
Comment 157 madmalkav 2016-10-11 20:30:58 UTC
Created attachment 127226 [details]
Xorg log with lastest 4.9-wip kernel and mesa master branch

I've tried agd5f's 4.9 wip beanch with latest Mesa. Wflinfo fails with an "amdgpu: unkown family" error, no Gallium dump log, error on dmesg:

[  353.579289] wflinfo[1529]: segfault at 8 ip 00007f8260c8dd6a sp 00007ffe2f8d74f0 error 4 in radeonsi_dri.so[7f8260912000+8fc000]

If I try to start X, similar errors appears.
Comment 158 Michel Dänzer 2016-10-12 03:11:57 UTC
(In reply to madmalkav from comment #157)
> I've tried agd5f's 4.9 wip beanch with latest Mesa. Wflinfo fails with an
> "amdgpu: unkown family" error, no Gallium dump log, error on dmesg:

Which Mesa Git commit is that exactly? It looks like it's before SI support was added to the amdgpu winsys code. Double-check that you're really building current Git master, and that your self-built radeonsi_dri.so is getting picked up.
Comment 159 madmalkav 2016-10-15 10:57:04 UTC
(In reply to Michel Dänzer from comment #158)
> Double-check that you're really building current Git master, and that your
> self-built radeonsi_dri.so is getting picked up.

You are probably right, but for personal reasons I won't continue testing things for this bug. I hope some of other affected users step up and continue the tests.

I will remain subscribed to the bug so I can grant the bounty if someone gets to fix it.
Comment 160 Ben 2016-11-25 07:19:47 UTC
Created attachment 128182 [details]
openSuse Tumbleweed - Linux 4.8.10

So I tested my card again by installing openSuse Tumbleweed on my computer again and I don't know what happened, but I am now able to reach the KDE login screen.

While the computer is booting up, there are graphical corruptions so I checked dmesg to see if the card is really ok or not. The driver still has some trouble with the card, but I am able to get into a graphical environment. How can I help to completely fix the issue with this card?
Comment 161 madmalkav 2016-12-03 10:56:15 UTC
Ben, I'm afraid the initialization of the conflicting parts of the card is just delayed in your current install, that's why it fails later. Hope I'm wrong.

If you want support for your tests, #radeon @ irc.freenode.net is always full of people that will give you a hand with that.
Comment 162 John Steele Scott 2016-12-11 09:17:35 UTC
I just bit the bullet and swapped out my Radeon HD 7870 XT for an RX 480 due to this issue. I'll happily donate the 7870 to any developer who would like to put it to use fixing this bug. Get in touch in the next week or so if interested. Otherwise I'll try and sell it before Christmas, it's still a good card for Windows gaming.
Comment 163 Ayhan 2016-12-13 18:27:20 UTC
Created attachment 128452 [details]
dmesg for linux 4.9 amdgpu driver

The attachment is dmesg of linux 4.9 and AMDGPU driver is in use. After startx, the colorful scrambled screen happened again, but it stayed still, not resetting screen for every 5-6 seconds like radeon driver. I issued a reboot command via SSH shell, it returned into old framebuffer correctly and rebooted succesfully, unlike the radeon driver case.

Dmesg file is the original one, not the journalctl capture, journalctl sometimes omits some messages.

This line was written after startx:

[drm] xxxx: dce_v6_0_afmt_setmode ----no impl !!!!!!!!

Also that line was omitted by journalctl...
Comment 164 Vedran Miletić 2017-03-22 16:05:46 UTC
*** Bug 70779 has been marked as a duplicate of this bug. ***
Comment 165 MAD 2017-08-03 21:06:42 UTC
4.10.0-28-generic
[AMD/ATI] Tahiti LE [Radeon HD 7870 XT]

I have the same problem with this card. 

On amdgpu I got black screen. But monitor remains on. I can log in via ssh and all.

On radeon first I get screen full of colourful pixels, then black screen, then monitor says no-signal and goes stand-by. And then pc hangs and I can no longer log in via ssh.

On radeon there are a lot of 
radeon 0000:01:00.0: ring 3 stalled for more than 10036msec
kinda logs in dmesg. But on amdgpu there is nothing interesting really. It seems like it almost works, except for the black screen ;)

I think I'm gonna try fresh kernel from padoka ppa next. When I have some free time.
Comment 166 MAD 2017-08-03 21:10:37 UTC
Created attachment 133231 [details]
dmesg amdgpu kernel 4.10
Comment 167 MAD 2017-08-03 21:11:56 UTC
Created attachment 133232 [details]
Xorg.log amdgpu kernel 4.10
Comment 168 MAD 2017-08-03 21:14:45 UTC
Created attachment 133233 [details]
dmesg radeon kernel 4.10
Comment 169 David Verelst 2017-09-20 19:47:34 UTC
Created attachment 134388 [details]
dmesg radeon kernel 4.12.13

Not sure if it helps, but here's another dmesg output from radeon + kernel 4.12.13. What does work though is the fallback to llvmpipe, and I end up with a working system + graphics.
Comment 170 David Verelst 2017-09-30 14:58:24 UTC
Created attachment 134576 [details]
journalctl -k amdgpu with linux-amd-staging-git

Also no luck with AMDGPU. Here's the kernel output for booting with linux-amd-staging-git 4.12.0-2a69a4b35621, (on Arch Linux, see also https://aur.archlinux.org/packages.php?ID=442065).
Comment 171 MAD 2017-11-04 20:45:17 UTC
Created attachment 135238 [details]
journalctl_radeon_4.14.0-041400rc7

[AMD/ATI] Tahiti LE [Radeon HD 7870 XT]
4.14.0-041400rc7-generic from: http://kernel.ubuntu.com/~kernel-ppa/mainline/
OpenGL version string: 3.0 Mesa 17.3.0-rc2 - padoka PPA

Basically still the same, black screen then:
radeon 0000:01:00.0: ring 0 stalled for more than 10244msec

And then:
radeon 0000:01:00.0: GPU reset succeeded, trying to resume

That's the last line in log file, after this I can no longer connect via ssh.
Comment 172 MAD 2017-11-04 21:14:28 UTC
Created attachment 135239 [details]
journalctl_amdgpu_4.14.0-041400rc7

I've had more luck with amdgpu. Well kinda. It finally boots without nomodeset (starting from 4.13 kernel). But falls back to software rendering (Device: llvmpipe). It also complains that:
amdgpu 0000:01:00.0: SI support provided by radeon.
amdgpu 0000:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1 to override.

But when I start it with:
modprobe.blacklist=radeon radeon.si_support=0 amdgpu.si_support=1 

screen freezes, gnome doesn't start and it produces output from attachment. Something about "dead whales":
lis 04 19:35:39 pc gnome-session-binary[1369]: CRITICAL: We failed, but the fail whale is dead. Sorry....

There is also something about:
lis 04 19:34:09 pc org.gnome.Shell.desktop[1440]: amdgpu_device_initialize: Cannot parse ASIC IDs, 0xffffffea./usr/share/libdrm/amdgpu.ids: No such file or directory
Comment 173 madmalkav 2018-03-20 22:30:03 UTC
Created attachment 138230 [details]
attachment-10505-0.html

As it has failed to attract any developer attention in for two years, I have cancelled the bountysource reward.
________________________________
From: bugzilla-daemon@freedesktop.org <bugzilla-daemon@freedesktop.org>
Sent: Saturday, November 4, 2017 10:14:28 PM
To: myhateisblind@hotmail.com
Subject: [Bug 60879] [radeonsi] Tahiti LE: GFX block is not functional, CP is okay


Comment # 172<https://bugs.freedesktop.org/show_bug.cgi?id=60879#c172> on bug 60879<https://bugs.freedesktop.org/show_bug.cgi?id=60879> from MAD<mailto:adamczuk@tlen.pl>

Created attachment 135239 [details]<attachment.cgi?id=135239> [details]<attachment.cgi?id=135239&action=edit>
journalctl_amdgpu_4.14.0-041400rc7

I've had more luck with amdgpu. Well kinda. It finally boots without nomodeset
(starting from 4.13 kernel). But falls back to software rendering (Device:
llvmpipe). It also complains that:
amdgpu 0000:01:00.0: SI support provided by radeon.
amdgpu 0000:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1 to override.

But when I start it with:
modprobe.blacklist=radeon radeon.si_support=0 amdgpu.si_support=1

screen freezes, gnome doesn't start and it produces output from attachment.
Something about "dead whales":
lis 04 19:35:39 pc gnome-session-binary[1369]: CRITICAL: We failed, but the
fail whale is dead. Sorry....

There is also something about:
lis 04 19:34:09 pc org.gnome.Shell.desktop[1440]: amdgpu_device_initialize:
Cannot parse ASIC IDs, 0xffffffea./usr/share/libdrm/amdgpu.ids: No such file or
directory

________________________________
You are receiving this mail because:

  *   You are on the CC list for the bug.
Comment 174 EmilyBrown 2019-02-01 07:49:07 UTC Comment hidden (spam)
Comment 175 MAD 2019-02-03 10:01:21 UTC
Created attachment 143276 [details]
journalctl-b0-radeonsi-4.20.6.log
Comment 176 MAD 2019-02-03 10:01:51 UTC
Created attachment 143277 [details]
Xorg-radeonsi-4.20.6.log
Comment 177 MAD 2019-02-03 10:02:42 UTC
Created attachment 143278 [details]
journalctl-b0-amdgpu-4.20.6.log
Comment 178 MAD 2019-02-03 10:03:15 UTC
Created attachment 143279 [details]
Xorg-amdgpu-4.20.6.log
Comment 179 emadyassen1998@yahoo.com (Spammer; Account disabled) 2019-07-20 00:48:02 UTC Comment hidden (spam)
Comment 180 Alex Deucher 2019-09-20 13:37:11 UTC
Does booting with pci=noats on the kernel command line in grub help?
Comment 181 GitLab Migration User 2019-09-25 17:50:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1208.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.