Bug 80141 - Fails to page flip multiple time, queue overflows waiting for one to finish that never does crashing entire system.
Summary: Fails to page flip multiple time, queue overflows waiting for one to finish t...
Status: RESOLVED DUPLICATE of bug 79980
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-17 15:22 UTC by Aaron B
Modified: 2014-07-10 07:29 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Xorg crash log of the system. (85.13 KB, text/plain)
2014-06-17 15:22 UTC, Aaron B
no flags Details
My hardware info. (66.53 KB, text/plain)
2014-06-17 15:25 UTC, Aaron B
no flags Details
Dmesg output. (69.16 KB, text/plain)
2014-06-17 17:53 UTC, Aaron B
no flags Details
Crash information from card unresponsive crash. This one recovered from it, though. (201.31 KB, text/plain)
2014-06-19 18:42 UTC, Aaron B
no flags Details
New crash of DRM that led to kernel panic on 3.16-rc2. (74.48 KB, text/plain)
2014-06-22 22:28 UTC, Aaron B
no flags Details
New crash with XOrg backtrace. (74.51 KB, text/plain)
2014-06-23 18:50 UTC, Aaron B
no flags Details

Description Aaron B 2014-06-17 15:22:32 UTC
Created attachment 101236 [details]
Xorg crash log of the system.

We have events pile up in the queue when the card becomes unresponsive. The only place I've seen this happen besides one time in Counter Strike Source, are youtube in Chromium running HTML5 video. More so when switching tabs to or from the video when a lot of pixels on the screen changing. It does not have to be rendering to crash, though, it happens off screen too. Otherwise no other software triggers this bug nearly as consistently. I am attaching the XOrg failure log. I'm on Mesa and firmware from the Oibaf PPA. 3.16-rc1 kernel, and this is in every kernel for a while as far as I remember. From the log, you can see the errors and how it says the queue overflows and the device is waiting for something to happen, it more than likely is DRM related. The screen usually goes out and stops rendering on the HDMI channel that I use, but sometimes comes back with very badly botched video if it does, both cases result in a 100% unresponsive system, besides audio will usually continue to play until the end of the video.

Hardware is an R9 270X Radeon card. I only use HDMI video output. I am using the mc2 files even though they are not officially with any firmware package, I added them when using the 3.15+ kernels. Any more info or anything else needed just ask, I'll supply it.
Comment 1 Aaron B 2014-06-17 15:25:41 UTC
Created attachment 101237 [details]
My hardware info.
Comment 2 Alex Deucher 2014-06-17 16:34:43 UTC
Please attach your dmesg output.  What was the last kernel that was working without problems?
Comment 3 Aaron B 2014-06-17 17:53:03 UTC
Created attachment 101259 [details]
Dmesg output.

I'm adding it for 3.15, currently trying to get this kernel to crash. Give me time I'll test down to 3.13 and see if it's in all of them, but I'll need time to find out which crash. I believe this has been a problem since 3.13 IIRC. Before 3.13, the card was basically unusable, though, so I'm not going to test down below that.
Comment 4 Aaron B 2014-06-19 18:42:11 UTC
Created attachment 101374 [details]
Crash information from card unresponsive crash. This one recovered from it, though.

Just had a crash happen on 3.15 generic, I'm going to go back to 3.13 now and see if I can trigger it on there from doing the same things.
Comment 5 Aaron B 2014-06-22 22:28:32 UTC
Created attachment 101543 [details]
New crash of DRM that led to kernel panic on 3.16-rc2.

(In reply to comment #2)
> Please attach your dmesg output.  What was the last kernel that was working
> without problems?

Added new file, I'm failing to find the kernel panic output with 3.16-rc2. It mentioned something wasn't syncing correctly so it shut everything down. It was a DRM bug from the read out. This is the log right up to before that happened, and the time stamp for page flipping failing is the same as the output from the panic so let me try to find it.
Comment 6 Aaron B 2014-06-23 18:50:12 UTC
Created attachment 101605 [details]
New crash with XOrg backtrace.

Xorg error reporting file, has a backtrace of the fault.
Comment 7 Michel Dänzer 2014-06-24 02:51:00 UTC
(In reply to comment #6)
> Xorg error reporting file, has a backtrace of the fault.

Thanks, but those backtraces are not useful for doing anything about the problem: They just show that the X server's input queue is overflowing, because the GPU is hanging.

Have you been able yet to find an older kernel where the problem doesn't occur? With 3.15 or newer, might be worth trying the patches from http://lists.freedesktop.org/archives/dri-devel/2014-June/062245.html .
Comment 8 Michel Dänzer 2014-06-24 06:25:25 UTC
(In reply to comment #7)
> With 3.15 or newer, might be worth trying the patches from
> http://lists.freedesktop.org/archives/dri-devel/2014-June/062245.html .

Don't bother trying those patches just yet, they don't seem to be working yet.
Comment 9 Aaron B 2014-06-24 18:22:28 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > With 3.15 or newer, might be worth trying the patches from
> > http://lists.freedesktop.org/archives/dri-devel/2014-June/062245.html .
> 
> Don't bother trying those patches just yet, they don't seem to be working
> yet.

Okay no problem. I just had a crash happen and recover. What commands should I run to get any useful log information for this? I'd post more logs, but it'd just be more of the same basically. I'll fall back down to 3.13 for now and try to get it to occur.
Comment 10 Michel Dänzer 2014-06-25 03:38:01 UTC
The patches from http://lists.freedesktop.org/archives/dri-devel/2014-June/062305.html seem to improve stability with 3.15 for me.
Comment 11 Aaron B 2014-06-27 02:51:01 UTC
(In reply to comment #10)
> The patches from
> http://lists.freedesktop.org/archives/dri-devel/2014-June/062305.html seem
> to improve stability with 3.15 for me.

I don't have the ability as it is to apply those patches, although I could probably figure it out. But I'm on 3.14. And I have no youtube problems, just the old no audio over HDMI that I remember, so the first problematic kernel with youtube is 3.15. Although 3.14 does have random crashes for other bugs and such, but I don't believe they are the same problem and I also don't believe they are in 3.15 either, just the one chromium youtube bug. My friend also has a gaming linux rig very similar, different brands on the GPU, Mobo, but yet same problems as me so I'm also going to say it's not a hardware problem too just to verify if anyone was thinking that. Anything needs tested or if I could even figure out how to make it happen easier than a random youtube video problem, shoot me ideas so I can troubleshoot. Or help you guys out.
Comment 12 Michel Dänzer 2014-06-27 03:06:57 UTC
(In reply to comment #11)
> Anything needs tested [...]

It would be very interesting if those patches help for your problem as well.
Comment 13 Aaron B 2014-06-27 06:49:59 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Anything needs tested [...]
> 
> It would be very interesting if those patches help for your problem as well.

I need to test more, but I've been trying to make it crash, playing games and youtube-ing the last few hours. I don't believe there's any problems with those DRM patches applied. Applied just to a bare 3.15 from linus's git, and all is well. I'll report any crashes here over the next few days if I get any, but these seem to help. Great FPS in games to boot, just like the ones 3.15 did bring by it's self. Just, more stable, like said in the comments of the patches. I'll let this bug post know anything that comes about, but this seems to work really well from the last 3 hours of using it. I'm on an R9 270X, like said. So Pitcairn/Tahiti architecture here. But, all is well.
Comment 14 Aaron B 2014-06-29 04:27:32 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Anything needs tested [...]
> 
> It would be very interesting if those patches help for your problem as well.

OKay, after some more testing, I have one bug still present in this version, and also one problem which may not be related.

Bug #1: Cinnamon restarted when moused over the top panel. Made login sound, and lost my BG preferences and icons went to GNOME-style icons. I don't think this is related, but just noting it.

Bug #2: This was triggered by Chromium, when resizing the window/dragging with the mouse on the top of the screen. This one particular time made the screen jump, like I've also seen in Steam, but this one triggers a black scanline, (very very rarely) accompanied by a slight (1px vertical) tear upwards at the scanline the black line is on. They happen more towards the bottom and middle of the screen, although they start at the top. It happened once every 10 seconds or so when it started, but now is happening 1-2 times a second sometimes with multiple black lines in the same frame it seems. The black line occurence is increasing as time goes on. This was VERY common in 3.14 kernel, so the issue probably is just being put off by the new dri implementation I was given to test, but still exists. I don't really know weather to make a new bug report or not specifically for that, I don't really know what components to look at either. Any ideas, tests, etc. I can do them. But still, this has seemed to fix the random complete failure crashes. Or, maybe with the black lines, has just delayed them. I can't really tell.
Comment 15 Aaron B 2014-06-29 04:36:03 UTC
> Bug #2: This was triggered by Chromium, when resizing the window/dragging
> with the mouse on the top of the screen. This one particular time made the
> screen jump, like I've also seen in Steam, but this one triggers a black
> scanline, (very very rarely) accompanied by a slight (1px vertical) tear
> upwards at the scanline the black line is on. They happen more towards the
> bottom and middle of the screen, although they start at the top. It happened
> once every 10 seconds or so when it started, but now is happening 1-2 times
> a second sometimes with multiple black lines in the same frame it seems. The
> black line occurence is increasing as time goes on. This was VERY common in
> 3.14 kernel, so the issue probably is just being put off by the new dri
> implementation I was given to test, but still exists. I don't really know
> weather to make a new bug report or not specifically for that, I don't
> really know what components to look at either. Any ideas, tests, etc. I can
> do them. But still, this has seemed to fix the random complete failure
> crashes. Or, maybe with the black lines, has just delayed them. I can't
> really tell.

This seems to be caused by Chromium, BUT. I believe the colors being sent to the screen were affecting it. Looking at my desktop, the lines were not on the screen. They used to be in 3.14, though, even on the desktop. But, with this glitch they were not. I opened Libreoffice, with mainly white pixels on the left side, flipped the screen 180 degrees, and...turning off the on HDMI, or whatever it does to the display when it flipped, fixed it, and now it's back to normal. But, like I said, I would wager the glitch was caused by pixel colors, or maybe colors were looked at as commands and the pixel clock frequency is a tad off. I'm not 100% sure on any of it, just saying what fixed it for me so if it's looked in to, it can help. If I could make a new bug report just for that, I will.
Comment 16 Aaron B 2014-07-02 05:37:23 UTC
Just had a crash while playing games, but not as bad. It was a complete loss of responsiveness with the system, yet the mouse still rendered. I don't know if it's an xorg problem on that or the driver, just reporting what happened. But xorg after this update fails to render anything correctly, so now I'll be out for testing anything until that new issue is fixed. It seems the problems that I had before aren't fixed at all, just delayed much more with the custom kernel with the drm patch. So the problem with random crashes and overflows, probably isn't fixed, just delayed or better controlled with this new code. But all other issues with the black lines, screen jumps, etc. all still exist.
Comment 17 Aaron B 2014-07-09 15:42:31 UTC
Seems to be a duplicate of Bug #79980.

*** This bug has been marked as a duplicate of bug 79980 ***
Comment 18 Aaron B 2014-07-10 07:27:13 UTC
Duplicate of 79980 as well.

*** This bug has been marked as a duplicate of bug 79011 ***
Comment 19 Aaron B 2014-07-10 07:29:40 UTC
Also a duplicate of the bug at https://bugzilla.kernel.org/show_bug.cgi?id=79011. Sorry, marked it as the bug from the forum I'm looking at, but didn't realize it encompassed multiple bugzilla domains.

*** This bug has been marked as a duplicate of bug 79980 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.