Bug 40886

Summary: Improve our lockup detection, reporting and recovery
Product: xorg Reporter: Martin Peres <martin.peres>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: anssi, bryce
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 40884    

Description Martin Peres 2011-09-14 12:38:54 UTC
At the moment, we only output the errors that the GPU reports in the kernel logs. However, these are usually not helpful in any way.

To improve the quality of bug reports, it is also necessary to output meaningful registers values and try to understand roughly were the problem is. If possible, an error code should be generated to help merging bug reports into a meaningful one.

This task is *very* suitable for students who want to learn about nouveau. If you consider applying to this, please know that you will have a lot of documentation to read and you will also be required to ask many questions to the Nouveau developers. The actual implementation should be quite small.

The Ubuntu xorg team proposed us to improve our bug "reportability". Here is what they have available on the intel driver that we could actually try to copy.

# Jesse Barnes on ubuntu-devel@lists.ubuntu.com:
#   You'll get three events, one when the error is detected, one before
#   the reset and one after.  Each has a different environment variable set;
#   the initial error has ERROR=1, the pre-reset event has RESET=1 and the
#   post-reset event has ERROR=0.

# Disable freeze hook.
SUBSYSTEM=="drm", ACTION=="change", ENV{ERROR}=="1", RUN+="/usr/share/apport/apport-gpu-error-intel.py"

The python script copies dmesg, Xorg.0.log, and
/sys/kernel/debug/dri/0/i915_error_state.  The latter is an
intel-specific error dump they use to help diagnose bugs.
We also capture a variety of other data and files, but those three seem
to be what the devs want, mostly.

We extract a couple error codes from the error_state file to use as a
way of automatically detecting dupes.

Here's a few examples of the results of all this:

  - https://bugs.freedesktop.org/show_bug.cgi?id=35854
  - https://bugs.freedesktop.org/show_bug.cgi?id=34014
  - https://bugs.freedesktop.org/show_bug.cgi?id=34307
Comment 1 Ilia Mirkin 2013-08-18 18:10:24 UTC
It appears that this bug report has laid dormant for quite a while. Sorry we haven't gotten to it. Since we fix bugs all the time, chances are pretty good that your issue has been fixed with the latest software. Please give it a shot. (Linux kernel 3.10.7, xf86-video-nouveau 1.0.9, mesa 9.1.6, or their git versions.) If upgrading to the latest isn't an option for you, your distro's bugzilla is probably the right destination for your bug report.

In an effort to clean up our bug list, we're pre-emptively closing all bugs that haven't seen updates since 2011. If the original issue remains, please make sure to provide fresh info, see http://nouveau.freedesktop.org/wiki/Bugs/ for what we need to see, and re-open this one.

Thanks,

The Nouveau Team

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.