Bug 43835

Summary:

System crashes when radeon firmware blob (R520_cp.bin) is installed

Product:

DRI

Reporter:

Camaleón <noelamac>

Component:

DRM/Radeon

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED MOVED

QA Contact:

Severity:

normal

Priority:

medium

CC:

jrnieder

Version:

XOrg git

Hardware:

Other

OS:

All

URL:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=651532

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Logs ("dmesg" and "xorg.0.log" for kernels 3.2-rc4 and 3.1 with and without the firmware installed)	none
glxinfo	none
glxinfo (with 3D enable)	none
glxinfo	none
dmesg	none
syslog	none
xorg.0.log	none
Dmesg with "drm.debug=6"	none
Syslog with "drm.debug=6"	none
Syslog with "drm.debug=6" + call trace	none
Output of "xrandr"	none
dmesg with kernel 3.3-rc3	none
syslog with kernel 3.3-rc3	none
Xorg.0.log with kernel 3.3-rc3	none

Description Camaleón 2011-12-14 09:06:57 UTC

Created attachment 54426 [details]
Logs ("dmesg" and "xorg.0.log" for kernels 3.2-rc4 and 3.1 with and without the firmware installed)

1. Steps to reproduce the problem

Running Debian Wheezy, by installing the package "firmware-linux-nonfree" which contains the firmware to enable 3D acceleration for the ATI card (M56P Radeon Mobility X1600), when the system starts and user logins, the system crashes (no specific action triggers the crash, is just about time to get it).

2. Symptoms

The user receives a "kernel oops" (kernel 3.1) or system hangs with a trace (kernel 3.2-rc4) and system locks.

3. Tested kernels

The user has tested kernel 3.1 (Wheezy's stock kernel) and 3.2-rc4 (from Debian's experimental branch). Both kernels expose the same result when firmware is installed. On the other hand, both kernels work fine as soon as the firmware package is unistalled.

4. Additional information

The crash has been tracked in Debian BTS #651532 (full link available in the URL field).

5. Attached logs (4 files):

- "dmesg" and "Xorg.0.log" for kernels 3.2-rc4 and 3.1 when firmware is installed.

- "dmesg" and "Xorg.0.log" for kernels 3.2-rc4 and 3.1 when firmware is not installed.

6. Other considerations

Please, note that I am opening the this bug on behalf of another person who is experiencing the crash. For this reason I'm CC'ing to him.

Comment 1 Alex Deucher 2011-12-14 09:19:34 UTC

In the future, please attach the dmesg and log files directly.  It looks like it's a problem with acceleration (which is available without the firmware).  I don't see any oops or backtraces in the logs.  Can you attach the oops or get a picture of it?  

Does setting:
Option "NoAccel" "True"
in the device section of your xorg.conf fix the problem?

Comment 2 Alex Deucher 2011-12-14 09:22:37 UTC

which is NOT available without the firmware

Comment 3 Camaleón 2011-12-14 09:56:05 UTC

(In reply to comment #1)
> In the future, please attach the dmesg and log files directly.  

Will do, sorry.

> It looks like it's a problem with acceleration (which is available without 
> the firmware).  I don't see any oops or backtraces in the logs.  Can you 
> attach the oops or get a picture of it?  

Kernel oops and backtrace are available at Debian bug. Direct links:

- Kernel 3.1

(syslog)
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=20111209_syslog_kernel_oops.txt;att=1;bug=651532

(snapshot)
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=20111209_snapshot_kernel_oops.jpg;att=4;bug=651532

- Kernel 3.2-rc4

(syslog)
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=21;filename=20111212_syslog_kernel_3_2_rc3;att=2;bug=651532

(snapshot)
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=21;filename=20111212_snapshot_kernel_trace.jpg;att=1;bug=651532

I can attach the files to this bug report if you find it convenient.
 
> Does setting:
> Option "NoAccel" "True"
> in the device section of your xorg.conf fix the problem?

I have asked the user to try with this option while having the firmware package installed, will report back as soon as I get the results.

Comment 4 Camaleón 2011-12-15 00:49:15 UTC

(In reply to comment #3)

> I have asked the user to try with this option while having the firmware package
> installed, will report back as soon as I get the results.

The user reported that both kernels do work (no crashes) with "firmware-linux-nonfree" installed and using this "/etc/X11/xorg.conf" file:

***
Section "Device"
    Identifier  "ATI"
    Driver      "radeon"
    Option      "NoAccel" "True"
EndSection

Section "Screen"
    Identifier "Default Screen"
    DefaultDepth     24
EndSection
***

This effectively disables 3D acceleration (which means no "gnome-shell") but the user hasn't experienced any further crash since yesterday.

Comment 5 Alex Deucher 2011-12-15 07:29:00 UTC

What version of the 3D driver is he using?  You might try a newer 3D driver package.  Make sure he is using the r300 gallium driver (r300g).

Comment 6 Camaleón 2011-12-15 07:47:32 UTC

(In reply to comment #5)
> What version of the 3D driver is he using?  

How could we check this?

> You might try a newer 3D driver package.  Make sure he is using the r300 
> gallium driver (r300g).

As he's on Debian Wheezy he has installed "libgl1-mesa-dri (7.11.1-1)" but not sure if this tells you something.

Comment 7 Alex Deucher 2011-12-15 08:43:49 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > What version of the 3D driver is he using?  
> 
> How could we check this?

Please attach the output of glxinfo.

> 
> > You might try a newer 3D driver package.  Make sure he is using the r300 
> > gallium driver (r300g).
> 
> As he's on Debian Wheezy he has installed "libgl1-mesa-dri (7.11.1-1)" but not
> sure if this tells you something.

Just need to find out if they are using the classic or gallium driver.

Comment 8 Alex Deucher 2011-12-15 08:44:55 UTC

Does the system hang if you remove the NoAccel option but don't load gnome-shell?

Comment 9 Camaleón 2011-12-15 13:25:23 UTC

Created attachment 54475 [details] [review]
glxinfo

I'm attaching the full output of "glxinfo".

Comment 10 Camaleón 2011-12-15 13:28:13 UTC

(In reply to comment #8)
> Does the system hang if you remove the NoAccel option but don't load
> gnome-shell?

Yes, the user has reported that by removing that option from "xorg.conf" file and login into gnome fallback mode (now "gnome classical") the system hung.

Comment 11 Alex Deucher 2011-12-15 13:40:27 UTC

(In reply to comment #9)
> Created attachment 54475 [details] [review] [review]
> glxinfo
> 
> I'm attaching the full output of "glxinfo".

Unfortunately, you'll end up with the software 3D driver if you have acceleration disabled.  You'll have to find out what debian uses on wheezy (r300c vs. r300g).  However, if you still get hangs even without using 3D, there seems to be a problem with acceleration in general on his system.

Comment 12 Camaleón 2011-12-16 00:01:15 UTC

Created attachment 54487 [details]
glxinfo (with 3D enable)

Comment 13 Camaleón 2011-12-16 00:17:14 UTC

(In reply to comment #11)

> Unfortunately, you'll end up with the software 3D driver if you have
> acceleration disabled.  

I have added the full output while 3D accel is enabled, hope this helps.

> You'll have to find out what debian uses on wheezy (r300c vs. r300g). 

I can't tell... maybe Jonathan can shed some light here :-)

> However, if you still get hangs even without using 3D, there seems to be a 
> problem with acceleration in general on his system.

Curious is that system runs stable (no hangs nor crashes) in gnome fallback mode as soon as "firmware-linux-nonfree" package is removed as stated in #c1.

So what we have now is that system does not crash if:

1/ "firmware-linux-nonfree" package is not installed, or
2/ Option "NoAccel" "True" is set at xorg.conf

Comment 14 Jonathan Nieder 2011-12-16 01:05:48 UTC

Mesa in wheezy ships the gallium r300 driver on all Linux architectures.

Comment 15 Michel Dänzer 2011-12-16 01:51:06 UTC

(In reply to comment #13)
> 1/ "firmware-linux-nonfree" package is not installed, or
> 2/ Option "NoAccel" "True" is set at xorg.conf

These are mostly equivalent, as acceleration is not possible without the microcode with KMS.

What might be interesting would be to try GNOME fallback mode with KMS disabled (radeon.modeset=0) with and without firmware-linux-nonfree installed. Please attach dmesg and Xorg.0.log for both cases again. (The r300g driver doesn't work with KMS disabled)

BTW, does the GNOME fallback mode end up using the same window manager (Metacity?) with and without acceleration being enabled?

Comment 16 Camaleón 2011-12-16 13:23:03 UTC

(In reply to comment #15)
> (In reply to comment #13)
> > 1/ "firmware-linux-nonfree" package is not installed, or
> > 2/ Option "NoAccel" "True" is set at xorg.conf
> 
> These are mostly equivalent, as acceleration is not possible without the
> microcode with KMS.
> 
> What might be interesting would be to try GNOME fallback mode with KMS disabled
> (radeon.modeset=0) with and without firmware-linux-nonfree installed. Please
> attach dmesg and Xorg.0.log for both cases again. (The r300g driver doesn't
> work with KMS disabled)

The user reports that he has tried to disable KMS by all these means:

- Appending "nomodeset" to kernel line
- Appending "radeon.modeset=0" to the kernel line
- Appending "modeset=0" to the kernel line
- Blacklisting radeon module to avoid from loading

But all he gets is a system hang with the following message:

***
Could not update ICEautorithy file /var/lib/gdm3/.ICEauthority
Closing session
***

And that's all. He's now stuck at there, forced to load his MacOS X partition in order to get the system up and running.

> BTW, does the GNOME fallback mode end up using the same window manager
> (Metacity?) with and without acceleration being enabled?

I don't know... anyhow, now I have to help the user to restore back his Debian system to an operating state.

Comment 17 Lucas Stach 2011-12-16 14:48:58 UTC

> But all he gets is a system hang with the following message:
> 
> ***
> Could not update ICEautorithy file /var/lib/gdm3/.ICEauthority
> Closing session
> ***

As far as I can tell this has nothing to do with graphics drivers. I hit this too with my home partition improperly tagged for some changes in SELinux. It seems to me this is more a problem of wrong file permissions.

Comment 18 Camaleón 2011-12-17 01:44:01 UTC

(In reply to comment #17)
 
> As far as I can tell this has nothing to do with graphics drivers. I hit this
> too with my home partition improperly tagged for some changes in SELinux. It
> seems to me this is more a problem of wrong file permissions.

Yes, I know, but this problem was caused as a side effect when trying to disable KMS. Anyway, the user can now again access normally to his system (".ICEauthority" file was manually removed and automatically re-created).

We're ready to run for more tests but please, remember that we're not developers but plain users, we need some guidance on the given steps.

Comment 19 Camaleón 2012-01-22 13:46:46 UTC

The user still reports crashes with kernel 3.2.0-rc7-686-pae. I'm attaching the involved files ("syslog" contains the kernel oops).

I can tell the user to run whatever tests you estimate convenient, he is very interested in solving this because he can't use GNOME3 and gnome-shell at all and has to work with no 3D acceleration.

Comment 20 Camaleón 2012-01-22 13:47:45 UTC

Created attachment 55998 [details]
glxinfo

Comment 21 Camaleón 2012-01-22 13:48:12 UTC

Created attachment 55999 [details]
dmesg

Comment 22 Camaleón 2012-01-22 13:49:03 UTC

Created attachment 56000 [details]
syslog

Comment 23 Camaleón 2012-01-22 13:49:36 UTC

Created attachment 56001 [details]
xorg.0.log

Comment 24 Jonathan Nieder 2012-01-22 14:24:16 UTC

bugzilla-daemon@freedesktop.org wrote:

> The user still reports crashes with kernel 3.2.0-rc7-686-pae. I'm attaching the
> involved files ("syslog" contains the kernel oops).

Summary of log: tests were performed on 22 January.

| 20:37:04 linux dbus[1320]: [system] Activating service name='org.freedesktop.Accounts' (using servicehelper)
| 20:37:04 linux kernel: [  724.400329] dbus-daemon-lau: Corrupted page table at address b817800c
| 20:37:04 linux kernel: [  724.400434] *pdpt = 0000000033666001 *pde = fb274e7ffb274e81 
| 20:37:04 linux kernel: [  724.400530] Bad pagetable: 0009 [#1] SMP 
[... snipping list of modules linked in, because of line wrapping ...]
| 20:37:04 linux kernel: [  724.401727] Pid: 1815, comm: dbus-daemon-lau Not tainted 3.2.0-rc7-686-pae #1 Apple Computer, Inc. iMac5,1/Mac-F4228EC8

The boot was at 20:36.  Maybe those 10 minutes came from NTP or
something.  Next boot:

| 20:41:33 linux dbus[1353]: [system] Activating service name='org.freedesktop.Accounts' (using servicehelper)
| 20:41:33 linux dbus[1353]: [system] Successfully activated service 'org.freedesktop.Accounts'
| 20:41:33 linux accounts-daemon[2275]: started daemon version 0.6.15
| 20:41:34 linux kernel: [   41.669373] ssh: Corrupted page table at address 998f31c
| 20:41:34 linux kernel: [   41.669433] *pdpt = 000000002c80d001 *pde = 000000002f980067 *pte = ff192f4fff192f4f 
| 20:41:34 linux kernel: [   41.669502] Bad pagetable: 0009 [#1] SMP 
[...]
| 20:41:34 linux kernel: [   41.670989] Pid: 2279, comm: ssh Not tainted 3.2.0-rc7-686-pae #1 Apple Computer, Inc. iMac5,1/Mac-F4228EC8

Anyway, the page table seems to get corrupted when X starts.

Could you send a log from booting and starting X with drm.debug=6 on
the kernel command line?

Thanks,
Jonathan

Comment 25 Camaleón 2012-01-23 13:21:32 UTC

Created attachment 56055 [details]
Dmesg with "drm.debug=6"

Comment 26 Camaleón 2012-01-23 13:22:08 UTC

Created attachment 56056 [details]
Syslog with "drm.debug=6"

Comment 27 Camaleón 2012-01-23 13:25:10 UTC

(In reply to comment #24)

> Could you send a log from booting and starting X with drm.debug=6 on
> the kernel command line?

I'm attaching "syslog" and "dmesg" with the above kernel option appended. "Glxinfo" and "Xorg.0.log" seem to provide no additional information.

Comment 28 Jonathan Nieder 2012-01-23 13:25:16 UTC

bugzilla-daemon@freedesktop.org wrote:

> Dmesg with "drm.debug=6"

Hm, no page table corruption/crash this time?

Comment 29 Camaleón 2012-01-23 13:40:19 UTC

(In reply to comment #28)
> bugzilla-daemon@freedesktop.org wrote:
> 
> > Dmesg with "drm.debug=6"
> 
> Hm, no page table corruption/crash this time?

I neither see a kernel oops at the "syslog". I just have asked the user if the system crashed this time again.

Comment 30 Camaleón 2012-01-23 23:58:35 UTC

(In reply to comment #29)

> I neither see a kernel oops at the "syslog". I just have asked the user if the
> system crashed this time again.

The user reported that system crashed after a while.

Comment 31 Jonathan Nieder 2012-01-24 00:00:35 UTC

bugzilla-daemon@freedesktop.org wrote:

> The user reported that system crashed after a while.

Interesting --- so it sounds like there is a random element to this,
too.  Can we have a log of the crash, please?

Comment 32 Camaleón 2012-01-24 01:16:03 UTC

(In reply to comment #31)
> bugzilla-daemon@freedesktop.org wrote:
> 
> > The user reported that system crashed after a while.
> 
> Interesting --- so it sounds like there is a random element to this,
> too.  Can we have a log of the crash, please?

I have asked the user for it. He will have to wait until the system locks.

Comment 33 Camaleón 2012-01-25 09:16:12 UTC

(In reply to comment #32)
> (In reply to comment #31)
> > bugzilla-daemon@freedesktop.org wrote:
> > 
> > > The user reported that system crashed after a while.
> > 
> > Interesting --- so it sounds like there is a random element to this,
> > too.  Can we have a log of the crash, please?
> 
> I have asked the user for it. He will have to wait until the system locks.

Sorry for the delay (the user was a bit busy fighting against his "VAT taxes").

I'm attaching the syslog for the kernel oops (starts at line "3911"). Any hint to bypass this crash would be very welcome, the user is going nuts with this issue :-(

Comment 34 Camaleón 2012-01-25 09:17:55 UTC

Created attachment 56153 [details]
Syslog with "drm.debug=6" + call trace

Comment 35 Jonathan Nieder 2012-01-25 09:37:29 UTC

bugzilla-daemon@freedesktop.org wrote:

> Syslog with "drm.debug=6" + call trace

Summary follows.  Log is from 25 January.

 15:55 bootup
 15:55 [after 22 seconds] drm driver loads
 15:55 [after 27 seconds] consolekit loads
 15:55 [after 33 or so seconds] modesetting again
 15:55 [after 39 seconds] gdm startup (this is where it crashed previously)
 16:34 [2344 seconds]:

| radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
| ------------[ cut here ]------------
| WARNING: at [...]/drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x22e/0x298 [radeon]()
| Hardware name: iMac5,1
| GPU lockup (waiting for 0x0003A3C8 last fence id 0x0003A3C7)
| Modules linked in: hid_magicmouse hidp michael_mic arc4 pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) acpi_cpufreq mperf cpufreq_stats cpufreq_conservative cpufreq_powersave cpufreq_userspace parport_pc ppdev lp parport bnep rfcomm binfmt_misc promethean(O) fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc uvcvideo videodev media ssb mmc_core pcmcia pcmcia_core ndiswrapper(O) loop firewire_sbp2 ir_lirc_codec rc_avermedia_m135a lirc_dev mxl5005s cryptd aes_i586 aes_generic ir_mce_kbd_decoder af9013 ecb btusb ir_sony_decoder bluetooth rfkill ir_jvc_decoder ir_rc6_decoder ir_rc5_decoder isight_firmware dvb_usb_af9015 dvb_usb dvb_core ir_nec_decoder rc_core uas hid_apple usb_storage snd_hda_codec_idt lib80211_crypt_tkip usbhid hid wl(P) snd_hda_intel snd_hda_codec radeon snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device ttm drm_kms_helper drm snd i2c_algo_bit i2c_i801 soundcore applesmc i2c_core snd_page_alloc power_supply iTCO_wdt iTCO_vendor_support 
| processor input_polldev evdev pcspkr lib80211 thermal_sys apple_bl button ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod crc_t10dif ata_generic firewire_ohci uhci_hcd firewire_core crc_itu_t ata_piix libata ehci_hcd sky2 usbcore scsi_mod [last unloaded: scsi_wait_scan]
| Pid: 1515, comm: Xorg Tainted: P           O 3.1.0-1-686-pae #1
| Call Trace:
|  [<c1037698>] ? warn_slowpath_common+0x68/0x79
|  [<f88b3ffe>] ? radeon_fence_wait+0x22e/0x298 [radeon]
|  [<c1037711>] ? warn_slowpath_fmt+0x29/0x2d
|  [<f88b3ffe>] ? radeon_fence_wait+0x22e/0x298 [radeon]
|  [<c104cf51>] ? add_wait_queue+0x30/0x30
|  [<f872521f>] ? ttm_bo_wait+0xa6/0x153 [ttm]
|  [<f88c32b6>] ? radeon_bo_wait+0x59/0x77 [radeon]
|  [<f88c3719>] ? radeon_gem_wait_idle_ioctl+0x2a/0x50 [radeon]
|  [<f87aedc9>] ? drm_ioctl+0x224/0x2dd [drm]
|  [<f88c36ef>] ? radeon_gem_busy_ioctl+0x6b/0x6b [radeon]
|  [<c10110cc>] ? restore_i387_fxsave+0x63/0x70
|  [<f87aeba5>] ? drm_copy_field+0x47/0x47 [drm]
|  [<c10d33d2>] ? do_vfs_ioctl+0x459/0x48f
|  [<c1011a89>] ? restore_i387_xstate+0x16c/0x1a3
|  [<c10adf54>] ? mmap_region+0x2ef/0x3b7
|  [<c104334d>] ? recalc_sigpending+0x1f/0x2f
|  [<c10d344c>] ? sys_ioctl+0x44/0x68
|  [<c12b2ddf>] ? sysenter_do_call+0x12/0x28
| ---[ end trace 6b5e1f4e74986b70 ]---
| radeon: wait for empty RBBM fifo failed ! Bad things might happen.
| Failed to wait GUI idle while programming pipes. Bad things might happen.
| radeon 0000:01:00.0: (rs600_asic_reset:348) RBBM_STATUS=0xB4116100
| radeon 0000:01:00.0: (rs600_asic_reset:367) RBBM_STATUS=0x94010140
| radeon 0000:01:00.0: (rs600_asic_reset:375) RBBM_STATUS=0x94000140
| radeon 0000:01:00.0: (rs600_asic_reset:383) RBBM_STATUS=0x94000140
| radeon 0000:01:00.0: restoring config space at offset 0x1 (was 0x100403, writing 0x100407)
| radeon 0000:01:00.0: failed to reset GPU
| radeon 0000:01:00.0: GPU reset failed
| BUG: unable to handle kernel paging request at f8982990

This is Debian kernel 3.1.8-2 (close to upstream v3.1.8).  I don't see
any page table corruption this time.

Comment 36 Alex Deucher 2012-01-25 09:54:47 UTC

It's a GPU lockup.  Unfortunately, they tend to be really hard to track down.  You might try a newer ddx or mesa.

Comment 37 Camaleón 2012-01-25 10:10:06 UTC

(In reply to comment #36)
> It's a GPU lockup.  Unfortunately, they tend to be really hard to track down. 
> You might try a newer ddx or mesa.

We're open to test anything, whatever... but I don't really know what kind of test to suggest to the user, I'm stuck at this point. All we know for sure is that by simply removing the firmware blob it makes the system to run stable but the user needs to have 3D acceleration, otherwise gnome-shell can't run.

How could we test a new ddx (sorry to ask but, what's that? :-?) or an udpated mesa without breaking many things?

Comment 38 Jonathan Nieder 2012-01-25 10:17:22 UTC

bugzilla-daemon@freedesktop.org wrote:

> How could we test a new ddx (sorry to ask but, what's that? :-?) or an udpated
> mesa without breaking many things?

http://pkg-xorg.alioth.debian.org/reference/squeeze-backports.html

Comment 39 Camaleón 2012-01-25 10:35:48 UTC

(In reply to comment #38)
> bugzilla-daemon@freedesktop.org wrote:
> 
> > How could we test a new ddx (sorry to ask but, what's that? :-?) or an udpated
> > mesa without breaking many things?
> 
> http://pkg-xorg.alioth.debian.org/reference/squeeze-backports.html

Thanks! But... mmmm... the use runs "wheezy" which I guess includes an updated version of the packages (btw, what are the packages we would need to update?), so I don't see the point for getting them from backports.

Comment 40 Jonathan Nieder 2012-01-25 10:41:38 UTC

bugzilla-daemon@freedesktop.org wrote:

> Thanks! But... mmmm... the use runs "wheezy" which I guess includes an updated
> version of the packages (btw, what are the packages we would need to update?),
> so I don't see the point for getting them from backports.

Whoops, sorry, I should have remembered.

Mesa is libgl1-mesa-dri and libgl1-mesa-glx.  The DDX driver is[1]
xserver-xorg-video-radeon and libdrm-radeon1, I suppose.  One can get
fairly recent versions of most packages from sid or experimental.

[1] http://www.x.org/wiki/Development/Documentation/Glossary

Comment 41 Camaleón 2012-01-25 10:56:22 UTC

(In reply to comment #40)

> Mesa is libgl1-mesa-dri and libgl1-mesa-glx.  The DDX driver is[1]
> xserver-xorg-video-radeon and libdrm-radeon1, I suppose.  One can get
> fairly recent versions of most packages from sid or experimental.
> 
> [1] http://www.x.org/wiki/Development/Documentation/Glossary

Okay, thanks... wheezy and sid share the same versions of the mentioned packages:

libgl1-mesa-glx (7.11.2-1) 
libgl1-mesa-glx (7.11.2-1 and others) 

xserver-xorg-video-radeon (1:6.14.3-2) 
xserver-xorg-video-radeon (1:6.14.3-2 and others) 

And I can't tell the user to update from experimental, it's too dangerous. Anyway, I understand bugzilla is not the best place to discuss this support things (though I thank your efforts, Jonathan and Xorg people) :-)

I have finally to surrender. If anyone thinks on anything we can try, you can contact directly to me or add the information here. I leave this bug status at your (@xorg devels) consideration.

Comment 42 Jonathan Nieder 2012-01-25 10:57:27 UTC

Jonathan Nieder wrote:

>                                                           One can get
> fairly recent versions of most packages from sid or experimental.

Though at the moment they all match wheezy:

 mesa 7.11.2
 libdrm 2.4.30
 xf86-video-ati 6.14.3

It should be possible to provide instructions to test a snapshot.
Which component in particular has interesting recent changes?

Comment 43 Jonathan Nieder 2012-01-25 12:39:49 UTC

bugzilla-daemon@freedesktop.org wrote:

> I have finally to surrender. If anyone thinks on anything we can try, you can
> contact directly to me or add the information here.

Ok, just to fill in the blanks: was this a regression?  Was there a
kernel or X or GNOME upgrade before which the system worked fine and
after which it broke?

X devs: it looks like there are two sets of symptoms here ---
sometimes there are GPU lockups, and other times (e.g., the syslog
from 2012-01-22) page table corruption with no obvious trouble before
that.  Questions:

 - is there any simple way to figure out what exactly is causing the
   regression?  E.g., after starting X without accelaration can we
   explicitly provoke whatever caused trouble?

 - is it normal that after a lockup the GPU fails to reset?  Even if
   the GPU lockup itself is not well understood, is that later failure
   fixable?

Comment 44 Camaleón 2012-01-25 23:41:02 UTC

(In reply to comment #42)

> Ok, just to fill in the blanks: was this a regression?  Was there a
> kernel or X or GNOME upgrade before which the system worked fine and
> after which it broke?

The problem started at some point between the migration from kernel 2.6.38/2.6.39 (in late November 2011) to 3.0.x and have continued until now (3.1.x).

I don't think this is a kernel issue but a package update because every kernel he has tried since then crashes in the same way. What package exactly started to make noise? I can't tell.

For example, the user reported this trace in November 27th, 2011. He kept kernel 2.6.38 because since kernel 3.0 his system became completely unstable with crashes every day. After trying with more updated kernels, the crashes persisted so while performing several system reinstalls he discovered a pattern, the source of the problem: inestability came as soon as he installed the radeon firmware and enabled acceleration regardless the kernel version.

Kernel failure message 1:
------------[ cut here ]------------
WARNING: at /build/buildd-linux-2.6_2.6.38-5~bpo60+1-i386-B7LqDK/linux-2.6-2.6.38/debian/build/source_i386_none/drivers/gpu/drm/radeon/radeon_fence.c:248
radeon_fence_wait+0x251/0x2d7 [radeon]()
Hardware name: iMac5,1
GPU lockup (waiting for 0x0000083B last fence id 0x00000836)
Modules linked in: hid_magicmouse nls_utf8 isofs nls_cp437 udf vfat
fat hidp acpi_cpufreq mperf cpufreq_conservative cpufreq_powersave
cpufreq_userspace cpufreq_stats parport_pc ppdev lp parport sco bridge
stp bnep rfcomm l2cap nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
binfmt_misc fuse ssb mmc_core pcmcia pcmcia_core loop firewire_sbp2
snd_hda_codec_idt snd_hda_intel btusb bluetooth rfkill radeon
snd_hda_codec snd_hwdep snd_pcm ttm snd_seq snd_timer drm_kms_helper
snd_seq_device drm i2c_algo_bit applesmc power_supply snd
input_polldev button processor video soundcore snd_page_alloc i2c_i801
rng_core i2c_core isight_firmware ndiswrapper(O) pcspkr tpm_tis tpm
tpm_bios thermal_sys hid_apple evdev usb_storage uas usbhid hid ext4
mbcache jbd2 crc16 sg sr_mod cdrom sd_mod crc_t10dif ata_generic
uhci_hcd ata_piix libata ehci_hcd scsi_mod usbcore firewire_ohci
firewire_core sky2 crc_itu_t nls_base [last unloaded: scsi_wait_scan]
Pid: 1340, comm: Xorg Tainted: P        W  O 2.6.38-bpo.2-686 #1
Call Trace:
 [<c102fa51>] ? warn_slowpath_common+0x6a/0x7b
 [<f8523b4d>] ? radeon_fence_wait+0x251/0x2d7 [radeon]
 [<c102fac8>] ? warn_slowpath_fmt+0x28/0x2c
 [<f8523b4d>] ? radeon_fence_wait+0x251/0x2d7 [radeon]
 [<c1044d66>] ? autoremove_wake_function+0x0/0x29
 [<f82844bf>] ? ttm_bo_wait+0xad/0x135 [ttm]
 [<f853505d>] ? radeon_bo_wait+0x5e/0x7c [radeon]
 [<f85350a2>] ? radeon_gem_wait_idle_ioctl+0x27/0x50 [radeon]
 [<f82affa4>] ? drm_ioctl+0x224/0x2d5 [drm]
 [<f853507b>] ? radeon_gem_wait_idle_ioctl+0x0/0x50 [radeon]
 [<c10a3b97>] ? handle_pte_fault+0x2c5/0x80f
 [<c11458f8>] ? prio_tree_insert+0x150/0x1cc
 [<f82afd80>] ? drm_ioctl+0x0/0x2d5 [drm]
 [<c10c8fb4>] ? do_vfs_ioctl+0x494/0x4df
 [<c10a7483>] ? mmap_region+0x328/0x3fb
 [<c10c9043>] ? sys_ioctl+0x44/0x64
 [<c1002f1f>] ? sysenter_do_call+0x12/0x28
---[ end trace 4d111c5bd88900e9 ]---

In brief, the last good-known configuration that worked fine with 3D acceleration enabled was kernel 2.6.38/2.6.39 and GNOME 2 (metacity).

Comment 45 Alex Deucher 2012-01-26 05:59:43 UTC

Can you narrow down the packages and bisect?  Unfortunately, I can't reproduce this on any of the 5xx cards I have access to.

Comment 46 Jonathan Nieder 2012-01-26 10:18:08 UTC

bugzilla-daemon@freedesktop.org wrote:

> Can you narrow down the packages and bisect?  Unfortunately, I can't reproduce
> this on any of the 5xx cards I have access to.

If I understand correctly, there is no known-good version of the X/kernel
stack, but the regression the user experienced came with the upgrade to
GNOME 3.

Please forgive my ignorance: is it possible to (temporarily) configure
GNOME 3 not to take advantage of accelaration, or to start X without
starting a GNOME session?  That might be interesting, since then it
might be possible to find some other simpler application that
reproduces the same trouble and helps pinpoint what is provoking
trouble from that end.

Comment 47 Michel Dänzer 2012-02-02 04:04:31 UTC

How are the LVDS and DVI displays arranged in the session? Can you attach the output of xrandr?

Comment 48 Camaleón 2012-02-02 06:23:13 UTC

Created attachment 56518 [details]
Output of "xrandr"

In reply to comment #47, I'm attaching the output of "xrandr".

Comment 49 Michel Dänzer 2012-02-02 06:36:30 UTC

The kernel DESKTOP_HEIGHT fix from bug 43835 might help for the GPU lockups.

Comment 50 Michel Dänzer 2012-02-02 06:36:59 UTC

Argh, I mean bug 45329.

Comment 51 Camaleón 2012-02-02 08:34:48 UTC

(In reply to comment #50)
> Argh, I mean bug 45329.

Thank you, we can do try... what would be the "easy peasy" way to go for it? Mainline kernel 3.3-rc2 contains the mentioned patches? Are there any other packages involved? By reading bug's #45329 comment 9 looks like "xf86-video-ati" also needs to be patched :-?

Comment 52 Michel Dänzer 2012-02-02 08:57:40 UTC

(In reply to comment #51)
> Mainline kernel 3.3-rc2 contains the mentioned patches?

No. You can try the drm-fixes branch of git://people.freedesktop.org/~airlied/linux.git, but it should be easy to manually apply the patch to any 3.x tree.

> Are there any other packages involved?

No.

Comment 53 Jonathan Nieder 2012-02-09 17:55:11 UTC

(In reply to comment #51)
> Mainline kernel 3.3-rc2 contains the mentioned patches?

3.3-rc3 does.

Comment 54 Camaleón 2012-02-12 03:24:44 UTC

(In reply to comment #53)
> (In reply to comment #51)
> > Mainline kernel 3.3-rc2 contains the mentioned patches?
> 
> 3.3-rc3 does.

Thanks for the pointer.

The user still reports crashes with the latest mainline kernel (3.3-rc3). I'm attaching the logs, though I can't see any error or kernel trace on them.

Comment 55 Camaleón 2012-02-12 03:26:18 UTC

Created attachment 56910 [details]
dmesg with kernel 3.3-rc3

Comment 56 Camaleón 2012-02-12 03:27:00 UTC

Created attachment 56911 [details]
syslog with kernel 3.3-rc3

Comment 57 Camaleón 2012-02-12 03:27:40 UTC

Created attachment 56912 [details]
Xorg.0.log with kernel 3.3-rc3

Comment 58 Martin Peres 2019-11-19 08:24:44 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/238.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.