Summary: | Xorg server crashed with AMD Radeon HD 6310 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Nemo_bis <federicoleva> | ||||||||||
Component: | Driver/Radeon | Assignee: | xf86-video-ati maintainers <xorg-driver-ati> | ||||||||||
Status: | RESOLVED INVALID | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | ||||||||||||
Version: | unspecified | ||||||||||||
Hardware: | x86 (IA32) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
Nemo_bis
2012-12-04 17:12:20 UTC
Please attach your dmesg output when X hangs. What version of mesa are you using? Thank you. I'll post my dmesg output when it happens again; now I rebooted one or two times so it doesn't have much of use. I have mesa 8.0.4-1.fc17. Created attachment 71085 [details]
/var/log/messages
Here's the dmesg for the last 4 days or so.
About ten minutes following
Dec 6 12:58:02 e350 kernel: [157010.314171] VFS: file-max limit 371898 reached
and short before seems to be a good example.
$ cat /proc/sys/fs/file-max
371898
$ cat /proc/sys/fs/file-nr
11488 0 371898
$ lsof | wc
41810 411540 4726629
$ ps -ef | wc
177 1602 14372
(In reply to comment #3) > Dec 6 12:58:02 e350 kernel: [157010.314171] VFS: file-max limit 371898 > reached Sounds like something may be leaking file descriptors... > $ cat /proc/sys/fs/file-nr > 11488 0 371898 I assume these numbers are from when there's no problem? Can you keep an eye on the first number, and if it grows beyond say 100000, try finding out which process has so many file descriptors open? Yes, that's right after startup. Now it's only a little higher. How do I check what process is keeping many files open, besides killing them all one by one? I can try to identify it, but everything seems to happen very suddenly and quickly... (In reply to comment #5) > How do I check what process is keeping many files open, besides killing them > all one by one? E.g. something like ( for i in /proc/[0-9]*/fd; do echo "$(sudo ls $i | wc -l): $i"; done ) | sort -n | tail does the trick. There might be a simpler way. Thanks. I didn't manage to use it because this time it happened during the night. Apparently, I have many repetitions of something like this: Dec 8 06:48:57 e350 dhclient[782]: DHCPREQUEST on em1 to 192.168.1.254 port 67 (xid=0x582009fe) Dec 8 06:48:57 e350 dhclient[782]: DHCPACK from 192.168.1.254 (xid=0x582009fe) Dec 8 06:48:58 e350 NetworkManager[645]: <info> (em1): DHCPv4 state changed renew -> renew Dec 8 06:48:58 e350 NetworkManager[645]: <info> address 192.168.1.130 Dec 8 06:48:58 e350 NetworkManager[645]: <info> prefix 24 (255.255.255.0) Dec 8 06:48:58 e350 NetworkManager[645]: <info> gateway 192.168.1.254 Dec 8 06:48:58 e350 NetworkManager[645]: <info> hostname 'e350' Dec 8 06:48:58 e350 NetworkManager[645]: <info> nameserver '62.101.93.101' Dec 8 06:48:58 e350 NetworkManager[645]: <info> nameserver '83.103.25.250' Dec 8 06:48:58 e350 NetworkManager[645]: <info> domain name 'fastwebnet.it' Dec 8 06:48:58 e350 dbus[706]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper) Dec 8 06:48:58 e350 dhclient[782]: bound to 192.168.1.130 -- renewal in 764 seconds. Dec 8 06:48:58 e350 dbus-daemon[706]: dbus[706]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper) Dec 8 06:48:58 e350 dbus-daemon[706]: dbus[706]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Dec 8 06:48:58 e350 dbus[706]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' and then about half a million times Dec 8 06:55:07 e350 kernel: [150257.594126] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4096, 4, 4096, -12) Dec 8 06:55:09 e350 kernel: [150259.571894] [TTM] Out of kernel memory Dec 8 06:55:09 e350 kernel: [150259.571925] [TTM] Out of kernel memory I'll try to post the log again (if I manage to compress it decently... it's 200 MB). I placed the log at https://docs.google.com/open?id=0B2Eepd4bNt7FeUc5VXc2ZzBrQVE Sorry, it all happens too quickly, I failed to track down the offending process again. See the evolution of open files: 48672 0 371898 mer 12 dic 2012, 21.54.20, CET 66720 0 371898 mer 12 dic 2012, 22.04.20, CET 114304 0 371898 mer 12 dic 2012, 22.14.20, CET 163520 0 371898 mer 12 dic 2012, 22.24.21, CET 180768 0 371898 mer 12 dic 2012, 22.34.21, CET 198112 0 371898 mer 12 dic 2012, 22.44.21, CET 226016 0 371898 mer 12 dic 2012, 22.54.21, CET 269952 0 371898 mer 12 dic 2012, 23.04.21, CET 341312 0 371898 mer 12 dic 2012, 23.14.21, CET (In reply to comment #9) > Sorry, it all happens too quickly, I failed to track down the offending > process again. See the evolution of open files: I don't think you need to wait for the limit to be reached. If the number of file descriptors is normally around 50K, then 100K or more could already indicate a problem. Or you could run a script, either in the background or from a cron job, which dumps the total number and top 10 processes to a log file at regular intervals. Be creative. :) Yes, this time I won't rely on me checking stuff regularly. I'm doing this now: while true; do for i in /proc/[0-9]*/fd; do LOG=$(date -Ihours).txt; echo "$(sudo ls $i | wc -l): $i" >> $LOG ; done ; cat /proc/sys/fs/file-nr >> openfiles.log ; date >> openfiles.log ; sleep 600 ; done Created attachment 71551 [details]
List of processes with their opened files
No luck: the list of processes with their opened files at the time of the last crash doesn't seem to contain the offender: the total makes about 6 thousands, not 370 at all...
(In reply to comment #12) > the total makes about 6 thousands, not 370 at all... Hmm, then it seems the leaking file descriptors aren't exposed in those /proc files. The nasty thing is that keyboard doesn't work at all when this bug happens. I've seen the using non-PAE kernel doesn't help and now I'm trying while true; do LOG=$(date -Ihours).txt; for i in /proc/[0-9]*/fd; do echo "$(sudo ls $i | wc -l): $i" >> $LOG; done; OPEN=$(cat /proc/sys/fs/file-nr); echo "$OPEN" >> openfiles.log; if $(echo $OPEN | grep -qE "^[0-9]{6,}"); then tput bel fi date >> openfiles.log; sleep 600; done Created attachment 72330 [details] ps -fax I can catch it when it happens, but it doens't help because even killing all processes one by one sometimes stops the number from increasing, but does never decrease it. I closed all programs, but not all system processes (and now I opened the browser of course), but it didn't help. Open files no longer assigned to any process, does it make sense? This time I have this in dmesg: [11870.410593] hrtimer: interrupt took 38748 ns [12294.716262] Peer 2.236.97.130:4662/44274 unexpectedly shrunk window 4096424630:4096441856 (repaired) [26270.771253] Peer 2.235.219.243:4662/48855 unexpectedly shrunk window 1702546936:1702564256 (repaired) [39697.640585] [TTM] Out of kernel memory [39697.640609] radeon 0000:00:01.0: object_init failed for (4096, 0x00000002) [39697.640614] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12) and then endless repetition of the last three lines. Does it look like https://bugs.mageia.org/show_bug.cgi?id=1471 ? Should I try radeon.modeset=0 ? (In reply to comment #15) > [...] even killing all processes one by one sometimes stops the number from > increasing, but does never decrease it. Does it decrease after killing kwin or Xorg? > Does it decrease after killing kwin or Xorg? Michel, how can I check Xorg? I'm not sure about kwin, I'll check again next time. Usually, it happens that after the night it takes some minutes to get the dialog to unlock the screen, then when I unlock it I get the message that kwin was killed because it was in segmentation fault and dmesg has all those errors about "Out of kernel memory" and "Failed to allocate GEM object" as in comment 7. (In reply to comment #17) > Michel, how can I check Xorg? Always the same idea: Once /proc/sys/fs/file-nr has increased significantly, kill the process and see if /proc/sys/fs/file-nr decreases back to normal after that. Sorry, it was http://bugs.winehq.org/show_bug.cgi?id=32617 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.