Summary: | [RADEON:KMS:MEMCORRUPTION] random memory corruption if using more than 4GiB of RAM on Core i7/P55 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Siarhei Siamashka <siarhei.siamashka> | ||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED INVALID | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | highest | ||||||||
Version: | unspecified | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Siarhei Siamashka
2011-01-07 22:57:53 UTC
OK, some more information. Got a few other similar mild failures (so that the box is still accessible via ssh) and they have different backtraces, but also seem to be memory allocation related. Right now my guess it that radeon kms is causing some memory corruption in the kernel. I tried to switch to SLUB allocator and enable SLUB_DEBUG. Unfortunately this change did not help me to catch any problems yet. On a somewhat positive side, the mild reliability problems also have disappeared (for example it does not seem to easily fail on browser flash video playback anymore). Still there is a testcase which is guaranteed to kill the system for me. It involves launching gl-117 game (using llvmpipe for 3D) in one window so that it runs its demo, and also starting scaled video playback in mplayer in another window so that both windows are visible on screen at the same time. In about less than 15 minutes and typically a lot faster, the whole system deadlocks. And the box is totally dead, I even can't connect to it with ssh, so I have neither backtraces nor clues about what could have happened. Anyway, I'm going to keep KMS enabled for a while, maybe some some other easier to debug testcases will be discovered and the driver bug(s?) could be found. Appears that the memory just gets randomly corrupted. When running gl-117 game demo using mesa 7.9 llvmpipe, and also running memtester program [1] at the same time so that the rest of the available memory gets tested, memtester is typically able to detect memory corruption before the system goes down in a spectacular way. The system used to have 4 memory sticks installed, 2GiB each. Memory corruption disappears if only one 2GiB stick is left (tested for more than a week without problems) or using 2GiB+2GiB configuration (just started using this, appears to be stable so far). Installing 6GiB or 8GiB of memory in various ways (trying different placement in slots on the motherboard) makes the issue reproducible again. It could be either a problem in radeon kernel drivers, or just defective hardware (motherboard?, PSU?, CPU?, memory?, graphics card?). Though memtest86+ does not detect problems and the system appears to be stable when used "headless" even with the intensive CPU/RAM usage. Anyway, unless somebody else manages to reproduce the same problem, there is no definite answer to this question. There is nothing else to be added here (other than dmesg and Xorg logs). So probably that's my last comment here unless I somehow manage to narrow down the bug and make a patch. Thanks. The bug can be closed if you want. 1. http://pyropus.ca/software/memtester/ Created attachment 42325 [details]
Xorg.0.log
Created attachment 42326 [details]
dmesg-4GB.log
After all, it looks like it is a hardware problem on my side. Experimenting with 'memmap' option simulating 4GiB setup precisely does not help when all 8GiB are installed. Also reducing RAM speed from DDR3-1333 to DDR3-1066 or DDR3-800 makes the problem significantly harder to reproduce, but does not eliminate it completely. And looking around in various hardware forums, reliability with all four DIMM modules installed seems to be a rather common problem. Appears that llvmpipe is just a very good memory checker, putting a lot of stress on all CPU cores and memory controller, exposing problems which don't happen in other use cases. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.