Summary: | Metro: Last Light segfaults very often in level 10 (swamp) on loading last checkpoint | ||
---|---|---|---|
Product: | Mesa | Reporter: | Darius Spitznagel <d.spitznagel> |
Component: | Mesa core | Assignee: | Tapani Pälli <lemody> |
Status: | RESOLVED WONTFIX | QA Contact: | |
Severity: | major | ||
Priority: | medium | ||
Version: | 10.1 | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 77449 | ||
Attachments: |
Metro: LL segfault on loading last checkpoint
Savegame Folder Replaying apitrace on intel which was made with amd gpu Apitrace made and replayed on intel IVB gdb debug log |
I'll go ahead and bisect (In reply to comment #1) > I'll go ahead and bisect Or let's say try to reproduce first .. Would you happen to have this save game available for share? Or possiblity to make apitrace of it? Texture corruption is caused by following commit. Let's try to fix that first and see if it is related to the segfaults. --- 8< --- commit 9cd51bb0c4608258199c69bc7738e72f055799d2 Author: Matt Turner <mattst88@gmail.com> Date: Tue Mar 11 13:16:37 2014 -0700 i965/vec4: Eliminate writes that are never read. With an awful O(n^2) algorithm that searches previous instructions for dead writes. (In reply to comment #2) > (In reply to comment #1) > > I'll go ahead and bisect > > Or let's say try to reproduce first .. Would you happen to have this save > game available for share? Or possiblity to make apitrace of it? OK, I will upload my full savegame directory as soon as I'm at home. Do you habe a ftp server where I can upload this? (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #1) > > > I'll go ahead and bisect > > > > Or let's say try to reproduce first .. Would you happen to have this save > > game available for share? Or possiblity to make apitrace of it? > > OK, I will upload my full savegame directory as soon as I'm at home. > Do you habe a ftp server where I can upload this? I'm afraid not :/ Let's see if we can find a share. I was able to actually get one crash, when the alien is seen first time and I think it was just at the start of a cutscene. I will try to reproduce it, it could be the same crash. Created attachment 96467 [details]
Savegame Folder
OK,I have compressed my savegame folder as tar.bz2 and attached it. It isn't that big. Hope you can reproduce my crashes. (In reply to comment #7) > OK,I have compressed my savegame folder as tar.bz2 and attached it. It isn't > that big. > Hope you can reproduce my crashes. thanks, I'll take a shot Yes, I'm able to reproduce the crash, will try to get backtrace. (In reply to comment #9) > Yes, I'm able to reproduce the crash, will try to get backtrace. I'm not able to get 'stable' (or same as darius's) backtrace, not sure what wrong. Either something trashing the memory or problem with symbols of my libs. Will try some more. The crash itself is fully reproducible now which is great. It looks like the crasher has been there for a very long time, it just wasn't seen until now. A bit more info to comment #3, it might not be the fault of this exact commit but could be that this commit simply reveals a bug elsewhere. By simply returning false from dead_code_eliminate() all the artifacts disappear. (In reply to comment #10) > (In reply to comment #9) > > Yes, I'm able to reproduce the crash, will try to get backtrace. > > I'm not able to get 'stable' (or same as darius's) backtrace, not sure what > wrong. Either something trashing the memory or problem with symbols of my > libs. Will try some more. The crash itself is fully reproducible now which > is great. > > It looks like the crasher has been there for a very long time, it just > wasn't seen until now. > > A bit more info to comment #3, it might not be the fault of this exact > commit but could be that this commit simply reveals a bug elsewhere. By > simply returning false from dead_code_eliminate() all the artifacts > disappear. I will try testing mesa from git reverting commit cd51bb0c4608258199c69bc7738e72f055799d2 and report back. I have also found some interessting things... I took an PC with an AMD GPU at my work and restored my sytem via fsarchiver there. So the OS is absolutely the same. I started MetroLL and played many many minutes dying many many times without a single crash. So we know now, that the crashes occur definetly on Intel iGPU (IVB on my side). Hope this helps to narrow the problem down. The specs with this system are: Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz (game ran slow but worked) 4GB RAM 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series] (prog-if 00 [VGA controller]) Subsystem: Hightech Information System Ltd. Device 2291 Flags: bus master, fast devsel, latency 0, IRQ 45 Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at e0100000 (64-bit, non-prefetchable) [size=128K] I/O ports at 2000 [size=256] Expansion ROM at e0140000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: radeon darius@pc1:~$ glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD CEDAR OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.1.0 OpenGL core profile shading language version string: 3.30 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 10.1.0 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: After THAT I made an aptirace of the render scene at start. Beside the known flash of outside world (or someting different) I saw not a single error or warning during rendering!!!! This is all what apitrace showed... darius@pc1:~/Downloads$ apitrace replay MetroLL_amd.trace 0 57 glXSwapIntervalMESA(interval = 0) = 0 57: warning: unsupported glXSwapIntervalMESA call 1 143012 glXSwapIntervalMESA(interval = 0) = 0 143012: warning: unsupported glXSwapIntervalMESA call Rendered 506 frames in 32.006 secs, average of 15.8096 fps On Intel I have many many warnings... 6650435: glDebugOutputCallback: Medium severity API performance issue 9, Stalling on the GPU for mapping a busy buffer object 16652177: glDebugOutputCallback: Medium severity API performance issue 12, Flushing before mapping a referenced bo. 16652177: glDebugOutputCallback: Medium severity API performance issue 11, Mapping a busy BO, causing a stall on the GPU. I will make fresh apitraces of both systems so you can investigate them. But right now I'm short on time. Will be back in some hours. Created attachment 96571 [details]
Replaying apitrace on intel which was made with amd gpu
Created attachment 96572 [details]
Apitrace made and replayed on intel IVB
I'm back:) So far, I replayed now the apitrace which I made with amd gpu (see comment 11) on my IVB PC and got the results attached in https://bugs.freedesktop.org/attachment.cgi?id=96571 After that I've made an apitrace of same kind (render secene on start of MetroLL until game menu) on my Intel IVB and got results attach in https://bugs.freedesktop.org/attachment.cgi?id=96572 Both ones have nearly same output. Solving these issues will defintely speed up MetroLL and maybe solve also the segfaults. To be clear: The segfaults are reproducible on loading last checkpoint inside gameplay (NOT on first load) and happen MOSTLY ON SECOND or third reload. As written in my previous comment I will try with current mesa reverting commit 9cd51bb0c4608258199c69bc7738e72f055799d2 and report later. As promissed I tried the following... Mesa git until commit 4047263cb15e89d23cb145c74fb3f303904e8f14 > broken textures, same segfaults. Mesa git before commit 9cd51bb0c4608258199c69bc7738e72f055799d2 > textures OK, same segfaults. Mesa 10.0.x > textures OK, same segfaults. What the heck leeds an OpenGL-App to crash on second or third load of the same data?! Wrong memory or buffer allocation?! I think its clear that there is something wrong with intel drm, ddx org glx driver. oooops!
> I think its clear that there is something wrong with intel drm, ddx org glx driver.
I meant intel drm, ddx or dri driver.
for the texture corruption issues ... there has been another bug on the same area of the code, the issues might be related to bug #76616 (In reply to comment #17) > for the texture corruption issues ... there has been another bug on the same > area of the code, the issues might be related to bug #76616 Indeed, the patch included in bug #76616 fixed the texture corruption with mesa git master. I also have other news for you. First I didn't wont to mention it because it's off topic and second it's beta, but... Painkiller HD has similar crashes as Metro LL. It crashes sometimes on loading save game and more often during game play (especially in level trainstation). The seqfaults are also telling "error 4"... [ 350.235524] MetroLL[2652]: segfault at 1 ip 08dd401f sp 9ac20d50 error 4 in MetroLL[8048000+1336000] [ 645.899404] PKHDGame[2705]: segfault at 0 ip 084493ba sp bfc55f00 error 4 in PKHDGame[8048000+20d3000] [ 845.409142] PKHDGame[2771]: segfault at 0 ip 084493ba sp bfcc1d20 error 4 in PKHDGame[8048000+20d3000] [ 1307.553163] PKHDGame[2855]: segfault at 0 ip 09000bf2 sp bfbe8c10 error 4 in PKHDGame[8048000+20d3000] Beside this, Painkiller HD has a nice Launch.log. This one ALWAYS tells on crash... [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 [0277.99] Critical: Error reentered: OpenGL error 0x505 I hope these crashes are related to the same problem Metro LL has. If not, sorry I did not open another bug report. Very interesting, look here... https://bugs.freedesktop.org/show_bug.cgi?id=74868 Especially this one... <<<<<<<<<<< Mesa: User error: GL_OUT_OF_MEMORY in glCompressedTexSubImage2D err:d3d:wined3d_debug_callback 0x1c8178: "GL_OUT_OF_MEMORY in glCompressedTexSubImage2D". err:d3d_surface:surface_upload_data >>>>>>>>>>>>>>>>> GL_OUT_OF_MEMORY (0x505) from glCompressedTexSubImage2DARB @ ../../../wine-1.7.12/dlls/wined3d/surface.c / 1688 EE r600_texture.c:1003 r600_texture_transfer_map - failed to create temporary texture to hold untiled copy >>>>>>>>>>> It mentions also the same error code (0x505) as Painkiller. A Memory problem with Metro LL too sounds likely as I already mentioned in Comment 15. Look also at comment 14 at bug 74868! Badly the patch is only for r600:( (In reply to comment #18) > (In reply to comment #17) > > for the texture corruption issues ... there has been another bug on the same > > area of the code, the issues might be related to bug #76616 > > Indeed, the patch included in bug #76616 fixed the texture corruption with > mesa git master. Thanks for testing. In the future please try to keep bug reports separate. Otherwise it gets pretty confusing. I'm now able to reproduce the same backtrace as Darius has. The thread that is working with Mesa stack is waiting for ioctl and I cannot see anything bad going on there. The thread that segfaults does not unfortunately have symbols and it for me it looks like there is a bug in the game itself, maybe related to threads. Using disassemble with the address in the backtrace and 'info registers' one can see that game is accessing something with offset of 1, maybe a struct member (?) Memory usage is not very high, for example for me it is 695396kB at the time of crash. I will still verify these observations. Created attachment 96650 [details]
gdb debug log
Here's some gdb log output. The thread segfaulting seems to always end up in the same place, accessing array (or struct?) with some specified offset (stored in eax) and the member is null.
@Tapani: Can you clarify this? http://steamcommunity.com/groups/steamuniverse/announcements/detail/1837773658991804782 <<<< Fixed "Metro: Last Light" on Intel graphics by backporting GLX support for ARB_create_context from newer X servers >>>> ??? (In reply to comment #23) > @Tapani: Can you clarify this? > > http://steamcommunity.com/groups/steamuniverse/announcements/detail/ > 1837773658991804782 > > <<<< > Fixed "Metro: Last Light" on Intel graphics by backporting GLX support for > ARB_create_context from newer X servers > >>>> > > ??? This is not related to this bug. With this extension application can create a GL context in a very fine grained way specifying version and required features it wants to use. It looks like SteamOS is still using older version of X server that does not support the extension but Valve backported patches to have the support in place. They did not want to do full X server upgrade. This bugreport can be closed! I have opened a new one for Metro: Last Light Redux and Metro 2003 Redux. https://bugs.freedesktop.org/show_bug.cgi?id=93599 Thanks Darius closing as WONTFIX as this segfault was in Metro code (see comment #22) .. would be *very* hard to track down without Metro symbols. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 96432 [details] Metro: LL segfault on loading last checkpoint Hello Devs, I have very often segfaults when loading last checkpoint in the swamp level. This does occur mostly on the second or third load of last checkpoint and sometimes but not often during fight. dmesg shows this after crash (I have collected some of them): MetroLL[3089]: segfault at 64 ip 08dd401f sp 996eccb0 error 4 in MetroLL[8048000+1336000] MetroLL[3146]: segfault at 1 ip 08dd401f sp 9a020d50 error 4 in MetroLL[8048000+1336000] MetroLL[2911]: segfault at 1 ip 08dd401f sp 99eedd50 error 4 in MetroLL[8048000+1336000] MetroLL[3797]: segfault at 1 ip 08dd401f sp 9a6edd50 error 4 in MetroLL[8048000+1336000] MetroLL[4363]: segfault at 64 ip 08dd401f sp adadace0 error 4 in MetroLL[8048000+1336000] MetroLL[4416]: segfault at 1 ip 08dd401f sp 9a820d50 error 4 in MetroLL[8048000+1336000] MetroLL[2840]: segfault at 1 ip 08dd401f sp 95dedd50 error 4 in MetroLL[8048000+1336000] MetroLL[2862]: segfault at 1 ip 08dd401f sp ada32ce0 error 4 in MetroLL[8048000+1336000] MetroLL[2739]: segfault at 1 ip 08dd401f sp 95720d50 error 4 in MetroLL[8048000+1336000] MetroLL[3276]: segfault at 1 ip 08dd401f sp 99f20d50 error 4 in MetroLL[8048000+1336000] MetroLL[4004]: segfault at 1 ip 08dd401f sp 9a020d50 error 4 in MetroLL[8048000+1336000] MetroLL[2727]: segfault at 1 ip 08dd401f sp 99f20d50 error 4 in MetroLL[8048000+1336000] MetroLL[2803]: segfault at 1 ip 08dd401f sp 99f20d50 error 4 in MetroLL[8048000+1336000] MetroLL[2695]: segfault at 1 ip 08dd401f sp 9aaedd50 error 4 in MetroLL[8048000+1336000] MetroLL[2757]: segfault at 64 ip 08dd401f sp 9a51fcb0 error 4 in MetroLL[8048000+1336000] I have also made a backtrace of one crash attached to this report. Hope I have picked the right threads! When not, let me know and I will make another backtrace. As you can see in the attachment I have started Metro LL with STEAM_RUNTIME enabled so there are some missing debugging symbols. When it helps I can of course run Metro LL with disabled RUNTIME and install some more debugging symbols for other non-mesa libs. My specs of now: Debian Jessie i386 8 GB total RAM Mesa 10.1.0 Intel Driver 2.99.911 CPU Intel(R) Core(TM) i3-3225 CPU @ 3.30GHz Xorg 1.15.0 Kernel 3.12.14 I hope this will help you to find the problem cause many people (Metro LL Steam community) have it, a few not. As a note: I've tested also todays mesa from git, but had new segfaults (more often then with 10.1 during fight) and some textures where broken too.