94726 – [Tonga] ARK: Survival Evolved crashes on savegame load. Out of Memory

Bug 94726 - [Tonga] ARK: Survival Evolved crashes on savegame load. Out of Memory

Summary: [Tonga] ARK: Survival Evolved crashes on savegame load. Out of Memory

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Gallium/radeonsi (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:	Default DRI bug account

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-03-27 20:13 UTC by thomas.rinsch
Modified:	2016-10-11 01:11 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
gdb logfile (2.56 KB, text/plain) 2016-03-27 20:13 UTC, thomas.rinsch	Details
gdb log with backtrace (12.32 KB, text/plain) 2016-03-28 11:13 UTC, thomas.rinsch	Details
ARK gdb log on TURKS (17.36 KB, text/plain) 2016-04-09 17:56 UTC, thomas.rinsch	Details
Workaround patch: unmap buffers as soon as possible (1.19 KB, patch) 2016-04-13 16:04 UTC, Nicolai Hähnle	Details \| Splinter Review
Show Obsolete (1) View All

Description thomas.rinsch 2016-03-27 20:13:46 UTC

Created attachment 122589 [details]
gdb logfile

ARK: Survival Evolved crashes when I try to load my singleplayer game.

Occurs only on my R9 380 with amdgpu/radeonsi.
Loading the game with LIBGL_ALWAYS_SOFTWARE works

Joining a server or starting a new game works with radeonsi as well.

Mesa is todays git, but problem exists with earlier versions as well.
llvm is yesterdays git.

It seems to be a memory allocation issue:
When I start the game with apitrace, my console is spammed with Mesa:

User error: GL_OUT_OF_MEMORY in glBufferData
Mesa: User error: GL_INVALID_VALUE in glBufferSubData(offset 0 + size 432 > buffer size 0)
amdgpu: Failed to allocate a buffer:
amdgpu:    size      : 4096 bytes
amdgpu:    alignment : 4096 bytes
amdgpu:    domains   : 4
amdgpu: Failed to allocate a buffer:
amdgpu:    size      : 4096 bytes
amdgpu:    alignment : 4096 bytes
amdgpu:    domains   : 4
 
just before the crash.
However system monitor shows ~75% of System memory is free.

Attached the logfile from running the game with gdb, though it doesn't really say much.

Comment 1 thomas.rinsch 2016-03-27 20:26:12 UTC

The game crashes just before leaving the load screen by the way.

I would like to provide the apitrace, however even if compressed it is 230MB in size, containing 605 frames. Can I just upload that here?

The last frame alone is 87MB according to qapitrace.

I'm new to reporting bugs here, so please be forgiving if I did anything wrong.
Tell me if I can deliver any additional info.

Comment 2 Michel Dänzer 2016-03-28 07:48:31 UTC

(In reply to thomas.rinsch from comment #0)
> Attached the logfile from running the game with gdb, though it doesn't
> really say much.

Please run "thread apply all bt full" at the gdb prompt after the SIGSEGVs and the SIGABRT.

(In reply to thomas.rinsch from comment #1)
> I would like to provide the apitrace, however even if compressed it is 230MB
> in size, containing 605 frames. Can I just upload that here?

No, attachments here are limited to 32MB. apitraces are usually uploaded to file sharing services such as Dropbox or Google Drive.

Comment 3 thomas.rinsch 2016-03-28 11:13:10 UTC

Created attachment 122595 [details]
gdb log with backtrace

(In reply to Michel Dänzer from comment #2)

Thanks a lot Michael,

> Please run "thread apply all bt full" at the gdb prompt after the SIGSEGVs
> and the SIGABRT.

I attached the new log here.

> No, attachments here are limited to 32MB. apitraces are usually uploaded to
> file sharing services such as Dropbox or Google Drive.

Uploaded the Apitrace to Google Drive:
https://drive.google.com/open?id=0BwKS4-SC1bfybjRPTVVBbVh0YnM

Comment 4 thomas.rinsch 2016-04-09 17:56:03 UTC

Created attachment 122837 [details]
ARK gdb log on TURKS

Comment 5 thomas.rinsch 2016-04-09 17:57:47 UTC

Tried to load the save on my laptop and realized it showed the same behaviour on my Turks (HD6770M) after all. Attached the gdb log.
llvmpipe works on the laptop as well.

Comment 6 Nicolai Hähnle 2016-04-13 16:04:35 UTC

Created attachment 122899 [details] [review]
Workaround patch: unmap buffers as soon as possible

Hi Thomas, thanks for your report. This game is really stressing our buffer handling :)

The problem is that it tries to create more than 64k buffers, and we keep all of them mapped, and it looks like we're running into some kind of kernel limit.

The attached patch should help you run the game for now, but it's not a proper solution because it can degrade performance quite significantly.

Comment 7 thomas.rinsch 2016-04-14 20:40:13 UTC

(In reply to Nicolai Hähnle from comment #6)

Thanks a lot Nicolai,

I applied the patch and unfortunately all that has changed is that ARK needs longer to crash now. In addition it tends to freeze the whole system, or at least crash the DE panel and itself.

As a result I wasn't able to get a new backtrace (application already exited).
The apitrace is twice as big now, but judging from a comparison the only difference is that it contains more frames from loading longer.

During one attempt I also got spammed by the 

User error: GL_OUT_OF_MEMORY in glBufferData
Mesa: User error: GL_INVALID_VALUE in glBufferSubData(offset 0 + size 432 > buffer size 0)
amdgpu: Failed to allocate a buffer:
amdgpu:    size      : 4096 bytes
amdgpu:    alignment : 4096 bytes
amdgpu:    domains   : 4
amdgpu: Failed to allocate a buffer:
amdgpu:    size      : 4096 bytes
amdgpu:    alignment : 4096 bytes
amdgpu:    domains   : 4
again. Interestingly it seems to behave slightly different on every try now.

It is nice to know you have a clue what the root of the problem is. :)
I can imagine ARK is behaving a bit odd. It worked with fglrx and obviously llvmpipe somehow, though.

I also noticed I spelled Michels name wrong in a previous comment. Apologies for that.

Comment 8 Nicolai Hähnle 2016-04-15 00:09:09 UTC

Thanks for testing, and sorry to hear that.

Is there anything in the dmesg? Would you mind uploading another apitrace?

Comment 9 thomas.rinsch 2016-04-15 18:11:21 UTC

(In reply to Nicolai Hähnle from comment #8)
> Thanks for testing, and sorry to hear that.
> 
> Is there anything in the dmesg? Would you mind uploading another apitrace?

Of course, here is a new API trace:
https://drive.google.com/file/d/0BwKS4-SC1bfyQlM4dDNkNTNIOUE/view?usp=sharing

Nothing in dmesg or /var/log/messages
However obviously I can't get the dmesg when the system is frozen. Any way to automatically log it on change?

I just noticed that the system gets extremely unresponsive shortly before freezing completely if that is of any help.

System Monitor shows normal CPU and Memory usage until the end, so I would imagine it's gpu related. (max 5.2GB used).

Btw. this might be very good to improve the driver. However if you find this to be an error on the application side, notice that the developer has announced a major rework of the OpenGL renderer.

Best regards and many thanks again.

Comment 10 Grigori Goronzy 2016-04-25 08:46:31 UTC

Maybe I'm naive here, but shouldn't it be possible to detect when we reach the limit? Then we could unmap some buffers (preferably ones that haven't been accessed for some time) only when it's needed.

Comment 11 Nicolai Hähnle 2016-04-25 17:22:19 UTC

Well, you can open /proc/$pid/maps and read it, but that is inefficient and hackish. I don't know of a nice way to determine the total number of open mmaps in a process.

One possible approach we'd discussed is indeed to maintain a LRU list of open mappings and close the oldest one when a certain threshold number is reached. This might be the easiest way to avoid the crashes, but some applications might be hurt in terms of performance, if they expect to be able to update those small buffers often.

Comment 12 thomas.rinsch 2016-10-07 19:11:50 UTC

I kept trying to load the savegame occasionally and today it worked again.

Haven't tried it in a while, so I don't know if it was an application side change or a mesa change.

Comment 13 Michel Dänzer 2016-10-11 01:11:45 UTC

(In reply to thomas.rinsch from comment #12)
> I kept trying to load the savegame occasionally and today it worked again.

Glad to hear that.

> Haven't tried it in a while, so I don't know if it was an application side
> change or a mesa change.

I'd say it's most likely one of Nicolai's recent Mesa changes.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.