Summary: | memcpy accessing GPU memory mappings using SSE instructions breaks in KVM | ||
---|---|---|---|
Product: | Mesa | Reporter: | maxamar |
Component: | Other | Assignee: | mesa-dev |
Status: | RESOLVED MOVED | QA Contact: | mesa-dev |
Severity: | critical | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
maxamar
2019-06-26 12:02:55 UTC
(In reply to maxamar from comment #0) > [ 131.909] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) [0x55f57cf882c9] > [ 131.909] (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x50) [0x7fbb6e85977f] > [ 131.910] (EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (memcpy+0x2d7) [0x7fbb6e7263b7] > [...] > [ 131.914] (EE) Illegal instruction at address 0x7fbb6e7262f7 This looks like a bug in /lib/x86_64-linux-gnu/libc.so.6, executing an instruction which isn't supported by your CPU. > After (replace memcpy in mesa libs in radeonsi with custom simple impl): How exactly did you "replace memcpy"? > X boots ok but error in amdgpu dmesg (hangs): > [ 3473.934176] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, > signaled seq=2, emitted seq=3 If this isn't due to an issue with your memcpy replacement, it's probably a Mesa issue or maybe a kernel one, but most certainly not an xf86-video-amdgpu one. > This looks like a bug in /lib/x86_64-linux-gnu/libc.so.6, executing an instruction which isn't supported by your CPU. That's really KVM by-design bug - offending instruction is SSE movups which in conjunction with GPU address space requires KVM to emulate this SSE instruction which it can't. I insist that this is Mesa's issue as standard memcpy would need to know if it runs inside KVM & if the address is in GPU space. Mesa should have it's own memcpy at least for accessing GPU memory space. > How exactly did you "replace memcpy"? In the source code replace calls to memcpy with calls to memcpy_new. > If this isn't due to an issue with your memcpy replacement, it's probably a Mesa issue or maybe a kernel one, but most certainly not an xf86-video-amdgpu one. These messages are generated by AMD amdgpu kernel module. BTW I solved my issue by changing BIOS to UEFI in KVM, however, baremetal version still doesn't work which is not good. Now my glibc memcpy choses another path without SSE instructions, I think. Issue reoccurs after change 10GB RAM -> 30GB RAM in KVM (gdm3 logs). Can't get Ubuntu VM to boot with all SSE cpuid flags disabled. (In reply to maxamar from comment #2) > That's really KVM by-design bug [...] Please take it up with KVM folks then. As i've checked must be at least SSE & SSE2 for it to boot. There were other reports on Proxmox forums which stated that with GPU it only works with with small RAM. > Please take it up with KVM folks then. For what - that's "by-design". Related bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202643 Fix https://svnweb.freebsd.org/ports?view=revision&revision=489754 > Disable use of SSE instructions in Xorg's xf86SlowBcopy() function. > > When such instructions are used to copy data from/to mapped video > memory, some hypervisors (e.g. KVM, Microsoft Hyper-V) can generate > SIGILL or SIGBUS exceptions, causing Xorg to crash. memcpy & memmove should get the same fix Other hypervisors should also benefit from this as real code must be faster than their emulation (that's how they possibly solve this). Reassigning to Mesa, but TBH I wouldn't expect anything to happen anytime soon (unless you do it yourself). Mesa definitely wants to use an optimized memcpy on bare metal, so replacing memcpy everywhere is probably out of the question, and finding all code in Mesa where this could happen might be tricky. Also note that APIs such as OpenGL or Vulkan expose such GPU mappings to applications directly, so the approach you're suggesting would likely require fixing a lot of application/framework code as well. It would most likely be less painful if this could be solved in KVM somehow, or if you just override the default memcpy implementation (with a known-good one) on your system. Then maybe it would be better if glibc exposed API to mark regions of memory as non-SSE. It seems that support for movups emulation had been added in 4.17 https://github.com/torvalds/linux/commit/29916968c48691c94be466a0b47cc9adcea9cb8d Sorry but this is not a bug at all. As Michel already noted core Vulkan as well as some OpenGL/OpenCL extensions mandate that the platform support all well aligned memory accesses to GPU local memory (VRAM). If your platform (KVM in this case) can't do this for some reason you simply can't use that platform with this software. In other words even if you replace memcpy/memset in Mesa with custom non SSE versions it is perfectly valid for an application to use SSE to access VRAM. And you can't change a binary application (which is actually just conforming to a standard). The only possible workaround I can see in the driver is to not use VRAM at all for CPU mappings. That's actually rather easily doable, but would potentially cripple performance quite a bit. I can point you to the necessary bits of code if you are interested in that. (In reply to Christian König from comment #10) > Sorry but this is not a bug at all. > > As Michel already noted core Vulkan as well as some OpenGL/OpenCL extensions > mandate that the platform support all well aligned memory accesses to GPU > local memory (VRAM). > > If your platform (KVM in this case) can't do this for some reason you simply > can't use that platform with this software. > > In other words even if you replace memcpy/memset in Mesa with custom non SSE > versions it is perfectly valid for an application to use SSE to access VRAM. > And you can't change a binary application (which is actually just conforming > to a standard). > > The only possible workaround I can see in the driver is to not use VRAM at > all for CPU mappings. That's actually rather easily doable, but would > potentially cripple performance quite a bit. > > I can point you to the necessary bits of code if you are interested in that. Yes and somehow Mesa uses "movups" instruction which is: MOVUPS-Move Unaligned Packed Single-Precision Floating-Point So is a bug. Correct version is movaps which copies aligned data (is supported in KVM since long ago). Yes it is in glibc and what's so - don't use it then. KVM is part of Linux so must be supported. Anyway upgrading kernel to 4.17 seems to solve the problem, needs a test. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/942. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.