I see massive display corruption with xf86-video-intel 2.99.917 on a DragonFly 4.1/Ivy-Bridge system.
Windows contents or text strings don't get properly drawn, showing blank areas. Rectangular areas of the screen stay black or display the background root window content etc...
Sometimes random memory values are visible in the form of bright and changing pixels in the few top lines of the screen.
I have bisected this issue; it started with commit:
sna: Avoid pwriting large batches
Reverting it fixes the corruption.
That's quite a major kernel bug...
It's quite possible this is a kernel bug indeed.
The drm/i915 GEM code is currently a kind of hybrid between Linux 3.8.13 and some custom code originating from FreeBSD.
Everything appeared to work fine until that commit though.
There are so many potential failures here, that unless DragonFly also has a comprehensive error capture any guess is likely to be a red herring.
For example, it might be that LLC is comprehensively broken and ignoring it is the best approach:
diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 78ed540..579eb6c 100644
@@ -70,7 +70,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int num_pages,
#define DBG_NO_CREATE2 0
#define DBG_NO_USERPTR 0
#define DBG_NO_UNSYNCHRONIZED_USERPTR 0
-#define DBG_NO_LLC 0
+#define DBG_NO_LLC 1
#define DBG_NO_SEMAPHORES 0
#define DBG_NO_MADV 0
#define DBG_NO_UPLOAD_CACHE 0
Or that may just be fixing the symptoms and not the root cause.
This patch appears to fix all display corruption issues.
I had a brief look at the parts of the drm/i915 kernel code handling the LLC and they are identical to what's in Linux 3.8.13.
My test hardware is a Xeon E3-1245v2 if it matters.
Sure, it could be anything from a relocation bug or not supporting pwrite out of a mmaped bo. There are a whole lot of corner cases in the kernel, so first make sure that igt/gem_* pass.
intel-gpu-tools looks like it could be useful but the code sadly doesn't even compile. It seems to require a Solaris system.
intel-gpu-tools is written for Linux. The Solaris X11/DRI maintainers make sure
it builds on Solaris as well - presumably the BSD maintainers could do the same
for their platforms.
Whoever wrote the compile-time checks did it wrong:
#elif defined(_SC_PAGESIZE) && defined(_SC_PHYS_PAGES) /* Solaris */
I guess something like #elif defined(__sun) should be more accurate
(In reply to Francois Tigeot from comment #8)
> Whoever wrote the compile-time checks did it wrong:
> #elif defined(_SC_PAGESIZE) && defined(_SC_PHYS_PAGES) /* Solaris */
> I guess something like #elif defined(__sun) should be more accurate
While that code is used on Solaris, it should work on any OS with sysconf
support for those queries - I followed the autoconf standard of checking
for features vs. putting in a billion OS #ifdefs, so it would have a higher
chance of working on OS'es like the BSD's for which I don't know if they
support those features or not. If something's not working there, please
suggest how to make it work better.
I managed to get some of the intel-gpu-tools binaries to build. What sort of massive kernel bug should I look for ?
And what is the right place to report bugs for intel-gpu-tools itself ? Many parts of the code need to be changed to build and run on non-Linux based operating systems.
Product:DRI, Component:DRM/Intel is good for i-g-t bugs.
My first guess would be igt/gem_reloc_vs_gpu.