I am getting a freeze when I attempt to use xrandr. My equipment is a GM45 motherboard with VGA and two SDVO/TMDS displays. "xrandr -q" works, but the server goes into an infinite loop. This does not occur every time, but does most times this is done. The build is as close as I can make it to the latest git checkouts. The kernel is 2.6.28-rc7-pae, from the anholt/drm-intel-next archive. The kernel version 2.6.28-rc4-pae, with the same user space, does not do this. I have investigated the problem somewhat with gdb and loading (and recompiling) the drm module with debugging enabled. The variable master->lock.hw_lock is NULL in the kernel function drm_lock. I get the following stack trace when the loop is interrupted. **************************************************************************** #0 0xffffe430 in __kernel_vsyscall () #1 0xb7c123e9 in ioctl () from /lib/libc.so.6 #2 0xb7a8034f in drmIoctl (fd=16, request=1074291754, arg=0xbfb211d8) at xf86drm.c:183 #3 0xb7a81107 in drmGetLock (fd=16, context=1, flags=0) at xf86drm.c:1297 #4 0xb7a8d7d0 in DRILock (pScreen=0x830ef70, flags=0) at dri.c:2201 #5 0xb7a8a587 in DRIScreenInit (pScreen=0x830ef70, pDRIInfo=0x82f2b38, pDRMFD=0x82620b8) at dri.c:525 #6 0xb7a4623e in I830DRIScreenInit (pScreen=0x830ef70) at i830_dri.c:631 #7 0xb7a0f805 in I830ScreenInit (scrnIndex=0, pScreen=0x830ef70, argc=8, argv=0xbfb21574) at i830_driver.c:3106 #8 0x080699e9 in AddScreen (pfnInit=0xb7a0f63d <I830ScreenInit>, argc=8, argv=0xbfb21574) at main.c:688 #9 0x080bc29e in InitOutput (pScreenInfo=0x824e160, argc=8, argv=0xbfb21574) at xf86Init.c:1245 #10 0x08068d34 in main (argc=8, argv=0xbfb21574, envp=0xbfb21598) at main.c:309 **************************************************************************** When the kernel function drm_lock finds the variable master->lock.hw_lock to be NULL, it returns EINTR, and this causes an infinite loop invoking an ioctl in drmIoctl. A nasty side bug is that the DRM_DEBUG line in drm_lock assumes master->lock.hw_lock is non-NULL, and causes an OOPS when debugging is enabled.
When I loaded the i915 module with modeset set to 1, the problem dissappeared. The kernel configuration option CONFIG_DRM_I915_KMS is not set. So as long as I load i915 with modeset set to 1, the problem is resolved.
Created attachment 20969 [details] dmesg output with warnings
Created attachment 20970 [details] lspci output
Created attachment 20973 [details] gdb log output illustrating the resetting of variable causing loop.
The lock.hw_lock variable in the kernel that causes the infinite loop by being reset is being reset upon screen closure called upon an xrandr operation. An error message "error setting MTRR (base = 0x20000000, size = 0x10000000, type = 1) Invalid argument (22)" is emitted by both kernel/dmesg and by libpciaccess. Following this, there is a close, and in that close operation the flag is reset. The associated gdb log illustrates this activity. The 3rd pci region has the failing base address/size.
Created attachment 21036 [details] Code from drm_bufs.c showing DRM_SHM branch. This is the code in drm/drm_bufs.c (kernel module) for the ioctl DRM_IOCTL_ADD_MAP, type DRM_SHM. This mapping must be invoked from user space before the opened dri device can be locked. It normally sets the master->lock.hw_lock variable. However, if the function drm_find_matching_map returns something, then the ioctl returns 0, and the master->lock.hw_lock is never set, resulting in the infinite loop when locking the dri channel is attempted later from user space.
After a comment by Dave Airlie that if a map has been created already, then that lock should be the primary, I analysed the addition of maps to the "dev->maplist" list in the drm code. Each time randr is run, all drm file descriptors are closed and reopened, and the list accumulates map elements. However these elements are never removed, because, although there is code to delete elements from the list, that code is never executed. It could be if a user space ioctl were invoked, but that does not happen. The old elements in the list are generally not reused, because they have a "master" field that identifies them with the "master" structure active in the device when the map was created. These master structures are allocated and freed on each xrandr close/reopen cycle. However if, by chance, an old "master" structure is returned by the dynamic memory allocation function, then one of the elements is reused, and the branch is taken in the code that does not set the master hwlock. So the infinite loop is entered when a later attempt is made to lock the drm. It seems problematic to free dynamic master structures, without also freeing any elements that rely on a reference to that structure for correct operation. Actually it seems problematic having all these map structures hanging round anyway, because I cannot find where they are ever freed, so at least they represent a memory leak.
Are you running a bare server (no other clients running) so that the server regenerates after each xrandr call?
(In reply to comment #8) > Are you running a bare server (no other clients running) so that the server > regenerates after each xrandr call? > Yes. I am targeting an embedded system. Usually I like to work by remotely logging in using ssh and running xterms on a remote computer, while running a rudimentary display on the target system. However there is a requirement to run a full desktop on the target, for developers. When I debug the xserver, with gdb, I also run only an xorg, with gdb.
Created attachment 21101 [details] Patch with pringk's, and dmesg output hopefully illustrating the error. This is the printk output from dmesg with my added printks in a patch. At module removal, there are 25 entries in the maplist.
Created attachment 21105 [details] My shell script for launching problem. Command is sudo stx -min -gdb To cause the error, I have compiled and installed X11 at prefix /usr/local/x11prefix, and I run this script with $ sudo stx -min -gdb On another terminal, I run xrandr, similar to what is in the script, but really just xrandr -q is necessary.
Generally no user environment involves server regens, so it may be in your best interest to avoid running a testing environment involving that code path. (Basically, run an xlogo or xterm or something before playing with xrandr). Still a bug.
I can confirm that the error does not occur while xlogo is also running, because the "master" is not deallocated.
If the /dev/dri/cardN file descriptor is held open by a paused process, then this freeze error does not occur either, (as well as in the situation of running an x application). However during xrandr operations, the i915 heap allocation is recalled, and since no heap deallocation has been invoked, errors occur in the dmesg output of the nature: [drm:i915_mem_heap_init] *ERROR* heap already initialised? (except that this error currently has no newline, so it runs onto the next dmesg message on output)
Created attachment 21216 [details] kernel patch that addresses error This kernel patch to drm_stubs.c removes all maps in dev->maplist that reference the master when the master structure is being freed, after invocation of the device destroy callback. It doesn't appear to introduce any new quirks. Use at your own risk.
Comment on attachment 21216 [details] kernel patch that addresses error What kernel version is the patch against? I tried it vs 2.6.17.10 and 2.6.18-pre9, no joy, 1 of 1 hunks rejected.
[Bug 18967] Xorg freeze after using xrandr, drm debug error. bugzilla-daemon Tue, 16 Dec 2008 15:08:54 -0800 http://bugs.freedesktop.org/show_bug.cgi?id=18967 --- Comment #15 from peter garrone <pgarr...@optusnet.com.au> 2008-12-16 15:08:28 PST --- Created an attachment (id=21216) --> (http://bugs.freedesktop.org/attachment.cgi?id=21216) kernel patch that addresses error This kernel patch to drm_stubs.c removes all maps in dev->maplist that reference the master when the master structure is being freed, after invocation of the device destroy callback. It doesn't appear to introduce any new quirks. Use at your own risk. Thank you all for working on this, I suspect this may effect more than just the intel servers. What kernel version is the diff against? I tried it against 2.6.17.10 and 2.7.18-pre9
Hi, Freedesktop's Bugzilla instance is EOLed and open bugs are about to be migrated to http://gitlab.freedesktop.org. To avoid migrating out of date bugs, I am now closing all the bugs that did not see any activity in the past year. If the issue is still happening, please create a new bug in the relevant project at https://gitlab.freedesktop.org/drm (use misc by default). Sorry about the noise!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.