This bug appears to be due to a number of causes: 1. Intermittent TV-out detection causes random changes in detection state for TV-out on my box. Equivalent would be a hotplug event, or some other reason the mode-list changes during the probe. 2. libdrm uses two ioctl calls to get the information from the kernel - the first populates the number of modes available, libdrm then allocates memory for that number of modes, and then a second call to the ioctl fills in the data. If the mode list has changed between the two ioctl calls (and there are now more modes than memory allocated to store the data), the second IOCTL cannot return the required list of modes, but still stores the number of modes available. No check is made to ensure that the second IOCTL succeeded in populating the data for the right numer of modes, so this eventually leads to a crash in the xf86-video-intel driver when it comes to use the data: #0 0x001cc137 in drmmode_ConvertFromKMode (scrn=<value optimised out>, kmode=0x0, mode=0x87b0554) at ../../src/drmmode_display.c:255 255 ../../src/drmmode_display.c: No such file or directory. in ../../src/drmmode_display.c (gdb) bt #0 0x001cc137 in drmmode_ConvertFromKMode (scrn=<value optimised out>, kmode=0x0, mode=0x87b0554) at ../../src/drmmode_display.c:255 #1 0x001cc335 in drmmode_output_get_modes (output=0x87ab5e8) at ../../src/drmmode_display.c:844 #2 0x080cbb56 in xf86ProbeOutputModes (scrn=0x8797c88, maxX=8192, maxY=8192) at ../../../../hw/xfree86/modes/xf86Crtc.c:1607 #3 0x080cc551 in xf86InitialConfiguration (scrn=0x8797c88, canGrow=1) at ../../../../hw/xfree86/modes/xf86Crtc.c:2358 #4 0x001cdfd3 in drmmode_output_set_property (output=0x0, property=142279864, value=0x4) at ../../src/drmmode_display.c:1098 #5 0x001b3455 in I830DrmModeInit (scrn=0x8797c88, flags=<value optimised out>) at ../../src/i830_driver.c:779 #6 I830PreInit (scrn=0x8797c88, flags=<value optimised out>) at ../../src/i830_driver.c:910 #7 0x080b6fa8 in InitOutput (pScreenInfo=0x81fd460, argc=8, argv=0xbfeb0ed4) at ../../../../hw/xfree86/common/xf86Init.c:841 #8 0x08066b3b in main (argc=8, argv=0xbfeb0ed4, envp=0xbfeb0ef8) at ../../dix/main.c:2 05 (gdb) frame 1 #1 0x001cc335 in drmmode_output_get_modes (output=0x87ab5e8) at ../../src/drmmode_display.c:844 844 in ../../src/drmmode_display.c (gdb) print *koutput $1 = {connector_id = 15, encoder_id = 0, connector_type = 6, connector_type_id = 1, connection = DRM_MODE_CONNECTED, mmWidth = 0, mmHeight = 0, subpixel = DRM_MODE_SUBPIXEL_UNKNOWN, count_modes = 4, modes = 0x0, count_props = 7, props = 0x8798438, prop_values = 0x87ab8e8, count_encoders = 1, encoders = 0x87ab280} NB: count_modes is 4, but modes is NULL. If we assume the most likely trigger of the error is a zero modes -> some modes transition, then we could put a NULL check in to the 2D driver (avoiding the segfault on my box). That can't however check for a "n" -> "n+1" modes transition, where the driver would just walk off the end of an incomplete list. That fix would need to be in libdrm, where the ioctls are called.
I believe the most appropriate place to kludge around this bug is libdrm's xf86drmMode.c, in the drmModeGetConnector function - where the bad data is coming from. More fundamentally though, the double ioctl call without some kind of locking against mode-list changes between them is flawed, and the root cause of this bug. Something like this would kludge around it, abiet reporting no modes in the case of error. I guess in theory, the same bug could affect properties, and any other dual-ioctl based query. --- libdrm-2.4.17+git20091230.c5c503b5.orig/xf86drmMode.c +++ libdrm-2.4.17+git20091230.c5c503b5/xf86drmMode.c @@ -373,6 +373,8 @@ { struct drm_mode_get_connector conn; drmModeConnectorPtr r = NULL; + int num_modes_first; + int num_modes_second; conn.connector_id = connector_id; conn.connector_type_id = 0; @@ -388,6 +390,8 @@ if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn)) return 0; + num_modes_first = conn.count_modes; + if (conn.count_props) { conn.props_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint32_t))); conn.prop_values_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint64_t))); @@ -402,6 +406,13 @@ if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn)) goto err_allocs; + num_modes_second = conn.count_modes; + + if (num_modes_first != num_modes_second) { + fprintf (stderr, "DRM PROBLEM: OH DEAR Num modes before (%i) is not equal to that after (%i)\n", + num_modes_first, num_modes_second); + } + if(!(r = drmMalloc(sizeof(*r)))) { goto err_allocs; } @@ -418,7 +429,16 @@ r->count_props = conn.count_props; r->props = drmAllocCpy(U642VOID(conn.props_ptr), conn.count_props, sizeof(uint32_t)); r->prop_values = drmAllocCpy(U642VOID(conn.prop_values_ptr), conn.count_props, sizeof(uint64_t)); - r->modes = drmAllocCpy(U642VOID(conn.modes_ptr), conn.count_modes, sizeof(struct drm_mode_modeinfo)); + + if (num_modes_first >= num_modes_second) { + r->modes = drmAllocCpy(U642VOID(conn.modes_ptr), conn.count_modes, sizeof(struct drm_mode_modeinfo)); + } else { + r->modes = NULL; + } + if (r->count_modes != 0 && r->modes == NULL) { + fprintf (stderr, "DRM PROBLEM: Copying modes failed.. zeroing count\n"); + r->count_modes = 0; + } r->count_encoders = conn.count_encoders; r->encoders = drmAllocCpy(U642VOID(conn.encoders_ptr), conn.count_encoders, sizeof(uint32_t)); r->connector_type = conn.connector_type;
*** Bug 25610 has been marked as a duplicate of this bug. ***
I don't think the properties (and friends) are similarly affected by hotplugging -- i.e. I think they are fixed upon initialisation. commit d1308f4fe7f94aae51ca9f70947aea8e09597f37 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 6 15:19:25 2010 +0000 modes: Retry GETRESOURCES if a hotplug event occurs between the two ioctls Peter Clifton hit an issue whereby he had a spurious TV hotplug event that occurred between the two GETRESOURCES ioctls that caused the kernel to skip filling one array and led to a crash (as the size of the allocated and initialised block of memory differed from the reported size). Fixes: http://bugs.freedesktop.org/show_bug.cgi?id=25912 Crash whilst probing modes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Thanks Chris! I'll let you know if I encounter any problems with that patch applied to libdrm. (WIll have to unpatch the drm driver to remove my (possible) fix for the TV-out misdetect issue). Btw, I don't think the issue had anything to do with hotplug events in my case.. the TV is probed on each ioctl call, and either could detect attached / detached. Somewhat un-necessary in terms of time required to probe (and xrandr to return), but there you go.
Created attachment 32480 [details] [review] Additional patch required to fix it for me.. This takes Chris's changes for the GETRESOURCES ioctl, and apply them to the one which was crashing for me, GETCONNECTOR
Re-opening, as the committed fix is not sufficient.
Similar error?
Yes, with just your patch (and my TV-probe fix removed), Xorg crashes like clockwork. Also, I spotted plymouthd hitting a segfault in libdrm. Ubunut's core dump isn't working properly at the moment unfortunately though, so its not so easy to get a retrace. With my patch as well, I haven't seen it crash (not doing xrandr at least... the damned lid is another matter!)
Peter, thanks for the patch. Pushed: commit 04f90a44709a48fb932ea954011cb551659bf246 Author: Peter Clifton <pcjc2@cam.ac.uk> Date: Wed Jan 6 20:44:11 2010 +0000 modes: Retry GETCONNECTOR if a hotplug event occurs between the two ioctls Hopefully this puts the bug to rest...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.