Bug 25912 - segfault whilst probing modes
Summary: segfault whilst probing modes
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 25610 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-01-06 06:49 UTC by Peter Clifton
Modified: 2010-01-06 15:48 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Additional patch required to fix it for me.. (4.36 KB, patch)
2010-01-06 13:17 UTC, Peter Clifton
no flags Details | Splinter Review

Description Peter Clifton 2010-01-06 06:49:11 UTC
This bug appears to be due to a number of causes:

1. Intermittent TV-out detection causes random changes in detection state for TV-out on my box.
Equivalent would be a hotplug event, or some other reason the mode-list changes during the probe.

2. libdrm uses two ioctl calls to get the information from the kernel - the first populates the number of modes available, libdrm then allocates memory for that number of modes, and then a second call to the ioctl fills in the data.

If the mode list has changed between the two ioctl calls (and there are now more modes than memory allocated to store the data), the second IOCTL cannot return the required list of modes, but still stores the number of modes available.

No check is made to ensure that the second IOCTL succeeded in populating the data for the right numer of modes, so this eventually leads to a crash in the xf86-video-intel driver when it comes to use the data:
#0  0x001cc137 in drmmode_ConvertFromKMode (scrn=<value optimised out>, 
    kmode=0x0, mode=0x87b0554) at ../../src/drmmode_display.c:255
255	../../src/drmmode_display.c: No such file or directory.
	in ../../src/drmmode_display.c
(gdb) bt
#0  0x001cc137 in drmmode_ConvertFromKMode (scrn=<value optimised out>, 
    kmode=0x0, mode=0x87b0554) at ../../src/drmmode_display.c:255
#1  0x001cc335 in drmmode_output_get_modes (output=0x87ab5e8)
    at ../../src/drmmode_display.c:844
#2  0x080cbb56 in xf86ProbeOutputModes (scrn=0x8797c88, maxX=8192, maxY=8192)
    at ../../../../hw/xfree86/modes/xf86Crtc.c:1607
#3  0x080cc551 in xf86InitialConfiguration (scrn=0x8797c88, canGrow=1)
    at ../../../../hw/xfree86/modes/xf86Crtc.c:2358
#4  0x001cdfd3 in drmmode_output_set_property (output=0x0, property=142279864, 
    value=0x4) at ../../src/drmmode_display.c:1098
#5  0x001b3455 in I830DrmModeInit (scrn=0x8797c88, flags=<value optimised out>)
    at ../../src/i830_driver.c:779
#6  I830PreInit (scrn=0x8797c88, flags=<value optimised out>)
    at ../../src/i830_driver.c:910
#7  0x080b6fa8 in InitOutput (pScreenInfo=0x81fd460, argc=8, argv=0xbfeb0ed4)
    at ../../../../hw/xfree86/common/xf86Init.c:841
#8  0x08066b3b in main (argc=8, argv=0xbfeb0ed4, envp=0xbfeb0ef8)
    at ../../dix/main.c:2
05

(gdb) frame 1
#1  0x001cc335 in drmmode_output_get_modes (output=0x87ab5e8)
    at ../../src/drmmode_display.c:844
844	in ../../src/drmmode_display.c
(gdb) print *koutput
$1 = {connector_id = 15, encoder_id = 0, connector_type = 6, 
  connector_type_id = 1, connection = DRM_MODE_CONNECTED, mmWidth = 0, 
  mmHeight = 0, subpixel = DRM_MODE_SUBPIXEL_UNKNOWN, count_modes = 4, 
  modes = 0x0, count_props = 7, props = 0x8798438, prop_values = 0x87ab8e8, 
  count_encoders = 1, encoders = 0x87ab280}

NB: count_modes is 4, but modes is NULL.

If we assume the most likely trigger of the error is a zero modes -> some modes transition, then we could put a NULL check in to the 2D driver (avoiding the segfault on my box). That can't however check for a "n" -> "n+1" modes transition, where the driver would just walk off the end of an incomplete list. That fix would need to be in libdrm, where the ioctls are called.
Comment 1 Peter Clifton 2010-01-06 07:03:30 UTC
I believe the most appropriate place to kludge around this bug is libdrm's xf86drmMode.c, in the drmModeGetConnector function - where the bad data is coming from.

More fundamentally though, the double ioctl call without some kind of locking against mode-list changes between them is flawed, and the root cause of this bug.

Something like this would kludge around it, abiet reporting no modes in the case of error. I guess in theory, the same bug could affect properties, and any other dual-ioctl based query.

--- libdrm-2.4.17+git20091230.c5c503b5.orig/xf86drmMode.c
+++ libdrm-2.4.17+git20091230.c5c503b5/xf86drmMode.c
@@ -373,6 +373,8 @@
 {
 	struct drm_mode_get_connector conn;
 	drmModeConnectorPtr r = NULL;
+	int num_modes_first;
+	int num_modes_second;
 
 	conn.connector_id = connector_id;
 	conn.connector_type_id = 0;
@@ -388,6 +390,8 @@
 	if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn))
 		return 0;
 
+	num_modes_first = conn.count_modes;
+
 	if (conn.count_props) {
 		conn.props_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint32_t)));
 		conn.prop_values_ptr = VOID2U64(drmMalloc(conn.count_props*sizeof(uint64_t)));
@@ -402,6 +406,13 @@
 	if (drmIoctl(fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn))
 		goto err_allocs;
 
+	num_modes_second = conn.count_modes;
+
+	if (num_modes_first != num_modes_second) {
+		fprintf (stderr, "DRM PROBLEM: OH DEAR Num modes before (%i) is not equal to that after (%i)\n",
+		         num_modes_first, num_modes_second);
+	}
+
 	if(!(r = drmMalloc(sizeof(*r)))) {
 		goto err_allocs;
 	}
@@ -418,7 +429,16 @@
 	r->count_props  = conn.count_props;
 	r->props        = drmAllocCpy(U642VOID(conn.props_ptr), conn.count_props, sizeof(uint32_t));
 	r->prop_values  = drmAllocCpy(U642VOID(conn.prop_values_ptr), conn.count_props, sizeof(uint64_t));
-	r->modes        = drmAllocCpy(U642VOID(conn.modes_ptr), conn.count_modes, sizeof(struct drm_mode_modeinfo));
+
+	if (num_modes_first >= num_modes_second) {
+		r->modes = drmAllocCpy(U642VOID(conn.modes_ptr), conn.count_modes, sizeof(struct drm_mode_modeinfo));
+	} else {
+		r->modes = NULL;
+	}
+	if (r->count_modes != 0 && r->modes == NULL) {
+		fprintf (stderr, "DRM PROBLEM: Copying modes failed.. zeroing count\n");
+		r->count_modes = 0;
+	}
 	r->count_encoders = conn.count_encoders;
 	r->encoders     = drmAllocCpy(U642VOID(conn.encoders_ptr), conn.count_encoders, sizeof(uint32_t));
 	r->connector_type  = conn.connector_type;
Comment 2 Peter Clifton 2010-01-06 07:22:09 UTC
*** Bug 25610 has been marked as a duplicate of this bug. ***
Comment 3 Chris Wilson 2010-01-06 07:45:54 UTC
I don't think the properties (and friends) are similarly affected by hotplugging -- i.e. I think they are fixed upon initialisation.

commit d1308f4fe7f94aae51ca9f70947aea8e09597f37
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 6 15:19:25 2010 +0000

    modes: Retry GETRESOURCES if a hotplug event occurs between the two ioctls
    
    Peter Clifton hit an issue whereby he had a spurious TV hotplug event
    that occurred between the two GETRESOURCES ioctls that caused the kernel
    to skip filling one array and led to a crash (as the size of the
    allocated and initialised block of memory differed from the reported
    size).
    
    Fixes: http://bugs.freedesktop.org/show_bug.cgi?id=25912
           Crash whilst probing modes
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reported-by: Peter Clifton <pcjc2@cam.ac.uk>

Comment 4 Peter Clifton 2010-01-06 08:36:41 UTC
Thanks Chris!

I'll let you know if I encounter any problems with that patch applied to libdrm. (WIll have to unpatch the drm driver to remove my (possible) fix for the TV-out misdetect issue).

Btw, I don't think the issue had anything to do with hotplug events in my case.. the TV is probed on each ioctl call, and either could detect attached / detached. Somewhat un-necessary in terms of time required to probe (and xrandr to return), but there you go.


Comment 5 Peter Clifton 2010-01-06 13:17:34 UTC
Created attachment 32480 [details] [review]
Additional patch required to fix it for me..

This takes Chris's changes for the GETRESOURCES ioctl, and apply them to the one which was crashing for me, GETCONNECTOR
Comment 6 Peter Clifton 2010-01-06 13:18:14 UTC
Re-opening, as the committed fix is not sufficient.
Comment 7 Chris Wilson 2010-01-06 13:42:54 UTC
Similar error?
Comment 8 Peter Clifton 2010-01-06 13:46:10 UTC
Yes, with just your patch (and my TV-probe fix removed), Xorg crashes like clockwork. Also, I spotted plymouthd hitting a segfault in libdrm.

Ubunut's core dump isn't working properly at the moment unfortunately though, so its not so easy to get a retrace.

With my patch as well, I haven't seen it crash (not doing xrandr at least... the damned lid is another matter!)
Comment 9 Chris Wilson 2010-01-06 15:48:29 UTC
Peter, thanks for the patch. Pushed:

commit 04f90a44709a48fb932ea954011cb551659bf246
Author: Peter Clifton <pcjc2@cam.ac.uk>
Date:   Wed Jan 6 20:44:11 2010 +0000

    modes: Retry GETCONNECTOR if a hotplug event occurs between the two ioctls

Hopefully this puts the bug to rest...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.