Bug 3217

Summary: Computer hangs during X.org shutdown. I get bad page state at __free_pages_ok if I /etc/init.d/xdm stop. i915 glx xorg 6.8.99.3
Product: DRI Reporter: Thomas <thomasa88>
Component: libglxAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: high CC: battousai, shrek, stachon, vsu
Version: XOrg git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Picture of error
none
errors/oops logs
none
CVS change which seems to introduce the problem
none
patch to fix drm_pci_alloc() so that subsequent drm_pci_free() works
none
patch to fix memory leaks in the PCI consistent memory handling code none

Description Thomas 2005-05-06 03:30:56 UTC
Upstreamed from bugs.gentoo.org:

To get direct rendering to work with my card (Intel 915GM) I had to get the 
latest xorg (6.8.99.3). I got it working after some testing but then I 
discovered that anytime I try 
to shutdown/restart X.org my computer just hangs.

I also tried loging in into ctrl+alt+f1 and running /etc/init.d/xdm stop. Then I 
get a lot of text scrolling by, seems to be stacktraces repeated over and over 
again. Finally the 
scrolling stops. The other time I tried I got a kernel panic but I didnt have a 
paper ready and just rebooted, this time I got a stacktrace. I didnt write it 
all down cuz its so much,
 but it says

"Trying to fix ut up, but a reboot is needed
Bad page state at __free_pages_ok (in process 'X' "

The stacktrace seems to be i915 trying to do something.

Edit. Took my time and took a foto and wrote it all down (puh)
 [<e00c846f>] i915_dma_cleanup+0x17f/0x1b0 [i915]
 [<e00c8a89>] i915_dma_init+0xa9/0xb0 [i915]
 [<e00dbba1>] drm_ioctl+0xf1/ox1b8 [drm]
 [<c0170480>] do_ioctl+0x70/0xa0
 [<c01706d5>] vfs_ioctl+0x65/0x1f0
 [<c01708a5>] sys_ioctl+0x45/0x70
 [<c01031cf>] syscall_call+0x7/0xb
Trying to fix it up, but a reboot is needed
Bad page state at __free_pages_ok (in process 'X', page c13f9020)
flags:0x00000000 mapping:00000fa0 mapcount:-1069243967 count:260)
Backtrace:000000 mapping:00000fa0 mapcount:-1069243967 count:260)
 [<c01411f5>] bad_page+0x75/0xb00 mapcount:-1069243967 count:260)
 [<c01414b0>] __free_pages_ok+0x70/0xd0unt:-1069243967 count:260)
 [<c01421f7>] __free_pages+0x37/0x50xd0unt:-1069243967 count:260)
 [<e00e0603>] drm_pci_free+0x43/0x50 [drm]:-1069243967 count:260)
 [<e00c846f>] i915_dma_cleanup+0x17f/0x1b0 [i915]43967 count:260)
 [<e00c8a89>] i195_dma_init+0xa9/0xb0 [i915]i915]43967 count:260)
 [<e00dbba1>] drm_ioctl+0xf1/0x1b8 [drm]915]i915]43967 count:260)
 [<c0170480>] do_ioctl+0x70/0xa0b8 [drm]915]i915]43967 count:260)
 [<c01706d5>] vfs_ioctl+0x65/0x1f0 [drm]915]i915]43967 count:260)
 [<c01708a5>] sys_ioctl+0x45/0x700 [drm]915]i915]43967 count:260)
 [<c01031cf>] syscall_call+0x7/0xb [drm]915]i915]43967 count:260)
Trying to fix it up, but a reboot is needed]i915]43967 count:260)
Trying to fix it up, but a reboot is needed]i915]43967 count:260)

There could be misspellings, orig pic: http://rapidshare.de/files/1602763/
xorg_error_100_2669.jpg.html

Reproducible: Always
Steps to Reproduce:
1.start X with glx enabled and driver i810 (xorg's own ver) + module: i915 drm
2.shutdown X

Actual Results:  
(/etc/init.d/xdm stop gave lots of text then) 
The computer freezed. 

Expected Results:  
Continue running 

How I installed the i915(gm)-driver. 
 
downloaded common-20050504-linux.i386 and i915-20050504-linux.i386 from 
http://dri.freedesktop.org/snapshots 
configured kernel not to have drm 
installed the drivers 
reinstalled xorg 6.8.99.3 cuz the driver installers replaced the xorg-driver
Comment 1 Thomas 2005-05-06 03:33:47 UTC
Created attachment 2625 [details]
Picture of error
Comment 2 Thomas 2005-05-14 05:41:21 UTC
now using gcc 3.4, xorg 6.8.99.5 (from portage) and 20050513 dri-driver. Now I 
can let the install.sh-script install the kernel modules without disturbing 
anything, but my computer still hangs when I try to stop/restart X.
Comment 3 Christoph Fritz 2005-05-19 09:25:12 UTC
I use sarge with a 2.6.12-rc4 Kernel to get my i915GMm-HFS AOpen Motherboard
successfully working for dri/drm/agpgart (onboard graphiccard):

Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected an Intel 915GM Chipset.
agpgart: Detected 764K stolen memory.
agpgart: AGP aperture is 256M @ 0xc0000000
[drm] Initialized drm 1.0.0 20040925
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
[drm] Initialized i915 1.1.0 20040405 on minor 0: Intel Corporation Mobile
915GM/GMS/910GML Express Graphics Controller

The standard XFree86 4.3.0.dfsg.1-1 is configured by /etc/X11/XF86Config-4
...
Section "Device"
        Identifier       "myMoth"
        Driver "i810"
        Option "DRI"
EndSection
...

/var/log/XFree86.0.log shows:
...
(II) I810(0): [drm] installed DRM signal handler
(II) I810(0): [DRI] installation complete
(II) I810(0): direct rendering: Enabled
...
also some warnings
(WW) I810(0): Bad V_BIOS checksum
(II) I810(0): Primary V_BIOS segment is: 0xc000

But now when I do a 'glxinfo' I get "direct rendering: No"
??? This really confused me.

And how do I get 3d-Hardware-Support to play QuakeIII with full-speed?
Mesa is installed:
$ dpkg -l  | grep mesa
ii  xlibmesa-gl    4.3.0.dfsg.1-1 Mesa 3D graphics library [XFree86]
ii  xlibmesa-gl-de 4.3.0.dfsg.1-1 Mesa 3D graphics library development files
ii  xlibmesa-glu   4.3.0.dfsg.1-1 Mesa OpenGL utility library [XFree86]
ii  xlibmesa-glu-d 4.3.0.dfsg.1-1 Mesa OpenGL utility library development file


It would be great if someone out there could help me.

PS: $( dmesg ) shows a 
"mtrr: base(0xc0020000) is not aligned on a size(0x180000) boundary"
is that due to my gfx-card?
Comment 4 Steven Newbury 2005-05-19 15:22:56 UTC
(In reply to comment #3)
> I use sarge with a 2.6.12-rc4 Kernel to get my i915GMm-HFS AOpen Motherboard
> successfully working for dri/drm/agpgart (onboard graphiccard):
> 
> Linux agpgart interface v0.101 (c) Dave Jones
> agpgart: Detected an Intel 915GM Chipset.
> agpgart: Detected 764K stolen memory.
> agpgart: AGP aperture is 256M @ 0xc0000000
> [drm] Initialized drm 1.0.0 20040925
> ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
> [drm] Initialized i915 1.1.0 20040405 on minor 0: Intel Corporation Mobile
> 915GM/GMS/910GML Express Graphics Controller
> 
> The standard XFree86 4.3.0.dfsg.1-1 is configured by /etc/X11/XF86Config-4


That is a very old release of X, you might do better with xorg-6.8.2.

> ...
> Section "Device"
>         Identifier       "myMoth"
>         Driver "i810"
>         Option "DRI"
> EndSection
> ...
> 
> /var/log/XFree86.0.log shows:
> ...
> (II) I810(0): [drm] installed DRM signal handler
> (II) I810(0): [DRI] installation complete
> (II) I810(0): direct rendering: Enabled
> ...
> also some warnings
> (WW) I810(0): Bad V_BIOS checksum
> (II) I810(0): Primary V_BIOS segment is: 0xc000
> 
> But now when I do a 'glxinfo' I get "direct rendering: No"
> ??? This really confused me.

Try: export LIBGL_DEBUG=1

> 
> And how do I get 3d-Hardware-Support to play QuakeIII with full-speed?
> Mesa is installed:
> $ dpkg -l  | grep mesa
> ii  xlibmesa-gl    4.3.0.dfsg.1-1 Mesa 3D graphics library [XFree86]
> ii  xlibmesa-gl-de 4.3.0.dfsg.1-1 Mesa 3D graphics library development files
> ii  xlibmesa-glu   4.3.0.dfsg.1-1 Mesa OpenGL utility library [XFree86]
> ii  xlibmesa-glu-d 4.3.0.dfsg.1-1 Mesa OpenGL utility library development file
> 
> 
> It would be great if someone out there could help me.
> 
> PS: $( dmesg ) shows a 
> "mtrr: base(0xc0020000) is not aligned on a size(0x180000) boundary"
> is that due to my gfx-card?
Yes, it will be for the write-combining for the video memory, it shouldn't be
the problem.
Comment 5 Christoph Fritz 2005-05-20 00:33:55 UTC
Hi,

just coming from a "Descent 3"-Session on my AOpen i915GMm-HFS Board. I have no
crash wehn I logout or do something else. AFAIK it's stable.

I use a 2.6.12-rc4 Kernel because of the newest implemented DRM, agpgart,
intel_agp and i915 modules. They initialize and work as described above.

Then I installed as described in http://dri.freedesktop.org/wiki/Building X.Org
and DRI from CVS. For DRI I just copied the Mesa/lib/i915_dri.so to
/usr/X11R6/lib/modules/dri. To get DRI compiled I can't set the include-path to
the kernel's DRM directory because of some differences in the drm.h file. So I
also grabbed the drm-cvs to let DRI use these includefiles.
That's all.

For some games I get glitches (very slow animation), e.g. shooting in Soldier Of
Fortune.

$ glxgears 
libGL warning: 3D driver claims to not support visual 0x22
libGL warning: 3D driver claims to not support visual 0x23
libGL warning: 3D driver claims to not support visual 0x26
libGL warning: 3D driver claims to not support visual 0x27
6458 frames in 5.0 seconds = 1291.525 FPS
6356 frames in 5.0 seconds = 1271.038 FPS
6354 frames in 5.0 seconds = 1270.744 FPS
6358 frames in 5.0 seconds = 1271.442 FPS

libGL is a new which was built by Xorg I think:
/usr/X11R6/lib/libGL.so.1.2

A strange thing is, that before glxinfo or glxgears is running this message is
printed out (by I don't know):
libGL warning: 3D driver claims to not support visual 0x22
libGL warning: 3D driver claims to not support visual 0x23
libGL warning: 3D driver claims to not support visual 0x26
libGL warning: 3D driver claims to not support visual 0x27

$ glxinfo
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.2
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_visual_info, GLX_EXT_visual_rating, 
    GLX_EXT_import_context, GLX_OML_swap_method, GLX_SGI_make_current_read, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig
client glx vendor string: SGI
client glx version string: 1.4
client glx extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
    GLX_MESA_swap_control, GLX_MESA_swap_frame_usage, GLX_OML_swap_method, 
    GLX_OML_sync_control, GLX_SGI_make_current_read, GLX_SGI_swap_control, 
    GLX_SGI_video_sync, GLX_SGIS_multisample, GLX_SGIX_fbconfig, 
    GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group
GLX extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
    GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGIS_multisample, 
    GLX_SGIX_fbconfig, GLX_SGIX_visual_select_group
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 915GM 20041217 x86/MMX/SSE2
OpenGL version string: 1.3 Mesa 6.3
OpenGL extensions:
    GL_ARB_depth_texture, GL_ARB_fragment_program, GL_ARB_imaging, 
    GL_ARB_multisample, GL_ARB_multitexture, GL_ARB_point_parameters, 
    GL_ARB_shadow, GL_ARB_texture_border_clamp, GL_ARB_texture_compression, 
    GL_ARB_texture_cube_map, GL_ARB_texture_env_add, 
    GL_ARB_texture_env_combine, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_mirrored_repeat, GL_ARB_texture_rectangle, 
    GL_ARB_transpose_matrix, GL_ARB_vertex_buffer_object, 
    GL_ARB_vertex_program, GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, 
    GL_EXT_blend_color, GL_EXT_blend_equation_separate, 
    GL_EXT_blend_func_separate, GL_EXT_blend_minmax, GL_EXT_blend_subtract, 
    GL_EXT_clip_volume_hint, GL_EXT_cull_vertex, GL_EXT_compiled_vertex_array, 
    GL_EXT_convolution, GL_EXT_copy_texture, GL_EXT_draw_range_elements, 
    GL_EXT_fog_coord, GL_EXT_histogram, GL_EXT_multi_draw_arrays, 
    GL_EXT_packed_pixels, GL_EXT_point_parameters, GL_EXT_polygon_offset, 
    GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_separate_specular_color, GL_EXT_shadow_funcs, GL_EXT_stencil_wrap, 
    GL_EXT_subtexture, GL_EXT_texture, GL_EXT_texture3D, 
    GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 
    GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 
    GL_EXT_texture_filter_anisotropic, GL_EXT_texture_lod_bias, 
    GL_EXT_texture_object, GL_EXT_texture_rectangle, GL_EXT_vertex_array, 
    GL_3DFX_texture_compression_FXT1, GL_APPLE_client_storage, 
    GL_APPLE_packed_pixels, GL_ATI_blend_equation_separate, 
    GL_IBM_rasterpos_clip, GL_IBM_texture_mirrored_repeat, 
    GL_INGR_blend_func_separate, GL_MESA_pack_invert, GL_MESA_ycbcr_texture, 
    GL_MESA_window_pos, GL_NV_blend_square, GL_NV_light_max_exponent, 
    GL_NV_texture_rectangle, GL_NV_texgen_reflection, GL_NV_vertex_program, 
    GL_NV_vertex_program1_1, GL_OES_read_format, GL_SGI_color_matrix, 
    GL_SGI_color_table, GL_SGIS_generate_mipmap, GL_SGIS_texture_border_clamp, 
    GL_SGIS_texture_edge_clamp, GL_SGIS_texture_lod, GL_SGIX_depth_texture, 
    GL_SUN_multi_draw_arrays

   visual  x  bf lv rg d st colorbuffer ax dp st accumbuffer  ms  cav
 id dep cl sp sz l  ci b ro  r  g  b  a bf th cl  r  g  b  a ns b eat
----------------------------------------------------------------------
0x22 16 tc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0x23 16 tc  0 16  0 r  .  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0x24 16 tc  0 16  0 r  y  .  5  6  5  0  0 16  8  0  0  0  0  0 0 Slow
0x25 16 tc  0 16  0 r  .  .  5  6  5  0  0 16  8  0  0  0  0  0 0 Slow
0x26 16 tc  0 16  0 r  y  .  5  6  5  0  0 16  0 16 16 16  0  0 0 Slow
0x27 16 tc  0 16  0 r  .  .  5  6  5  0  0 16  0 16 16 16  0  0 0 Slow
0x28 16 tc  0 16  0 r  y  .  5  6  5  0  0 16  8 16 16 16  0  0 0 Slow
0x29 16 tc  0 16  0 r  .  .  5  6  5  0  0 16  8 16 16 16  0  0 0 Slow


Maybe this info will help for developing and testing. And maybe there is someone
out there who can tell me what's up with this libGL warnings. 
Comment 6 Christoph Fritz 2005-05-20 03:16:05 UTC
When I start quake3, the intro isn't shown - there is just a black screen. When
I start a more difficult level with I think more OpenGL-Effects, quake3 crashes.

For me it seems that this issue is caused by the mentioned warnings:
libGL warning: 3D driver claims to not support visual 0x22
libGL warning: 3D driver claims to not support visual 0x23
libGL warning: 3D driver claims to not support visual 0x26
libGL warning: 3D driver claims to not support visual 0x27
Comment 7 Ian Romanick 2005-05-20 07:27:13 UTC
You can safely ignore the "3D driver claims to not support visual" messages. 
I'll try to clean those up today.
Comment 8 Ian Romanick 2005-05-20 08:09:09 UTC
*** Bug 3351 has been marked as a duplicate of this bug. ***
Comment 9 Sergey Vlasov 2005-05-21 06:35:05 UTC
Created attachment 2731 [details]
errors/oops logs

The same problem observed in ALT Linux Sisyphus with kernel 2.6.11.10 and DRI
CVS snapshot from 2005-05-13.  Kernel message log is attached.
Comment 10 Sergey Vlasov 2005-05-21 08:05:03 UTC
Created attachment 2732 [details] [review]
CVS change which seems to introduce the problem

Seems that the problem was introduced by this CVS commit:

date: 2005-04-26 05:19:11 +0000;  author: anholt
Convert BSD code to mostly use bus_dma, the dma abstraction for dealing with
IOMMUs and such.  There is one usage of the forbidden vtophys() left in
drm_scatter.c which will be fixed up soon.  This required a KPI change for
drm_pci_alloc/free() to return/use a drm_dma_handle_t that keeps track of
os-specific bits, rather than just passing around the vaddr/busaddr/size.

Submitted by:	Tonnerre Lombard (partially)
Tested on:	FreeBSD: Rage128 AGP/PCI
		Linux:	Savage4 AGP/PCI

	/cvs/dri/drm/bsd-core/ati_pcigart.c	1.10	
	/cvs/dri/drm/bsd-core/drmP.h		1.56	
	/cvs/dri/drm/bsd-core/drm_bufs.c	1.35	
	/cvs/dri/drm/bsd-core/drm_dma.c 	1.35	
	/cvs/dri/drm/bsd-core/drm_pci.c 	1.6	
	/cvs/dri/drm/bsd-core/drm_scatter.c	1.11	
	/cvs/dri/drm/linux-core/drmP.h		1.142	
	/cvs/dri/drm/linux-core/drm_bufs.c	1.55	
	/cvs/dri/drm/linux-core/drm_drv.c	1.112	
	/cvs/dri/drm/linux-core/drm_pci.c	1.8	
	/cvs/dri/drm/linux-core/drm_vm.c	1.51	
	/cvs/dri/drm/shared-core/i915_dma.c	1.20	
	/cvs/dri/drm/shared-core/i915_drv.h	1.11	
	/cvs/dri/drm/shared-core/mach64_dma.c	1.11	
	/cvs/dri/drm/shared-core/mach64_drv.h	1.7	

After reverting that patch (attached) X.org shutdown completes normally - no
error messages or oopses.
Comment 11 Sergey Vlasov 2005-05-21 10:36:32 UTC
Created attachment 2733 [details] [review]
patch to fix drm_pci_alloc() so that subsequent drm_pci_free() works

Seems that the problem is that the drm_pci_alloc() implementation for Linux
forgot to set the dmah->size field in the new drm_dma_handle_t structure. 
Because of this, subsequent drm_pci_free() passed bogus size to
pci_free_consistent(), which was leading to lots of "Bad page state" messages,
and subsequently to massive memory corruption resulting in a hang.

Most drivers do not use drm_pci_alloc() and drm_pci_free(), therefore they were
not hit by this problem.  Of course, the DRM core for Linux uses
drm_pci_free(), but it does not pass the drm_dma_handle_t structure from
drm_pci_alloc() to it - instead it fills that structure directly, setting
->size correctly and thus avoiding the bug.

This patch should fix the problem; unfortunately, I cannot get it tested on the
problematic hardware until Monday.
Comment 12 Sergey Vlasov 2005-05-21 10:56:21 UTC
Created attachment 2734 [details] [review]
patch to fix memory leaks in the PCI consistent memory handling code

Apparently the patch mentioned in comment #10, in addition to breaking the i915
driver on Linux, also introduced a memory leak into the PCI consistent memory
handling code for Linux.  There are two problems:

1. drm_pci_alloc() allocates new drm_dma_handle_t structure, but drm_pci_free()
does not free it - this leads to a memory leak in drivers which use
drm_pci_alloc() and drm_pci_free().  (The BSD implementation of drm_pci_free()
_does_ free the drm_dma_handle_t structure passed to it.)

2. drm_addmap() calls drm_pci_alloc(), but does not save the pointer to the
created drm_dma_handle_t structure - instead, it takes the virtual and bus
addresses from it and stores just them; the drm_dma_handle_t structure is
leaked.

The attached patch attempts to fix these problems.

The solution to the first problem is obvious: drm_pci_free() must free the
structure passed to it, so that the Linux and BSD implementation behave in the
same way.  The second problem, however, is worse - the code has calls like

			drm_pci_free(dev, &dmah);

which would obviously break if drm_pci_free() will free its argument.

The real problem is that the Linux DRM core does not have an OS-specific
drm_local_map_t - therefore there is no place to store the drm_dma_handle_t
pointer from drm_pci_alloc() to pass it to drm_pci_free() later.  Introducing
real drm_local_map_t to the Linux DRM core will need major changes, therefore
my patch uses a somewhat hackish approach:

1) drm_addmap() just calls kfree(dmah) on the drm_dma_handle_t pointer from
drm_pci_alloc() to avoid leaking that memory.

2) A new function - __drm_pci_free() - is introduced; this function works just
like drm_pci_free(), but does not free the drm_dma_handle_t structure passed to
it.  This function is used instead of drm_pci_free() in places where the
drm_dma_handle_t structure is constructed on the stack.  Only the DRM core has
such code, therefore __drm_pci_free() does not need to be exported.

This patch is only compile tested now, like the previous one.
Comment 13 Bryan Stine 2005-05-21 23:39:00 UTC
Initial testing (aka starting up a session and then stopping it) of your  
patches shows success for me. I don't see any unusual memory leaks or any such  
stuff. 
Comment 14 Martin Stachon 2005-05-23 04:25:58 UTC
The first patch fixed the bug for me.
Comment 15 Eric Anholt 2005-05-28 13:38:10 UTC
Committed.  Thanks for cleaning up after my mess :/

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.