Bug 35758 - [965Q] repeated EDID checksum errors - screen blanks randomly - kernel oops
Summary: [965Q] repeated EDID checksum errors - screen blanks randomly - kernel oops
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-29 01:02 UTC by joschibrauchle
Modified: 2016-10-07 10:02 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (24.53 KB, text/x-log)
2011-03-29 01:03 UTC, joschibrauchle
no flags Details

Description joschibrauchle 2011-03-29 01:02:15 UTC
While working at the machine, I first get a message in /var/log/messages EVERY 22.5 minutes roughly. The remainder and raw edid block change every time:
-----
Mar 28 18:06:04 test114 kernel: [23757.099468] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 179
Mar 28 18:06:04 test114 kernel: [23757.099470] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Mar 28 18:06:04 test114 kernel: [23757.099473] <3>00 ff ff ff ff ff ff 00 1a b3 52 05 2c d0 07 00  ..........R.,...
Mar 28 18:06:04 test114 kernel: [23757.099475] <3>1b 11 01 03 80 26 1e 78 2a ee 95 a3 54 4c 97 ff  .....&.x*...TL..
Mar 28 18:06:04 test114 kernel: [23757.099477] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 28 18:06:04 test114 kernel: [23757.099479] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 28 18:06:04 test114 kernel: [23757.099480] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 28 18:06:04 test114 kernel: [23757.099482] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 28 18:06:04 test114 kernel: [23757.099484] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 28 18:06:04 test114 kernel: [23757.099486] <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
-----

This is most of the time followed by multiple:
-----
Mar 28 18:06:04 test114 kernel: [23757.164114] i2c i2c-10: sendbytes: NAK bailout.
Mar 28 18:06:14 test114 kernel: [23767.188782] i2c i2c-10: sendbytes: NAK bailout.
Mar 28 18:06:14 test114 kernel: [23767.198106] i2c i2c-10: sendbytes: NAK bailout.
-----

Sometimes the (or in case of multi-head, one of them) monitor blanks and the screen never comes back.

Today I also saw the following kernel oops:
-----
Mar 28 18:06:24 test114 kernel: [23777.387046] ------------[ cut here ]------------
Mar 28 18:06:24 test114 kernel: [23777.388002] kernel BUG at /usr/src/packages/BUILD/kernel-desktop-2.6.37.1/linux-2.6.37/drivers/gpu/drm/i915/i915_gem.c:4201!
Mar 28 18:06:24 test114 kernel: [23777.388002] invalid opcode: 0000 [#1] PREEMPT SMP 
Mar 28 18:06:24 test114 kernel: [23777.388002] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1/status
Mar 28 18:06:24 test114 kernel: [23777.388002] CPU 1 
Mar 28 18:06:24 test114 kernel: [23777.388002] Modules linked in: fuse md5 des_generic cbc vboxnetadp vboxnetflt vboxdrv autofs4 snd_pcm_oss snd_mixer_oss edd rpcsec_gs
s_krb5 nfs lockd fscache nfs_acl auth_rpcgss sunrpc af_packet cpufreq_conservative microcode cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf tg3 floppy container sg sr_mod c
drom iTCO_wdt iTCO_vendor_support fschmd i2c_i801 serio_raw pcspkr snd_hda_codec_realtek tpm_tis tpm snd_hda_intel snd_hda_codec tpm_bios snd_hwdep snd_pcm snd_timer snd shpchp pci
_hotplug soundcore snd_page_alloc ext4 jbd2 crc16 linear i915 drm_kms_helper drm i2c_algo_bit button video dm_snapshot dm_mod fan processor thermal thermal_sys
Mar 28 18:06:24 test114 kernel: [23777.388002] 
Mar 28 18:06:24 test114 kernel: [23777.388002] Pid: 27, comm: kworker/1:1 Not tainted 2.6.37.1-1.2-desktop #1 FUJITSU SIEMENS CELSIUS W                     /D2317-A2
Mar 28 18:06:24 test114 kernel: [23777.388002] RIP: 0010:[<ffffffffa00f77c7>]  [<ffffffffa00f77c7>] i915_gem_object_pin+0x187/0x1b0 [i915]
Mar 28 18:06:24 test114 kernel: [23777.388002] RSP: 0018:ffff88007a2558a0  EFLAGS: 00010246
Mar 28 18:06:24 test114 kernel: [23777.388002] RAX: ffff880077c54000 RBX: ffff880037c83c00 RCX: 0000000000000000
Mar 28 18:06:24 test114 kernel: [23777.388002] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffff880037c83c00
Mar 28 18:06:24 test114 kernel: [23777.388002] RBP: 0000000000020000 R08: ffff88007a254000 R09: 00000000000f73aa
Mar 28 18:06:24 test114 kernel: [23777.388002] R10: 0000000000000001 R11: 00000000ffffffff R12: ffff88003742c000
Mar 28 18:06:24 test114 kernel: [23777.388002] R13: 000000000003c47c R14: 000000000003c000 R15: 0000000000000000
Mar 28 18:06:24 test114 kernel: [23777.388002] FS:  0000000000000000(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
Mar 28 18:06:24 test114 kernel: [23777.388002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 28 18:06:24 test114 kernel: [23777.388002] CR2: 00007fea72395c60 CR3: 000000007973b000 CR4: 00000000000006e0
Mar 28 18:06:24 test114 kernel: [23777.388002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 28 18:06:24 test114 kernel: [23777.388002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 28 18:06:24 test114 kernel: [23777.388002] Process kworker/1:1 (pid: 27, threadinfo ffff88007a254000, task ffff88007a252700)
Mar 28 18:06:24 test114 kernel: [23777.388002] Stack:
Mar 28 18:06:24 test114 kernel: [23777.388002]  ffff880037c83c00 0000000000000000 0000000000000000 0000000000000000
Mar 28 18:06:24 test114 kernel: [23777.388002]  0000000000000000 ffffffffa0104f98 0000000000000000 dead000000200200
Mar 28 18:06:24 test114 kernel: [23777.388002]  0000000101663c8b ffff88007795e800 ffff880077c54000 ffffffffa0105121
Mar 28 18:06:24 test114 kernel: [23777.388002] Call Trace:
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa0104f98>] intel_pin_and_fence_fb_obj+0x48/0x100 [i915]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa0105121>] intel_pipe_set_base+0xd1/0x290 [i915]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa0109028>] intel_crtc_mode_set+0x938/0x1e70 [i915]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d22cd>] drm_crtc_helper_set_mode+0x13d/0x3c0 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d320d>] drm_crtc_helper_set_config+0x83d/0xa00 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d1141>] drm_fb_helper_set_par+0x71/0xe0 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d130e>] drm_fb_helper_single_fb_probe+0x15e/0x2e0 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d10a2>] drm_fb_helper_hotplug_event+0xf2/0x120 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffffa00d2146>] output_poll_execute+0x1a6/0x1b0 [drm_kms_helper]
Mar 28 18:06:24 test114 kernel: [23777.388002]  [<ffffffff81074630>] process_one_work+0x110/0x490
Mar 28 18:06:24 test114 kernel: [23777.455024]  [<ffffffff81075345>] worker_thread+0x165/0x340
Mar 28 18:06:24 test114 kernel: [23777.455024]  [<ffffffff81079956>] kthread+0x96/0xa0
Mar 28 18:06:24 test114 kernel: [23777.455024]  [<ffffffff81003d74>] kernel_thread_helper+0x4/0x10
Mar 28 18:06:24 test114 kernel: [23777.455024] Code: 00 00 00 00 e8 2b a2 ff ff 89 c5 e9 f4 fe ff ff 0f 1f 40 00 89 ee 48 89 df e8 66 ba ff ff 85 c0 0f 84 0e ff ff ff e9 42 ff ff ff <0f> 0b 41 89 e8 48 c7 c2 e0 f7 12 a0 be 72 10 00 00 48 c7 c7 b0 
Mar 28 18:06:24 test114 kernel: [23777.455024] RIP  [<ffffffffa00f77c7>] i915_gem_object_pin+0x187/0x1b0 [i915]
Mar 28 18:06:24 test114 kernel: [23777.455024]  RSP <ffff88007a2558a0>
-----

Here are some further details:
-- chipset: 965Q
-- system architecture: x86_64
-- intel=2.14.0 xserver=1.9.3 mesa=7.10 libdrm=2.4.23-9.1
-- kernel version: 2.6.37.1-1.2-desktop
-- Linux distribution: OpenSuSE 11.4
-- Machine or mobo model: Fujitsu-Siemens Celsius W350
-- Display connector: DMS-59 to dual DVI connector

In order to debug the problem, I set the drm.debug=0x06 flag and got the following out of the logs:
------
Mar 29 09:53:41 test114 kernel: [52202.024240] [drm:intel_sdvo_debug_write], SDVOB: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.028840] [drm:intel_sdvo_read_response], SDVOB: R: (Success) 01 00
Mar 29 09:53:41 test114 kernel: [52202.032242] [drm:intel_sdvo_detect], SDVO response 1 0 [1]
Mar 29 09:53:41 test114 kernel: [52202.032247] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:41 test114 kernel: [52202.038037] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:41 test114 kernel: [52202.086011] [drm:intel_crt_detect], CRT not detected via hotplug
Mar 29 09:53:41 test114 kernel: [52202.086014] [drm:output_poll_execute], [CONNECTOR:5:VGA-1] status updated from 2 to 2
Mar 29 09:53:41 test114 kernel: [52202.086017] [drm:intel_sdvo_debug_write], SDVOB: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.088083] [drm:intel_sdvo_write_cmd], I2c transfer returned -6
Mar 29 09:53:41 test114 kernel: [52202.088086] [drm:output_poll_execute], [CONNECTOR:8:DVI-D-1] status updated from 1 to 3
Mar 29 09:53:41 test114 kernel: [52202.088089] [drm:intel_sdvo_debug_write], SDVOC: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.090152] [drm:intel_sdvo_write_cmd], I2c transfer returned -6
Mar 29 09:53:41 test114 kernel: [52202.090155] [drm:output_poll_execute], [CONNECTOR:10:DVI-D-2] status updated from 2 to 3
Mar 29 09:53:41 test114 kernel: [52202.092102] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 92
Mar 29 09:53:41 test114 kernel: [52202.092104] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Mar 29 09:53:41 test114 kernel: [52202.092107] <3>00 ff ff ff ff ff ff 00 1a b3 52 05 2c d0 07 00  ..........R.,...
Mar 29 09:53:41 test114 kernel: [52202.092109] <3>1b 11 01 03 80 26 1e 78 2a ee 95 a3 54 4c 99 26  .....&.x*...TL.&
Mar 29 09:53:41 test114 kernel: [52202.092111] <3>0f 50 54 a5 4b 00 81 80 01 01 01 01 01 01 01 01  .PT.K...........
Mar 29 09:53:41 test114 kernel: [52202.092113] <3>01 01 01 01 01 01 30 2a 00 98 51 00 2a 40 30 70  ......0*..Q.*@0p
Mar 29 09:53:41 test114 kernel: [52202.092115] <3>13 00 78 2d 11 00 00 1e 00 00 00 fd 00 38 4c 1e  ..x-.........8L.
Mar 29 09:53:41 test114 kernel: [52202.092117] <3>52 0e 00 0a 20 20 20 20 20 20 00 00 00 fc 00 50  R...      .....P
Mar 29 09:53:41 test114 kernel: [52202.092118] <3>31 39 2d 32 0a 20 20 20 20 20 20 20 00 00 00 ff  19-2.       ....
Mar 29 09:53:41 test114 kernel: [52202.092120] <3>01 80 ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
Mar 29 09:53:41 test114 kernel: [52202.092122] 
Mar 29 09:53:41 test114 kernel: [52202.092123] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:41 test114 kernel: [52202.146071] [drm:intel_sdvo_debug_write], SDVOC: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.150644] [drm:intel_sdvo_read_response], SDVOC: R: (Success) 00 00
Mar 29 09:53:41 test114 kernel: [52202.154019] [drm:intel_sdvo_detect], SDVO response 0 0 [1]
Mar 29 09:53:41 test114 kernel: [52202.160037] [drm:intel_crt_detect], CRT not detected via hotplug
Mar 29 09:53:41 test114 kernel: [52202.160436] [drm:intel_sdvo_debug_write], SDVOB: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.165024] [drm:intel_sdvo_read_response], SDVOB: R: (Success) 01 00
Mar 29 09:53:41 test114 kernel: [52202.168397] [drm:intel_sdvo_detect], SDVO response 1 0 [1]
Mar 29 09:53:41 test114 kernel: [52202.168400] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:41 test114 kernel: [52202.174171] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:41 test114 kernel: [52202.228104] [drm:intel_sdvo_debug_write], SDVOC: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:41 test114 kernel: [52202.232676] [drm:intel_sdvo_read_response], SDVOC: R: (Success) 00 00
Mar 29 09:53:41 test114 kernel: [52202.236050] [drm:intel_sdvo_detect], SDVO response 0 0 [1]
Mar 29 09:53:41 test114 kernel: [52202.242037] [drm:intel_crt_detect], CRT not detected via hotplug
Mar 29 09:53:51 test114 kernel: [52212.134016] [drm:intel_crt_detect], CRT not detected via hotplug
Mar 29 09:53:51 test114 kernel: [52212.134022] [drm:output_poll_execute], [CONNECTOR:5:VGA-1] status updated from 2 to 2
Mar 29 09:53:51 test114 kernel: [52212.134028] [drm:intel_sdvo_debug_write], SDVOB: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:51 test114 kernel: [52212.138627] [drm:intel_sdvo_read_response], SDVOB: R: (Success) 01 00
Mar 29 09:53:51 test114 kernel: [52212.142017] [drm:intel_sdvo_detect], SDVO response 1 0 [1]
Mar 29 09:53:51 test114 kernel: [52212.142022] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:51 test114 kernel: [52212.147821] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:51 test114 kernel: [52212.201919] [drm:output_poll_execute], [CONNECTOR:8:DVI-D-1] status updated from 3 to 1
Mar 29 09:53:51 test114 kernel: [52212.201924] [drm:intel_sdvo_debug_write], SDVOC: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:51 test114 kernel: [52212.206513] [drm:intel_sdvo_read_response], SDVOC: R: (Success) 00 00
Mar 29 09:53:51 test114 kernel: [52212.209905] [drm:intel_sdvo_detect], SDVO response 0 0 [1]
Mar 29 09:53:51 test114 kernel: [52212.209909] [drm:output_poll_execute], [CONNECTOR:10:DVI-D-2] status updated from 3 to 2
Mar 29 09:53:51 test114 kernel: [52212.211953] [drm:intel_sdvo_debug_write], SDVOB: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:51 test114 kernel: [52212.216525] [drm:intel_sdvo_read_response], SDVOB: R: (Success) 01 00
Mar 29 09:53:51 test114 kernel: [52212.219900] [drm:intel_sdvo_detect], SDVO response 1 0 [1]
Mar 29 09:53:51 test114 kernel: [52212.219903] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:51 test114 kernel: [52212.225675] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 02                      (SDVO_CMD_SET_CONTROL_BUS_SWITCH)
Mar 29 09:53:51 test114 kernel: [52212.279601] [drm:intel_sdvo_debug_write], SDVOC: W: 0B                         (SDVO_CMD_GET_ATTACHED_DISPLAYS)
Mar 29 09:53:51 test114 kernel: [52212.284181] [drm:intel_sdvo_read_response], SDVOC: R: (Success) 00 00
Mar 29 09:53:51 test114 kernel: [52212.287553] [drm:intel_sdvo_detect], SDVO response 0 0 [1]
Mar 29 09:53:51 test114 kernel: [52212.293038] [drm:intel_crt_detect], CRT not detected via hotplug
------
Comment 1 joschibrauchle 2011-03-29 01:03:01 UTC
Created attachment 44984 [details]
Xorg log
Comment 2 Chris Wilson 2011-03-29 01:22:44 UTC
Both of these should have been addressed in 2.6.38+:

commit 007c80a5497a3f9c8393960ec6e6efd30955dcb1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 15 11:40:00 2011 +0000

    drm: Hold the mode mutex whilst probing for sysfs status
    
    As detect will use hw registers and may modify structures, it needs to be
    serialised by use of the dev->mode_config.mutex. Make it so.
    
    Otherwise, we may cause random crashes as the sysfs file is queried
    whilst a concurrent hotplug poll is being run.

and

commit 4819d2e4310796c4e9eef674499af9b9caf36b5a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 15 11:04:41 2011 +0000

    drm: Retry i2c transfer of EDID block after failure
    
    Usually EDID retrieval is fine. However, sometimes, especially when the
    machine is loaded, it fails, but succeeds after a few retries.
    
    Based on a patch by Michael Buesch.
    
    Reported-by: Michael Buesch <mb@bu3sch.de>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
Comment 3 Jari Tahvanainen 2016-10-07 10:02:59 UTC
Closing a really old resolved bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.