Bug 101460 - [GLK] general protection fault in i915_gem_object_info
Summary: [GLK] general protection fault in i915_gem_object_info
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-16 00:35 UTC by Abhay Kumar
Modified: 2017-06-30 21:35 UTC (History)
1 user (show)

See Also:
i915 platform: GLK
i915 features: GEM/Other


Attachments
crash log. (32.08 KB, text/plain)
2017-06-16 00:35 UTC, Abhay Kumar
no flags Details
dmesg.log (168.87 KB, text/plain)
2017-06-27 18:39 UTC, Humberto Israel Perez Rodriguez
no flags Details

Description Abhay Kumar 2017-06-16 00:35:10 UTC
Created attachment 131990 [details]
crash log.

[    6.992374] Bluetooth: L2CAP socket layer initialized
[    6.992386] Bluetooth: SCO socket layer initialized
[   22.000173] general protection fault: 0000 [#1] PREEMPT SMP
[   22.000181] Modules linked in: uinput acpi_als kfifo_buf industrialio bluetooth ecdh_generic lzo zram fuse cfg80211(O) compat(O) ip6table_filter asix usbnet mii
[   22.000202] CPU: 1 PID: 733 Comm: chrome Tainted: G     U     O    4.12.0-rc4-cros-be-ga02eede86890-dirty #1
[   22.000205] Hardware name: Intel glkrvp/glkrvp, BIOS Intel_glkrvp.9623.0.2017_06_09_1457 06/09/2017
[   22.000208] task: ffffa27fb7ac4100 task.stack: ffffae7001044000
[   22.000216] RIP: 0010:per_file_stats+0x6a/0xc3
[   22.000219] RSP: 0018:ffffae7001047c88 EFLAGS: 00010287
[   22.000222] RAX: deacffffffffff20 RBX: ffffa27fa5a51e60 RCX: dead000000000100
[   22.000224] RDX: ffffae7001047d18 RSI: ffffa27fb79317d0 RDI: ffffa27fb7b71500
[   22.000226] RBP: ffffae7001047c88 R08: ffffa27fb7b713a8 R09: 0000000000000000
[   22.000229] R10: ffffa27fba14c4e0 R11: 00000000000000c0 R12: ffffa27fb66d8c38
[   22.000231] R13: ffffffff8ebf1f08 R14: ffffae7001047d18 R15: ffffa27fba148108
[   22.000234] FS:  00007f5251d8c780(0000) GS:ffffa27fbfc80000(0000) knlGS:0000000000000000
[   22.000236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.000239] CR2: 00002eb770bd7000 CR3: 000000017794e000 CR4: 00000000003406e0
[   22.000241] Call Trace:
[   22.000248]  idr_for_each+0x4a/0xd1
[   22.000252]  i915_gem_object_info+0x28c/0x36e
[   22.000258]  seq_read+0x1a9/0x38d
[   22.000264]  full_proxy_read+0x5c/0x8b
[   22.000269]  __vfs_read+0x35/0xc0
[   22.000273]  ? fsnotify_perm+0x64/0x6f
[   22.000276]  ? security_file_permission+0x3b/0x42
[   22.000280]  vfs_read+0xa9/0xc5
[   22.000283]  SyS_read+_fastpath+0x13/0x94
[   22.000292] RIP: 0033:0x7f52521ddf0x5f/0xa3
[   22.000289]  entry_SYSCALL_644d
[   22.000295] RSP: 002b:00007ffdcf82d400 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[   22.000298] RAX: ffffffffffffffda RBX: 00002eb76f1bcd80 RCX: 00007f52521ddf4d
[   22.000300] RDX: 0000000000010000 RSI: 00002eb770bd7000 RDI: 0000000000000072
[   22.000302] RBP: 00007ffdcf82d440 R08: 00007f5251d8c780 R09: 00002eb770bd7000
[   22.000304] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[   22.000306] R13: 0000000000000000 R14: 0000000000000463 R15: ffffffffffffffff
[   22.000309] Code: 48 8b 86 d8 00 00 00 48 01 42 28 48 8b 86 10 01 00 00 48 81 c6 10 01 00 00 48 2d e0 01 00 00 48 8d 88 e0 01 00 00 48 39 f1 74 55 <f6> 80 98 00 00 00 01 74 3d f6 80 e1 00 00 00 01 74 0a 48 8b 48 
[   22.000349] RIP: per_file_stats+0x6a/0xc3 RSP: ffffae7001047c88
[   22.000352] ---[ end trace 9d32ae44854cdd18 ]---
[   22.003475] Kernel panic - not syncing: Fatal exception
[   22.003499] Kernel Offset: 0xd800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   22.006554] ACPI MEMORY or I/O RESET_REG.
Comment 1 Abhay Kumar 2017-06-16 04:40:13 UTC
This happens after 500 cycles of cold reboot.
Comment 2 Elizabeth 2017-06-16 22:02:08 UTC
Hello Abhay, could you please attach dmesg with parameter drm.debug=0xe and kern.log? Is the problem 100% reproducible? Could you, if possible, add more information about software and hardware environment? Thank you.
Comment 3 Chris Wilson 2017-06-16 22:04:54 UTC
(In reply to elizabethx.de.la.torre.mena from comment #2)
> Hello Abhay, could you please attach dmesg with parameter drm.debug=0xe and
> kern.log? Is the problem 100% reproducible? Could you, if possible, add more
> information about software and hardware environment? Thank you.

There's no need. The oops is completely sufficient.
Comment 4 Abhay Kumar 2017-06-16 22:07:30 UTC
We had similar kinda crash long back https://bugs.freedesktop.org/show_bug.cgi?id=81712

here looks like we are lock up in fire permission.
Comment 5 Chris Wilson 2017-06-21 09:48:44 UTC
commit 0caf81b5c53d9bd332a95dbcb44db8de0b397a7c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 17 12:57:44 2017 +0100

    drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
    
    As we walk the obj->vma_list in per_file_stats(), we need to hold
    struct_mutex to prevent alteration of that list.
Comment 6 Humberto Israel Perez Rodriguez 2017-06-27 18:38:45 UTC
(In reply to Chris Wilson from comment #5)
> commit 0caf81b5c53d9bd332a95dbcb44db8de0b397a7c
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sat Jun 17 12:57:44 2017 +0100
> 
>     drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
>     
>     As we walk the obj->vma_list in per_file_stats(), we need to hold
>     struct_mutex to prevent alteration of that list.

Hi

after test this kernel commit and send some cold reboots to my GLK i noticed that i got some relevant kernel messages


kern  :emerg : [Sun Dec  4 14:33:10 2016] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408
kern  :emerg : [Sun Dec  4 14:33:10 2016] mce: [Hardware Error]: TSC 0 ADDR fef4c9a0
kern  :emerg : [Sun Dec  4 14:33:10 2016] mce: [Hardware Error]: PROCESSOR 0:706a0 TIME 0 SOCKET 0 APIC 0 microcode 1c



kern  :err   : [Sun Dec  4 14:33:10 2016] ACPI Error: Invalid type (RegionField) for target of Scope operator [SSP2] (Cannot override) (20170303/dswload-273)
kern  :err   : [Sun Dec  4 14:33:10 2016] ACPI Exception: AE_AML_OPERAND_TYPE, During name lookup/catalog (20170303/psobject-241)
kern  :err   : [Sun Dec  4 14:33:10 2016] ACPI Exception: AE_AML_OPERAND_TYPE, (SSDT: RVPRtd3) while loading table (20170303/tbxfload-228)
kern  :err   : [Sun Dec  4 14:33:10 2016] ACPI Error: 1 table load failures, 11 successful (20170303/tbxfload-246)
kern  :err   : [Sun Dec  4 14:33:19 2016] uvesafb: Getting VBE info block failed (eax=0x4f00, err=1)
kern  :err   : [Sun Dec  4 14:33:19 2016] uvesafb: vbe_init() failed with -22
kern  :err   : [Sun Dec  4 14:33:19 2016] atkbd serio0: Failed to deactivate keyboard on isa0060/serio0
kern  :err   : [Sun Dec  4 14:33:20 2016] atkbd serio0: Failed to enable keyboard on isa0060/serio0


please see the dmesg.log attached

i am not sure if this is the same failure for this bug, i will be waiting for a response if i should to create another bug for this or not.

BTW, i tested with latest drm-intel and it was the same failure

commit bf26e1dbbba24a7697559f1131d4be99747b7646
Author:     Martin Peres <martin.peres@linux.intel.com>
AuthorDate: Tue Jun 27 16:59:42 2017 +0300
Commit:     Martin Peres <martin.peres@linux.intel.com>
CommitDate: Tue Jun 27 16:59:42 2017 +0300

    drm-tip: 2017y-06m-27d-13h-59m-07s UTC integration manifest
Comment 7 Humberto Israel Perez Rodriguez 2017-06-27 18:39:05 UTC
Created attachment 132287 [details]
dmesg.log
Comment 8 Chris Wilson 2017-06-27 18:47:46 UTC
The MCE is just that, acpi wouldn't be acpi without an failure and the circular locking bug is nothing to do with i915.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.