Bug 107402 - [Intel GFX CI][BAT] igt@amdgpu_amd_basic@userptr - incomplete - general protection fault: 0000 [#1] PREEMPT SMP PTI, __mmu_notifier_release
Summary: [Intel GFX CI][BAT] igt@amdgpu_amd_basic@userptr - incomplete - general prote...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-27 12:11 UTC by Martin Peres
Modified: 2018-11-01 17:01 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-07-27 12:11:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4557/fi-kbl-8809g/igt@amdgpu_amd_basic@userptr.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4547/fi-kbl-8809g/igt@amdgpu_amd_basic@userptr.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4529/fi-kbl-8809g/igt@amdgpu_amd_basic@userptr.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4515/fi-kbl-8809g/igt@amdgpu_amd_basic@userptr.html

<4>[  312.687974] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4>[  312.688007] CPU: 4 PID: 4826 Comm: amd_basic Tainted: G     U            4.18.0-rc6-CI-CI_DRM_4557+ #1
<4>[  312.688054] Hardware name: Intel Corporation NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0040.2018.0516.1521 05/16/2018
<4>[  312.688110] RIP: 0010:__mmu_notifier_release+0x4e/0x100
<4>[  312.688136] Code: 31 c0 31 d2 31 f6 48 c7 c7 80 11 26 82 b9 02 00 00 00 41 89 c4 e8 d2 5c ef ff 48 8b bb 58 05 00 00 58 48 8b 1f 48 85 db 74 27 <48> 8b 43 10 48 8b 40 08 48 85 c0 74 0b 4c 89 f6 48 89 df e8 1a 19 
<4>[  312.688271] RSP: 0018:ffffc90000513db0 EFLAGS: 00010202
<4>[  312.688299] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
<4>[  312.688335] RDX: 0000000000000007 RSI: ffffffff8212855b RDI: ffffffff820d77f7
<4>[  312.688371] RBP: ffffc90000513dc8 R08: 00000000d767fbd3 R09: 0000000000000000
<4>[  312.688407] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[  312.688443] R13: ffff8802723957c0 R14: ffff88026e7f1d40 R15: ffff88026e7f1de8
<4>[  312.688479] FS:  00007efe2347c980(0000) GS:ffff88027ed00000(0000) knlGS:0000000000000000
<4>[  312.688520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  312.688550] CR2: 00007efe2126d2f8 CR3: 0000000005210005 CR4: 00000000003606e0
<4>[  312.688585] Call Trace:
<4>[  312.688601]  exit_mmap+0x140/0x1a0
<4>[  312.688623]  mmput+0x5c/0x120
<4>[  312.688639]  do_exit+0x5be/0xd40
<4>[  312.688659]  do_group_exit+0x34/0xb0
<4>[  312.688679]  __x64_sys_exit_group+0xf/0x10
<4>[  312.688702]  do_syscall_64+0x55/0x190
<4>[  312.688723]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  312.688750] RIP: 0033:0x7efe228e6e06
<4>[  312.688768] Code: Bad RIP value.
<4>[  312.688788] RSP: 002b:00007ffd79bf4598 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
<4>[  312.688827] RAX: ffffffffffffffda RBX: 00007efe22be9740 RCX: 00007efe228e6e06
<4>[  312.688863] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
<4>[  312.688899] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
<4>[  312.688936] R10: 00007efe175e90c0 R11: 0000000000000246 R12: 00007efe22be9740
<4>[  312.688972] R13: 0000000000000003 R14: 00007efe22bf2628 R15: 0000000000000000
<4>[  312.689010] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_realtek snd_hda_codec_generic amdgpu x86_pkg_temp_thermal btusb coretemp btrtl btbcm crct10dif_pclmul btintel crc32_pclmul snd_hda_codec_hdmi ghash_clmulni_intel bluetooth snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm ecdh_generic chash gpu_sched igb ttm mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel [last unloaded: i915]
<0>[  313.354059] ---------------------------------
<4>[  313.354866] ---[ end trace bded48d468e3b807 ]---
<4>[  313.541665] RIP: 0010:__mmu_notifier_release+0x4e/0x100
<4>[  313.546806] Code: 31 c0 31 d2 31 f6 48 c7 c7 80 11 26 82 b9 02 00 00 00 41 89 c4 e8 d2 5c ef ff 48 8b bb 58 05 00 00 58 48 8b 1f 48 85 db 74 27 <48> 8b 43 10 48 8b 40 08 48 85 c0 74 0b 4c 89 f6 48 89 df e8 1a 19 
<4>[  313.552656] RSP: 0018:ffffc90000513db0 EFLAGS: 00010202
<4>[  313.557737] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
<4>[  313.562900] RDX: 0000000000000007 RSI: ffffffff8212855b RDI: ffffffff820d77f7
<4>[  313.567959] RBP: ffffc90000513dc8 R08: 00000000d767fbd3 R09: 0000000000000000
<4>[  313.573084] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[  313.578493] R13: ffff8802723957c0 R14: ffff88026e7f1d40 R15: ffff88026e7f1de8
<4>[  313.584442] FS:  00007efe2347c980(0000) GS:ffff88027ec00000(0000) knlGS:0000000000000000
<4>[  313.589789] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  313.595165] CR2: 0000561cc6647ef8 CR3: 0000000005210005 CR4: 00000000003606f0
<1>[  313.600822] Fixing recursive fault but reboot is needed!
Comment 1 Martin Peres 2018-07-27 12:12:25 UTC
Bumping the priority to high since it affects BAT on a new platform.
Comment 2 Chris Wilson 2018-07-27 12:14:02 UTC
Use after free by amdgpu.ko
Comment 3 Martin Peres 2018-07-27 12:17:00 UTC
Thanks Chris, moving the failure to AMDGpu!
Comment 4 Michel Dänzer 2018-07-27 12:21:15 UTC
(In reply to Chris Wilson from comment #2)
> Use after free by amdgpu.ko

How do you know it's a use-after-free? Can you provide more information about that, e.g. from KASAN?
Comment 5 Chris Wilson 2018-07-27 12:23:03 UTC
RBX: 6b6b6b6b6b6b6b6b == POISON_FREE
Comment 6 Martin Peres 2018-07-27 12:29:20 UTC
Our KASAN runs do not currently run AMDGPU tests. This should be fixed by the next run (triggered manually).
Comment 7 Martin Peres 2018-11-01 17:01:14 UTC
Used to happen at least every week, but nothing since IGT_4640 (1 month, 2 weeks / 660 runs ago). Closing!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.