Bug 111273 - crash calling AMDGPU_INFO_READ_MMR_REG with count set to -1
Summary: crash calling AMDGPU_INFO_READ_MMR_REG with count set to -1
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-31 19:11 UTC by Trek
Modified: 2019-09-13 22:26 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
possible fix (1.17 KB, patch)
2019-08-31 23:21 UTC, Trek
no flags Details | Splinter Review
possible fix v2 (1.13 KB, patch)
2019-09-01 19:26 UTC, Trek
no flags Details | Splinter Review

Description Trek 2019-07-31 19:11:59 UTC
calling from libdrm_amdgpu
  amdgpu_read_mm_registers(dev, 0x8010 / 4, -1, 0xffffffff, 0, out)
leads to this dump:

WARNING: CPU: 3 PID: 30278 at mm/page_alloc.c:4377 __alloc_pages_nodemask+0x241/0x2b0
CPU: 3 PID: 30278 Comm: radeontop Not tainted 4.19.0-5-amd64 #1 Debian 4.19.37-5+deb10u1
RIP: 0010:__alloc_pages_nodemask+0x241/0x2b0
Code: 89 f7 89 ee 45 31 f6 e8 bd d5 ff ff e9 fb fe ff ff e8 e3 ac 01 00 e9 cb fe ff ff 45 31 f6 81 e7 00 02 00 00 0f 85 e7 fe ff ff <0f> 0b e9 e0 fe ff ff 31 c0 e9 6a fe ff ff 65 48 8b 04 25 40 5c 01
RSP: 0018:ffffb64a01c27a58 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8b4853df0000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000016 RDI: 0000000000000000
RBP: 00000003fffffffc R08: 0000000000000001 R09: ffffffffc0f01ebf
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000006000c0
R13: ffffb64a01c27d98 R14: 0000000000000000 R15: 0000000000000008
FS:  00007fa12fe5f280(0000) GS:ffff8b4856f80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa12f5f45b0 CR3: 000000010e498000 CR4: 00000000000406e0
Call Trace:
 kmalloc_order+0x14/0x30
 kmalloc_order_trace+0x1d/0xa0
 amdgpu_info_ioctl+0x908/0x1290 [amdgpu]
 ? get_page_from_freelist+0x7be/0x11b0
 ? unix_destruct_scm+0x80/0xa0
 ? select_idle_sibling+0x22/0x3a0
 ? kmem_cache_free+0x1a7/0x1d0
 ? free_unref_page_commit+0x91/0x100
 ? amdgpu_firmware_info.isra.5+0x210/0x210 [amdgpu]
 drm_ioctl_kernel+0xa1/0xf0 [drm]
 drm_ioctl+0x206/0x3a0 [drm]
 ? amdgpu_firmware_info.isra.5+0x210/0x210 [amdgpu]
 ? tlb_finish_mmu+0x1f/0x30
 ? unmap_region+0xdd/0x110
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 do_vfs_ioctl+0xa4/0x630
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x53/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa12faa8427
Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc737ffda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005561b8c625b0 RCX: 00007fa12faa8427
RDX: 00007ffc737ffdf0 RSI: 0000000040206445 RDI: 0000000000000003
RBP: 00007ffc737ffdf0 R08: 0000000000000000 R09: 00005561b8c6a950
R10: fffffffffffffd06 R11: 0000000000000246 R12: 0000000040206445
R13: 0000000000000003 R14: 00007ffc7380002b R15: 0000000000000000
---[ end trace e7c99a8c5897d841 ]---

libdrm's amdgpu_read_mm_registers() calls drmCommandWrite(DRM_AMDGPU_INFO) with AMDGPU_INFO_READ_MMR_REG query, that calls kernel's amdgpu_kms.c amdgpu_info_ioctl()

it is not always reproducible, but it seems I can crash it once for each boot

the system is Debian 10 buster amd64 Linux 4.19.37 libdrm 2.4.97 chipset KAVERI

tell me if you need more info
thanks!
Comment 1 Trek 2019-08-31 23:21:54 UTC
Created attachment 145226 [details] [review]
possible fix

The proposed fix is tested on latest git.

I'm unsure if 65536 is a good limit: it could be small as 64, but even if the longest consecutive registers are 48, may be in the future they are increased and no one remember to higher that limit. Anyway it should not be larger than the PCI BAR area for memory mapped registers, that on my KAVERI is 256K, thus 65536 registers.

ciao!
Comment 2 Trek 2019-09-01 19:26:19 UTC
Created attachment 145229 [details] [review]
possible fix v2

Thanks to agd5f_, here the patch with updated limit fixed to 128.
Comment 3 Alex Deucher 2019-09-13 20:41:35 UTC
Applied.  thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.