Bug 107319

Summary: [amdgpu] [4.14+] [patch] amdgpu uses raw rlc_hdr values, causing kernel OOPS on big endian architectures
Product: DRI Reporter: A. Wilcox <awilfox>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium    
Version: XOrg git   
Hardware: PowerPC   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
drm/amdgpu: use processed values for counting none

Description A. Wilcox 2018-07-21 05:24:34 UTC
Created attachment 140751 [details] [review]
drm/amdgpu: use processed values for counting

[    8.143396] Unable to handle kernel paging request for data at address 0xc00800000615b000
[    8.143396] Faulting instruction address: 0xc008000005f063c8
[    8.143399] Oops: Kernel access of bad area, sig: 11 [#1]
[    8.143429] BE SMP NR_CPUS=256 NUMA PowerNV
[    8.143461] Modules linked in: binfmt_misc amdgpu(+) ast ttm drm_kms_helper sysimgblt syscopyarea sysfillrect fb_sys_fops drm joydev mac_hid tg3 ipmi_powernv ipmi_msghandler agpgart i2c_algo_bit shpchp
[    8.143615] CPU: 0 PID: 2402 Comm: kworker/0:3 Not tainted 4.14.48-mc8-easy #1
[    8.143679] Workqueue: events .work_for_cpu_fn
[    8.143728] task: c0000003e7fd8000 task.stack: c0000003e7fe0000
[    8.143783] NIP:  c008000005f063c8 LR: c008000005f06388 CTR: c00000000027efd0
[    8.143869] REGS: c0000003e7fe3430 TRAP: 0300   Not tainted  (4.14.48-mc8-easy)
[    8.143950] MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28002444  XER: 20040000
[    8.144040] CFAR: c008000005f063d0 DAR: c00800000615b000 DSISR: 40000000 SOFTE: 1
[    8.144787] NIP [c008000005f063c8] .gfx_v8_0_sw_init+0x5a8/0x15b0 [amdgpu]
[    8.144911] LR [c008000005f06388] .gfx_v8_0_sw_init+0x568/0x15b0 [amdgpu]
[    8.144976] Call Trace:
[    8.145049] [c0000003e7fe36b0] [c008000005f06388] .gfx_v8_0_sw_init+0x568/0x15b0 [amdgpu] (unreliable)
[    8.145194] [c0000003e7fe37c0] [c008000005e484c4] .amdgpu_device_init+0xf34/0x1750 [amdgpu]
[    8.145302] [c0000003e7fe38f0] [c008000005e4a94c] .amdgpu_driver_load_kms+0x9c/0x2a0 [amdgpu]
[    8.145420] [c0000003e7fe3980] [c008000004f4d200] .drm_dev_register+0x1c0/0x250 [drm]
[    8.145548] [c0000003e7fe3a30] [c008000005e416c4] .amdgpu_pci_probe+0x164/0x1a0 [amdgpu]
[    8.145601] [c0000003e7fe3ac0] [c000000000618ed0] .local_pci_probe+0x60/0x130
[    8.145683] [c0000003e7fe3b60] [c0000000000e8780] .work_for_cpu_fn+0x30/0x50
[    8.145774] [c0000003e7fe3be0] [c0000000000ecea8] .process_one_work+0x2a8/0x550
[    8.145854] [c0000003e7fe3c80] [c0000000000ed430] .worker_thread+0x2e0/0x600
[    8.145924] [c0000003e7fe3d70] [c0000000000f5338] .kthread+0x158/0x1a0
[    8.145973] [c0000003e7fe3e30] [c00000000000bd4c] .ret_from_kernel_thread+0x58/0x8c
[    8.146054] Instruction dump:
[    8.146089] 38db004c 39200000 7d00342c 7d08da14 815b0048 554af0be 7f8a4840 409d004c 792a1764 e8ff46f0 39290001 79290020 <7cc8542c> 7cc7512e 4bffffd8 60000000
[    8.146252] ---[ end trace 8b0ede048bbb20ae ]---

adev->gfx.rlc in gfx_v8_0_init_microcode has the values from rlc_hdr already processed by le32_to_cpu.  Using the rlc_hdr values on big-endian machines causes a kernel Oops due to writing well outside of the array (0x24000000 instead of 0x24).  gfx_v9_0 had the same issue and was fixed in the same manner (but was not tested locally; I do not have a v9 card).
Comment 1 Michel Dänzer 2018-07-23 08:30:26 UTC
Please send patches like this directly to the amd-gfx mailing list for review.
Comment 2 Martin Peres 2019-11-19 08:45:04 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/459.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.