Bug 109808 - ROCm OpenCL segfaults on drm-next-5.1-wip
Summary: ROCm OpenCL segfaults on drm-next-5.1-wip
Status: RESOLVED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-01 21:19 UTC by bmilreu
Modified: 2019-03-04 15:30 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
attachment-9297-0.html (3.16 KB, text/html)
2019-03-04 09:01 UTC, Michael Eagle
no flags Details

Description bmilreu 2019-03-01 21:19:15 UTC
rocm clinfo segfaults with this on dmesg:

mar 01 15:27:36 mjb kernel: kfd2kgd: init_user_pages: Failed to register MMU notifier: -19


Reverting: 
	drm/amdgpu: use HMM callback to replace mmu notifier
	drm/amdgpu: replace get_user_pages with HMM mirror helpers
        drm/amdkfd: avoid HMM change cause circular lock
        drm/amdgpu: use HMM callback to replace mmu notifier

makes it work again, something might be wrong with those related commits.
Comment 1 Philip Yang 2019-03-03 18:39:46 UTC
Error code -19 means NODEV, please check if the running kernel enables the kernel config option CONFIG_ZONE_DEVICE (read kernel config from file /proc/config.gz).

init_user_pages return -NODEV if userptr support is not enabled, userptr support depends on kernel option CONFIG_HMM_MIRROR/CONFIG_HMM, which depends on kernel option CONFIG_ZONE_DEVICE.

CONFIG_ZONE_DEVICE, CONFIG_HMM is by default ON, but kernel config file may not select to set CONFIG_ZONE_DEVICE if the kernel config file is from old kernel.

If userptr support is not enabled, then clinfo and KFD usreptr support will return error -19.

Please correct kernel config file by adding CONFIG_ZONE_DEVICE=y
Comment 2 bmilreu 2019-03-03 20:36:56 UTC
Thanks for the answer, it was indeed a previous outdated config. Tested with CONFIG_ZONE_DEVICE=y and issue is gone, closing this.
Comment 3 Michael Eagle 2019-03-04 09:01:10 UTC
Created attachment 143518 [details]
attachment-9297-0.html

Hi Philip,
I was wondering. Is it possible so that the message would be either more
descriptive so that user is informed about this or the kernel config to be
modified to automatically satisfy dependencies?

On Sun, Mar 3, 2019 at 10:36 PM <bugzilla-daemon@freedesktop.org> wrote:

> bmilreu@gmail.com changed bug 109808
> <https://bugs.freedesktop.org/show_bug.cgi?id=109808>
> What Removed Added
> Resolution --- NOTABUG
> Status NEW RESOLVED
>
> *Comment # 2 <https://bugs.freedesktop.org/show_bug.cgi?id=109808#c2> on
> bug 109808 <https://bugs.freedesktop.org/show_bug.cgi?id=109808> from
> bmilreu@gmail.com <bmilreu@gmail.com> *
>
> Thanks for the answer, it was indeed a previous outdated config. Tested with
> CONFIG_ZONE_DEVICE=y and issue is gone, closing this.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Comment 4 Philip Yang 2019-03-04 15:30:38 UTC
I will change the error message for this specific case to mention the missing kernel config option.

I cannot add select ZONE_DEVICE in driver Kconfig file because there will be a circular dependency issue. The old or wired kernel config file may select to don't enable HMM or ZONE_DEVICE.

Thanks,
Philip


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.