108850 – i915 "kernel NULL pointer dereference" in kernel-4.18.20

Bug 108850 - i915 "kernel NULL pointer dereference" in kernel-4.18.20

Summary: i915 "kernel NULL pointer dereference" in kernel-4.18.20

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	Triaged
Keywords:

Duplicates (1):	108984 (view as bug list)
Depends on:
Blocks:

Reported:	2018-11-23 18:43 UTC by William
Modified:	2019-01-11 10:08 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments
/var/log/dmesg (5.27 KB, text/plain) 2018-11-23 18:43 UTC, William	no flags	Details
View All

Description William 2018-11-23 18:43:45 UTC

Created attachment 142599 [details]
/var/log/dmesg

On boot up with kernel-4.18.2 the screen freezes with the 
last line on the console showing  :-
  "fb: switching to inteldrmfb from VESA VGA"

The rest of the boot continues with partitions mounted, etc, and
dmesg saved to log file.

Was ok with kernel-4.18.16 (have not tried the ones in between)
and is also ok with 4.19.3 and 4.14.82.

This is repeatable, though on one occasion I got and extra line
on the console (of normal usb related information).

------------------------------------
Hardware:
  Asus P7H55-M SI with Intel Core i5 processor
  and Integrated Graphics [i915 driver].

Distro: Slackware-14.2, but with newer kernel.
------------------------------------

Will attach dmesg extract including trace.

I appreciate 4.18.y is going EOL so this is all a bit academic, 
but I am reporting it in case it affects anything else.

Comment 1 Chris Wilson 2018-11-23 20:43:42 UTC

Missing backport of

commit d9d117e40d4ffc03438177eeac83d96dfeee76be
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jun 11 18:18:25 2018 +0100

    drm/i915/ringbuffer: Serialize load of PD_DIR
    
    After triggering the mm switch with a load of PD_DIR, which may be
    deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
    with that load by posting a read of PD_DIR (or else those subsequent
    commands may access the stale page tables).
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180611171825.13678-2-c
hris@chris-wilson.co.uk

or at least the active ingredient of always allocating the engine->scratch.

Comment 2 Lakshmi 2018-11-26 06:36:59 UTC

William, Can you apply the above patch and try if you still see the same message in log?

Comment 3 William 2018-11-26 11:09:41 UTC

I am not a developer and don't know my way around the "store" so it would
help if you would add the patch as an attachment here  - I can then give it
a try.

Comment 4 Lakshmi 2018-11-26 13:36:26 UTC

(In reply to William from comment #3)
> I am not a developer and don't know my way around the "store" so it would
> help if you would add the patch as an attachment here  - I can then give it
> a try.
Here is the path.

https://cgit.freedesktop.org/drm-tip/patch/?id=d9d117e40d4ffc03438177eeac83d96dfeee76be

Comment 5 William 2018-11-26 21:17:14 UTC

I first tried patch with options  --dry-run -p1 -i  and got the following :-

  checking file drivers/gpu/drm/i915/intel_engine_cs.c
  checking file drivers/gpu/drm/i915/intel_ringbuffer.c
  Hunk #1 succeeded at 1357 (offset -4 lines).
  Hunk #2 FAILED at 1389.
  Hunk #3 succeeded at 1427 with fuzz 1 (offset -29 lines).
  Hunk #4 FAILED at 1658.
  Hunk #5 succeeded at 2128 (offset -50 lines).
  2 out of 5 hunks FAILED
  checking file drivers/gpu/drm/i915/intel_ringbuffer.h
  Hunk #1 succeeded at 865 (offset -4 lines).

So I assume the patch is for a different base version of intel_ringbuffer.c

Comment 6 Chris Wilson 2018-12-04 16:03:13 UTC

commit 5179749925933575a67f9d8f16d0cc204f98a29f (HEAD -> drm-intel-next-queued, 
drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 4 14:15:16 2018 +0000

    drm/i915: Allocate a common scratch page
    
    Currently we allocate a scratch page for each engine, but since we only
    ever write into it for post-sync operations, it is not exposed to
    userspace nor do we care for coherency. As we then do not care about its
    contents, we can use one page for all, reducing our allocations and
    avoid complications by not assuming per-engine isolation.
    
    For later use, it simplifies engine initialisation (by removing the
    allocation that required struct_mutex!) and means that we can always rely
    on there being a scratch page.
    
    v2: Check that we allocated a large enough scratch for I830 w/a
    
    Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: <stable@vger.kernel.org> # v4.18.20+

should apply to v4.18 (and be auto-backported) and prevent the NULL deref.

Comment 7 Chris Wilson 2018-12-09 10:17:29 UTC

*** Bug 108984 has been marked as a duplicate of this bug. ***

Comment 8 Francesco Balestrieri 2018-12-28 08:40:35 UTC

Reporter, did this fix your issue?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.