Bug 99274 - map_gtt reading/writing data mismatch during the GPU reset
Summary: map_gtt reading/writing data mismatch during the GPU reset
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-04 13:38 UTC by Zhipeng Gong
Modified: 2017-07-24 22:39 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
test case (3.78 KB, text/x-csrc)
2017-01-04 13:38 UTC, Zhipeng Gong
no flags Details
ftrace file (166.14 KB, text/plain)
2017-01-04 13:40 UTC, Zhipeng Gong
no flags Details

Description Zhipeng Gong 2017-01-04 13:38:46 UTC
Created attachment 128752 [details]
test case

The issue happens when the app calls map_gtt to reads/writes the data to the memory, right then the KMD starts call i915_reset to reset the GPU.
The fence registers are reinited during GPU reset, while the CPU is reading/writing the data through GTT aperture space and fence register at the same time.
As a result, the data is inconsistent.
Comment 1 Zhipeng Gong 2017-01-04 13:40:51 UTC
Created attachment 128753 [details]
ftrace file
Comment 2 Zhipeng Gong 2017-01-04 13:44:36 UTC
The issue might be because i915_gem_restore_fences calls i965_write_fence_reg, which writes 0 to fence register during the GPU reset
Comment 3 yann 2017-01-04 14:55:05 UTC
Zhipeng Gong, can you try Chris' patchset: https://patchwork.freedesktop.org/series/17498/
Comment 4 Chris Wilson 2017-01-04 20:24:40 UTC
commit b1ed35d9179bc42c5ac7b86548cfae589be17b3e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 4 14:51:10 2017 +0000

    drm/i915: Revoke fenced GTT mmapings across GPU reset
    
    The fence registers are clobbered by a GPU reset. If there is concurrent
    user access to a fenced region via a GTT mmaping, the access will not be
    fenced during the reset (until we restore the fences afterwards). In order
    to prevent invalid access during the reset, before we clobber the fences
    first we must invalidate the GTT mmapings. Access to the mmap will then
    be forced to fault in the page, and in handling the fault, i915_gem_fault()
    will take the struct_mutex and wait upon the reset to complete.
    
    v2: Fix up commentary.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
    Testcase: igt/gem_mmap_gtt/hang
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 5 Zhipeng Gong 2017-01-05 01:38:56 UTC
I have verified the patch locally, it can work, Thanks for the patch.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.