Bug 98311

Summary: Bus error (core dumped) 8192x8192 JPEG decode
Product: libva Reporter: U. Artie Eoff <ullysses.a.eoff>
Component: intelAssignee: haihao <haihao.xiang>
Status: ASSIGNED --- QA Contact: Sean V Kelley <seanvk>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: gdb trace w/drm_intel_bufmgr debug on
gdb trace with drm bufmgr debug on
additional gdb data
attachment-31931-0.html
attachment-32256-0.html

Description U. Artie Eoff 2016-10-18 18:56:01 UTC
Decoding a 8192x8192 resolution JPEG image sometimes triggers a SIGBUS bus error (core dump).

This can be reproduced with `./test_i965_drv_video --gtest_filter=Big/JPEGEncodeInputTest.Full/2` (i.e. 8192x8192 UYVY test case)  sometimes.

Dmesg does not report any errors even with drm.debug=0xe kernel parameter.

See attachments for some gdb trace details.

Environment
-----------
SKL (Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz)
Kernel 4.5.0-302.fc24.x86_64
drm (master) libdrm-2.4.68-0-gfc09c5ab8424 
mesa (master) heads/master-0-ga1e49be71360 
libva (master) heads/master-0-g3b7e4999950a 
intel-driver (master) heads/master-0-ge748bc7f0565
Comment 1 U. Artie Eoff 2016-10-18 18:57:00 UTC
Comment on attachment 127383 [details]
gdb trace w/drm_intel_bufmgr debug on

Starting program: /home/uartie/Work/workspace/media/build/intel-driver/test/test_i965_drv_video --gtest_filter=Big/JPEGEncodeInputTest.Full/2
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.23.1-5.fc24.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Note: Google Test filter = Big/JPEGEncodeInputTest.Full/2
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
libva info: VA-API version 0.39.4
libva info: va_getDriverName() returns 0
libva info: User requested driver 'i965'
libva info: Trying to open /home/uartie/Work/workspace/media/build/intel-driver/src/.libs/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_39
bo_create: buf 1 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=0, open=1, limit=-1
bo_map: 1 (batch buffer), map_count=1
bo_map: 1 (batch buffer) -> 0x7ffff7f51000
bo_create: buf 2 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=0, open=2, limit=-1
bo_map: 2 (batch buffer), map_count=1
bo_map: 2 (batch buffer) -> 0x7ffff7ed1000
bo_create: buf 3 (kernel shader) 65328b
drm_intel_gem_bo_purge_vma_cache: cached=0, open=3, limit=-1
bo_map: 3 (kernel shader), map_count=1
bo_map: 3 (kernel shader) -> 0x7ffff7fe6000
drm_intel_gem_bo_purge_vma_cache: cached=1, open=2, limit=-1
bo_create: buf 4 (kernel shader) 5696b
drm_intel_gem_bo_purge_vma_cache: cached=1, open=3, limit=-1
bo_map: 4 (kernel shader), map_count=1
bo_map: 4 (kernel shader) -> 0x7ffff7fe4000
drm_intel_gem_bo_purge_vma_cache: cached=2, open=2, limit=-1
libva info: va_openDriver() returns 0
[----------] 1 test from Big/JPEGEncodeInputTest
[ RUN      ] Big/JPEGEncodeInputTest.Full/2
bo_create: buf 5 (vaapi surface) 134217728b
bo_create: buf 6 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=2, open=3, limit=-1
bo_map: 6 (batch buffer), map_count=1
bo_map: 6 (batch buffer) -> 0x7ffff7e51000
bo_create: buf 7 (kernel shader) 1744b
drm_intel_gem_bo_purge_vma_cache: cached=2, open=4, limit=-1
bo_map: 7 (kernel shader), map_count=1
bo_map: 7 (kernel shader) -> 0x7ffff7fe3000
drm_intel_gem_bo_purge_vma_cache: cached=3, open=3, limit=-1
bo_create: buf 8 (Buffer) 268460032b
drm_intel_gem_bo_purge_vma_cache: cached=3, open=4, limit=-1
bo_map: 8 (Buffer), map_count=1
bo_map: 8 (Buffer) -> 0x7fffdde7c000
drm_intel_gem_bo_purge_vma_cache: cached=4, open=3, limit=-1
drm_intel_gem_bo_purge_vma_cache: cached=4, open=4, limit=-1
bo_map_gtt: mmap 5 (vaapi surface), map_count=1
bo_map_gtt: 5 (vaapi surface) -> 0x7fffd5e7c000
drm_intel_gem_bo_purge_vma_cache: cached=5, open=3, limit=-1
bo_create: buf 9 (Buffer) 32768b
bo_create: buf 10 (Buffer) 4194304b
bo_create: buf 11 (Buffer) 131072b
bo_create: buf 12 (Buffer) 65536b
bo_create: buf 13 (batch buffer) 4194304b
drm_intel_gem_bo_purge_vma_cache: cached=5, open=4, limit=-1
bo_map: 13 (batch buffer), map_count=1
bo_map: 13 (batch buffer) -> 0x7fffd5a7c000
bo_create: buf 14 (surface state & binding table) 2312b
bo_create: buf 15 (surface state & binding table) 1344b
drm_intel_gem_bo_purge_vma_cache: cached=4, open=5, limit=-1
bo_map: 8 (Buffer) -> 0x7fffdde7c000
drm_intel_gem_bo_purge_vma_cache: cached=5, open=4, limit=-1
drm_intel_gem_bo_purge_vma_cache: cached=6, open=3, limit=-1
BO 5 (vaapi surface) migrated: 0x00000000 00000000 -> 0x00000000 f7fff000
BO 10 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 f7bff000
BO 9 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 f7bf7000
BO 11 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 f7bd7000
BO 8 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 e7bd1000
BO 12 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 e7bc1000
BO 6 (batch buffer) migrated: 0x00000000 00000000 -> 0x00000000 e7b41000
 0: 5 (vaapi surface)
 1: 10 (Buffer)
 2: 9 (Buffer)
 3: 11 (Buffer)
 4: 8 (Buffer)
 5: 12 (Buffer)
 6: 6 (batch buffer)@0x00000000 00000058 -> 5 (vaapi surface)@0x00000000 f7fff000 + 0x00000000
 6: 6 (batch buffer)@0x00000000 00000064 -> 10 (Buffer)@0x00000000 f7bff000 + 0x00000000
 6: 6 (batch buffer)@0x00000000 00000070 -> 9 (Buffer)@0x00000000 f7bf7000 + 0x00000000
 6: 6 (batch buffer)@0x00000000 0000007c -> 11 (Buffer)@0x00000000 f7bd7000 + 0x00000000
 6: 6 (batch buffer)@0x00000000 0000010c -> 10 (Buffer)@0x00000000 f7bff000 + 0x00000000
 6: 6 (batch buffer)@0x00000000 00000184 -> 8 (Buffer)@0x00000000 e7bd1000 + 0x00001000
 6: 6 (batch buffer)@0x00000000 00000190 -> 8 (Buffer)@0x00000000 e7bd1000 + 0x10005000
 6: 6 (batch buffer)@0x00000000 0000019c -> 12 (Buffer)@0x00000000 e7bc1000 + 0x00000000
bo_unreference final: 6 (batch buffer)
bo_create: buf 16 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=6, open=4, limit=-1
bo_map: 16 (batch buffer), map_count=1
bo_map: 16 (batch buffer) -> 0x7fffd59fc000
drm_intel_gem_bo_purge_vma_cache: cached=5, open=5, limit=-1
bo_map: 8 (Buffer) -> 0x7fffdde7c000
drm_intel_gem_bo_purge_vma_cache: cached=6, open=4, limit=-1
bo_create: buf 6 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=5, open=5, limit=-1
bo_map: 6 (batch buffer) -> 0x7ffff7e51000
bo_create: buf 17 (Buffer) 207060624b
bo_create: buf 18 (vaapi surface) 201326592b
drm_intel_gem_bo_purge_vma_cache: cached=6, open=4, limit=-1
BO 18 (vaapi surface) migrated: 0x00000000 00000000 -> 0x00000000 dbb41000
BO 17 (Buffer) migrated: 0x00000000 00000000 -> 0x00000000 cf5c9000
 0: 18 (vaapi surface)
 1: 17 (Buffer)
 2: 6 (batch buffer)@0x00000000 00000040 -> 18 (vaapi surface)@0x00000000 dbb41000 + 0x00000000
 2: 6 (batch buffer)@0x00000000 00000218 -> 17 (Buffer)@0x00000000 cf5c9000 + 0x00000000
 2: 6 (batch buffer)@0x00000000 00000428 -> 17 (Buffer)@0x00000000 cf5c9000 + 0x00000000
bo_unreference final: 6 (batch buffer)
bo_create: buf 19 (batch buffer) 524288b
drm_intel_gem_bo_purge_vma_cache: cached=6, open=5, limit=-1
bo_map: 19 (batch buffer), map_count=1
bo_map: 19 (batch buffer) -> 0x7fffb4e8b000
drm_intel_gem_bo_purge_vma_cache: cached=6, open=6, limit=-1
bo_map_gtt: mmap 18 (vaapi surface), map_count=1
bo_map_gtt: 18 (vaapi surface) -> 0x7fffa8e8b000

Program received signal SIGBUS, Bus error.
0x000000000044eb58 in JPEG::Encode::JPEGEncodeInputTest::VerifyOutput()::{lambda(unsigned char const&, unsigned char const&)#1}::operator()(unsigned char const&, unsigned char const&) const (__closure=0x7fffffffc5b0, a=@0x7fffa8e8b000: <error reading variable>, b=@0x7fffc1483010: 64 '@') at i965_jpeg_encode_test.cpp:404
404	            return std::abs(int(a)-int(b)) <= 2;
Missing separate debuginfos, use: dnf debuginfo-install libgcc-6.1.1-2.fc24.x86_64 libpciaccess-0.13.4-3.fc24.x86_64 libstdc++-6.1.1-2.fc24.x86_64
>>> bt full
#0  0x000000000044eb58 in JPEG::Encode::JPEGEncodeInputTest::VerifyOutput()::{lambda(unsigned char const&, unsigned char const&)#1}::operator()(unsigned char const&, unsigned char const&) const (__closure=0x7fffffffc5b0, a=@0x7fffa8e8b000: <error reading variable>, b=@0x7fffc1483010: 64 '@') at i965_jpeg_encode_test.cpp:404
No locals.
#1  0x0000000000452345 in std::equal<unsigned char const*, unsigned char const*, JPEG::Encode::JPEGEncodeInputTest::VerifyOutput()::{lambda(unsigned char const&, unsigned char const&)#1}>(unsigned char const*, JPEG::Encode::JPEGEncodeInputTest::VerifyOutput()::{lambda(unsigned char const&, unsigned char const&)#1}, unsigned char const*, JPEG::Encode::JPEGEncodeInputTest::VerifyOutput()::{lambda(unsigned char const&, unsigned char const&)#1}) (__first1=0x7fffa8e8b000 <error: Cannot access memory at address 0x7fffa8e8b000>, __last1=0x7fffa8e8d000 <error: Cannot access memory at address 0x7fffa8e8d000>, __first2=0x7fffc1483010 "@,%\247\272\t\030\347]\203v\221\322J\220/Ϋ\031\223TN\215ـ\350j.\262\364r\261\351\331v\313pG\004\356:q\267\270\337}\315`}D\364ݏN\224\003\026\333\350\234%\366v|\243;\340\031\226\366k'w\213)0\372\237X\345\326V\222\256k\216<?i\343\352\361\250\246\003^+\bY\315\337\302.H\360VЌ݆\302n\272\363\320[A\035\024b\206Qqn\274\231\017\025v.Rk\203\250-\022\061\257\310\026\314H*\037\204\aB\245\036\334I\243_\306V\352D+\305-v&JG\213I%\212\004\311\003f\370\215\367r\266\254\243p\230\237RQ\371\273݇\276\371\347\336\300\070\230\356\375", <incomplete sequence \302>, __binary_pred=...) at /usr/include/c++/6.1.1/bits/stl_algobase.h:1083
No locals.
#2  0x00000000004507a3 in JPEG::Encode::JPEGEncodeInputTest::VerifyOutput (this=0x9cd820) at i965_jpeg_encode_test.cpp:415
        gtest_ar_ = {
          success_ = 160, 
          message_ = {
            ptr_ = 0x7fffffffc7c8
          }
        }
        r = 0
        w = 8192
        h = 8192
        source = 0x7fffc1483010 "@,%\247\272\t\030\347]\203v\221\322J\220/Ϋ\031\223TN\215ـ\350j.\262\364r\261\351\331v\313pG\004\356:q\267\270\337}\315`}D\364ݏN\224\003\026\333\350\234%\366v|\243;\340\031\226\366k'w\213)0\372\237X\345\326V\222\256k\216<?i\343\352\361\250\246\003^+\bY\315\337\302.H\360VЌ݆\302n\272\363\320[A\035\024b\206Qqn\274\231\017\025v.Rk\203\250-\022\061\257\310\026\314H*\037\204\aB\245\036\334I\243_\306V\352D+\305-v&JG\213I%\212\004\311\003f\370\215\367r\266\254\243p\230\237RQ\371\273݇\276\371\347\336\300\070\230\356\375", <incomplete sequence \302>
        result = 0x7fffa8e8b000 <error: Cannot access memory at address 0x7fffa8e8b000>
        i = 0
        image = {
          image_id = 167772160, 
          format = {
            fourcc = 1211249204, 
            byte_order = 1, 
            bits_per_pixel = 16, 
            depth = 0, 
            red_mask = 0, 
            green_mask = 0, 
            blue_mask = 0, 
            alpha_mask = 0
          }, 
          buf = 134217740, 
          width = 8192, 
          height = 8192, 
          data_size = 201326592, 
          num_planes = 3, 
          pitches = {[0] = 8192, [1] = 8192, [2] = 8192}, 
          offsets = {[0] = 0, [1] = 67108864, [2] = 134217728}, 
          num_palette_entries = 0, 
          entry_bytes = 0, 
          component_order = "\000\000\000"
        }
        data = 0x7fffa8e8b000 <error: Cannot access memory at address 0x7fffa8e8b000>
        isClose = {<No data fields>}
        oconfig = 16777217
        ocontext = 33554433
        buffers = std::vector of length 5, capacity 8 = {[0] = 134217735, [1] = 134217736, [2] = 134217737, [3] = 134217738, [4] = 134217739}
        osurfaces = std::vector of length 1, capacity 1 = {[0] = 67108865}
        attribs = std::vector of length 1, capacity 1 = {[0] = {
            type = VAConfigAttribRTFormat, 
            value = 2
          }}
        expect = std::shared_ptr (count 1, weak 1) 0x9cfc80
        pd = std::shared_ptr (count 1, weak 0) 0x9cff80
#3  0x00000000004485e0 in JPEG::Encode::JPEGEncodeInputTest_Full_Test::TestBody (this=0x9cd820) at i965_jpeg_encode_test.cpp:457
        i965 = 0x9c2900
#4  0x00000000005645be in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x9cd820, method=&virtual testing::Test::TestBody(), location=0x6e1bdb "the test body") at ../test/gtest/src/gtest.cc:2402
No locals.
#5  0x000000000055f350 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x9cd820, method=&virtual testing::Test::TestBody(), location=0x6e1bdb "the test body") at ../test/gtest/src/gtest.cc:2438
No locals.
#6  0x000000000054564c in testing::Test::Run (this=0x9cd820) at ../test/gtest/src/gtest.cc:2475
        impl = 0x954e00
#7  0x0000000000545ebc in testing::TestInfo::Run (this=0x990260) at ../test/gtest/src/gtest.cc:2656
        impl = 0x954e00
        repeater = 0x955000
        start = 1476816869608
        test = 0x9cd820
#8  0x00000000005464fd in testing::TestCase::Run (this=0x9802d0) at ../test/gtest/src/gtest.cc:2774
        i = 2
        impl = 0x954e00
        repeater = 0x955000
        start = 1476816869608
#9  0x000000000054d07b in testing::internal::UnitTestImpl::RunAllTests (this=0x954e00) at ../test/gtest/src/gtest.cc:4649
        test_index = 12
        start = 1476816869589
        i = 0
        in_subprocess_for_death_test = false
        should_shard = false
        has_tests_to_run = true
        failed = false
        repeater = 0x955000
        repeat = 1
        forever = false
#10 0x000000000056536f in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x954e00, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x54cdb8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x6e2418 "auxiliary test code (environments or event listeners)") at ../test/gtest/src/gtest.cc:2402
No locals.
#11 0x000000000055ff58 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x954e00, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x54cdb8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x6e2418 "auxiliary test code (environments or event listeners)") at ../test/gtest/src/gtest.cc:2438
No locals.
#12 0x000000000054bd7d in testing::UnitTest::Run (this=0x9423e0 <testing::UnitTest::GetInstance()::instance>) at ../test/gtest/src/gtest.cc:4257
        in_death_test_child_process = false
        premature_exit_file = {
          premature_exit_filepath_ = 0x0
        }
#13 0x0000000000468fc5 in RUN_ALL_TESTS () at ../test/gtest/include/gtest/gtest.h:2233
No locals.
#14 0x0000000000468f3a in main (argc=1, argv=0x7fffffffd808) at test_main.cpp:35
No locals.
>>>
Comment 2 U. Artie Eoff 2016-10-18 18:58:42 UTC
Created attachment 127384 [details]
gdb trace with drm bufmgr debug on
Comment 3 U. Artie Eoff 2016-10-18 19:02:25 UTC
Created attachment 127388 [details]
additional gdb data
Comment 4 U. Artie Eoff 2016-10-18 19:08:04 UTC
The bus error is triggered when the test tries to access the data pointer returned by i965_MapBuffer of the derived image.
Comment 5 U. Artie Eoff 2016-10-18 21:07:58 UTC
AFAICT, we are failing to read from the drm mmap'd region... mmap documentation says:

       Use of a mapped region can result in these signals:

       SIGSEGV
              Attempted write into a region mapped as read-only.

       SIGBUS Attempted access to a portion of the buffer that does not
              correspond to the file (for example, beyond the end of the
              file, including the case where another process has truncated
              the file).

If I'm reading it correctly and since the mmap call does not fail, something must have truncated it before we try to use it?  But what/how?
Comment 6 haihao 2016-10-19 13:52:09 UTC
Is the buffer tiled?
Comment 7 U. Artie Eoff 2016-10-19 13:55:35 UTC
(In reply to haihao from comment #6)
> Is the buffer tiled?

dri_bo_get_tiling in i965_MapBuffer returns 2 for tiling
Comment 8 U. Artie Eoff 2016-10-19 14:24:20 UTC
(In reply to U. Artie Eoff from comment #7)
> (In reply to haihao from comment #6)
> > Is the buffer tiled?
> 
> dri_bo_get_tiling in i965_MapBuffer returns 2 for tiling

Furthermore, the returned [mmap'd] buffer comes from intel_bufmgr_gem.c:map_gtt(...)
Comment 9 U. Artie Eoff 2016-10-20 16:20:27 UTC
When I execute from within a graphical environment, I always encounter this issue.  I am using gnome for my graphical env.

However, if I vt-switch into a non-graphical console, I *do not* encounter this issue when executing from there.
Comment 10 haihao 2016-10-24 04:56:12 UTC
I can't reproduce it in the graphics env. I ran the test case 100 times without any issue.
Comment 11 haihao 2016-10-24 05:10:38 UTC
Please you check the aperture size in your system

$> sudo cat /sys/kernel/debug/dri/0/i915_gem_objects | grep gtt

map_gtt() uses the aperture space and a tiled 8192x8192 NV12 surface is 96M. I guess your system has a small aperture space.
Comment 12 haihao 2016-10-24 05:12:36 UTC
Or you can try a new linux kernel, I am using 4.8.0-rc8+.
Comment 13 U. Artie Eoff 2016-10-24 17:54:27 UTC
$ sudo cat /sys/kernel/debug/dri/0/i915_gem_objects | grep gtt
264 [57] objects, 35753984 [35753984] bytes in gtt
4294967296 [268435456] gtt total
Comment 14 U. Artie Eoff 2016-10-24 19:07:46 UTC
This is the state of i915_gem_objects when the SIGBUS occurs on my system...

518 objects, 1447014400 bytes
311 [26] objects, 168669184 [168669184] bytes in gtt
  0 [0] active objects, 0 [0] bytes
  26 [26] inactive objects, 168669184 [168669184] bytes
15 unbound objects, 50593792 bytes
19 purgeable objects, 19304448 bytes
2 pinned mappable objects, 16683008 bytes
4 fault mappable objects, 134746112 bytes
4294967296 [268435456] gtt total

systemd-logind: 56 objects, 105865216 bytes (0 active, 88657920 inactive, 524288 global, 67125248 shared, 1130496 unbound)
Xorg: 404 objects, 263962624 bytes (0 active, 150650880 inactive, 4096 global, 67121152 shared, 74235904 unbound)
test_i965_drv_v: 48 objects, 1135419392 bytes (0 active, 1129111552 inactive, 134217728 global, 0 shared, 0 unbound)
Comment 15 haihao 2016-10-25 04:44:18 UTC
I can always reproduce this issue with kernel 4.5.0-rc4+ with/without graphical env. The issue is gone after switching kernel to 4.8.0-rc8+. 

Could you try a recent kernel to see the issue is still there? I will mark the bug as resolved if the test case works fine with a recent kernel.
Comment 16 U. Artie Eoff 2016-10-25 20:44:07 UTC
I tried kernel 4.8.0-rc8 and the issue is still there.  I also encounter this issue with and without a graphical session on this kernel.
Comment 17 Sean V Kelley 2016-10-25 23:07:35 UTC
Yes, I was going to say I've been using 4.8 on Arch and the issue is always there.

Sean
Comment 18 U. Artie Eoff 2016-10-25 23:37:06 UTC
(In reply to Sean V Kelley from comment #17)
> Yes, I was going to say I've been using 4.8 on Arch and the issue is always
> there.
> 
> Sean

My last result was from upstream kernel at tag v4.8-rc8, which is not v4.8-rc8+ as Haihao tested.  @Haihao, can you be more specific about which commit id you are on with v4.8-rc8+?  @Sean, is your kernel a 4.8 "final" or an "rc" release?

I'm trying 4.9.0-rc2+ (i.e. 9fe68cad6e74) right now, and so far I am not seeing the issue with this version after ~100 executions.
Comment 19 haihao 2016-10-26 00:45:23 UTC
I am using drm-intel-nightly 

commit aab15c274da587bcab19376d2caa9d6626440335
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Sep 26 15:11:53 2016 +0300

    drm-intel-nightly: 2016y-09m-26d-12h-11m-33s UTC integration manifest
Comment 20 Sean V Kelley 2016-10-26 06:46:19 UTC
Created attachment 127549 [details]
attachment-31931-0.html

That’s all good and well, but for this testing you really should use a stable kernel.  That’s what our customers are using.

Sean
> On 25 DFómh 2016, at 17:45, bugzilla-daemon@freedesktop.org wrote:
> 
> 
> Comment # 19 <https://bugs.freedesktop.org/show_bug.cgi?id=98311#c19> on bug 98311 <https://bugs.freedesktop.org/show_bug.cgi?id=98311> from haihao <mailto:haihao.xiang@intel.com>
> I am using drm-intel-nightly 
> 
> commit aab15c274da587bcab19376d2caa9d6626440335
> Author: Jani Nikula <jani.nikula@intel.com <mailto:jani.nikula@intel.com>>
> Date:   Mon Sep 26 15:11:53 2016 +0300
> 
>     drm-intel-nightly: 2016y-09m-26d-12h-11m-33s UTC integration manifest
> 
> You are receiving this mail because:
> You are the QA Contact for the bug.
Comment 21 Sean V Kelley 2016-10-26 06:50:41 UTC
Created attachment 127550 [details]
attachment-32256-0.html

I’d prefer not to see a chase-the-kernel-version and hope-it-goes-away, like to know the root cause.  :-)

Arch’s kernel generally is tracking releases and not rcs. 4.8.4-1 

And yes I do still see the issue.

https://www.archlinux.org/packages/core/x86_64/linux/ <https://www.archlinux.org/packages/core/x86_64/linux/>

Thanks,

Sean


> On 25 DFómh 2016, at 16:37, bugzilla-daemon@freedesktop.org wrote:
> 
> 
> Comment # 18 <https://bugs.freedesktop.org/show_bug.cgi?id=98311#c18> on bug 98311 <https://bugs.freedesktop.org/show_bug.cgi?id=98311> from U. Artie Eoff <mailto:ullysses.a.eoff@intel.com>
> (In reply to Sean V Kelley from comment #17 <x-msg://6/show_bug.cgi?id=98311#c17>)
> > Yes, I was going to say I've been using 4.8 on Arch and the issue is always
> > there.
> > 
> > Sean
> 
> My last result was from upstream kernel at tag v4.8-rc8, which is not v4.8-rc8+
> as Haihao tested.  @Haihao, can you be more specific about which commit id you
> are on with v4.8-rc8+?  @Sean, is your kernel a 4.8 "final" or an "rc" release?
> 
> I'm trying 4.9.0-rc2+ (i.e. 9fe68cad6e74) right now, and so far I am not seeing
> the issue with this version after ~100 executions.
> 
> You are receiving this mail because:
> You are the QA Contact for the bug.
Comment 22 haihao 2016-10-26 08:52:49 UTC
It is hard for us too find the root cause if we don't look into the i915. If we want to know the root cause, I think a better way is find a good commit and a bad commit, then do bisect. 

In addition, we can make sure it is not a driver issue if the test case works well with a recent kernel.
Comment 23 U. Artie Eoff 2016-10-26 15:42:43 UTC
I continuously ran the test overnight with kernel.org 4.9.0-rc2+ (i.e. 9fe68cad6e74) and have not encountered this issue.

Yes, I agree.  We should understand the root-cause even if it's kernel related and not a driver issue.  That way, if this issue shows up again in the future, we can save time by investigating the direct cause.  So far, it appears kernel related.  Or maybe it's a compatibility issue with userspace libdrm and kernel version?

I am not sure what the chronological relationship is between drm-intel-nightly and kernel.org.  So if you're going to bisect i915, use kernel.org since there is a good and bad there.  Unfortunately, 4.8.4 is the latest mainstream stable release which Sean confirmed is bad.  And other distros are on slightly older versions.  So if we can understand the cause, perhaps we can come up with a driver solution to WA it for now.
Comment 24 Sean V Kelley 2016-11-02 17:32:44 UTC
Yes, the most important thing is that we need to be able to isolate the root cause as we can hardly promote the framework if we don't know the cause for the errors.  It is good to bisect the kernel and map that to libdrm versions.  I get very wary of using non stable kernels in this testing.  Try to start from the Quarterly release BKC.

Sean

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.