Summary: | Invalid data in error state | ||
---|---|---|---|
Product: | DRI | Reporter: | Lionel Landwerlin <lionel.g.landwerlin> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED MOVED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | dmitry.v.rogozhkin, francesco.balestrieri, intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | Triaged, ReadyForDev | ||
i915 platform: | KBL, SKL | i915 features: | GEM/Other |
Description
Lionel Landwerlin
2018-08-26 11:10:12 UTC
Reproduced using import { gem, gem_ioctls } from '../lib/i915/gem-ioctls.js'; import { drm } from '../lib/drm.js'; const ioctl = igt.os.ioctl; const print = igt.toy.print; const i915 = drm.cards()[0]; var fd; fd = igt.os.open(drm.sysfs(i915) + '/' + 'error'); igt.os.write(fd, Uint32Array.of(0)); igt.os.close(fd); const reloc_sz = gem_ioctls.s_execbuf_reloc.__pack_size__; const reloc = new ArrayBuffer(2 * reloc_sz); const batch = gem.create(i915, 4096); let cs = new Uint32Array(gem.mmap(i915, batch, 4096, 1)); gem.set_domain(i915, batch, 0x80, 0x80); cs[0] = 0x20 << 23 | 2; ioctl.pack(new DataView(reloc, 0, reloc_sz), gem_ioctls.s_execbuf_reloc, new Map([ [ "target_handle", batch ], [ "delta", 4096 - 4 ], [ "offset", 4 ], ])); cs[1] = 4096 - 4; cs[2] = 0; cs[3] = 1; cs[16] = 0x5 << 23; cs[17] = 0x31 << 23 | 1 << 8 | 1; ioctl.pack(new DataView(reloc, reloc_sz, reloc_sz), gem_ioctls.s_execbuf_reloc, new Map([ [ "target_handle", batch ], [ "delta", 64 ], [ "offset", 4 * 17 ], ])); cs[18] = 64; cs[19] = 0; const handle = gem.create(i915, 4096 * 65536); let data = new Uint32Array(gem.mmap(i915, handle, 4096 * 65536)); for (let page = 0; page < 65536; page++) { for (let x = 0; x < 4096/4; x++) { data[page * 4096/4 + x] = page << 16 | x; } } const sz = gem_ioctls.s_execbuf_object.__pack_size__; const objects = new ArrayBuffer(sz * 2); ioctl.pack(new DataView(objects, 0, sz), gem_ioctls.s_execbuf_object, new Map([ [ "handle", handle ], [ "flags", 1 << 7 ], ])); ioctl.pack(new DataView(objects, sz, sz), gem_ioctls.s_execbuf_object, new Map([ [ "handle", batch ], [ "relocation_count", 2 ], [ "relocs_ptr", reloc ], ])); gem.execbuf(i915, ioctl.pack(undefined, gem_ioctls.s_execbuf, new Map([ [ "buffers_ptr", objects], [ "buffer_count", 2 ] ]))); igt.assert(gem.busy(i915, batch)); while (!cs[1023] && gem.busy(i915, batch)) ; fd = igt.os.open(drm.debugfs(i915) + '/' + 'i915_wedged'); igt.os.write(fd, Uint32Array.of(0xffffffff)); igt.os.close(fd); fd = igt.os.open(drm.sysfs(i915) + '/' + 'error'); const re = /--- user =/; let line; do { line = igt.os.readLine(fd) } while (line && !re.test(line)); let error = igt.os.readLine(fd); igt.os.close(fd); let out = igt.ascii85.decode(error, error[0] == ':', 1); for (var page = 0; page < 65536; page++) { let a = new Uint32Array(out, 4096*page, 4096/4); for (var i = 0; i < 4096/4; i++) { if (a[i] != page << 16 | i) { print(`Error detected on page:${page}, byte:${4*i}`); print(new Uint32Array(out, 4096*page + 4*(i&-32), 64)); exit(1); } } } For science! As a further bit of information, sometimes data isn't zero, it's actually other bits of some buffer or other. I've seen several dumps where bits of the first page of the batch end up showing up in the second page of the batch. The oddity is that the corruption came from the misplaced reloc (overwriting the BB_START itself). As a tight spin loop, no errors crop up in the capture. At least in this simple case of checking 65,536 pages, the corruption was an artifact of the hang. Jason, Lionel, are you still seeing the problem? There have been changes in the capture system and we are wondering whether they helped with this issue. (In reply to Francesco Balestrieri from comment #4) > Jason, Lionel, are you still seeing the problem? There have been changes in > the capture system and we are wondering whether they helped with this issue. I'll have a look. In the meantime if you want to check, you can try with drv_hangman test and check the error state produced. #assessing i915_hangman --r error-state-capture-rcs0 --debug 2>&1 | tee ~/tmp/107691.log With latest drm-tip and igt, running above, I do not see an 'rcs0 --- HW context' section in the output/log. I do see data in 'rcs0 --- NULL context' though, looks ascii. Is there a way to interpret the data in this context? Is this a new change in the capture system to show just the default state? (In reply to umesh from comment #6) > #assessing > > i915_hangman --r error-state-capture-rcs0 --debug 2>&1 | tee ~/tmp/107691.log > > With latest drm-tip and igt, running above, I do not see an 'rcs0 --- HW > context' section in the output/log. I do see data in 'rcs0 --- NULL context' > though, looks ascii. Is there a way to interpret the data in this context? > Is this a new change in the capture system to show just the default state? I usually look at it with the mesa aubinator_error_decode tool. Use the --all-bb option to decode the context image (other it just decodes the batches). -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/137. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.