Bug 111724 - NVE6 (GK106) memory re-clocking breaks GpuTest plot3d benchmark
Summary: NVE6 (GK106) memory re-clocking breaks GpuTest plot3d benchmark
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: not set normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-17 14:31 UTC by Mark Menzynski
Modified: 2019-09-17 15:29 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
mmiotrace log when changing directly from 0x7 to 0xf (3.91 MB, application/x-xz)
2019-09-17 14:33 UTC, Mark Menzynski
no flags Details
pmu log from dmesg when changing directly from 0x7 to 0xf (4.86 KB, text/x-log)
2019-09-17 14:34 UTC, Mark Menzynski
no flags Details
pmu log from dmesg with noveau code modified so the values are same as with nvidia driver (5.12 KB, text/x-log)
2019-09-17 14:34 UTC, Mark Menzynski
no flags Details
mmiotrace log when changing from 0x7 to 0xa and then to 0xf, benchmark works in this case (3.81 MB, application/x-xz)
2019-09-17 14:35 UTC, Mark Menzynski
no flags Details
pmu log from dmesg when changin from 0x7 to 0xa and then to 0xf, benchmark working (7.19 KB, text/x-log)
2019-09-17 14:35 UTC, Mark Menzynski
no flags Details
mmiotrace log from nvidia driver (5.50 MB, application/x-xz)
2019-09-17 14:36 UTC, Mark Menzynski
no flags Details
strap_peek (11 bytes, text/plain)
2019-09-17 14:37 UTC, Mark Menzynski
no flags Details
vbios.rom (98.00 KB, application/octet-stream)
2019-09-17 14:37 UTC, Mark Menzynski
no flags Details
part of dmesg when running plot3d fullscreen (2.73 KB, text/plain)
2019-09-17 15:10 UTC, Mark Menzynski
no flags Details
screenshot from the glitchy benchmark (2.35 MB, image/png)
2019-09-17 15:11 UTC, Mark Menzynski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Menzynski 2019-09-17 14:31:43 UTC
I have stepped upon a problem with NVE6 (GK106) in GpuTest https://www.geeks3d.com/gputest/ plot3d benchmark that occurs only in plot3d and nowhere else. There are visible glitches and when left for a longer time Nouveau seems to crash.

The GPU has 4 profiles: 
07: core 324 MHz memory 648 MHz
0a: core 324-862 MHz memory 1620 MHz
0d: core 549-1228 MHz memory 6008 MHz
0f: core 549-1228 MHz memory 6008 MHz

The problem occurs when switching re-clocking profile directly from 648 MHz to 6008 MHz skipping the 0xA 1620 MHz profile. If gone through 0xA profile everything works fine.

If the memory re-clocking is disabled, it works fine. If there is 0xF profile set directly (breaking the benchmark) with memory re-clocking enabled, then the nouveau gets unloaded, and nouveau gets loaded back with memory re-clocking disabled, when changing re-clocking profiles it still glitches. Which implies something that breaks this is only touched when the memory re-clocking is enabled.

I have gone through all nouveau pmu scripts traces, checked every difference (of the scripts) with Nvidia driver and nothing seemed to affect this problem that has different values than Nvidia. Actual code which was changing the values for 0xf profile to be same as Nvidia is here: https://github.com/mmenzyns/nouveau/tree/linux-5.2_gk106_memory_issues. The scripts for the highest-profile should be almost identical between Nvidia and Nouveau.
Comment 1 Mark Menzynski 2019-09-17 14:33:31 UTC
Created attachment 145395 [details]
mmiotrace log when changing directly from 0x7 to 0xf
Comment 2 Mark Menzynski 2019-09-17 14:34:00 UTC
Created attachment 145396 [details]
pmu log from dmesg when changing directly from 0x7 to 0xf
Comment 3 Mark Menzynski 2019-09-17 14:34:34 UTC
Created attachment 145397 [details]
pmu log from dmesg with noveau code modified so the values are same as with nvidia driver
Comment 4 Mark Menzynski 2019-09-17 14:35:22 UTC
Created attachment 145398 [details]
mmiotrace log when changing from 0x7 to 0xa and then to 0xf, benchmark works in this case
Comment 5 Mark Menzynski 2019-09-17 14:35:58 UTC
Created attachment 145399 [details]
pmu log from dmesg when changin from 0x7 to 0xa and then to 0xf, benchmark working
Comment 6 Mark Menzynski 2019-09-17 14:36:52 UTC
Created attachment 145400 [details]
mmiotrace log from nvidia driver
Comment 7 Mark Menzynski 2019-09-17 14:37:15 UTC
Created attachment 145401 [details]
strap_peek
Comment 8 Mark Menzynski 2019-09-17 14:37:28 UTC
Created attachment 145402 [details]
vbios.rom
Comment 9 Ilia Mirkin 2019-09-17 14:38:45 UTC
Mark - try using blob ctxsw firmware. Perhaps ours misses something. There are some GK106's which just die immediately with out firmware... (See VideoAcceleration wiki page for how to extract firmware from blob drivers.)
Comment 10 Mark Menzynski 2019-09-17 15:10:18 UTC
Created attachment 145403 [details]
part of dmesg when running plot3d fullscreen
Comment 11 Mark Menzynski 2019-09-17 15:11:07 UTC
Created attachment 145404 [details]
screenshot from the glitchy benchmark
Comment 12 Mark Menzynski 2019-09-17 15:29:55 UTC
(In reply to Ilia Mirkin from comment #9)
> Mark - try using blob ctxsw firmware. Perhaps ours misses something. There
> are some GK106's which just die immediately with out firmware... (See
> VideoAcceleration wiki page for how to extract firmware from blob drivers.)

Doesn't work. Same problem.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.