I have stepped upon a problem with NVE6 (GK106) in GpuTest https://www.geeks3d.com/gputest/ plot3d benchmark that occurs only in plot3d and nowhere else. There are visible glitches and when left for a longer time Nouveau seems to crash.
The GPU has 4 profiles:
07: core 324 MHz memory 648 MHz
0a: core 324-862 MHz memory 1620 MHz
0d: core 549-1228 MHz memory 6008 MHz
0f: core 549-1228 MHz memory 6008 MHz
The problem occurs when switching re-clocking profile directly from 648 MHz to 6008 MHz skipping the 0xA 1620 MHz profile. If gone through 0xA profile everything works fine.
If the memory re-clocking is disabled, it works fine. If there is 0xF profile set directly (breaking the benchmark) with memory re-clocking enabled, then the nouveau gets unloaded, and nouveau gets loaded back with memory re-clocking disabled, when changing re-clocking profiles it still glitches. Which implies something that breaks this is only touched when the memory re-clocking is enabled.
I have gone through all nouveau pmu scripts traces, checked every difference (of the scripts) with Nvidia driver and nothing seemed to affect this problem that has different values than Nvidia. Actual code which was changing the values for 0xf profile to be same as Nvidia is here: https://github.com/mmenzyns/nouveau/tree/linux-5.2_gk106_memory_issues. The scripts for the highest-profile should be almost identical between Nvidia and Nouveau.
Created attachment 145395 [details]
mmiotrace log when changing directly from 0x7 to 0xf
Created attachment 145396 [details]
pmu log from dmesg when changing directly from 0x7 to 0xf
Created attachment 145397 [details]
pmu log from dmesg with noveau code modified so the values are same as with nvidia driver
Created attachment 145398 [details]
mmiotrace log when changing from 0x7 to 0xa and then to 0xf, benchmark works in this case
Created attachment 145399 [details]
pmu log from dmesg when changin from 0x7 to 0xa and then to 0xf, benchmark working
Created attachment 145400 [details]
mmiotrace log from nvidia driver
Created attachment 145401 [details]
Created attachment 145402 [details]
Mark - try using blob ctxsw firmware. Perhaps ours misses something. There are some GK106's which just die immediately with out firmware... (See VideoAcceleration wiki page for how to extract firmware from blob drivers.)
Created attachment 145403 [details]
part of dmesg when running plot3d fullscreen
Created attachment 145404 [details]
screenshot from the glitchy benchmark
(In reply to Ilia Mirkin from comment #9)
> Mark - try using blob ctxsw firmware. Perhaps ours misses something. There
> are some GK106's which just die immediately with out firmware... (See
> VideoAcceleration wiki page for how to extract firmware from blob drivers.)
Doesn't work. Same problem.