Bug 111724

Summary: NVE6 (GK106) memory re-clocking breaks GpuTest plot3d benchmark
Product: xorg Reporter: Mark Menzynski <mmenzyns>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: not set    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
mmiotrace log when changing directly from 0x7 to 0xf
none
pmu log from dmesg when changing directly from 0x7 to 0xf
none
pmu log from dmesg with noveau code modified so the values are same as with nvidia driver
none
mmiotrace log when changing from 0x7 to 0xa and then to 0xf, benchmark works in this case
none
pmu log from dmesg when changin from 0x7 to 0xa and then to 0xf, benchmark working
none
mmiotrace log from nvidia driver
none
strap_peek
none
vbios.rom
none
part of dmesg when running plot3d fullscreen
none
screenshot from the glitchy benchmark none

Description Mark Menzynski 2019-09-17 14:31:43 UTC
I have stepped upon a problem with NVE6 (GK106) in GpuTest https://www.geeks3d.com/gputest/ plot3d benchmark that occurs only in plot3d and nowhere else. There are visible glitches and when left for a longer time Nouveau seems to crash.

The GPU has 4 profiles: 
07: core 324 MHz memory 648 MHz
0a: core 324-862 MHz memory 1620 MHz
0d: core 549-1228 MHz memory 6008 MHz
0f: core 549-1228 MHz memory 6008 MHz

The problem occurs when switching re-clocking profile directly from 648 MHz to 6008 MHz skipping the 0xA 1620 MHz profile. If gone through 0xA profile everything works fine.

If the memory re-clocking is disabled, it works fine. If there is 0xF profile set directly (breaking the benchmark) with memory re-clocking enabled, then the nouveau gets unloaded, and nouveau gets loaded back with memory re-clocking disabled, when changing re-clocking profiles it still glitches. Which implies something that breaks this is only touched when the memory re-clocking is enabled.

I have gone through all nouveau pmu scripts traces, checked every difference (of the scripts) with Nvidia driver and nothing seemed to affect this problem that has different values than Nvidia. Actual code which was changing the values for 0xf profile to be same as Nvidia is here: https://github.com/mmenzyns/nouveau/tree/linux-5.2_gk106_memory_issues. The scripts for the highest-profile should be almost identical between Nvidia and Nouveau.
Comment 1 Mark Menzynski 2019-09-17 14:33:31 UTC
Created attachment 145395 [details]
mmiotrace log when changing directly from 0x7 to 0xf
Comment 2 Mark Menzynski 2019-09-17 14:34:00 UTC
Created attachment 145396 [details]
pmu log from dmesg when changing directly from 0x7 to 0xf
Comment 3 Mark Menzynski 2019-09-17 14:34:34 UTC
Created attachment 145397 [details]
pmu log from dmesg with noveau code modified so the values are same as with nvidia driver
Comment 4 Mark Menzynski 2019-09-17 14:35:22 UTC
Created attachment 145398 [details]
mmiotrace log when changing from 0x7 to 0xa and then to 0xf, benchmark works in this case
Comment 5 Mark Menzynski 2019-09-17 14:35:58 UTC
Created attachment 145399 [details]
pmu log from dmesg when changin from 0x7 to 0xa and then to 0xf, benchmark working
Comment 6 Mark Menzynski 2019-09-17 14:36:52 UTC
Created attachment 145400 [details]
mmiotrace log from nvidia driver
Comment 7 Mark Menzynski 2019-09-17 14:37:15 UTC
Created attachment 145401 [details]
strap_peek
Comment 8 Mark Menzynski 2019-09-17 14:37:28 UTC
Created attachment 145402 [details]
vbios.rom
Comment 9 Ilia Mirkin 2019-09-17 14:38:45 UTC
Mark - try using blob ctxsw firmware. Perhaps ours misses something. There are some GK106's which just die immediately with out firmware... (See VideoAcceleration wiki page for how to extract firmware from blob drivers.)
Comment 10 Mark Menzynski 2019-09-17 15:10:18 UTC
Created attachment 145403 [details]
part of dmesg when running plot3d fullscreen
Comment 11 Mark Menzynski 2019-09-17 15:11:07 UTC
Created attachment 145404 [details]
screenshot from the glitchy benchmark
Comment 12 Mark Menzynski 2019-09-17 15:29:55 UTC
(In reply to Ilia Mirkin from comment #9)
> Mark - try using blob ctxsw firmware. Perhaps ours misses something. There
> are some GK106's which just die immediately with out firmware... (See
> VideoAcceleration wiki page for how to extract firmware from blob drivers.)

Doesn't work. Same problem.
Comment 13 Martin Peres 2019-12-04 09:53:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/503.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.