Bug 99068 - Screen hangs when running a 3D app. GTX 660
Summary: Screen hangs when running a 3D app. GTX 660
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-13 01:30 UTC by Nicolás Luciano Bértolo
Modified: 2019-09-18 20:44 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
journalctl output (257.42 KB, text/plain)
2016-12-13 01:30 UTC, Nicolás Luciano Bértolo
Details
dmesg output (69.47 KB, text/plain)
2016-12-13 01:31 UTC, Nicolás Luciano Bértolo
Details
mmiotrace (2.20 KB, text/x-log)
2016-12-16 00:02 UTC, Nicolás Luciano Bértolo
Details
complete mmiotrace (2.42 MB, application/x-xz)
2016-12-16 00:57 UTC, Nicolás Luciano Bértolo
Details

Description Nicolás Luciano Bértolo 2016-12-13 01:30:21 UTC
Created attachment 128446 [details]
journalctl output

I am running the Fedora 25 Wayland Live system, but the bug is also reproducible using other distros and Xorg.

When running a 3D app the screen hangs, but the rest of the system keeps working fine.

I can reproduce it like this:
1) Boot a live system.
2) Launch firefox
3) Start a webgl app and run it for some time. It should hang after a while.

I attach the output of "dmesg" and that of "journalctl -b".
Comment 1 Nicolás Luciano Bértolo 2016-12-13 01:31:09 UTC
Created attachment 128447 [details]
dmesg output
Comment 2 Ilia Mirkin 2016-12-13 03:21:53 UTC
I wonder if this is the same issue as https://bugs.freedesktop.org/show_bug.cgi?id=93629 (and several others). There's something problematic about GTX 660's.
Comment 3 Nicolás Luciano Bértolo 2016-12-13 22:31:55 UTC
I seems similar to https://bugs.freedesktop.org/show_bug.cgi?id=99037 too.

Is there anything I can help you with?
I can run any tests you may want.

Thanks for your hard work.
Comment 4 Ilia Mirkin 2016-12-13 22:38:33 UTC
(In reply to Nicolás Luciano Bértolo from comment #3)
> I seems similar to https://bugs.freedesktop.org/show_bug.cgi?id=99037 too.

That's a very different GPU with very different issues.

> 
> Is there anything I can help you with?
> I can run any tests you may want.
> 
> Thanks for your hard work.

You should try the blob firmware and see if that fixes things for you. It helps some but not others. The main developer who tends to deal with these issues hasn't been able to reproduce any issues on his GTX 660, unfortunately.
Comment 5 Nicolás Luciano Bértolo 2016-12-14 21:00:20 UTC
I ran the mmiotrace, extracted the files using the script provided, renamed them to their correct name and rebuilt the initramfs.

I can't get it to load the firmware. This is the error message I get:

[    4.908199] fb: switching to nouveaufb from EFI VGA
[    4.908585] nouveau 0000:01:00.0: NVIDIA GK106 (0e6000a1)
[    5.042663] nouveau 0000:01:00.0: bios: version 80.06.10.00.3d
[    5.043341] nouveau 0000:01:00.0: loading /lib/firmware/nvidia/gk106/fecs_inst.bin failed with error -22
[    5.043342] nouveau 0000:01:00.0: Direct firmware load for nvidia/gk106/fecs_inst.bin failed with error -22
[    5.043343] nouveau 0000:01:00.0: gr: failed to load fecs_inst
[    5.043978] nouveau 0000:01:00.0: fb: 2048 MiB GDDR5
[    5.105387] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB
[    5.105388] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    5.105391] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    5.105392] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    5.105394] nouveau 0000:01:00.0: DRM: DCB outp 00: 02000f00 00000000
[    5.105396] nouveau 0000:01:00.0: DRM: DCB outp 01: 01000f02 00020030
[    5.105397] nouveau 0000:01:00.0: DRM: DCB outp 03: 02011f62 00020010
[    5.105398] nouveau 0000:01:00.0: DRM: DCB outp 04: 04822fb6 0f420010
[    5.105399] nouveau 0000:01:00.0: DRM: DCB outp 05: 04022f72 00020010
[    5.105400] nouveau 0000:01:00.0: DRM: DCB outp 06: 08033f82 00020030
[    5.105401] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[    5.105402] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
[    5.105403] nouveau 0000:01:00.0: DRM: DCB conn 02: 00020246
[    5.105404] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000331
[    5.148299] nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
[    5.199540] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 001940 [ !ENGINE ]
[    5.468949] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff8db261921c00
[    5.469019] fbcon: nouveaufb (fb0) is primary device
[    6.062618] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    6.099542] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
Comment 6 Ilia Mirkin 2016-12-14 21:45:39 UTC
(In reply to Nicolás Luciano Bértolo from comment #5)
> I ran the mmiotrace, extracted the files using the script provided, renamed
> them to their correct name and rebuilt the initramfs.
> 
> I can't get it to load the firmware. This is the error message I get:
> 
> [    4.908199] fb: switching to nouveaufb from EFI VGA
> [    4.908585] nouveau 0000:01:00.0: NVIDIA GK106 (0e6000a1)
> [    5.042663] nouveau 0000:01:00.0: bios: version 80.06.10.00.3d
> [    5.043341] nouveau 0000:01:00.0: loading
> /lib/firmware/nvidia/gk106/fecs_inst.bin failed with error -22
> [    5.043342] nouveau 0000:01:00.0: Direct firmware load for
> nvidia/gk106/fecs_inst.bin failed with error -22

The file isn't there. Make sure that it is. (At the time the nouveau module loads. So if e.g. nouveau loads from the initrd, it needs to be in the initrd. If it's built-in, it needs to be in the kernel's extra firmware list.)
Comment 7 Nicolás Luciano Bértolo 2016-12-14 22:16:27 UTC
What happens is that the file is actually there, but it is 0 bytes long.

This is the script that I am using to extract the files:

#! /bin/bash

rmmod nvidia_drm
rmmod nvidia_modeset
rmmod nvidia

echo 64000 > /sys/kernel/debug/tracing/buffer_size_kb
echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace_pipe > mmiotrace.log &

modprobe nvidia

xinit -e sh -c "glxgears & sleep 10"

sleep 5
echo nop > /sys/kernel/debug/tracing/current_tracer
wait

This line fails:
echo nop > /sys/kernel/debug/tracing/current_tracer
It outputs: Resource or device busy.
Comment 8 Ilia Mirkin 2016-12-14 22:19:38 UTC
You should instead just use the extractor I wrote. Follow the instructions at https://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware (to the letter).
Comment 9 Nicolás Luciano Bértolo 2016-12-14 23:02:13 UTC
It works.
I don't get any lockups.

In case it is necessary, here are some more details about this card.
According to nvflash:

<0> GeForce GTX 660      (10DE,11C0,1458,354E) H:--:NRM S:00,B:01,PCI,D:00,F:00

This BIOS:
NVIDIA Source BIOS Version:80.06.10.00.3D
http://www.gigabyte.com/products/product-page.aspx?pid=4361&kw=GV-N660OC-2GD#bios
Comment 10 Ben Skeggs 2016-12-15 22:57:29 UTC
(In reply to Nicolás Luciano Bértolo from comment #7)
> What happens is that the file is actually there, but it is 0 bytes long.
> 
> This is the script that I am using to extract the files:
> 
> #! /bin/bash
> 
> rmmod nvidia_drm
> rmmod nvidia_modeset
> rmmod nvidia
> 
> echo 64000 > /sys/kernel/debug/tracing/buffer_size_kb
> echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
> cat /sys/kernel/debug/tracing/trace_pipe > mmiotrace.log &
> 
> modprobe nvidia
> 
> xinit -e sh -c "glxgears & sleep 10"
> 
> sleep 5
> echo nop > /sys/kernel/debug/tracing/current_tracer
> wait
> 
> This line fails:
> echo nop > /sys/kernel/debug/tracing/current_tracer
> It outputs: Resource or device busy.

That's normal, you need to kill the 'cat' process first.

Can you share your resulting trace please, I'll try and see what's different vs my (working) board.
Comment 11 Nicolás Luciano Bértolo 2016-12-16 00:02:37 UTC
Created attachment 128495 [details]
mmiotrace
Comment 12 Ilia Mirkin 2016-12-16 00:10:50 UTC
(In reply to Nicolás Luciano Bértolo from comment #11)
> Created attachment 128495 [details]
> mmiotrace

This trace is, effectively, empty. Among other things, it's not a good idea to have nvidia loaded at any time before grabbing the trace.

A successful trace should be 50-100MB uncompressed. (It compresses down quite nicely though.)
Comment 13 Nicolás Luciano Bértolo 2016-12-16 00:57:00 UTC
Created attachment 128496 [details]
complete mmiotrace
Comment 14 GitLab Migration User 2019-09-18 20:44:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1120.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.