Summary: | DRM: EVO timeout with kernel 4.15.x | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Sérgio M. Basto <sergio> | ||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||
Severity: | major | ||||||||||
Priority: | medium | CC: | acizov, dominik, pauloedgarcastro, sergio | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=105173 https://bugs.freedesktop.org/show_bug.cgi?id=105174 https://bugzilla.redhat.com/show_bug.cgi?id=1551401 https://bugzilla.redhat.com/show_bug.cgi?id=1618906 |
||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Sérgio M. Basto
2018-03-01 23:05:16 UTC
boots fine with kernel 4.14.x Can you say a few words about what's connected? Also, any chance you could bisect to the specific commit? 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M GS] (rev a1) (In reply to Sérgio M. Basto from comment #3) > 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M > GS] (rev a1) I meant more like what's connected to the card... screens and how they're hooked up. (In reply to Ilia Mirkin from comment #4) > (In reply to Sérgio M. Basto from comment #3) > > 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M > > GS] (rev a1) > > I meant more like what's connected to the card... screens and how they're > hooked up. sorry my previous comment was not a reply . I have one laptop from 2007 , 2009 or so, (dual core dual with a good nvidia at the time ) it boots, it write boot log in vesa mode as usual , when switch root and should load drm , it hangs ( when loading drm kernel module). About bisect kernel , before that I tried just over and boot with kernel 4.1-rc3 [1], which also don't boot . I may try boot with kernel-4.15.0-0.rc0.git3 [2] It is not impossible I try bisect the kernel , but I don't build a kernel for a long time . The thing is I'm not the only one with problems with nvidia /nouveau and kernel 4.15 . yeah also with nvidia drives it hangs . [1] https://fedoraproject.org/wiki/RawhideKernelNodebug [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=999713 I tried just over and boot with kernel 4.16-rc3 [1], which also don't boot Sorry for my English , you may anything if you don't understand what I wrote: I just tried jump over and boot with kernel 4.16-rc3, which also don't boot One new : kernel 4.16-rc3 and 4.15.6 boots well if I add nouveau.modeset=0 kernel-4.15-git3 and kernel-4.15-git6 doesn't boot or hang even with nouveau.modeset=0 but maybe it is unrelated The screen is only one , eDP-1 connected primary I receive this comment [1]. This patch [2] fixed my problem I will test it , this patch makes sense to you ? Thanks [1] https://bugzilla.redhat.com/show_bug.cgi?id=1546439#c7 [2] https://github.com/skeggsb/nouveau/pull/1/files Could you please try 4.16-rc6, which includes the patch you mentioned, and see if that helps? Created attachment 138329 [details] ./dmesg.txt (In reply to Pierre Moreau from comment #9) > Could you please try 4.16-rc6, which includes the patch you mentioned, and > see if that helps? no luck , but I found that laptop does not freeze , I could connect to him via ssh I send in attach full dmesg.txt and Xorg.0.log Created attachment 138330 [details]
Xorg.0.log
(In reply to Sérgio M. Basto from comment #10) > I send in attach full dmesg.txt and Xorg.0.log Thank you for the logs. I tried 4.15.12 on a G98 (9300 GS) but could not reproduce that issue (I had two screens: one connected over VGA and the other over HDMI). 4.15.12 should not have any new patches over 4.16-rc6, so you don’t need to try 4.15.12. I’ll see if I can find some errors in the patches that went in 4.15. If you are able to bisect the faulty commit that went in 4.15, that would be really helpful. (In reply to Pierre Moreau from comment #12) > (In reply to Sérgio M. Basto from comment #10) > > I send in attach full dmesg.txt and Xorg.0.log > > Thank you for the logs. > > I tried 4.15.12 on a G98 (9300 GS) but could not reproduce that issue (I had > two screens: one connected over VGA and the other over HDMI). 4.15.12 should > not have any new patches over 4.16-rc6, so you don’t need to try 4.15.12. > I’ll see if I can find some errors in the patches that went in 4.15. If you > are able to bisect the faulty commit that went in 4.15, that would be really > helpful. I tried kernel-4.15.0-rc0.git3 and kernel-4.15.0-rc0.git6 doesn't boot or hang even with nouveau.modeset=0 So for me is difficult to test kernel-4.15-rc0 , I convinced i issue started before kernel-4.15-rc1, but if you got the patches for nouveau on kernel 4.15 , I can build a stable kernel and reverse all the patches and if it works I could bisect then ... Since seems the laptop doesn't not hang and I could shut it down without a cool reboot even better (*) I'm convinced my issue started before kernel-4.15-rc1 (In reply to Sérgio M. Basto from comment #14) > (*) I'm convinced my issue started before kernel-4.15-rc1 I try bitsect kernel the patch-4.14-git2.xz is the first bad commit, I tried the same commit [1] but with git1 and boots fine , so I assume is exclusively a kernel code issue. I don't how numeration works so here is the resume, kernel 4.15.0-git1 boots and kernel 4.15.0-git2 don't boot xzdiff -up patch-4.14-git1.xz patch-4.14-git2.xz , have 100 thousand lines , where I find a git tree with these commits ? [1] https://src.fedoraproject.org/rpms/kernel/c/2ef4e8028f509354fb5a339bd2f8d0d1df8f2e8d?branch=master while kernel-4.14.18 still boot without any problem , and was boot since 2007 , now with kernel-4.17.3-100.fc27.x86_64 still have dmesg | grep -i nouv [ 6.897082] nouveau 0000:01:00.0: NVIDIA G98 (298480a2) [ 6.924724] nouveau 0000:01:00.0: bios: version 62.98.2e.00.08 [ 6.946862] nouveau 0000:01:00.0: bios: M0203T not found [ 6.947024] nouveau 0000:01:00.0: bios: M0203E not matched! [ 6.947173] nouveau 0000:01:00.0: fb: 256 MiB DDR2 [ 7.031599] nouveau 0000:01:00.0: DRM: VRAM: 256 MiB [ 7.031744] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB [ 7.031905] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 7.032073] nouveau 0000:01:00.0: DRM: DCB version 4.0 [ 7.032240] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000323 00010034 [ 7.032394] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011300 00000028 [ 7.032541] nouveau 0000:01:00.0: DRM: DCB outp 02: 04032312 00020010 [ 7.032688] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000040 [ 7.032835] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000100 [ 7.032980] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001261 [ 7.039372] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies [ 7.117580] nouveau 0000:01:00.0: DRM: allocated 1280x800 fb: 0x60000, bo 0000000065c597bd [ 7.123536] fbcon: nouveaufb (fb0) is primary device [ 9.191132] nouveau 0000:01:00.0: DRM: EVO timeout [ 11.191063] nouveau 0000:01:00.0: DRM: base-0: timeout [ 13.192324] nouveau 0000:01:00.0: DRM: base-0: timeout [ 15.266028] nouveau 0000:01:00.0: DRM: base-0: timeout [ 17.266094] nouveau 0000:01:00.0: DRM: base-0: timeout [ 17.499507] nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon [ 19.503322] nouveau 0000:01:00.0: DRM: base-0: timeout [ 19.511588] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device [ 19.519163] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 [ 21.564246] nouveau 0000:01:00.0: DRM: base-0: timeout [ 30.324866] nouveau 0000:01:00.0: DRM: EVO timeout [ 639.261601] nouveau 0000:01:00.0: DRM: EVO timeout Against one good boot dmesg | grep -i nouv [ 6.731505] nouveau 0000:01:00.0: NVIDIA G98 (298480a2) [ 6.758367] nouveau 0000:01:00.0: bios: version 62.98.2e.00.08 [ 6.781022] nouveau 0000:01:00.0: bios: M0203T not found [ 6.781025] nouveau 0000:01:00.0: bios: M0203E not matched! [ 6.781028] nouveau 0000:01:00.0: fb: 256 MiB DDR2 [ 6.831893] nouveau 0000:01:00.0: DRM: VRAM: 256 MiB [ 6.832058] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB [ 6.832206] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 6.832349] nouveau 0000:01:00.0: DRM: DCB version 4.0 [ 6.832493] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000323 00010034 [ 6.832638] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011300 00000028 [ 6.832783] nouveau 0000:01:00.0: DRM: DCB outp 02: 04032312 00020010 [ 6.832928] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000040 [ 6.833085] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000100 [ 6.833226] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001261 [ 6.869977] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies [ 6.945008] nouveau 0000:01:00.0: DRM: allocated 1280x800 fb: 0x50000, bo ffff8a5638d90000 [ 6.977103] fbcon: nouveaufb (fb0) is primary device [ 8.542893] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device [ 8.546151] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 I hit an EVO timeout on one of my machines. If this is a regression I can bisect the kernel and see what it points me to. kernel 4.15.0-git1 boots and kernel 4.15.0-git2 don't boot regression started on kernel-4.15.0-git2 and still not fixed in kernel 4.17 (In reply to Sérgio M. Basto from comment #15) > (In reply to Sérgio M. Basto from comment #14) > > (*) I'm convinced my issue started before kernel-4.15-rc1 > > I try bitsect kernel the patch-4.14-git2.xz is the first bad commit, I tried > the same commit [1] but with git1 and boots fine , so I assume is > exclusively a kernel code issue. > > I don't how numeration works so here is the resume, kernel 4.15.0-git1 > boots and kernel 4.15.0-git2 don't boot > > xzdiff -up patch-4.14-git1.xz patch-4.14-git2.xz , have 100 thousand lines , > where I find a git tree with these commits ? > > > [1] > https://src.fedoraproject.org/rpms/kernel/c/ > 2ef4e8028f509354fb5a339bd2f8d0d1df8f2e8d?branch=master that diff doesn't really help and because this is an upstream bug tracker you should rather git bisect the kernel itself, not some packages you installed on your system. If you can pinpoint to a specific git commit inside the kernel, that might be very helpful. (In reply to Karol Herbst from comment #20) > (In reply to Sérgio M. Basto from comment #15) > > (In reply to Sérgio M. Basto from comment #14) > > > (*) I'm convinced my issue started before kernel-4.15-rc1 > > > > I try bitsect kernel the patch-4.14-git2.xz is the first bad commit, I tried > > the same commit [1] but with git1 and boots fine , so I assume is > > exclusively a kernel code issue. > > > > I don't how numeration works so here is the resume, kernel 4.15.0-git1 > > boots and kernel 4.15.0-git2 don't boot > > > > xzdiff -up patch-4.14-git1.xz patch-4.14-git2.xz , have 100 thousand lines , > > where I find a git tree with these commits ? > > > > > > [1] > > https://src.fedoraproject.org/rpms/kernel/c/ > > 2ef4e8028f509354fb5a339bd2f8d0d1df8f2e8d?branch=master > > that diff doesn't really help and because this is an upstream bug tracker > you should rather git bisect the kernel itself, not some packages you > installed on your system. > > If you can pinpoint to a specific git commit inside the kernel, that might > be very helpful. where I find a git tree with these commits (patch-4.14-git1.xz to patch-4.14-git2.xz) ? Looks like I have an affected machine as well. I encountered this when bringing an old installation (F26) up to date. dmesg from 4.17.14-102.fc27.x86_64: [...] Aug 18 01:58:15 kernel: nouveau 0000:01:00.0: NVIDIA G98 (298480a2) Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: version 62.98.3c.00.44 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: M0203T not found Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: M0203E not matched! Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: fb: 512 MiB DDR2 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: VRAM: 512 MiB Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB version 4.0 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513 Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo 000000006f9828c3 Aug 18 01:58:28 kernel: fbcon: nouveaufb (fb0) is primary device Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: EVO timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device Aug 18 01:58:30 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 Aug 18 01:58:30 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:32 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:37 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:58:53 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:59:10 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:59:11 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:59:13 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout Aug 18 01:59:52 kernel: nouveau 0000:01:00.0: DRM: EVO timeout Aug 18 02:04:07 kernel: nouveau 0000:01:00.0: DRM: EVO timeout 4.14.x works fine like in Sérgio's case. Mine is also a single-screen setup (LVDS-1) with no external outputs connected. There are no errors in Xorg log. I believe this is a different bug than https://bugzilla.redhat.com/show_bug.cgi?id=1547037 and patch https://github.com/skeggsb/nouveau/pull/1/files did not fix this. Sérgio, are you sure the above patch fixes this for you? Ok, now I got something more interesting. I booted Fedora kernel 4.19.0-0.rc0.git5.1.fc30.x86_64 (commit 1f7a4c73a739a63b3f108d8eda6f947fdc70dd65). I still got a frozen console, but when Xorg started, the following WARNING appeared in kernel log. Does this give any clues? [ 7.193842] nouveau 0000:01:00.0: NVIDIA G98 (298480a2) [ 7.253541] nouveau 0000:01:00.0: bios: version 62.98.3c.00.44 [ 7.301129] nouveau 0000:01:00.0: bios: M0203T not found [ 7.301492] nouveau 0000:01:00.0: bios: M0203E not matched! [ 7.301669] nouveau 0000:01:00.0: fb: 512 MiB DDR2 [ 7.719129] nouveau 0000:01:00.0: DRM: VRAM: 512 MiB [ 7.719498] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB [ 7.719681] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 7.719851] nouveau 0000:01:00.0: DRM: DCB version 4.0 [ 7.720014] nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034 [ 7.720182] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028 [ 7.720387] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030 [ 7.720567] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000 [ 7.720736] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140 [ 7.720903] nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261 [ 7.721066] nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513 [ 7.738669] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies [ 7.784149] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 7.813006] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 7.828162] nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo (____ptrval____) [ 7.870502] fbcon: nouveaufb (fb0) is primary device [ 7.885068] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 7.898694] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 9.902963] nouveau 0000:01:00.0: DRM: core notifier timeout [ 11.903058] nouveau 0000:01:00.0: DRM: base-0: timeout [ 11.908408] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 11.947124] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 11.958855] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 11.961609] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device [ 11.972641] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 [ 11.978613] #0: (____ptrval____) (drm_connector_list_iter){.+.+}, at: nouveau_backlight_init+0x63/0x450 [nouveau] [ 22.205362] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 32.445359] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 42.685355] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 52.925595] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 63.165373] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 73.405363] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 83.645378] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 93.890397] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 104.125363] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 107.020185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 107.032965] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 107.074838] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 107.086752] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 110.113354] nouveau 0000:01:00.0: DRM: core notifier timeout [ 110.634595] ------------[ cut here ]------------ [ 110.634608] nouveau 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000010c412000] [size=4096 bytes] [ 110.634630] WARNING: CPU: 1 PID: 1163 at kernel/dma/debug.c:1230 check_sync+0x136/0x670 [ 110.634634] Modules linked in: ip_set nfnetlink ebtable_nat ebtable_broute ccm bridge stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc arc4 snd_hda_codec_realtek snd_hda_codec_generic ath9k snd_hda_intel ath9k_common snd_hda_codec ath9k_hw snd_hda_core uvcvideo btusb snd_hwdep btrtl snd_seq snd_seq_device btbcm btintel snd_pcm mac80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 ath videobuf2_common cfg80211 videodev media bluetooth snd_timer snd coretemp ecdh_generic joydev r592 soundcore asus_laptop memstick sparse_keymap rfkill input_polldev pcc_cpufreq acpi_cpufreq dm_crypt [ 110.634775] nouveau ata_generic pata_acpi firewire_ohci firewire_core mxm_wmi wmi i2c_algo_bit drm_kms_helper sdhci_pci cqhci sdhci ttm sis190 serio_raw mmc_core mii crc_itu_t drm sata_sis pata_sis video [ 110.634823] CPU: 1 PID: 1163 Comm: Xorg Not tainted 4.19.0-0.rc0.git5.1.fc30.x86_64 #1 [ 110.634827] Hardware name: ASUSTeK Computer Inc. X71SL /X71SL , BIOS 206 11/05/2008 [ 110.634832] RIP: 0010:check_sync+0x136/0x670 [ 110.634837] Code: 48 85 ed 75 04 48 8b 68 10 48 8b 3c 24 e8 e2 38 56 00 48 89 c6 4d 89 e8 4c 89 f9 48 89 ea 48 c7 c7 a8 18 30 b1 e8 ee 77 f6 ff <0f> 0b 8b 05 9a 75 85 01 85 c0 0f 84 81 04 00 00 48 83 c4 28 4c 89 [ 110.634841] RSP: 0018:ffffb980412c7a10 EFLAGS: 00010082 [ 110.634847] RAX: 0000000000000000 RBX: ffffffffb2f33410 RCX: 0000000000000006 [ 110.634851] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff9e12fbbd6ba0 [ 110.634855] RBP: ffff9e12f9f82ed0 R08: 0000000000000000 R09: 0000000000000001 [ 110.634859] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286 [ 110.634863] R13: 0000000000001000 R14: 0000000000010000 R15: 000000010c412000 [ 110.634868] FS: 00007fe0441aeac0(0000) GS:ffff9e12fba00000(0000) knlGS:0000000000000000 [ 110.634873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 110.634877] CR2: 00007fe03c0c8d90 CR3: 0000000114a6e000 CR4: 00000000000006e0 [ 110.634881] Call Trace: [ 110.634897] debug_dma_sync_single_for_device+0x7b/0x90 [ 110.634915] ? ttm_bo_mem_compat+0x23/0x60 [ttm] [ 110.634925] ? kfree+0x188/0x320 [ 110.634932] ? krealloc+0x25/0xa0 [ 110.635040] nouveau_bo_sync_for_device+0x6a/0xb0 [nouveau] [ 110.635098] nouveau_bo_validate+0x71/0x90 [nouveau] [ 110.635154] nouveau_gem_ioctl_pushbuf+0x8a5/0x1ad0 [nouveau] [ 110.635222] ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau] [ 110.635240] ? drm_ioctl_kernel+0xa5/0xf0 [drm] [ 110.635240] ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau] [ 110.635240] drm_ioctl_kernel+0xa5/0xf0 [drm] [ 110.635240] drm_ioctl+0x1fc/0x390 [drm] [ 110.635240] ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau] [ 110.635240] nouveau_drm_ioctl+0x65/0xc0 [nouveau] [ 110.635240] do_vfs_ioctl+0xa5/0x6e0 [ 110.635240] ksys_ioctl+0x60/0x90 [ 110.635240] __x64_sys_ioctl+0x16/0x20 [ 110.635240] do_syscall_64+0x60/0x1f0 [ 110.635240] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 110.635240] RIP: 0033:0x7fe041422ec7 [ 110.635240] Code: 00 00 90 48 8b 05 d9 7f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 7f 2c 00 f7 d8 64 89 01 48 [ 110.635240] RSP: 002b:00007ffcd424fc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 110.635240] RAX: ffffffffffffffda RBX: 0000000000d5ae98 RCX: 00007fe041422ec7 [ 110.635240] RDX: 00007ffcd424fcd0 RSI: 00000000c0406481 RDI: 000000000000000e [ 110.635240] RBP: 00007ffcd424fcd0 R08: 0000000000000000 R09: 0000000000d59f20 [ 110.635240] R10: 0000000000d6be98 R11: 0000000000000246 R12: 00000000c0406481 [ 110.635240] R13: 000000000000000e R14: 0000000000d5a070 R15: 0000000000d59f20 [ 110.635240] irq event stamp: 0 [ 110.635240] hardirqs last enabled at (0): [<0000000000000000>] (null) [ 110.635240] hardirqs last disabled at (0): [<ffffffffb00bb817>] copy_process.part.28+0x747/0x1e70 [ 110.635240] softirqs last enabled at (0): [<ffffffffb00bb817>] copy_process.part.28+0x747/0x1e70 [ 110.635240] softirqs last disabled at (0): [<0000000000000000>] (null) [ 110.635240] ---[ end trace a1450e59d31d3810 ]--- [ 114.365372] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 117.080247] nouveau 0000:01:00.0: DRM: core notifier timeout [ 119.080664] nouveau 0000:01:00.0: DRM: base-0: timeout [ 122.562843] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 122.574617] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1 [ 124.605473] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 134.845626] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 145.085449] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 155.325447] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 165.565443] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 175.805469] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 [ 186.045466] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1 BTW I started bisect kernel , but we have at least 2 phases , the first phase the computer hangs but graphics still on [1], this week I hope starting second phase and find the commit where really started EVO timeout, I tested binaries of fedora first and it is between kernel-4.15.0-0.rc4.git4.1.fc28.x86_64 and kernel-4.15.0-0.rc6.git0.1.fc28.x86_64 (build dates 22 dez 2017 and 01 jan 2018 ) , that is the state of my investigation . Thanks. [1] # bad: [b18d62891aaff49d0ee8367d4b6bb9452469f807] Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip # good: [7d58e1c9059eefe0066c5acf2ffa582f6f0180e3] Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect start 'b18d62891aaff49d0ee8367d4b6bb9452469f807' '7d58e1c9059eefe0066c5acf2ffa582f6f0180e3' # bad: [331b57d14829c49d75076779cdc54d7e4537bbf0] Merge branch 'irq/urgent' into x86/apic git bisect bad 331b57d14829c49d75076779cdc54d7e4537bbf0 # good: [e43b3b58548051f8809391eb7bec7a27ed3003ea] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs git bisect good e43b3b58548051f8809391eb7bec7a27ed3003ea # good: [ec0f7cd273dc41ab28bba703cac82690ea5f2863] genirq/matrix: Add tracepoints git bisect good ec0f7cd273dc41ab28bba703cac82690ea5f2863 # skip: [f0cc6ccaf7ba42a1247fe5a9244b6009a3beddd5] x86/vector: Simplify the CPU hotplug vector update git bisect skip f0cc6ccaf7ba42a1247fe5a9244b6009a3beddd5 # bad: [baab1e84b1124bfd3e40ef6c8e05b2a15136e3d5] x86/apic: Remove unused callbacks git bisect bad baab1e84b1124bfd3e40ef6c8e05b2a15136e3d5 # good: [83a105229c59e433409e4d86e9bb915ca281235c] x86/apic: Move common APIC callbacks git bisect good 83a105229c59e433409e4d86e9bb915ca281235c # bad: [3534be05e4adc303d41fae65901598695adea685] x86/ioapic: Mark legacy vectors at reallocation time git bisect bad 3534be05e4adc303d41fae65901598695adea685 # bad: [ef9e56d894eab99a33a06b96ba8057afa67d3702] x86/ioapic: Remove obsolete post hotplug update git bisect bad ef9e56d894eab99a33a06b96ba8057afa67d3702 # good: [c1d1ee9ac1793d939ba1a1322767cc5f77a5b8fe] x86/apic: Get rid of apic->target_cpus git bisect good c1d1ee9ac1793d939ba1a1322767cc5f77a5b8fe # good: [7854f82293e99f6bb3df793a2f579db4670ba71b] x86/vector: Rename used_vectors to system_vectors git bisect good 7854f82293e99f6bb3df793a2f579db4670ba71b # bad: [fdba46ffb4c203b6e6794163493fd310f98bb4be] x86/apic: Get rid of multi CPU affinity git bisect bad fdba46ffb4c203b6e6794163493fd310f98bb4be # first bad commit: [fdba46ffb4c203b6e6794163493fd310f98bb4be] x86/apic: Get rid of multi CPU affinity I can confirm Sergio's findings so far, though kernel-4.15.0-0.rc4.git4.1.fc28.x86_64 (based on git commit ead68f216110) hangs completely upon switching fbcon to nouveaufb if I have "rhgb" in the kernel command line. I won't be able to provide more details for some time as I have to give back the machine where this can be reproduced. I'll keep my fingers crossed for Sergio. (In reply to Dominik 'Rathann' Mierzejewski from comment #25) > I can confirm Sergio's findings so far, though > kernel-4.15.0-0.rc4.git4.1.fc28.x86_64 (based on git commit ead68f216110) > hangs completely upon switching fbcon to nouveaufb if I have "rhgb" in the > kernel command line. I won't be able to provide more details for some time > as I have to give back the machine where this can be reproduced. I'll keep > my fingers crossed for Sergio. Correct, here [1] is the kernel I tested . I just started : git bisect start v4.15-rc6 ead68f216110 Bisecting: 173 revisions left to test after this (roughly 8 steps) [1] kernel-4.15.0-0.rc3.git4.1.fc28 kernel-4.15.0-0.rc4.git0.1.fc28 good graphics, bad interrupts kernel-4.15.0-0.rc4.git1.1.fc28 kernel-4.15.0-0.rc4.git2.1.fc28 kernel-4.15.0-0.rc4.git3.1.fc28 good graphics, bad interrupts kernel-4.15.0-0.rc4.git4.1.fc28 good graphics, bad interrupts kernel-4.15.0-0.rc6.git0.1.fc28 bad graphics kernel-4.15.0-0.rc6.git0.2.fc28 bad graphics kernel-4.15.0-0.rc6.git0.3.fc28 kernel-4.15.0-0.rc6.git1.1.fc28 good interrupts (but bad graphics) kernel-4.15.0-0.rc6.git2.1.fc28 Created attachment 141327 [details] [review] the commit which start evo timeout And result of git bisect start v4.15-rc6 ead68f216110 is [1] This is the commit when switch graphics at boot startup starts to fail, I'm thinking in the revert it in kernel-4.15.0-0.rc6.git1.1.fc28 which already have the good interrupts, to see if I can boot correctly again . Or should I seek for commit that fix delivery interrupts ? what do you think ? [1] # bad: [30a7acd573899fd8b8ac39236eff6468b195ac7d] Linux 4.15-rc6 # good: [ead68f216110170ec729e2c4dec0aad6d38259d7] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Skip some commits because all are good, just almost the last one is the bad commit ... # good: [64e05d118e357bb52a084b609436acf292ce7944] x86/apic: Update the 'apic=' description of setting APIC driver # bad: [f39d7d78b70e0f39facb1e4fab77ad3df5c52a35] Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip # bad: [a31e58e129f73ab5b04016330b13ed51fde7a961] x86/apic: Switch all APICs to Fixed delivery mode # first bad commit: [a31e58e129f73ab5b04016330b13ed51fde7a961] x86/apic: Switch all APICs to Fixed delivery mode Just notice the log of the commit just attached, have this sentence [1] and the my first bisect end with [2] ! it match. [1] Fixes: fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity") Reported-by: vcaputo@pengaru.com [2] # first bad commit: [fdba46ffb4c203b6e6794163493fd310f98bb4be] x86/apic: Get rid of multi CPU affinity In resume My first bad commit: [fdba46ffb4c203b6e6794163493fd310f98bb4be] x86/apic: Get rid of multi CPU affinity (in kernel 4.15.0-git2) My second bad commit: [a31e58e129f73ab5b04016330b13ed51fde7a961] x86/apic: Switch all APICs to Fixed delivery mode (in kernel-4.15.0-0.rc6.git1.1) [1] commit message say that fixes fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity") [1] https://bugs.freedesktop.org/attachment.cgi?id=141327 I can confirm this nast bug. I also have errormessages like: nouveau 0000:01:00.0: DRM: base-0: timeout (a lot of them). The X systenm becomnes very slow, and strange garph. effects occur. But after seconds it stablizes again. It happens specially when moving windows. I've got an arch system, but they don't seem to care there. My system: uname -a Linux ws-001 4.19.8-arch1-1-ARCH #1 SMP PREEMPT Sat Dec 8 13:49:11 UTC 2018 x86_64 GNU/Linux I've got a dual monitor setup, with 01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) I think it's somewhere in the dri code is my intuition. Stef Bon relevant code in dmesg: [ 3.715523] nouveau 0000:01:00.0: bios: version 86.07.42.00.4a [ 3.726451] nouveau 0000:01:00.0: fb: 4096 MiB GDDR5 [ 3.775840] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB [ 3.775842] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB [ 3.775843] nouveau 0000:01:00.0: DRM: BIT table 'A' not found [ 3.775845] nouveau 0000:01:00.0: DRM: BIT table 'L' not found [ 3.775846] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 3.775848] nouveau 0000:01:00.0: DRM: DCB version 4.1 [ 3.775850] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f42 04620030 [ 3.775851] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010 [ 3.775853] nouveau 0000:01:00.0: DRM: DCB outp 02: 02822f76 04600020 [ 3.775854] nouveau 0000:01:00.0: DRM: DCB outp 03: 02022f72 00020020 [ 3.775855] nouveau 0000:01:00.0: DRM: DCB outp 04: 04033f82 00020030 [ 3.775857] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031 [ 3.775858] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161 [ 3.775859] nouveau 0000:01:00.0: DRM: DCB conn 02: 00020246 [ 3.775860] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000331 I'm also affected by this bug on Fedora 29. I hopped on the kernel mainline and started poking around and noticed that I see this bug on 4.19 but not on 4.20. So I bisected to find the commit in the 4.20 series that fixes the bug. The fix appears to be: commit 970a5ee41c72df46e3b0f307528c7d8ef7734a2e Author: Ben Skeggs <bskeggs@redhat.com> Date: Wed Dec 12 16:51:17 2018 +1000 drm/nouveau/kms/nv50-: also flush fb writes when rewinding push buffer Should hopefully fix a regression some people have been seeing since EVO push buffers were moved to VRAM by default on Pascal GPUs. Fixes: d00ddd9da ("drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascal") Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: <stable@vger.kernel.org> # 4.19+ I can cherry pick just this commit on top of 4.19 and I get a stable system. Not ideal but, adding nouveau.noaccel=1 to grub in 4.20.13-200.fc29.x86_64 seems to help. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/411. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.