Summary: | Kworker using 100% CPU | ||
---|---|---|---|
Product: | xorg | Reporter: | patrick.clara |
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | daleha, reedlaw |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
patrick.clara
2014-03-28 12:07:06 UTC
Created attachment 96521 [details]
dmesg
Created attachment 96522 [details]
lspci
Created attachment 96523 [details]
dmidecode
Created attachment 96524 [details]
acpidump
Created attachment 96525 [details]
powertop html report
As requested by Ilia Mirkin I tryed booting with nouveau.debug=debug and drm.debug=0xe. Here a chunk of the output it continuously repeats [ 25.732112] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 16 [ 25.732399] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x58565600 [ 25.732404] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30543631 [ 25.732409] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.732414] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.732429] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 9: 0x00000000 8 [ 25.732646] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a11 [ 25.732652] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.732657] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.732662] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.732668] nouveau [ DRM] display: 4x270000 dpcd 0x11 [ 25.732669] nouveau [ DRM] encoder: 4x270000 [ 25.732670] nouveau [ DRM] maximum: 4x270000 [ 25.732672] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected [ 25.733745] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected [ 25.733753] nouveau [ DRM] plugged eDP-1 [ 25.733765] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.733786] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.733787] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.733788] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.733790] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.733942] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1 [ 25.734113] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00 [ 25.734118] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.734121] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.734127] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.734141] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.734155] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734156] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734158] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734159] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734313] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1 [ 25.734475] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00 [ 25.734480] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.734486] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.734491] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.734505] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.734526] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734527] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734528] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734530] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.734681] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.734967] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xffffff00 [ 25.734973] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00ffffff [ 25.734975] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x96a2a934 [ 25.734981] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.734987] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.735273] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x040117ff [ 25.735278] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x781322a0 [ 25.735284] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xa7a11202 [ 25.735289] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x26995354 [ 25.735295] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.735575] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0054500b [ 25.735581] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010000 [ 25.735586] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.735592] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.735607] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.735895] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.735901] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x762a0101 [ 25.735905] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x54b0c440 [ 25.735911] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x042c6014 [ 25.735916] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.736203] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xc1580091 [ 25.736209] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x1e000010 [ 25.736214] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xc440762a [ 25.736220] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x615b54b0 [ 25.736225] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.736507] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0091042c [ 25.736512] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0010c158 [ 25.736518] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00001e00 [ 25.736523] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x2a00fd00 [ 25.736529] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.736815] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x1e64603d [ 25.736821] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x20200a01 [ 25.736825] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x20202020 [ 25.736831] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xfc000000 [ 25.736836] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 16 [ 25.737123] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x58565600 [ 25.737129] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30543631 [ 25.737134] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.737140] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.737153] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 9: 0x00000000 8 [ 25.737381] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a11 [ 25.737390] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.737398] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.737406] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.737410] nouveau [ DRM] display: 4x270000 dpcd 0x11 [ 25.737411] nouveau [ DRM] encoder: 4x270000 [ 25.737412] nouveau [ DRM] maximum: 4x270000 [ 25.737414] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected [ 25.738486] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected [ 25.738492] nouveau [ DRM] plugged eDP-1 [ 25.738504] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.738525] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738526] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738528] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738529] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738681] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1 [ 25.738852] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00 [ 25.738857] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.738860] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.738866] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.738880] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.738894] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738896] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738897] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.738898] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.739052] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1 [ 25.739214] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00 [ 25.739220] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.739225] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x30443031 [ 25.739231] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30 [ 25.739244] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1 [ 25.739265] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.739266] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.739268] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.739269] nouveau D[ I2C][0000:01:00.0] AUXCH(1): wr 0x00000000 [ 25.739421] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.739705] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xffffff00 [ 25.739711] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00ffffff [ 25.739716] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x96a2a934 [ 25.739722] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00000000 [ 25.739727] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.740009] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x040117ff [ 25.740014] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x781322a0 [ 25.740019] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xa7a11202 [ 25.740025] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x26995354 [ 25.740030] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.740317] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0054500b [ 25.740323] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010000 [ 25.740327] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.740333] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.740339] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.740625] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x01010101 [ 25.740631] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x762a0101 [ 25.740636] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x54b0c440 [ 25.740642] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x042c6014 [ 25.740647] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.740929] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xc1580091 [ 25.740934] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x1e000010 [ 25.740940] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xc440762a [ 25.740945] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x615b54b0 [ 25.740951] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.741237] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0091042c [ 25.741243] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x0010c158 [ 25.741247] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x00001e00 [ 25.741253] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x2a00fd00 [ 25.741258] nouveau D[ I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16 [ 25.741544] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x1e64603d [ 25.741550] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x20200a01 [ 25.741555] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0x20202020 [ 25.741561] nouveau D[ I2C][0000:01:00.0] AUXCH(1): rd 0xfc000000 Can you see if booting with drm_kms_helper.poll=0 helps? Booting with drm_kms_helper.poll=0 does not help. (In reply to comment #8) > Booting with drm_kms_helper.poll=0 does not help. Are you absolutely sure? That should cause drm_helper_hpd_irq_event to not do any work and return immediately... unless that setting is somehow not getting propagated to dev->mode_config.poll_enabled... (In reply to comment #9) > (In reply to comment #8) > > Booting with drm_kms_helper.poll=0 does not help. > > Are you absolutely sure? That should cause drm_helper_hpd_irq_event to not > do any work and return immediately... unless that setting is somehow not > getting propagated to dev->mode_config.poll_enabled... I have tryed other 3 times and checked and rechecked spelling. Nothing does change. I then booted with "modprobe.blacklist=nouveau" and afterwards did "modprobe drm_kms_helper poll=0" and "modprobe nouveau". The problem disappears but also the output on the screen. I can not reproduce it every time and it could be related to when and how i switch between tty. I got this in dmesg (only in this particular case), dont know if it is related. [ 49.265515] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0 [ 54.910045] pci_pm_runtime_suspend(): nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22 [ 78.572068] detected fb_set_par error, error code: -22 [ 175.690629] detected fb_set_par error, error code: -22 [ 176.447496] systemd-logind[2892]: New session 3 of user Debian-gdm. [ 183.530486] detected fb_set_par error, error code: -22 [ 183.537975] systemd-logind[2892]: Removed session 3. [ 184.246263] systemd-logind[2892]: New session 4 of user Debian-gdm. [ 201.221280] systemd-logind[2892]: Removed session 4. [ 201.224190] systemd-logind[2892]: Removed session 1. [ 201.709503] detected fb_set_par error, error code: -22 Maybe now I can clarify a little bit. If nouveau loads at boot, the problem is always there, even with drm_kms_helper.poll=0. If nouveau is blacklisted at boot i have two possible situations: 1) - i have gdm on the screen - through ssh load drm_kms_helper with poll=0 - load nouveau - the screen goes black, or sometimes it displays some rubbish - in dmesg whis appears: [ 66.938644] pci_pm_runtime_suspend(): nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22 - i press ctrl+alt+F1 and nothing happens but this appears in dmesg: [ 175.868791] detected fb_set_par error, error code: -22 -> CPU load is normal 2) - switch to tty1 - through ssh load drm_kms_helper with poll=0 - load nouveau - screen changes resolution and still works, also changing tty works - CPU load at 100% All the 3 situations are reproducible all the time (In reply to comment #11) > Maybe now I can clarify a little bit. > > > If nouveau loads at boot, the problem is always there, even with > drm_kms_helper.poll=0. > > > If nouveau is blacklisted at boot i have two possible situations: > 1) > - i have gdm on the screen > - through ssh load drm_kms_helper with poll=0 > - load nouveau > - the screen goes black, or sometimes it displays some rubbish > - in dmesg whis appears: > [ 66.938644] pci_pm_runtime_suspend(): > nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22 > - i press ctrl+alt+F1 and nothing happens but this appears in dmesg: > [ 175.868791] detected fb_set_par error, error code: -22 > -> CPU load is normal Yeah, nouveau suspends and is gone forever. You can disable runtime pm with nouveau.runpm=0. A fix for this should be in 3.14-final, I think. > > 2) > - switch to tty1 > - through ssh load drm_kms_helper with poll=0 > - load nouveau > - screen changes resolution and still works, also changing tty works > - CPU load at 100% > > > All the 3 situations are reproducible all the time In one of the cpu load at 100% situations, could you do a perf record -g -a (as root), and attach the resulting perf.data file after like 30s of runtime, along with your kallsyms (which should include nouveau and drm* symbols... grabbing /proc/kallsyms from a running kernel should work). Created attachment 96557 [details]
kallsyms with loaded nouveau
Created attachment 96560 [details]
perf.data.xz
Hm, well either I'm an idiot or I need more stuff to see it properly... In either case, since I can't get it to work, would you mind running perf report And pasting in the first couple of entries? Samples: 121K of event 'cycles', Event count (approx.): 71741963089 + 64.62% kworker/1:1 [kernel.kallsyms] [k] ioread32 + 18.47% kworker/1:1 [kernel.kallsyms] [k] native_read_tsc + 13.71% kworker/1:1 [kernel.kallsyms] [k] delay_tsc + 2.31% swapper [kernel.kallsyms] [k] ioread32 + 0.37% swapper [kernel.kallsyms] [k] intel_idle + 0.05% kworker/1:1 [nouveau] [k] nv94_aux + 0.02% swapper [kernel.kallsyms] [k] cpuidle_enter_state + 0.02% swapper [kernel.kallsyms] [k] irq_entries_start + 0.02% swapper [kernel.kallsyms] [k] _raw_spin_lock + 0.01% kworker/1:1 [kernel.kallsyms] [k] __const_udelay + 0.01% swapper [nouveau] [k] nve0_gpio_intr + 0.01% swapper [kernel.kallsyms] [k] menu_select + 0.01% swapper [kernel.kallsyms] [k] rcu_eqs_enter_common.isra.48 + 0.01% swapper [kernel.kallsyms] [k] ktime_get + 0.01% kworker/1:1 [nouveau] [k] dcb_gpio_table + 0.01% kworker/1:1 [nouveau] [k] dcb_table + 0.01% kworker/1:1 [kernel.kallsyms] [k] iowrite32 + 0.01% kworker/1:1 [i2c_algo_bit] [k] i2c_outb.isra.0 + 0.01% kworker/1:1 [nouveau] [k] nouveau_i2c_setscl + 0.01% swapper [kernel.kallsyms] [k] queue_work_on + 0.01% swapper [kernel.kallsyms] [k] native_write_msr_safe + 0.01% swapper [kernel.kallsyms] [k] cpu_startup_entry + 0.00% kworker/1:1 [nouveau] [k] nv_printk_ + 0.00% kworker/1:1 [kernel.kallsyms] [k] native_write_msr_safe + 0.00% kworker/1:1 [nouveau] [k] nouveau_bios_rd08 + 0.00% kworker/1:1 [kernel.kallsyms] [k] __udelay (In reply to comment #16) > Samples: 121K of event 'cycles', Event count (approx.): 71741963089 > + 64.62% kworker/1:1 [kernel.kallsyms] [k] ioread32 > + 18.47% kworker/1:1 [kernel.kallsyms] [k] native_read_tsc > + 13.71% kworker/1:1 [kernel.kallsyms] [k] delay_tsc > + 2.31% swapper [kernel.kallsyms] [k] ioread32 There was supposed to be a call graph... Perhaps you're using an older version of perf? Or perhaps it really wants you to pass in -g for report? (Or you didn't use -g when recording?) Or maybe you're in the curses TUI and you have to select to expand? I forget how it works :( Created attachment 96570 [details]
perf report
I don't have any idea how perf works. I used -g and -a. Version of iperf is 3.14 rc7. I have attached the output of "perf report -g --stdio". Is this what you are looking for? (In reply to comment #18) > Created attachment 96570 [details] > perf report Hmmm... this is also what I saw as well with your kallsyms. I assumed that I did something wrong -- normally there are deep call stacks, but perhaps there's something else going on. Oh well. I'll think some more on this, sorry for the trouble. I tryed again with nouveau.debug=debug and drm.debug=0xe, but this time I added drm_kms_helper.poll=0. The dmesg output is the same as without drm_kms_helper.poll=0. Even these lines are still present: [ 85.722596] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected [ 85.723677] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected I think I have found something. Commenting out the following line in nouveau/core/core/event.c:nouveau_event_init seems to resolve the problem. The screen output is working but I have no idea if it is going to break something else. list_add_tail(&handler->head, &event->index[index].list); I als saw that nouveau_connector_hotplug is called much more often than nouveau_connector_hotplug_work. If I comment out everything inside nouveau_connector_hotplug_work does not reduce CPU load. Removing INIT_WORK(&nv_connector->hpd_work, nouveau_connector_hotplug_work); from nouveau_connector_create reduces the load, but /proc/interrupts are still increasing very fast. Created attachment 96609 [details]
/proc/interrupts without nouveau_connector_hotplug_work running
A dump_stack() from inside nouveau_connector_hotplug CPU: 0 PID: 0 Comm: swapper/0 Tainted: G R W O 3.14.0-rc8+ #2 [ 3122.983526] Hardware name: ASUSTeK COMPUTER INC. UX51VZH/UX51VZH, BIOS UX51VZH.207 05/23/2013 [ 3122.983527] ffff8802132d5400 ffffffff814b18f2 ffff8802136e9800 ffffffffa0814699 [ 3122.983529] ffff88021176a280 ffffffffa078ad80 0000000000000012 0000000000020000 [ 3122.983531] 0000000000000001 0000000000000000 0000000000020000 ffffffffa07beb39 [ 3122.983532] Call Trace: [ 3122.983533] <IRQ> [<ffffffff814b18f2>] ? dump_stack+0x41/0x51 [ 3122.983543] [<ffffffffa0814699>] ? nouveau_connector_hotplug+0x9/0x30 [nouveau] [ 3122.983548] [<ffffffffa078ad80>] ? nouveau_event_trigger+0x60/0xb0 [nouveau] [ 3122.983554] [<ffffffffa07beb39>] ? nve0_gpio_intr+0xa9/0xe0 [nouveau] [ 3122.983561] [<ffffffffa07c2cf5>] ? nouveau_mc_intr+0x115/0x170 [nouveau] [ 3122.983562] [<ffffffff810afebd>] ? handle_irq_event_percpu+0x2d/0x1b0 [ 3122.983564] [<ffffffff810b0073>] ? handle_irq_event+0x33/0x50 [ 3122.983566] [<ffffffff810b27bd>] ? handle_edge_irq+0x7d/0x130 [ 3122.983568] [<ffffffff810156a8>] ? handle_irq+0x18/0x30 [ 3122.983569] [<ffffffff81014fc4>] ? do_IRQ+0x44/0xe0 [ 3122.983571] [<ffffffff814b746d>] ? common_interrupt+0x6d/0x6d [ 3122.983573] <EOI> [<ffffffff8139759d>] ? cpuidle_enter_state+0x4d/0xc0 [ 3122.983578] [<ffffffff81397593>] ? cpuidle_enter_state+0x43/0xc0 [ 3122.983581] [<ffffffff813976b9>] ? cpuidle_idle_call+0xa9/0x1d0 [ 3122.983583] [<ffffffff8101c695>] ? arch_cpu_idle+0x5/0x30 [ 3122.983584] [<ffffffff810af285>] ? cpu_startup_entry+0x95/0x230 [ 3122.983587] [<ffffffff818c6efc>] ? start_kernel+0x41d/0x428 [ 3122.983589] [<ffffffff818c6904>] ? repair_env_string+0x58/0x58 [ 3122.983592] [<ffffffff818c6120>] ? early_idt_handlers+0x120/0x120 [ 3122.983594] [<ffffffff818c671f>] ? x86_64_start_kernel+0x14d/0x15c Created attachment 96773 [details]
dmesg with working kernel 3.8.rc6-5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96 (pdisp-trace vbios-trace)
Created attachment 96774 [details]
dmesg with broken kernel 3.8.rc6-0a0afd282fd715dd63d64b243299a64da14f8e8d (pdisp-trace vbios-trace)
I dont't know if it is relevant, but in the hope of providing you some useful information I saw that the interrupt storm starts when the last nvbios_exec(&init) in nouveau/core/engine/disp/dport.c:dp_link_train_init is called. nvbios_exec(&init) is called with the initial offset of 0x5d7b and calls following opcodes with respective offsets: 0x6e 0x5d7b 0x6e 0x5d88 0x6e 0x5d95 0x6e 0x5da2 0x6e 0x5daf 0x7a 0x5dbc 0x7a 0x5dc5 0x7a 0x5dce 0x6e 0x5dd7 0x74 0x5de4 0x5b 0x5de7 *** Bug 77073 has been marked as a duplicate of this bug. *** I recently generated this bug: https://bugs.freedesktop.org/show_bug.cgi?id=77073 Which appears to only happen when external, displayport display is attached. I'm going to try rolling back to kernel 3.8 to see if that resolves it. I attached a bunch of debug info to my ticket which I hope will be useful. The main thing of note is that the kworker is doing: kworker/0:3-1128 [000] d.s2 1973.688813: workqueue_queue_work: work struct=ffff88046f2135d8 function=od_dbs_timer [cpufreq_ondemand] workqueue=ffff88046d17ba00 req_cpu=0 cpu=0 kworker/0:3-1128 [000] d.h2 1973.704711: workqueue_queue_work: work struct=ffffffff819b85e0 function=push_to_pool workqueue=ffff88046d17ba00 req_cpu=32 cpu=0 kworker/0:3-1128 [000] d.h3 1973.757128: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0 kworker/0:3-1128 [000] d.h3 1973.830281: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0 kworker/0:3-1128 [000] d.s3 1973.838757: workqueue_queue_work: work struct=ffff88046f2135d8 function=od_dbs_timer [cpufreq_ondemand] workqueue=ffff88046d17ba00 req_cpu=0 cpu=0 kworker/0:3-1128 [000] d.h2 1973.859889: workqueue_queue_work: work struct=ffffffff819b85e0 function=push_to_pool workqueue=ffff88046d17ba00 req_cpu=32 cpu=0 kworker/0:3-1128 [000] d.h3 1973.903514: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0 Over and over again. (In reply to comment #22) > I think I have found something. > > Commenting out the following line in > nouveau/core/core/event.c:nouveau_event_init seems to resolve the problem. > The screen output is working but I have no idea if it is going to break > something else. > > list_add_tail(&handler->head, &event->index[index].list); This also allows drawing for me too, at least on the console. I cannot start Xorg server with this though, and I'm having a hard time getting the log. I can confirm that v3.8-rc6 ( a few commits before the bad commit ) works fine for me. (In reply to comment #0) > Loading the nouveau driver leads to a kworker constantly using 100% of a > CPU core. > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 211 root 20 0 0 0 0 R 100.0 0.0 2:51.00 kworker/1:1 > > Powertop suggests the problem comes from nouveau_connector_hotplug_work. > > I noticed this problem first occurred in kernel 3.9 and still persists in > 3.14 rc8. In kernel 3.8 the problem did not occur. Some time ago, trying to > figure out the cause, i also tried a kernel bisect. > > > git bisect start > # bad: [c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1] Linux 3.9 > git bisect bad c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1 > # good: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8 > git bisect good 19f949f52599ba7c3f67a5897ac6be14bfcb1200 > # good: [d778df51c09264076fe0208c099ef7d428f21790] mm: vmscan: save work > scanning (almost) empty LRU lists > git bisect good d778df51c09264076fe0208c099ef7d428f21790 > # bad: [ee89f81252179dcbf6cd65bd48299f5e52292d88] Merge branch > 'for-3.9/core' of git://git.kernel.dk/linux-block > git bisect bad ee89f81252179dcbf6cd65bd48299f5e52292d88 > # good: [69086a78bdc973ec0b722be790b146e84ba8a8c4] Merge branch 'for-linus' > of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs > git bisect good 69086a78bdc973ec0b722be790b146e84ba8a8c4 > # bad: [ed5dc2372dba46e0ecd08791b1a0399d313e5cff] Merge tag > 'mmc-updates-for-3.9-rc1' of git:// > git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc > git bisect bad ed5dc2372dba46e0ecd08791b1a0399d313e5cff > # bad: [1f3a574a4bfe86ebf7d51fac37e0668397372fd8] Merge branch > 'drm-nouveau-next' of > git://anongit.freedesktop.org/git/nouveau/linux-2.6into drm-next > git bisect bad 1f3a574a4bfe86ebf7d51fac37e0668397372fd8 > # good: [cd17ef4114ad5c514b17e6a0bb02a309ab90b692] Merge tag > 'drm-intel-next-2013-02-01' of git:// > people.freedesktop.org/~danvet/drm-intel into drm-next > git bisect good cd17ef4114ad5c514b17e6a0bb02a309ab90b692 > # good: [210561ffd72d00eccf12c0131b8024d5436bae95] intel/iommu: force > writebuffer-flush quirk on Gen 4 Chipsets > git bisect good 210561ffd72d00eccf12c0131b8024d5436bae95 > # good: [0f0800661a125ddb038462570c869fe6f8ab5737] drm/nouveau/gpio: pass > number of on-die gpio lines to base > git bisect good 0f0800661a125ddb038462570c869fe6f8ab5737 > # good: [35f8badc1cf652381fa3f82c1fbea39f4dbe87fd] drm: Don't set the > plane->fb to NULL on successfull set_plane > git bisect good 35f8badc1cf652381fa3f82c1fbea39f4dbe87fd > # good: [31a34aa421032cfe3b2b892c929e7539e747a7ac] drm/nouveau/i2c: aux > channels not necessarily on nvio > git bisect good 31a34aa421032cfe3b2b892c929e7539e747a7ac > # bad: [0a0afd282fd715dd63d64b243299a64da14f8e8d] drm/nv50-/disp: move DP > link training to core and train from supervisor > git bisect bad 0a0afd282fd715dd63d64b243299a64da14f8e8d > # good: [ac8cc241a81941932da44993242e68c62e115ec7] drm/nv50/devinit: > reverse the logic for running encoder init scripts > git bisect good ac8cc241a81941932da44993242e68c62e115ec7 > # good: [7dcd060c0e6e4f76ce7eea94c66437ce8f546530] drm/nouveau/i2c: create > proper chipset-specific class implementations > git bisect good 7dcd060c0e6e4f76ce7eea94c66437ce8f546530 > # good: [5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96] drm/nv50-/disp: handle > supervisor tasks from workqueue > git bisect good 5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96 > # first bad commit: [0a0afd282fd715dd63d64b243299a64da14f8e8d] > drm/nv50-/disp: move DP link training to core and train from supervisor > > > > > 0a0afd282fd715dd63d64b243299a64da14f8e8d is the first bad commit > commit 0a0afd282fd715dd63d64b243299a64da14f8e8d > Author: Ben Skeggs <bskeggs@redhat.com> > Date: Mon Feb 18 23:17:53 2013 -0500 > > drm/nv50-/disp: move DP link training to core and train from supervisor > > We need to be able to do link training for PIOR-connected ANX9805 from > the third supervisor handler (due to script ordering in the bios, can't > have the "user" call train because some settings are overwritten from > the modesetting bios scripts). > > This moves link training for SOR-connected DP encoders to the second > supervisor interrupt, *before* we call the modesetting scripts (yes, > different ordering from PIOR is necessary). This is useful since we > should now be able to remove some hacks to workaround races between > the supervisor and link training paths. > > Signed-off-by: Ben Skeggs <bskeggs@redhat.com> > > :040000 040000 8f2f2d8ea35eae1b6e6ff981761be02a8818bac3 > 2994eb54c689056254c241fe78a3af8d0b61d1df M drivers @Ilia Mirkin Is there anything else I can supply that will be helpful for debugging? I have found an other strange behavior. I have used the latest kernel from the torvalds git. With 3.14-rc8 it didn't happen. Loading nouveau the first time results in the usual situation. I unload it with "echo 0 | sudo tee /sys/class/vtconsole/vtcon1/bind" and "sudo rmmod nouveau". I load it again and the CPU is idle and interrupts are normal. Screen and ttys seem to work as they should. I cant start X (either the first time I load nouveau nor the second time) which says Fatal server error: [ 70.848] (EE) no screens found(EE) But this is probably becouse I broke it with some updates. For me at least this is reproducible every time. I will attach both logiles with debug=PDISP=trace,VBIOS=trace of the first and second load. I noticed the following line was present only during the first load: nouveau [ DRM] unplugged eDP-1 This was present only on the second load: nouveau [ DEVINIT][0000:01:00.0] adaptor not initialised I noticed that during the second load, the vbios is somehow executing differently. After executing SUB_DIRECT opcodes, sometines it jums, sometimes not. Maybe becouse it is initializing the gpu on the second load? Probably there are more differences I did not notice. @ Dale Hamel If you have time, could you please try if you can reproduce it? Created attachment 96971 [details]
dmesg on 3.14+, first load
Created attachment 96972 [details]
dmesg on 3.14+, second load
I think I should add that it happens only if I boot in pure EFI mode. If I boot in EFI mode with CSM enabled, on the second load it behaves as usual. I tried loading nouveau at boot with "config=NvForcePost=1". This seems to workaround the problem for now. Also X starts corectly. Will do some more testing... (In reply to comment #37) > I think I should add that it happens only if I boot in pure EFI mode. If I > boot in EFI mode with CSM enabled, on the second load it behaves as usual. > > I tried loading nouveau at boot with "config=NvForcePost=1". This seems to > workaround the problem for now. Also X starts corectly. > > Will do some more testing... Are you booting on a Macbook with something like rEFIt or rEFInd? How are you booting pure CSM if so? I'd like to try to duplicate this. (In reply to comment #38) > (In reply to comment #37) > > I think I should add that it happens only if I boot in pure EFI mode. If I > > boot in EFI mode with CSM enabled, on the second load it behaves as usual. > > > > I tried loading nouveau at boot with "config=NvForcePost=1". This seems to > > workaround the problem for now. Also X starts corectly. > > > > Will do some more testing... > > Are you booting on a Macbook with something like rEFIt or rEFInd? How are > you booting pure CSM if so? I'd like to try to duplicate this. I have an Asus Zenbook UX51VZ and using simply GRUB (The EFI version obviously), no rEFIt or rEFInd. Maybe I could not explain it clear enough. If I boot in pure EFI mode, unloading and loading nouveau again, makes nouveau think the gpu is not initialized, or did not do the post. (Which I think the bios should do, but not sure). Therefore, as you can see by comparing the two last dmesgs i attached, nouveau is executing a big chunk of the vbios. After doing so CPU load is normal and everythink seems to work as it should, at least for now. In the other case, where loading, unloading and loading again doesn't make any difference, i did not boot in pure CSM mode. I booted in EFI mode but with CSM enabled. These things actually go beyond my knowledge, but as far I understand, enabling CSM while still booting in EFI mode, enables legacy functionality like PCIROM, VGA setup and things like that. I now tried als the pure CSM boot but there is no difference between pure CSM and EFI with CSM enabled. In any case the interesting thing in the dmesg of the second load in pure EFI is this line: nouveau [ DEVINIT][0000:01:00.0] adaptor not initialised The behavior of nouveau, if it thinks the gpu is not initialized can be forced by passing this parameter to nouveau: "config=NvForcePost=1". Either in grub with "nouveau.config=NvForcePost=1" or in /etc/modprobe/... So probably it would be more interesting trying this than to replicate the strange behavior of unloading and loading again. "config=NvForcePost=1", for me, works in pure EFI, EFI with CSM and also in pure CSM. One last thing: I suppose that my system is newer running the init scripts in vbios during POST, but I think there is more. As this patch describes (http://www.spinics.net/lists/linux-pci/msg29828.html), the bios should also set up VGA (whatever it means). On my system, booting in pure EFI "cat /sys/class/drm/card0/device/boot_vga" prints 0 and therefore the X server doesn't find my gpu and fails to start. If CSM is enabled, and also in pure CSM mode, it prints 1 as it should. Just for my curiosity, could you please check how your machine behaves regarding this boot_vga? I get the same problem with a kworker pegging the CPU as soon as I plug in an external HDMI monitor. I'm using kernel 3.14.1-1. I'm on Arch Linux using the xf86-video-nouveau 1.0.10-2 package. DP has had a substantial rework targeted at 3.16. Can you guys give http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next a shot and see if it avoids the hpd storm? Absolutely, i'm building right now and i'll report back tomorrow once I can test on a display port monitor. So, the new kernel linked above definitely fixes the problem. Hotplug now works flawlessly! Unfortunately it introduces a fairly minor bug with X11 that seems to cause my modelines to be invalid. I'm running on a retina macbook pro, and I normally downscale the resolution by 50% so that i can actually read text on it, but my xrandr lines in xinitrc now cause it to hang: xrandr --newmode "1440x900_60.00" 106.50 1440 1528 1672 1904 900 903 909 934 -hsync +vsync xrandr --addmode $monitor 1440x900_60.00 xrandr --output $monitor --mode 1440x900_60.00 I'll check to see if these modelines are just invalid, or if this is just some bug related to xrandr. It's clearly got something to do with the kernel though, since it's no longer working. Great work to all the devs who refactored this! One step closer to converting my office to use linux on their MBPs :) So, the new kernel linked above definitely fixes the problem. Hotplug now works flawlessly! Unfortunately it introduces a fairly minor bug with X11 that seems to cause my modelines to be invalid. I'm running on a retina macbook pro, and I normally downscale the resolution by 50% so that i can actually read text on it, but my xrandr lines in xinitrc now cause it to hang: xrandr --newmode "1440x900_60.00" 106.50 1440 1528 1672 1904 900 903 909 934 -hsync +vsync xrandr --addmode $monitor 1440x900_60.00 xrandr --output $monitor --mode 1440x900_60.00 I'll check to see if these modelines are just invalid, or if this is just some bug related to xrandr. It's clearly got something to do with the kernel though, since it's no longer working. Great work to all the devs who refactored this! One step closer to converting my office to use linux on their MBPs :) Sorry, somehow my last comment was double posted. (In reply to comment #45) > Sorry, somehow my last comment was double posted. (In reply to comment #44) > So, the new kernel linked above definitely fixes the problem. > > Hotplug now works flawlessly! Thanks for testing, I'm closing the bug :) > Unfortunately it introduces a fairly minor bug with X11 that seems to cause > my modelines to be invalid. Please fill out another bug report, otherwise, we cannot track bugs easily! A did a bit more testing, and I should say that it actually works with a caveat - this is only for display port monitors, thunderbolt monitors (which uses the same connector) don't seem to be recognized at all now, which is a regression. But that's another bug, the 100% kworker thing is definitely fixed. (In reply to comment #47) > A did a bit more testing, and I should say that it actually works with a > caveat - this is only for display port monitors, thunderbolt monitors (which > uses the same connector) don't seem to be recognized at all now, which is a > regression. A "thunderbolt" monitor should be no different. When you file a bug for it, can you attach debug logs (nouveau.debug=PDISP=trace,VBIOS=trace,I2C=trace log_buf_len=1M) of it working before, and not working now? Thanks! > > But that's another bug, the 100% kworker thing is definitely fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.