Bug 76732

Summary:

Kworker using 100% CPU

Product:

xorg

Reporter:

patrick.clara

Component:

Driver/nouveau

Assignee:

Nouveau Project <nouveau>

Status:

RESOLVED FIXED

QA Contact:

Xorg Project Team <xorg-team>

Severity:

normal

Priority:

medium

CC:

daleha, reedlaw

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
dmesg	none
lspci	none
dmidecode	none
acpidump	none
powertop html report	none
kallsyms with loaded nouveau	none
perf.data.xz	none
perf report	none
/proc/interrupts without nouveau_connector_hotplug_work running	none
dmesg with working kernel 3.8.rc6-5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96 (pdisp-trace vbios-trace)	none
dmesg with broken kernel 3.8.rc6-0a0afd282fd715dd63d64b243299a64da14f8e8d (pdisp-trace vbios-trace)	none
dmesg on 3.14+, first load	none
dmesg on 3.14+, second load	none

Description patrick.clara 2014-03-28 12:07:06 UTC

Loading the nouveau driver leads to a kworker constantly using  100% of a CPU core.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
211 root      20   0       0      0      0 R 100.0  0.0   2:51.00 kworker/1:1

Powertop suggests the problem comes from nouveau_connector_hotplug_work.

I noticed this problem first occurred in kernel 3.9 and still persists in 3.14 rc8. In kernel 3.8 the problem did not occur. Some time ago, trying to figure out the cause, i also tried a kernel bisect.


git bisect start
# bad: [c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1] Linux 3.9
git bisect bad c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1
# good: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8
git bisect good 19f949f52599ba7c3f67a5897ac6be14bfcb1200
# good: [d778df51c09264076fe0208c099ef7d428f21790] mm: vmscan: save work
scanning (almost) empty LRU lists
git bisect good d778df51c09264076fe0208c099ef7d428f21790
# bad: [ee89f81252179dcbf6cd65bd48299f5e52292d88] Merge branch
'for-3.9/core' of git://git.kernel.dk/linux-block
git bisect bad ee89f81252179dcbf6cd65bd48299f5e52292d88
# good: [69086a78bdc973ec0b722be790b146e84ba8a8c4] Merge branch 'for-linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good 69086a78bdc973ec0b722be790b146e84ba8a8c4
# bad: [ed5dc2372dba46e0ecd08791b1a0399d313e5cff] Merge tag
'mmc-updates-for-3.9-rc1' of git://
git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
git bisect bad ed5dc2372dba46e0ecd08791b1a0399d313e5cff
# bad: [1f3a574a4bfe86ebf7d51fac37e0668397372fd8] Merge branch
'drm-nouveau-next' of
git://anongit.freedesktop.org/git/nouveau/linux-2.6into drm-next
git bisect bad 1f3a574a4bfe86ebf7d51fac37e0668397372fd8
# good: [cd17ef4114ad5c514b17e6a0bb02a309ab90b692] Merge tag
'drm-intel-next-2013-02-01' of git://
people.freedesktop.org/~danvet/drm-intel into drm-next
git bisect good cd17ef4114ad5c514b17e6a0bb02a309ab90b692
# good: [210561ffd72d00eccf12c0131b8024d5436bae95] intel/iommu: force
writebuffer-flush quirk on Gen 4 Chipsets
git bisect good 210561ffd72d00eccf12c0131b8024d5436bae95
# good: [0f0800661a125ddb038462570c869fe6f8ab5737] drm/nouveau/gpio: pass
number of on-die gpio lines to base
git bisect good 0f0800661a125ddb038462570c869fe6f8ab5737
# good: [35f8badc1cf652381fa3f82c1fbea39f4dbe87fd] drm: Don't set the
plane->fb to NULL on successfull set_plane
git bisect good 35f8badc1cf652381fa3f82c1fbea39f4dbe87fd
# good: [31a34aa421032cfe3b2b892c929e7539e747a7ac] drm/nouveau/i2c: aux
channels not necessarily on nvio
git bisect good 31a34aa421032cfe3b2b892c929e7539e747a7ac
# bad: [0a0afd282fd715dd63d64b243299a64da14f8e8d] drm/nv50-/disp: move DP
link training to core and train from supervisor
git bisect bad 0a0afd282fd715dd63d64b243299a64da14f8e8d
# good: [ac8cc241a81941932da44993242e68c62e115ec7] drm/nv50/devinit:
reverse the logic for running encoder init scripts
git bisect good ac8cc241a81941932da44993242e68c62e115ec7
# good: [7dcd060c0e6e4f76ce7eea94c66437ce8f546530] drm/nouveau/i2c: create
proper chipset-specific class implementations
git bisect good 7dcd060c0e6e4f76ce7eea94c66437ce8f546530
# good: [5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96] drm/nv50-/disp: handle
supervisor tasks from workqueue
git bisect good 5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96
# first bad commit: [0a0afd282fd715dd63d64b243299a64da14f8e8d]
drm/nv50-/disp: move DP link training to core and train from supervisor




0a0afd282fd715dd63d64b243299a64da14f8e8d is the first bad commit
commit 0a0afd282fd715dd63d64b243299a64da14f8e8d
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Mon Feb 18 23:17:53 2013 -0500

    drm/nv50-/disp: move DP link training to core and train from supervisor

    We need to be able to do link training for PIOR-connected ANX9805 from
    the third supervisor handler (due to script ordering in the bios, can't
    have the "user" call train because some settings are overwritten from
    the modesetting bios scripts).

    This moves link training for SOR-connected DP encoders to the second
    supervisor interrupt, *before* we call the modesetting scripts (yes,
    different ordering from PIOR is necessary).  This is useful since we
    should now be able to remove some hacks to workaround races between
    the supervisor and link training paths.

    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 8f2f2d8ea35eae1b6e6ff981761be02a8818bac3
2994eb54c689056254c241fe78a3af8d0b61d1df M    drivers

Comment 1 patrick.clara 2014-03-28 12:12:04 UTC

Created attachment 96521 [details]
dmesg

Comment 2 patrick.clara 2014-03-28 12:12:23 UTC

Created attachment 96522 [details]
lspci

Comment 3 patrick.clara 2014-03-28 12:12:42 UTC

Created attachment 96523 [details]
dmidecode

Comment 4 patrick.clara 2014-03-28 12:13:20 UTC

Created attachment 96524 [details]
acpidump

Comment 5 patrick.clara 2014-03-28 12:13:44 UTC

Created attachment 96525 [details]
powertop html report

Comment 6 patrick.clara 2014-03-28 12:22:04 UTC

As requested by Ilia Mirkin I tryed booting with nouveau.debug=debug and drm.debug=0xe.

Here a chunk of the output it continuously repeats

[   25.732112] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 16
[   25.732399] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x58565600
[   25.732404] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30543631
[   25.732409] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.732414] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.732429] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 9: 0x00000000 8
[   25.732646] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a11
[   25.732652] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.732657] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.732662] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.732668] nouveau  [     DRM] display: 4x270000 dpcd 0x11
[   25.732669] nouveau  [     DRM] encoder: 4x270000
[   25.732670] nouveau  [     DRM] maximum: 4x270000
[   25.732672] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected
[   25.733745] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected
[   25.733753] nouveau  [     DRM] plugged eDP-1
[   25.733765] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.733786] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.733787] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.733788] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.733790] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.733942] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1
[   25.734113] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00
[   25.734118] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.734121] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.734127] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.734141] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.734155] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734156] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734158] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734159] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734313] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1
[   25.734475] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00
[   25.734480] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.734486] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.734491] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.734505] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.734526] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734527] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734528] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734530] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.734681] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.734967] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xffffff00
[   25.734973] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00ffffff
[   25.734975] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x96a2a934
[   25.734981] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.734987] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.735273] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x040117ff
[   25.735278] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x781322a0
[   25.735284] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xa7a11202
[   25.735289] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x26995354
[   25.735295] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.735575] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0054500b
[   25.735581] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010000
[   25.735586] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.735592] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.735607] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.735895] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.735901] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x762a0101
[   25.735905] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x54b0c440
[   25.735911] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x042c6014
[   25.735916] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.736203] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xc1580091
[   25.736209] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x1e000010
[   25.736214] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xc440762a
[   25.736220] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x615b54b0
[   25.736225] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.736507] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0091042c
[   25.736512] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0010c158
[   25.736518] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00001e00
[   25.736523] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x2a00fd00
[   25.736529] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.736815] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x1e64603d
[   25.736821] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x20200a01
[   25.736825] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x20202020
[   25.736831] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xfc000000
[   25.736836] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 16
[   25.737123] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x58565600
[   25.737129] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30543631
[   25.737134] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.737140] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.737153] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 9: 0x00000000 8
[   25.737381] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a11
[   25.737390] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.737398] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.737406] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.737410] nouveau  [     DRM] display: 4x270000 dpcd 0x11
[   25.737411] nouveau  [     DRM] encoder: 4x270000
[   25.737412] nouveau  [     DRM] maximum: 4x270000
[   25.737414] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected
[   25.738486] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected
[   25.738492] nouveau  [     DRM] plugged eDP-1
[   25.738504] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.738525] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738526] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738528] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738529] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738681] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1
[   25.738852] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00
[   25.738857] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.738860] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.738866] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.738880] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.738894] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738896] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738897] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.738898] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.739052] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 1: 0x00000050 1
[   25.739214] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00040a00
[   25.739220] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.739225] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x30443031
[   25.739231] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x6d000a30
[   25.739244] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 4: 0x00000050 1
[   25.739265] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.739266] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.739268] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.739269] nouveau D[     I2C][0000:01:00.0] AUXCH(1): wr 0x00000000
[   25.739421] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.739705] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xffffff00
[   25.739711] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00ffffff
[   25.739716] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x96a2a934
[   25.739722] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00000000
[   25.739727] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.740009] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x040117ff
[   25.740014] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x781322a0
[   25.740019] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xa7a11202
[   25.740025] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x26995354
[   25.740030] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.740317] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0054500b
[   25.740323] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010000
[   25.740327] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.740333] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.740339] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.740625] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x01010101
[   25.740631] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x762a0101
[   25.740636] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x54b0c440
[   25.740642] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x042c6014
[   25.740647] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.740929] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xc1580091
[   25.740934] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x1e000010
[   25.740940] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xc440762a
[   25.740945] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x615b54b0
[   25.740951] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.741237] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0091042c
[   25.741243] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x0010c158
[   25.741247] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x00001e00
[   25.741253] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x2a00fd00
[   25.741258] nouveau D[     I2C][0000:01:00.0] AUXCH(1): 5: 0x00000050 16
[   25.741544] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x1e64603d
[   25.741550] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x20200a01
[   25.741555] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0x20202020
[   25.741561] nouveau D[     I2C][0000:01:00.0] AUXCH(1): rd 0xfc000000

Comment 7 Ilia Mirkin 2014-03-28 12:32:00 UTC

Can you see if booting with drm_kms_helper.poll=0 helps?

Comment 8 patrick.clara 2014-03-28 12:39:29 UTC

Booting with drm_kms_helper.poll=0 does not help.

Comment 9 Ilia Mirkin 2014-03-28 13:08:28 UTC

(In reply to comment #8)
> Booting with drm_kms_helper.poll=0 does not help.

Are you absolutely sure? That should cause drm_helper_hpd_irq_event to not do any work and return immediately... unless that setting is somehow not getting propagated to dev->mode_config.poll_enabled...

Comment 10 patrick.clara 2014-03-28 13:46:29 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > Booting with drm_kms_helper.poll=0 does not help.
> 
> Are you absolutely sure? That should cause drm_helper_hpd_irq_event to not
> do any work and return immediately... unless that setting is somehow not
> getting propagated to dev->mode_config.poll_enabled...

I have tryed other 3 times and checked and rechecked spelling.
Nothing does change.

I then booted with "modprobe.blacklist=nouveau" and afterwards did "modprobe drm_kms_helper poll=0" and "modprobe nouveau". The problem disappears but also the output on the screen. I can not reproduce it every time and it could be related to when and how i switch between tty. I got this in dmesg (only in this particular case), dont know if it is related.

[   49.265515] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0
[   54.910045] pci_pm_runtime_suspend(): nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22
[   78.572068] detected fb_set_par error, error code: -22
[  175.690629] detected fb_set_par error, error code: -22
[  176.447496] systemd-logind[2892]: New session 3 of user Debian-gdm.
[  183.530486] detected fb_set_par error, error code: -22
[  183.537975] systemd-logind[2892]: Removed session 3.
[  184.246263] systemd-logind[2892]: New session 4 of user Debian-gdm.
[  201.221280] systemd-logind[2892]: Removed session 4.
[  201.224190] systemd-logind[2892]: Removed session 1.
[  201.709503] detected fb_set_par error, error code: -22

Comment 11 patrick.clara 2014-03-28 15:41:04 UTC

Maybe now I can clarify a little bit.


If nouveau loads at boot, the problem is always there, even with drm_kms_helper.poll=0.


If nouveau is blacklisted at boot i have two possible situations:
1)
- i have gdm on the screen
- through ssh load drm_kms_helper with poll=0
- load nouveau
- the screen goes black, or sometimes it displays some rubbish
- in dmesg whis appears:
[   66.938644] pci_pm_runtime_suspend(): nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22
- i press ctrl+alt+F1 and nothing happens but this appears in dmesg:
[  175.868791] detected fb_set_par error, error code: -22
-> CPU load is normal

2)
- switch to tty1
- through ssh load drm_kms_helper with poll=0
- load nouveau
- screen changes resolution and still works, also changing tty works
- CPU load at 100%


All the 3 situations are reproducible all the time

Comment 12 Ilia Mirkin 2014-03-28 16:50:59 UTC

(In reply to comment #11)
> Maybe now I can clarify a little bit.
> 
> 
> If nouveau loads at boot, the problem is always there, even with
> drm_kms_helper.poll=0.
> 
> 
> If nouveau is blacklisted at boot i have two possible situations:
> 1)
> - i have gdm on the screen
> - through ssh load drm_kms_helper with poll=0
> - load nouveau
> - the screen goes black, or sometimes it displays some rubbish
> - in dmesg whis appears:
> [   66.938644] pci_pm_runtime_suspend():
> nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22
> - i press ctrl+alt+F1 and nothing happens but this appears in dmesg:
> [  175.868791] detected fb_set_par error, error code: -22
> -> CPU load is normal

Yeah, nouveau suspends and is gone forever. You can disable runtime pm with nouveau.runpm=0. A fix for this should be in 3.14-final, I think.

> 
> 2)
> - switch to tty1
> - through ssh load drm_kms_helper with poll=0
> - load nouveau
> - screen changes resolution and still works, also changing tty works
> - CPU load at 100%
> 
> 
> All the 3 situations are reproducible all the time

In one of the cpu load at 100% situations, could you do a

perf record -g -a

(as root), and attach the resulting perf.data file after like 30s of runtime, along with your kallsyms (which should include nouveau and drm* symbols... grabbing /proc/kallsyms from a running kernel should work).

Comment 13 patrick.clara 2014-03-28 17:18:11 UTC

Created attachment 96557 [details]
kallsyms with loaded nouveau

Comment 14 patrick.clara 2014-03-28 17:33:37 UTC

Created attachment 96560 [details]
perf.data.xz

Comment 15 Ilia Mirkin 2014-03-28 17:39:20 UTC

Hm, well either I'm an idiot or I need more stuff to see it properly... In either case, since I can't get it to work, would you mind running

perf report

And pasting in the first couple of entries?

Comment 16 patrick.clara 2014-03-28 18:00:27 UTC

Samples: 121K of event 'cycles', Event count (approx.): 71741963089
+  64.62%      kworker/1:1  [kernel.kallsyms]        [k] ioread32
+  18.47%      kworker/1:1  [kernel.kallsyms]        [k] native_read_tsc
+  13.71%      kworker/1:1  [kernel.kallsyms]        [k] delay_tsc
+   2.31%          swapper  [kernel.kallsyms]        [k] ioread32
+   0.37%          swapper  [kernel.kallsyms]        [k] intel_idle
+   0.05%      kworker/1:1  [nouveau]                [k] nv94_aux
+   0.02%          swapper  [kernel.kallsyms]        [k] cpuidle_enter_state
+   0.02%          swapper  [kernel.kallsyms]        [k] irq_entries_start
+   0.02%          swapper  [kernel.kallsyms]        [k] _raw_spin_lock
+   0.01%      kworker/1:1  [kernel.kallsyms]        [k] __const_udelay
+   0.01%          swapper  [nouveau]                [k] nve0_gpio_intr
+   0.01%          swapper  [kernel.kallsyms]        [k] menu_select
+   0.01%          swapper  [kernel.kallsyms]        [k] rcu_eqs_enter_common.isra.48
+   0.01%          swapper  [kernel.kallsyms]        [k] ktime_get
+   0.01%      kworker/1:1  [nouveau]                [k] dcb_gpio_table
+   0.01%      kworker/1:1  [nouveau]                [k] dcb_table
+   0.01%      kworker/1:1  [kernel.kallsyms]        [k] iowrite32
+   0.01%      kworker/1:1  [i2c_algo_bit]           [k] i2c_outb.isra.0
+   0.01%      kworker/1:1  [nouveau]                [k] nouveau_i2c_setscl
+   0.01%          swapper  [kernel.kallsyms]        [k] queue_work_on
+   0.01%          swapper  [kernel.kallsyms]        [k] native_write_msr_safe
+   0.01%          swapper  [kernel.kallsyms]        [k] cpu_startup_entry
+   0.00%      kworker/1:1  [nouveau]                [k] nv_printk_
+   0.00%      kworker/1:1  [kernel.kallsyms]        [k] native_write_msr_safe
+   0.00%      kworker/1:1  [nouveau]                [k] nouveau_bios_rd08
+   0.00%      kworker/1:1  [kernel.kallsyms]        [k] __udelay

Comment 17 Ilia Mirkin 2014-03-28 18:04:10 UTC

(In reply to comment #16)
> Samples: 121K of event 'cycles', Event count (approx.): 71741963089
> +  64.62%      kworker/1:1  [kernel.kallsyms]        [k] ioread32
> +  18.47%      kworker/1:1  [kernel.kallsyms]        [k] native_read_tsc
> +  13.71%      kworker/1:1  [kernel.kallsyms]        [k] delay_tsc
> +   2.31%          swapper  [kernel.kallsyms]        [k] ioread32

There was supposed to be a call graph... Perhaps you're using an older version of perf? Or perhaps it really wants you to pass in -g for report? (Or you didn't use -g when recording?)

Or maybe you're in the curses TUI and you have to select to expand? I forget how it works :(

Comment 18 patrick.clara 2014-03-28 21:51:44 UTC

Created attachment 96570 [details]
perf report

Comment 19 patrick.clara 2014-03-28 21:55:35 UTC

I don't have any idea how perf works.
I used -g and -a.
Version of iperf is 3.14 rc7.

I have attached the output of "perf report -g --stdio". Is this what you are looking for?

Comment 20 Ilia Mirkin 2014-03-28 21:58:32 UTC

(In reply to comment #18)
> Created attachment 96570 [details]
> perf report

Hmmm... this is also what I saw as well with your kallsyms. I assumed that I did something wrong -- normally there are deep call stacks, but perhaps there's something else going on. Oh well. I'll think some more on this, sorry for the trouble.

Comment 21 patrick.clara 2014-03-28 22:24:02 UTC

I tryed again with nouveau.debug=debug and drm.debug=0xe, but this time I added drm_kms_helper.poll=0.

The dmesg output is the same as without drm_kms_helper.poll=0.
Even these lines are still present:

[   85.722596] [drm:drm_helper_hpd_irq_event], [CONNECTOR:17:eDP-1] status updated from connected to connected
[   85.723677] [drm:drm_helper_hpd_irq_event], [CONNECTOR:21:HDMI-A-1] status updated from disconnected to disconnected

Comment 22 patrick.clara 2014-03-30 10:24:07 UTC

I think I have found something.

Commenting out the following line in nouveau/core/core/event.c:nouveau_event_init seems to resolve the problem. The screen output is working but I have no idea if it is going to break something else.

list_add_tail(&handler->head, &event->index[index].list);

Comment 23 patrick.clara 2014-03-30 11:15:41 UTC

I als saw that nouveau_connector_hotplug is called much more often than nouveau_connector_hotplug_work.

If I comment out everything inside nouveau_connector_hotplug_work does not reduce CPU load.

Removing
INIT_WORK(&nv_connector->hpd_work, nouveau_connector_hotplug_work);
from nouveau_connector_create reduces the load, but /proc/interrupts are still increasing very fast.

Comment 24 patrick.clara 2014-03-30 11:17:01 UTC

Created attachment 96609 [details]
/proc/interrupts without nouveau_connector_hotplug_work running

Comment 25 patrick.clara 2014-03-30 12:01:28 UTC

A dump_stack() from inside nouveau_connector_hotplug

CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  R     W  O 3.14.0-rc8+ #2
[ 3122.983526] Hardware name: ASUSTeK COMPUTER INC. UX51VZH/UX51VZH, BIOS UX51VZH.207 05/23/2013
[ 3122.983527]  ffff8802132d5400 ffffffff814b18f2 ffff8802136e9800 ffffffffa0814699
[ 3122.983529]  ffff88021176a280 ffffffffa078ad80 0000000000000012 0000000000020000
[ 3122.983531]  0000000000000001 0000000000000000 0000000000020000 ffffffffa07beb39
[ 3122.983532] Call Trace:
[ 3122.983533]  <IRQ>  [<ffffffff814b18f2>] ? dump_stack+0x41/0x51
[ 3122.983543]  [<ffffffffa0814699>] ? nouveau_connector_hotplug+0x9/0x30 [nouveau]
[ 3122.983548]  [<ffffffffa078ad80>] ? nouveau_event_trigger+0x60/0xb0 [nouveau]
[ 3122.983554]  [<ffffffffa07beb39>] ? nve0_gpio_intr+0xa9/0xe0 [nouveau]
[ 3122.983561]  [<ffffffffa07c2cf5>] ? nouveau_mc_intr+0x115/0x170 [nouveau]
[ 3122.983562]  [<ffffffff810afebd>] ? handle_irq_event_percpu+0x2d/0x1b0
[ 3122.983564]  [<ffffffff810b0073>] ? handle_irq_event+0x33/0x50
[ 3122.983566]  [<ffffffff810b27bd>] ? handle_edge_irq+0x7d/0x130
[ 3122.983568]  [<ffffffff810156a8>] ? handle_irq+0x18/0x30
[ 3122.983569]  [<ffffffff81014fc4>] ? do_IRQ+0x44/0xe0
[ 3122.983571]  [<ffffffff814b746d>] ? common_interrupt+0x6d/0x6d
[ 3122.983573]  <EOI>  [<ffffffff8139759d>] ? cpuidle_enter_state+0x4d/0xc0
[ 3122.983578]  [<ffffffff81397593>] ? cpuidle_enter_state+0x43/0xc0
[ 3122.983581]  [<ffffffff813976b9>] ? cpuidle_idle_call+0xa9/0x1d0
[ 3122.983583]  [<ffffffff8101c695>] ? arch_cpu_idle+0x5/0x30
[ 3122.983584]  [<ffffffff810af285>] ? cpu_startup_entry+0x95/0x230
[ 3122.983587]  [<ffffffff818c6efc>] ? start_kernel+0x41d/0x428
[ 3122.983589]  [<ffffffff818c6904>] ? repair_env_string+0x58/0x58
[ 3122.983592]  [<ffffffff818c6120>] ? early_idt_handlers+0x120/0x120
[ 3122.983594]  [<ffffffff818c671f>] ? x86_64_start_kernel+0x14d/0x15c

Comment 26 patrick.clara 2014-04-02 09:49:01 UTC

Created attachment 96773 [details]
dmesg with working kernel 3.8.rc6-5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96 (pdisp-trace vbios-trace)

Comment 27 patrick.clara 2014-04-02 09:50:16 UTC

Created attachment 96774 [details]
dmesg with broken kernel 3.8.rc6-0a0afd282fd715dd63d64b243299a64da14f8e8d (pdisp-trace vbios-trace)

Comment 28 patrick.clara 2014-04-02 13:57:27 UTC

I dont't know if it is relevant, but in the hope of providing you some useful information I saw that the interrupt storm starts when the last nvbios_exec(&init) in nouveau/core/engine/disp/dport.c:dp_link_train_init is called.

nvbios_exec(&init) is called with the initial offset of 0x5d7b and calls following opcodes with respective offsets:

0x6e 0x5d7b
0x6e 0x5d88
0x6e 0x5d95
0x6e 0x5da2
0x6e 0x5daf
0x7a 0x5dbc
0x7a 0x5dc5
0x7a 0x5dce
0x6e 0x5dd7
0x74 0x5de4
0x5b 0x5de7

Comment 29 Ilia Mirkin 2014-04-04 23:13:43 UTC

*** Bug 77073 has been marked as a duplicate of this bug. ***

Comment 30 Dale Hamel 2014-04-04 23:19:11 UTC

I recently generated this bug:

https://bugs.freedesktop.org/show_bug.cgi?id=77073

Which appears to only happen when external, displayport display is attached.

I'm going to try rolling back to kernel 3.8 to see if that resolves it.

I attached a bunch of debug info to my ticket which I hope will be useful. The main thing of note is that the kworker is doing:



     kworker/0:3-1128 [000] d.s2 1973.688813: workqueue_queue_work: work struct=ffff88046f2135d8 function=od_dbs_timer [cpufreq_ondemand] workqueue=ffff88046d17ba00 req_cpu=0 cpu=0
     kworker/0:3-1128 [000] d.h2 1973.704711: workqueue_queue_work: work struct=ffffffff819b85e0 function=push_to_pool workqueue=ffff88046d17ba00 req_cpu=32 cpu=0
     kworker/0:3-1128 [000] d.h3 1973.757128: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0
     kworker/0:3-1128 [000] d.h3 1973.830281: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0
     kworker/0:3-1128 [000] d.s3 1973.838757: workqueue_queue_work: work struct=ffff88046f2135d8 function=od_dbs_timer [cpufreq_ondemand] workqueue=ffff88046d17ba00 req_cpu=0 cpu=0
     kworker/0:3-1128 [000] d.h2 1973.859889: workqueue_queue_work: work struct=ffffffff819b85e0 function=push_to_pool workqueue=ffff88046d17ba00 req_cpu=32 cpu=0
     kworker/0:3-1128 [000] d.h3 1973.903514: workqueue_queue_work: work struct=ffff88045db116e0 function=nouveau_connector_hotplug_work workqueue=ffff88046d17ba00 req_cpu=32 cpu=0



Over and over again.

Comment 31 Dale Hamel 2014-04-05 00:19:41 UTC

(In reply to comment #22)
> I think I have found something.
> 
> Commenting out the following line in
> nouveau/core/core/event.c:nouveau_event_init seems to resolve the problem.
> The screen output is working but I have no idea if it is going to break
> something else.
> 
> list_add_tail(&handler->head, &event->index[index].list);

This also allows drawing for me too, at least on the console.

I cannot start Xorg server with this though, and I'm having a hard time getting the log.

Comment 32 Dale Hamel 2014-04-05 00:20:47 UTC

I can confirm that v3.8-rc6 ( a few commits before the bad commit ) works fine for me.

(In reply to comment #0)
> Loading the nouveau driver leads to a kworker constantly using  100% of a
> CPU core.
> 
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 211 root      20   0       0      0      0 R 100.0  0.0   2:51.00 kworker/1:1
> 
> Powertop suggests the problem comes from nouveau_connector_hotplug_work.
> 
> I noticed this problem first occurred in kernel 3.9 and still persists in
> 3.14 rc8. In kernel 3.8 the problem did not occur. Some time ago, trying to
> figure out the cause, i also tried a kernel bisect.
> 
> 
> git bisect start
> # bad: [c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1] Linux 3.9
> git bisect bad c1be5a5b1b355d40e6cf79cc979eb66dafa24ad1
> # good: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8
> git bisect good 19f949f52599ba7c3f67a5897ac6be14bfcb1200
> # good: [d778df51c09264076fe0208c099ef7d428f21790] mm: vmscan: save work
> scanning (almost) empty LRU lists
> git bisect good d778df51c09264076fe0208c099ef7d428f21790
> # bad: [ee89f81252179dcbf6cd65bd48299f5e52292d88] Merge branch
> 'for-3.9/core' of git://git.kernel.dk/linux-block
> git bisect bad ee89f81252179dcbf6cd65bd48299f5e52292d88
> # good: [69086a78bdc973ec0b722be790b146e84ba8a8c4] Merge branch 'for-linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good 69086a78bdc973ec0b722be790b146e84ba8a8c4
> # bad: [ed5dc2372dba46e0ecd08791b1a0399d313e5cff] Merge tag
> 'mmc-updates-for-3.9-rc1' of git://
> git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
> git bisect bad ed5dc2372dba46e0ecd08791b1a0399d313e5cff
> # bad: [1f3a574a4bfe86ebf7d51fac37e0668397372fd8] Merge branch
> 'drm-nouveau-next' of
> git://anongit.freedesktop.org/git/nouveau/linux-2.6into drm-next
> git bisect bad 1f3a574a4bfe86ebf7d51fac37e0668397372fd8
> # good: [cd17ef4114ad5c514b17e6a0bb02a309ab90b692] Merge tag
> 'drm-intel-next-2013-02-01' of git://
> people.freedesktop.org/~danvet/drm-intel into drm-next
> git bisect good cd17ef4114ad5c514b17e6a0bb02a309ab90b692
> # good: [210561ffd72d00eccf12c0131b8024d5436bae95] intel/iommu: force
> writebuffer-flush quirk on Gen 4 Chipsets
> git bisect good 210561ffd72d00eccf12c0131b8024d5436bae95
> # good: [0f0800661a125ddb038462570c869fe6f8ab5737] drm/nouveau/gpio: pass
> number of on-die gpio lines to base
> git bisect good 0f0800661a125ddb038462570c869fe6f8ab5737
> # good: [35f8badc1cf652381fa3f82c1fbea39f4dbe87fd] drm: Don't set the
> plane->fb to NULL on successfull set_plane
> git bisect good 35f8badc1cf652381fa3f82c1fbea39f4dbe87fd
> # good: [31a34aa421032cfe3b2b892c929e7539e747a7ac] drm/nouveau/i2c: aux
> channels not necessarily on nvio
> git bisect good 31a34aa421032cfe3b2b892c929e7539e747a7ac
> # bad: [0a0afd282fd715dd63d64b243299a64da14f8e8d] drm/nv50-/disp: move DP
> link training to core and train from supervisor
> git bisect bad 0a0afd282fd715dd63d64b243299a64da14f8e8d
> # good: [ac8cc241a81941932da44993242e68c62e115ec7] drm/nv50/devinit:
> reverse the logic for running encoder init scripts
> git bisect good ac8cc241a81941932da44993242e68c62e115ec7
> # good: [7dcd060c0e6e4f76ce7eea94c66437ce8f546530] drm/nouveau/i2c: create
> proper chipset-specific class implementations
> git bisect good 7dcd060c0e6e4f76ce7eea94c66437ce8f546530
> # good: [5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96] drm/nv50-/disp: handle
> supervisor tasks from workqueue
> git bisect good 5cc027f6b1ec651c18a4322ed3e30c6e9cf01e96
> # first bad commit: [0a0afd282fd715dd63d64b243299a64da14f8e8d]
> drm/nv50-/disp: move DP link training to core and train from supervisor
> 
> 
> 
> 
> 0a0afd282fd715dd63d64b243299a64da14f8e8d is the first bad commit
> commit 0a0afd282fd715dd63d64b243299a64da14f8e8d
> Author: Ben Skeggs <bskeggs@redhat.com>
> Date:   Mon Feb 18 23:17:53 2013 -0500
> 
>     drm/nv50-/disp: move DP link training to core and train from supervisor
> 
>     We need to be able to do link training for PIOR-connected ANX9805 from
>     the third supervisor handler (due to script ordering in the bios, can't
>     have the "user" call train because some settings are overwritten from
>     the modesetting bios scripts).
> 
>     This moves link training for SOR-connected DP encoders to the second
>     supervisor interrupt, *before* we call the modesetting scripts (yes,
>     different ordering from PIOR is necessary).  This is useful since we
>     should now be able to remove some hacks to workaround races between
>     the supervisor and link training paths.
> 
>     Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
> 
> :040000 040000 8f2f2d8ea35eae1b6e6ff981761be02a8818bac3
> 2994eb54c689056254c241fe78a3af8d0b61d1df M    drivers

Comment 33 Dale Hamel 2014-04-05 00:22:48 UTC

@Ilia Mirkin

Is there anything else I can supply that will be helpful for debugging?

Comment 34 patrick.clara 2014-04-06 10:36:19 UTC

I have found an other strange behavior. I have used the latest kernel from the torvalds git. With 3.14-rc8 it didn't happen.

Loading nouveau the first time results in the usual situation.
I unload it with "echo 0 | sudo tee /sys/class/vtconsole/vtcon1/bind" and "sudo rmmod nouveau".
I load it again and the CPU is idle and interrupts are normal.
Screen and ttys seem to work as they should.

I cant start X (either the first time I load nouveau nor the second time) which says
Fatal server error:
[    70.848] (EE) no screens found(EE)
But this is probably becouse I broke it with some updates.


For me at least this is reproducible every time.

I will attach both logiles with debug=PDISP=trace,VBIOS=trace of the first and second load.

I noticed the following line was present only during the first load:
nouveau  [     DRM] unplugged eDP-1

This was present only on the second load:
nouveau  [ DEVINIT][0000:01:00.0] adaptor not initialised

I noticed that during the second load, the vbios is somehow executing differently. After executing SUB_DIRECT opcodes, sometines it jums, sometimes not. Maybe becouse it is initializing the gpu on the second load?

Probably there are more differences I did not notice.


@ Dale Hamel 
If you have time, could you please try if you can reproduce it?

Comment 35 patrick.clara 2014-04-06 10:38:17 UTC

Created attachment 96971 [details]
dmesg on 3.14+, first load

Comment 36 patrick.clara 2014-04-06 10:38:58 UTC

Created attachment 96972 [details]
dmesg on 3.14+, second load

Comment 37 patrick.clara 2014-04-08 21:20:36 UTC

I think I should add that it happens only if I boot in pure EFI mode. If I boot in EFI mode with CSM enabled, on the second load it behaves as usual.

I tried loading nouveau at boot with "config=NvForcePost=1". This seems to workaround the problem for now. Also X starts corectly.

Will do some more testing...

Comment 38 Dale Hamel 2014-04-09 23:54:32 UTC

(In reply to comment #37)
> I think I should add that it happens only if I boot in pure EFI mode. If I
> boot in EFI mode with CSM enabled, on the second load it behaves as usual.
> 
> I tried loading nouveau at boot with "config=NvForcePost=1". This seems to
> workaround the problem for now. Also X starts corectly.
> 
> Will do some more testing...

Are you booting on a Macbook with something like rEFIt or rEFInd? How are you booting pure CSM if so? I'd like to try to duplicate this.

Comment 39 patrick.clara 2014-04-10 08:37:09 UTC

(In reply to comment #38)
> (In reply to comment #37)
> > I think I should add that it happens only if I boot in pure EFI mode. If I
> > boot in EFI mode with CSM enabled, on the second load it behaves as usual.
> > 
> > I tried loading nouveau at boot with "config=NvForcePost=1". This seems to
> > workaround the problem for now. Also X starts corectly.
> > 
> > Will do some more testing...
> 
> Are you booting on a Macbook with something like rEFIt or rEFInd? How are
> you booting pure CSM if so? I'd like to try to duplicate this.

I have an Asus Zenbook UX51VZ and using simply GRUB (The EFI version obviously), no rEFIt or rEFInd.

Maybe I could not explain it clear enough.
If I boot in pure EFI mode, unloading and loading nouveau again, makes nouveau think the gpu is not initialized, or did not do the post. (Which I think the bios should do, but not sure). Therefore, as you can see by comparing the two last dmesgs i attached, nouveau is executing a big chunk of the vbios. After doing so CPU load is normal and everythink seems to work as it should, at least for now.

In the other case, where loading, unloading and loading again doesn't make any difference, i did not boot in pure CSM mode. I booted in EFI mode but with CSM enabled. These things actually go beyond my knowledge, but as far I understand, enabling CSM while still booting in EFI mode, enables legacy functionality like PCIROM, VGA setup and things like that.

I now tried als the pure CSM boot but there is no difference between pure CSM and EFI with CSM enabled.


In any case the interesting thing in the dmesg of the second load in pure EFI is this line:
nouveau  [ DEVINIT][0000:01:00.0] adaptor not initialised 

The behavior of nouveau, if it thinks the gpu is not initialized can be forced by passing this parameter to nouveau: "config=NvForcePost=1".
Either in grub with "nouveau.config=NvForcePost=1" or in /etc/modprobe/...

So probably it would be more interesting trying this than to replicate the strange behavior of unloading and loading again.

"config=NvForcePost=1", for me, works in pure EFI, EFI with CSM and also in pure CSM.



One last thing:
I suppose that my system is newer running the init scripts in vbios during POST, but I think there is more. As this patch describes (http://www.spinics.net/lists/linux-pci/msg29828.html), the bios should also set up VGA (whatever it means).
On my system, booting in pure EFI "cat /sys/class/drm/card0/device/boot_vga" prints 0 and therefore the X server doesn't find my gpu and fails to start.
If CSM is enabled, and also in pure CSM mode, it prints 1 as it should.

Just for my curiosity, could you please check how your machine behaves regarding this boot_vga?

Comment 40 reedlaw 2014-04-22 15:27:12 UTC

I get the same problem with a kworker pegging the CPU as soon as I plug in an external HDMI monitor. I'm using kernel 3.14.1-1. I'm on Arch Linux using the xf86-video-nouveau 1.0.10-2 package.

Comment 41 Ilia Mirkin 2014-06-11 23:28:23 UTC

DP has had a substantial rework targeted at 3.16. Can you guys give http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next a shot and see if it avoids the hpd storm?

Comment 42 Dale Hamel 2014-06-11 23:50:34 UTC

Absolutely, i'm building right now and i'll report back tomorrow once I can test on a display port monitor.

Comment 43 Dale Hamel 2014-06-12 14:27:08 UTC

So, the new kernel linked above definitely fixes the problem.

Hotplug now works flawlessly!

Unfortunately it introduces a fairly minor bug with X11 that seems to cause my modelines to be invalid. I'm running on a retina macbook pro, and I normally downscale the resolution by 50% so that i can actually read text on it, but my xrandr lines in xinitrc now cause it to hang:

xrandr --newmode  "1440x900_60.00"  106.50  1440 1528 1672 1904  900 903 909 934 -hsync +vsync
xrandr --addmode $monitor 1440x900_60.00
xrandr --output $monitor --mode 1440x900_60.00

I'll check to see if these modelines are just invalid, or if this is just some bug related to xrandr. It's clearly got something to do with the kernel though, since it's no longer working.

Great work to all the devs who refactored this! One step closer to converting my office to use linux on their MBPs :)

Comment 44 Dale Hamel 2014-06-12 14:27:40 UTC

So, the new kernel linked above definitely fixes the problem.

Hotplug now works flawlessly!

Unfortunately it introduces a fairly minor bug with X11 that seems to cause my modelines to be invalid. I'm running on a retina macbook pro, and I normally downscale the resolution by 50% so that i can actually read text on it, but my xrandr lines in xinitrc now cause it to hang:

xrandr --newmode  "1440x900_60.00"  106.50  1440 1528 1672 1904  900 903 909 934 -hsync +vsync
xrandr --addmode $monitor 1440x900_60.00
xrandr --output $monitor --mode 1440x900_60.00

I'll check to see if these modelines are just invalid, or if this is just some bug related to xrandr. It's clearly got something to do with the kernel though, since it's no longer working.

Great work to all the devs who refactored this! One step closer to converting my office to use linux on their MBPs :)

Comment 45 Dale Hamel 2014-06-12 14:28:37 UTC

Sorry, somehow my last comment was double posted.

Comment 46 Martin Peres 2014-06-12 14:31:18 UTC

(In reply to comment #45)
> Sorry, somehow my last comment was double posted.

(In reply to comment #44)
> So, the new kernel linked above definitely fixes the problem.
> 
> Hotplug now works flawlessly!

Thanks for testing, I'm closing the bug :)

> Unfortunately it introduces a fairly minor bug with X11 that seems to cause
> my modelines to be invalid.

Please fill out another bug report, otherwise, we cannot track bugs easily!

Comment 47 Dale Hamel 2014-06-12 14:41:48 UTC

A did a bit more testing, and I should say that it actually works with a caveat - this is only for display port monitors, thunderbolt monitors (which uses the same connector) don't seem to be recognized at all now, which is a regression.

But that's another bug, the 100% kworker thing is definitely fixed.

Comment 48 Ben Skeggs 2014-06-13 00:03:39 UTC

(In reply to comment #47)
> A did a bit more testing, and I should say that it actually works with a
> caveat - this is only for display port monitors, thunderbolt monitors (which
> uses the same connector) don't seem to be recognized at all now, which is a
> regression.
A "thunderbolt" monitor should be no different.  When you file a bug for it, can you attach debug logs (nouveau.debug=PDISP=trace,VBIOS=trace,I2C=trace log_buf_len=1M) of it working before, and not working now?

Thanks!

> 
> But that's another bug, the 100% kworker thing is definitely fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.