Bug 90482 - Xorg take 100% CPU when using multiple independent screen configuration
Summary: Xorg take 100% CPU when using multiple independent screen configuration
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-16 15:14 UTC by dura
Modified: 2016-10-24 09:41 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
my xorg.conf (2.99 KB, text/plain)
2015-05-16 15:14 UTC, dura
no flags Details
quick and dirty fix (1.28 KB, patch)
2015-05-16 15:15 UTC, dura
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description dura 2015-05-16 15:14:07 UTC
Created attachment 115835 [details]
my xorg.conf

My xorg setup is based on two independent screen (see attached xorg.conf). Most of the time the Xorg process eat 100% of one CPU core.

I will try to explain what I think is the problem and the fix (or workaround) I made to the nouveau code.

Basically, the problem is that nouveau add a socket to the xorg socket list but does not register a handler that will be able to deal with it. When data is received on this socket the Select call in xorg-server/os/WaitFor.c will return but the data will not be read. When Select is called again it returns immediately because data was not read, and again, and again, resulting in 100% CPU core consumption.

Now, some details (based on xf86-video-nouveau-1.0.11 source code)
To add a socket in the xorg server list AddGeneralSocket is used.
In src/drmmode_display.c AddGeneralSocket is called in two places: line 1605 (via drmmode_uevent_init and line 1554) and 1610. If line 1608 condition is not meet, a socket is added without an handler so drmmode_wakeup_handler will never be called with correct data.

My solution is to add a new handler that will only handle the udev part when line 1608 condition is not meet. See attached patch.
Comment 1 dura 2015-05-16 15:15:58 UTC
Created attachment 115836 [details] [review]
quick and dirty fix
Comment 2 Matti Nykyri 2015-05-16 20:42:16 UTC
I also have two screen setup and am facing the same problem since beginning of April.

Strace shows the KOBJENT NETLINK socket in Xorg returning immediately from select. The host gets around 9000 interupts / sec:

cat /proc/interupts
           CPU0       CPU1       CPU2       CPU3       
  0:        151          0          0          0   IO-APIC-edge      timer
  1:          1          0          0          0   IO-APIC-edge      i8042
  8:         20          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          1          0          0          0   IO-APIC-edge      i8042
 16:       8898          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
 17:       1268          0          0          0   IO-APIC  17-fasteoi   snd_hda_intel
 18:          0          0          0          0   IO-APIC  18-fasteoi   0000:03:00.0, i801_smbus
 19:      70297          0          0          0   IO-APIC  19-fasteoi   pata_jmicron
 23:         33          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:    6912419          0          0          0  HPET_MSI-edge      hpet2
 25:          0    7491688          0          0  HPET_MSI-edge      hpet3
 26:          0          0    4552335          0  HPET_MSI-edge      hpet4
 27:          0          0          0    8398350  HPET_MSI-edge      hpet5
 34:     155556          0          0          0   PCI-MSI-edge      nouveau
 35:      37000          0          0          0   PCI-MSI-edge      0000:00:1f.2
 36:          0          0          0          0   PCI-MSI-edge      0000:09:00.0
 37:     120172          0          0          0   PCI-MSI-edge      enp5s0
 38:        325          0          0          0   PCI-MSI-edge      snd_hda_intel
NMI:      76929      95741      80757      78650   Non-maskable interrupts
LOC:          0        125        121        118   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:      76928      95740      80756      78649   Performance monitoring interrupts
IWI:         11         10         11          5   IRQ work interrupts
RTR:          5          0          0          0   APIC ICR read retries
RES:      83545      76436      81720      73802   Rescheduling interrupts
CAL:        142        146        161         94   Function call interrupts
TLB:        276        150        213        227   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:        251        251        251        251   Machine check polls
ERR:          0
MIS:          0

Kernel function tracer:
cat /sys/kernel/debug/tracing/trace
               X-2956  [000] .... 74409.277687: __fget_light <-__fdget
             zsh-6138  [002] d... 74409.277688: __mod_zone_page_state <-get_page_from_freelist
               X-2956  [000] .... 74409.277688: sock_poll <-do_select
             zsh-6138  [002] d... 74409.277688: zone_statistics <-get_page_from_freelist
               X-2956  [000] .... 74409.277688: unix_poll <-sock_poll
             zsh-6138  [002] d... 74409.277688: __inc_zone_state <-zone_statistics
               X-2956  [000] .... 74409.277688: __fdget <-do_select
             zsh-6138  [002] d... 74409.277688: __inc_zone_state <-zone_statistics
               X-2956  [000] .... 74409.277688: __fget_light <-__fdget
               X-2956  [000] .... 74409.277688: sock_poll <-do_select
               X-2956  [000] .... 74409.277688: unix_poll <-sock_poll
               X-2956  [000] .... 74409.277688: __fdget <-do_select
               X-2956  [000] .... 74409.277688: __fget_light <-__fdget
               X-2956  [000] .... 74409.277688: sock_poll <-do_select
               X-2956  [000] .... 74409.277688: unix_poll <-sock_poll
               X-2956  [000] .... 74409.277688: __fdget <-do_select
               X-2956  [000] .... 74409.277688: __fget_light <-__fdget
               X-2956  [000] .... 74409.277689: sock_poll <-do_select
               X-2956  [000] .... 74409.277689: unix_poll <-sock_poll
               X-2956  [000] .... 74409.277689: __fdget <-do_select
               X-2956  [000] .... 74409.277689: __fget_light <-__fdget
               X-2956  [000] .... 74409.277689: sock_poll <-do_select
               X-2956  [000] .... 74409.277689: unix_poll <-sock_poll
               X-2956  [000] .... 74409.277689: __fdget <-do_select
               X-2956  [000] .... 74409.277689: __fget_light <-__fdget
               X-2956  [000] .... 74409.277689: sock_poll <-do_select
               X-2956  [000] .... 74409.277689: unix_poll <-sock_poll
             zsh-6138  [002] .... 74409.277689: kmemleak_alloc <-kmalloc_order
               X-2956  [000] .... 74409.277689: __fdget <-do_select
               X-2956  [000] .... 74409.277690: __fget_light <-__fdget
               X-2956  [000] .... 74409.277690: sock_poll <-do_select

Perf top:
   PerfTop:    9602 irqs/sec  kernel:86.0%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------------

    19.24%  [kernel]                 [k] do_select                  
    17.08%  [kernel]                 [k] __fget_light               
    13.34%  [kernel]                 [k] unix_poll                  
    13.18%  [kernel]                 [k] sock_poll                  
     3.00%  [kernel]                 [k] do_raw_spin_lock           
     1.81%  [kernel]                 [k] _raw_spin_unlock_irqrestore
     1.55%  Xorg                     [.] WaitForSomething           
     1.45%  [kernel]                 [k] __fdget                    
     1.38%  [kernel]                 [k] datagram_poll              
     1.37%  [kernel]                 [k] fput                       
     1.37%  [kernel]                 [k] _raw_spin_lock_irqsave     
     1.37%  [kernel]                 [k] __pollwait                 
     1.31%  [vdso]                   [.] __vdso_clock_gettime       
     1.23%  [kernel]                 [k] memset                     
     0.76%  libdrm_nouveau.so.2.0.0  [.] pushbuf_flush              
     0.76%  libdrm_nouveau.so.2.0.0  [.] pushbuf_kref               
     0.72%  nouveau_drv.so           [.] drmmode_wakeup_handler     
     0.72%  [kernel]                 [k] copy_user_generic_string   
     0.64%  [kernel]                 [k] do_raw_spin_unlock         
     0.61%  [kernel]                 [k] hpet_msi_next_event        
     0.57%  libdrm_nouveau.so.2.0.0  [.] nouveau_bo_ref             
     0.55%  [kernel]                 [k] system_call                
     0.54%  [kernel]                 [k] core_sys_select            
     0.51%  [kernel]                 [k] select_estimate_accuracy   
     0.45%  nouveau_drv.so           [.] NVBlockHandler             
     0.44%  Xorg                     [.] GetTimeInMillis            
     0.43%  [kernel]                 [k] poll_freewait              
     0.43%  [kernel]                 [k] ktime_get_ts64             
     0.42%  [kernel]                 [k] remove_wait_queue          
     0.41%  Xorg                     [.] IdleTimeWakeupHandler      
     0.40%  [kernel]                 [k] add_wait_queue             
     0.39%  libexa.so                [.] ExaBlockHandler            
     0.37%  Xorg                     [.] BlockHandler               
     0.35%  Xorg                     [.] WakeupHandler              
     0.32%  Xorg                     [.] xf86Wakeup                 
     0.31%  [kernel]                 [k] nouveau_barobj_wr32        

The bug activates when display of a screen is turned on. Restarting X fixes the issue until a display is cycled.
Comment 3 Matti Nykyri 2015-05-16 20:53:05 UTC
and strace Xorg(PID 2954):
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 56000}) = 1 (in [12], left {264, 55981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 56000}) = 1 (in [12], left {264, 55981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 55000}) = 1 (in [12], left {264, 54982})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 55000}) = 1 (in [12], left {264, 54981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 55000}) = 1 (in [12], left {264, 54981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 54000}) = 1 (in [12], left {264, 53982})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 54000}) = 1 (in [12], left {264, 53982})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 54000}) = 1 (in [12], left {264, 53981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 53000}) = 1 (in [12], left {264, 52981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 53000}) = 1 (in [12], left {264, 52981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 53000}) = 1 (in [12], left {264, 52981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 52000}) = 1 (in [12], left {264, 51981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 52000}) = 1 (in [12], left {264, 51981})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 51000}) = 1 (in [12], left {264, 50982})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 51000}) = 1 (in [12], left {264, 50983})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 51000}) = 1 (in [12], left {264, 50983})
2954  select(256, [1 3 5 10 11 12 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81], NULL, NULL, {264, 51000}) = 1 (in [12], left {264, 50983})

cat /proc/2954/fd:
0 l-wx------ 1 root root 64 May 16 23:47 0 -> /var/log/Xorg.0.log
0 lrwx------ 1 root root 64 May 16 23:47 1 -> socket:[12196]
0 lrwx------ 1 root root 64 May 16 23:47 10 -> /dev/dri/card0
0 lrwx------ 1 root root 64 May 16 23:47 11 -> socket:[1845]
0 lrwx------ 1 root root 64 May 16 23:47 12 -> socket:[1858]
0 lrwx------ 1 root root 64 May 16 23:47 13 -> /dev/dri/card0
0 lrwx------ 1 root root 64 May 16 23:47 14 -> /dev/input/event1
0 lrwx------ 1 root root 64 May 16 23:47 15 -> /dev/input/event0
0 lrwx------ 1 root root 64 May 16 23:47 16 -> /dev/input/event2
0 lrwx------ 1 root root 64 May 16 23:47 17 -> /dev/input/event3
0 lrwx------ 1 root root 64 May 16 23:47 18 -> socket:[2848]
....etc

lsof -p 2954:
X       2954 root    0w      REG                8,2   164675     262153 /var/log/Xorg.0.log
X       2954 root    1u     unix 0xffff880214864840      0t0      12196 @/tmp/.X11-unix/X0
X       2954 root    2w      REG                8,2    56795     262596 /var/log/kdm.log
X       2954 root    3u     unix 0xffff880214864c80      0t0      12197 /tmp/.X11-unix/X0
X       2954 root    4r      REG                8,2    31614     524293 /usr/lib64/xorg/protocol.txt
X       2954 root    5u  netlink                         0t0       2823 KOBJECT_UEVENT
X       2954 root    6w      REG                0,3        0 4026531961 /proc/mtrr
X       2954 root    7w      REG                0,3        0 4026531961 /proc/mtrr
X       2954 root    8u      CHR                4,7      0t0       2080 /dev/tty7
X       2954 root    9u      CHR              10,63      0t0       1026 /dev/vga_arbiter
X       2954 root   10u      CHR              226,0      0t0       1044 /dev/dri/card0
X       2954 root   11u     sock                0,6      0t0       1845 protocol: NETLINK
X       2954 root   12u  netlink                         0t0       1858 KOBJECT_UEVENT
X       2954 root   13u      CHR              226,0      0t0       1044 /dev/dri/card0
X       2954 root   14u      CHR              13,65      0t0       1082 /dev/input/event1
X       2954 root   15u      CHR              13,64      0t0      10912 /dev/input/event0
X       2954 root   16u      CHR              13,66      0t0       2447 /dev/input/event2
X       2954 root   17u      CHR              13,67      0t0       1323 /dev/input/event3
X       2954 root   18u     unix 0xffff8800d9839100      0t0       2848 @/tmp/.X11-unix/X0
Comment 4 Mariusz Białończyk 2016-10-24 09:41:04 UTC
@dura
Thank you for your quick and dirty fix!
I am also suffering the same problem (reported here btw: https://bugs.freedesktop.org/show_bug.cgi?id=98383).

In case when my xinerama is disabled, your patch is working great. No more 100% cpu usage!
But when it is disabled and I power up my TV, then I have still have server crash, but that could be another story (maybe related?).

What is the problem to merge this patch upstream?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.