Bug 32647 - [945GM] Kworker 300+ wakeups per second after a while
Summary: [945GM] Kworker 300+ wakeups per second after a while
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-25 01:54 UTC by Antonio Orefice
Modified: 2011-08-09 09:33 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
while true ; do powertop -d -t 5|grep kworker ; echo "--" ; done #output (22.43 KB, text/plain)
2010-12-25 01:54 UTC, Antonio Orefice
no flags Details
perf top output while kworker wakes up 300 times per second (3.12 KB, text/plain)
2010-12-25 14:17 UTC, Antonio Orefice
no flags Details
perf top output while kworker wakes up 20 times per second (3.12 KB, text/plain)
2010-12-25 14:19 UTC, Antonio Orefice
no flags Details
perf top output while kworker wakes up 20 times per second (3.09 KB, text/plain)
2010-12-25 14:27 UTC, Antonio Orefice
no flags Details
strace X, perf record and perf timechart record output on 300wakes/sec (535.33 KB, application/x-gzip)
2010-12-26 02:23 UTC, Antonio Orefice
no flags Details
strace X, perf record and perf timechart record output on 20..40 wakes/sec (456.27 KB, application/x-gzip)
2010-12-26 02:24 UTC, Antonio Orefice
no flags Details
perf report -i 300.perf.data-perf_record_f_g_a.txt -g fr,5 | head 500 > perf-300.txt (27.08 KB, text/plain)
2010-12-26 03:08 UTC, Antonio Orefice
no flags Details

Description Antonio Orefice 2010-12-25 01:54:51 UTC
Created attachment 41436 [details]
while true ; do  powertop -d -t 5|grep kworker ; echo "--" ; done #output

Chipset: 945GM
Kernel version: 2.6.36.2
Arch: i686
xorg-server: 1.9.2
mesa / intel-dri: 7.9.0.git20101207
xf86-video-intel: 2.13.0
libdrm version: 2.4.22

Linux distribution: Arch linux
Machine model: Asus 1005HA
Display Connector: LVDS

Reproducible: Always


Step to reproduce:
-------------------
1 start powertop in a terminal
2 observe kworker wakeups per seconds (hopefully you have low values)
3 browse, read some documents... use the pc for a while
4 observe kworker wakeups per seconds (it goes and stays to until 300,xx wakeups per second)
5 kill the processes one by one and see if the problem goes away
6 see that the only way to lower kworker wakeups is to kill/restart X 

The time needed to reproduce the bug appears to be random, sometimes it happens even after 10..15minutes of use, it goes to 300 and stays there.

I've attached the output of this command:
while true ; do  powertop -d -t 5|grep kworker ; echo "--" ; done

As you can see, after about 170 iterations, there's another kworker thread:
45,3% (300,2)   kworker/0:2

I have to say that i strongly use suspend to disk feature, maybe it is related(?)
Comment 1 Chris Wilson 2010-12-25 02:02:40 UTC
A perf top or record would be useful to see what's spinning inside the worker thread. Watching gem_objects, polling intel_gpu_dump, would give some clues if there is continuing rendering. But perf is probably the best source of information here.
Comment 2 Antonio Orefice 2010-12-25 14:17:59 UTC
Created attachment 41445 [details]
perf top output while kworker wakes up 300 times per second
Comment 3 Antonio Orefice 2010-12-25 14:19:06 UTC
Created attachment 41446 [details]
perf top output while kworker wakes up 20 times per second
Comment 4 Antonio Orefice 2010-12-25 14:22:23 UTC
I launched perf top and pasted the output while kworker was at 300wakes/s, next, i killed and started X again and kworker was at about 20wakes/s and pasted the output of perf top in another attachment.

I never used perf before, if you need more info, i will try to provide them,
thanks.
Comment 5 Antonio Orefice 2010-12-25 14:27:34 UTC
Created attachment 41449 [details]
perf top output while kworker wakes up 20 times per second

wrong file
Comment 6 Chris Wilson 2010-12-25 15:02:23 UTC
Judging by that X is frequently polling during the 300 wakeup/s (note the amount of time spent in read_hpet which I associate with poll()). Doesn't seem -intel driver related at all.

strace -p `pidof X` during the high frequency would confirm that and may give another clue. But it doesn't explain why powertop associated the wakeups with kworker/0. A 'perf record -f -g -a' so that the callgraphs are captured as well may also help. Similar 'perf timechart' might help as that is geared towards trying to figure out relationships between context switches and wakeups.
Comment 7 Antonio Orefice 2010-12-26 02:23:40 UTC
Created attachment 41453 [details]
strace X, perf record and perf timechart record output on 300wakes/sec
Comment 8 Antonio Orefice 2010-12-26 02:24:34 UTC
Created attachment 41454 [details]
strace X, perf record and perf timechart record output on 20..40 wakes/sec
Comment 9 Antonio Orefice 2010-12-26 02:27:04 UTC
Made two attachments (one when the wakes per seconds are around 20..40 and the other when they are around 300)with outputs from:
strace -p `pidof X`
timechart record
perf record -f -g -a
Comment 10 Chris Wilson 2010-12-26 02:44:37 UTC
Ah, you need the local debug files to translate the perf *record into something meaningful. Can you run 'perf report -i 300.perf.data-perf_record_f_g_a.txt -g fr,5 | head 500 > perf-300.txt'

The strace suggests a lot of ioctl activity (which of course is impenetrable to strace) but

  ioctl(9, 0xc010645b, 0xbfa73fd0)        = 0
  ioctl(9, 0xc0106464, 0xbfa74054)        = 0
  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0x1259a) = 0xb6609000
  ioctl(9, 0x400c645f, 0xbfa74064)        = 0 
  ioctl(9, 0xc010645b, 0xbfa73fd0)        = 0

is indicative of our driver (and fallback rendering to boot, or if you are lucky pixmap upload). So it does look like our driver is active during the 300 wakeup/s and also that it is rendering at that time.
Comment 11 Antonio Orefice 2010-12-26 03:08:18 UTC
Created attachment 41455 [details]
perf report -i 300.perf.data-perf_record_f_g_a.txt -g fr,5 | head 500 > perf-300.txt

Output from:
perf report -i 300.perf.data-perf_record_f_g_a.txt -g fr,5 | head 500 > perf-300.txt
Comment 12 Chris Wilson 2010-12-26 05:25:00 UTC
This is extremely odd:

24.01%           lspci  libc-2.12.1.so        [.] 0x740af

(with further lspci symbols, i.e. lspci is running and hooging the cpu which is very odd).

The cpu time for kthread/0 is actually spent sending the timer/scheduler tick. So that would seem to just be context switches.

Did you think of just checking top to see what was running at that time? I've no idea what to make of that lspci is running.
Comment 13 Antonio Orefice 2010-12-26 08:21:46 UTC
(In reply to comment #12)
> This is extremely odd:
> 
> 24.01%           lspci  libc-2.12.1.so        [.] 0x740af
> 
> (with further lspci symbols, i.e. lspci is running and hooging the cpu which is
> very odd).
> 
> The cpu time for kthread/0 is actually spent sending the timer/scheduler tick.
> So that would seem to just be context switches.
> 
> Did you think of just checking top to see what was running at that time? I've
> no idea what to make of that lspci is running.

that lspci is just a case, i have powertop in background which calls lspci periodically, that's not the problem.

But down in the tree there's swapper->read_hpet
Comment 14 Antonio Orefice 2010-12-26 08:33:10 UTC
Seems that somehow suspend to disk -> resume (tried both tuxonice and uswsusp methods) is related and necessary for the bug to be reproduced.
Now this netbook is up from a complete boot for about 30 minutes and kworker is still at 30W/s.
Comment 15 Antonio Orefice 2010-12-29 10:32:29 UTC
Switching to new powertop from git, i've more information about this.
it shows this at first place:

Usage       Events/s    Category       Description
2,0 ms/s    289,5       kWork          flush_to_ldisc

As i understood, flush_to_ldisc() has something to do with ttys, i'm not sure anymore this issue is related to Xorg/intel, but still the only way i found to stop it from waking up the cpu is to kill X.

Do you have some clue?

Thank you.
Comment 16 Antonio Orefice 2010-12-31 03:08:03 UTC
I just downgraded to:
intel-dri-7.8.2 (was 7.9.0.git20101207)
xf86-video-intel-2.12.0 (was 2.13)
xorg-server-1.9.0 (was 1.9.2)

Now the netbook is up from hibernation for about 10 Hours and the issue seems to be gone (crossing my fingers...)
So it appears to be something related after all.
Comment 17 Chris Wilson 2011-01-11 16:20:29 UTC
Hmm, don't recognise that at all.

Can you now upgrade and recheck in this order:

intel-dri (mesa),
xf86-video-intel,
xserver.

Given the wakeups seems to be flush_to_ldisc, I would expect the xserver to include the culprit. Then it's just a matter of bisecting the xserver...
Comment 18 Antonio Orefice 2011-01-12 00:28:44 UTC
(In reply to comment #17)
> Hmm, don't recognise that at all.
> 
> Can you now upgrade and recheck in this order:
> 
> intel-dri (mesa),
> xf86-video-intel,
> xserver.
> 
> Given the wakeups seems to be flush_to_ldisc, I would expect the xserver to
> include the culprit. Then it's just a matter of bisecting the xserver...

Now seems to be fine with: 
intel-dri: 7.9.0.git20101207
xf86-video-intel: 2.13.0
xorg-server 1.9.0


For the reference, this gave the problem:
intel-dri: 7.9.0.git20101207
xf86-video-intel: 2.13.0
xorg-server 1.9.2 *


So one would say that the xorg-server is the culprit, but still on different hardware (nvidia-ion chipset with the same packages and using hibernation) issue is not present.

I will try to upgrade xorg-server 1.9.3 in two weeks.

Thanks for your support!
Comment 19 Chris Wilson 2011-01-12 03:08:04 UTC
(In reply to comment #18)
> So one would say that the xorg-server is the culprit, but still on different
> hardware (nvidia-ion chipset with the same packages and using hibernation)
> issue is not present.

Don't forget that the extensions supported (and therefore code paths exercised) between nvidia and the OS drivers are different. The prime example being RandR and hotplug/udev.

But yes, that should narrow down the source of the error.
Comment 20 Antonio Orefice 2011-03-14 09:18:53 UTC
(In reply to comment #17)
> Hmm, don't recognise that at all.
> 
> Can you now upgrade and recheck in this order:
> 
> intel-dri (mesa),
> xf86-video-intel,
> xserver.
> 
> Given the wakeups seems to be flush_to_ldisc, I would expect the xserver to
> include the culprit. Then it's just a matter of bisecting the xserver...

Sorry for the late answer.
Today i fully upgraded my system and i had:
xorg-server 1.9.4
xf86-video-intel 2.14.0
(For the 3d part now i'm using mesa/gallium from git because i've better 3d performance in compiz.)

Same issue as before, and if i just downgrade xorg-server and it's dependancies to 1.9.0 the issue is solved.

Everything points to xorg-server now, do you have some advices to bisect it?
(I never did something like that)

Thanks.
Comment 21 Michael Düll 2011-05-23 10:38:24 UTC
I can confirm this on my T400 with Intel graphics on KDE 4. I am NOT using suspend2disk, only suspend2ram.

akurei@joel: ~ $ yaourt -Qi xorg-server | grep Version
Version        : 1.10.1-1
akurei@joel: ~ $ yaourt -Qi xf86-video-intel | grep Version
Version        : 2.15.0-1
akurei@joel: ~ $ yaourt -Qi intel-dri | grep Version
Version        : 7.10.2-2
akurei@joel: ~ $ yaourt -Qi libdrm | grep Version
Version        : 2.4.25-1
akurei@joel: ~ $ uname -a
Linux joel 2.6.38-ARCH #1 SMP PREEMPT Fri May 13 09:24:47 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz GenuineIntel GNU/Linux
Comment 22 Antonio Orefice 2011-08-09 09:33:15 UTC
Can't say when, because i didn't tried every version, but now, with xorg-server 1.10.3.901 it seems that the issue is gone.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.