Bug 110950 - RT cpu starvation due to driver threads or memory access
Summary: RT cpu starvation due to driver threads or memory access
Status: RESOLVED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-20 07:41 UTC by ANCELOT Stéphane
Modified: 2019-10-02 19:06 UTC (History)
2 users (show)

See Also:
i915 platform: BYT, HSW
i915 features:


Attachments

Description ANCELOT Stéphane 2019-06-20 07:41:24 UTC
Hi, 
I am facing preemption issues related with rt preempt realtime tasks and  intel video driver with next chipsets  :

* Intel J1900
* 4300 U

On heavy X11 load (eg lot of widgets creation at application startup, moving windows, hide/show windows...) my realtime task is interrupted for about 500us to 1ms.

My realtime task is isolated on a cpu, I suppose the problem is related to memory access, the driver locks /access memory and my application can't access it . (There is only a numa node available)


If I run the same graphics applications through a ssh connection the problem has not happened. 

I would like to know if there are workarounds ? and preemption status with intel drivers.

Thanks
Comment 1 Chris Wilson 2019-06-20 08:33:21 UTC
I don't suppose you have a handy recipe for setting up a machine with an isolated RT CPU?

https://gitlab.freedesktop.org/drm/igt-gpu-tools/blob/master/benchmarks/gem_syslatency.c

is a tool we use to try and assess the damage we cause to the system by measuring the latency of an RT thread. But I've never before setup an isolated cpu...

Just to check, do you mean the cpu isolation feature in the kernel, or are just using cpuset?

And also to confirm you are talking about CPU preemption and not GPU preeemption?
Comment 2 Chris Wilson 2019-06-20 08:44:39 UTC
Intel J1900

https://ark.intel.com/content/www/us/en/ark/products/78867/intel-celeron-processor-j1900-2m-cache-up-to-2-42-ghz.html

So Baytrail. Hmm. That shouldn't be as bad for RT as we don't need interrupt processing to keep the GPU feed (I automatically assumed the complaint would be about execlists which has much more noticeable impact on RT). On the other hand, we don't perform GPU preemption.
Comment 3 Chris Wilson 2019-06-20 09:00:46 UTC
One should also note that memory contention between the GPU and CPU cores is a real issue; there should be some MSR (don't ask me!) for configuring the relative priorities and allotments as tanstaafl.
Comment 4 ANCELOT Stéphane 2019-06-20 09:33:04 UTC
(In reply to Chris Wilson from comment #1)
> I don't suppose you have a handy recipe for setting up a machine with an
> isolated RT CPU?
> 

If I understand correctly the question, the system is started with cpus 0 isolated thanks to isolcpus=1 kernel boot parameters.

When RT threads are started, they are migrated to cpus 0 using pthread_setaffinity_np()



> https://gitlab.freedesktop.org/drm/igt-gpu-tools/blob/master/benchmarks/
> gem_syslatency.c
> 
> is a tool we use to try and assess the damage we cause to the system by
> measuring the latency of an RT thread. But I've never before setup an
> isolated cpu...
>
The problem can better being watched, if 2 RT tasks are used, first one starts periodically at eg 3ms and lasts eg 300us, the next one is chained by the first one and lasts eg 100us. adding 50us for each context change, you unfortunately finish the last task at almost 1.3 ms when problem has occured. 


I can monitor tasks activation using serial com port and a scope meter. Monitoring times thanks to clock_gettime  in threads is accurate too.
 
> Just to check, do you mean the cpu isolation feature in the kernel, or are
> just using cpuset?
> 
> And also to confirm you are talking about CPU preemption and not GPU
> preeemption?

If I understand , that is CPU preemption , the realtime task does not access GPU. however graphic tasks (non RT tasks) are loading GPU unit.
Comment 5 ANCELOT Stéphane 2019-06-20 14:25:05 UTC
I forgot to indicate :
19 inch display with a Resolution 1280x1024 / vertical orientation
Comment 6 Chris Wilson 2019-06-20 17:44:07 UTC
Just a quick check on the baseline, the results of our cycletest for measuring RT thread latency, maximum latency measured during a 120s period:

x baseline-max.lt
+ i915-max.lt
+------------------------------------------------------------------------------+
| xxx       +                                                                  |
| xxxxx     +  + +                                                             |
| xxxxx     ++ + *                                                             |
| xxxxxx    *+ + *                                                             |
| xxxxxxx   *+ ++*                                                             |
| xxxxxxx  x*+ ++*                                                             |
| xxxxxxx  **+x*+*                                                             |
|xxxxxxxxx **+*****+    +          +              +        +  +         +    + |
|xxxxxxxxxx**+*****+x+  +    ++    + +  +  ++     +  +    ++  ++ + +    +    + |
|xxxxxxxxxx**+******x**x*x++++*+  ++ + ++ ++++ + ++  ++  ++++ +++++++++ +    ++|
| |____M_A_____|                                                               |
|           |___________M_________A____________________|                       |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120            11            47            18     20.533333     7.7924045
+ 120            23           105            39     50.716667     26.207377

in microseconds. So even in the best case, the worst case impact of submitting nops is on average 50us.

Now, this does not take into account any impact memory contention has on the RT thread since we are not stressing the GPU in that manner. Nor does it setup an isolated cpu. Just establishing expectations for j1900.
Comment 7 ANCELOT Stéphane 2019-06-21 06:46:40 UTC
I should provide some unittests similar to my RT tasks in order for you being able to measure impact on it.
Comment 8 Chris Wilson 2019-06-26 11:25:24 UTC
One easy thing for you to check would be how does GPU frequency affect the memory contention:

$ cd /sys/class/drm/card0
$ cat gt_RPn_freq_mhz > gt_min_freq_mhz
$ cat gt_RPn_freq_mhz > gt_max_freq_mhz
$ cat gt_RPn_freq_mhz > gt_boost_freq_mhz
Comment 9 ANCELOT Stéphane 2019-06-26 12:19:14 UTC
Using the command you provided, I can watch an impact.

The complete cycletime of my process is normally around 700us with non graphic interaction

When the GPU uses default settings, and there is graphic activity this time raises up to 1.3ms.

Reducing the frequency with the command you provided reduces this time to 1ms .
Comment 10 Lakshmi 2019-07-29 12:00:56 UTC
(In reply to ANCELOT Stéphane from comment #9)
> Using the command you provided, I can watch an impact.
> 
> The complete cycletime of my process is normally around 700us with non
> graphic interaction
> 
> When the GPU uses default settings, and there is graphic activity this time
> raises up to 1.3ms.
> 
> Reducing the frequency with the command you provided reduces this time to
> 1ms .

@Chris, What are the next steps here?
Comment 11 Lakshmi 2019-08-28 12:27:48 UTC
(In reply to ANCELOT Stéphane from comment #9)
> Using the command you provided, I can watch an impact.
> 
> The complete cycletime of my process is normally around 700us with non
> graphic interaction
> 
> When the GPU uses default settings, and there is graphic activity this time
> raises up to 1.3ms.
> 
> Reducing the frequency with the command you provided reduces this time to
> 1ms .

GPU causes memory contention with a RT app. There is nothing we can help if the RT app is very important to you.

To me, system works as expected and no changes needed in this case. I would like to close this bug as WORKSFORME.
Comment 12 Lakshmi 2019-10-02 19:06:37 UTC
(In reply to Lakshmi from comment #11)
> (In reply to ANCELOT Stéphane from comment #9)
> > Using the command you provided, I can watch an impact.
> > 
> > The complete cycletime of my process is normally around 700us with non
> > graphic interaction
> > 
> > When the GPU uses default settings, and there is graphic activity this time
> > raises up to 1.3ms.
> > 
> > Reducing the frequency with the command you provided reduces this time to
> > 1ms .
> 
> GPU causes memory contention with a RT app. There is nothing we can help if
> the RT app is very important to you.
> 
> To me, system works as expected and no changes needed in this case. I would
> like to close this bug as WORKSFORME.

As said, this issue works as expected. Resolving this bug as NOTABUG. Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.