Discussion:
[Xen-devel] Xen optimization
Milan Boberic
2018-10-09 10:59:13 UTC
Permalink
Hi,
I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
In bm application I:
- start triple timer counter (ttc) which generates
interrupt every 1us
- turn on PS LED
- call function 100 times in for loop (function that sets
some values)
- turn off LED
- stop triple timer counter
- reset counter value

I ran this bare-metal application under Xen Hypervisor with following settings:
- used null scheduler (sched=null) and vwfi=native
- bare-metal application have one vCPU and it is pinned for pCPU1
- domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.

When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.

I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?

Also I would gladly accept any suggestion about increasing
performance, decreasing jitter, decreasing interrupt latency, etc.

Thanks in advance, Milan Boberic.
Dario Faggioli
2018-10-09 16:46:17 UTC
Permalink
Post by Milan Boberic
Hi,
Hi Milan,
Post by Milan Boberic
I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
- start triple timer counter (ttc) which generates
interrupt every 1us
- turn on PS LED
- call function 100 times in for loop (function that sets
some values)
- turn off LED
- stop triple timer counter
- reset counter value
Ok, I'm adding Stefano, Julien, and a couple of other people interested
in RT/lowlat on Xen.
Post by Milan Boberic
- used null scheduler (sched=null) and vwfi=native
- bare-metal application have one vCPU and it is pinned for pCPU1
- domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.
So, this is probably me not being familiar with Xen on Xilinx (and with
Xen on ARM as a whole), but there's a few things I'm not sure I
understand:
- you say you use sched=null _and_ pinning? That should not be
necessary (although, it shouldn't hurt either)
- "domain which is PetaLinux", is that dom0?

IAC, if it's not terrible hard to run this kind of test, I'd say, try
without 'vwfi=native', and also with another scheduler, like Credit,
(but then do make sure you use pinning).
Post by Milan Boberic
When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.
Here, when you say "without Xen", do you also mean without any
baremetal OS at all?
Post by Milan Boberic
I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?
Right. So, I'm not sure I've understood the test scenario either. But
yeah, 3us jitter seems significant. Still, if we're comparing with
bare-hw, without even an OS at all, I think it could have been expected
for latency and jitter to be higher in the Xen case.

Anyway, I am not sure anyone has done a kind of analysis that could
help us identify accurately from where things like that come, and in
what proportions.

It would be really awesome to have something like that, so do go ahead
if you feel like it. :-)

I think tracing could help a little (although we don't have a super-
sophisticated tracing infrastructure like Linux's perf and such), but
sadly enough, that's still not available on ARM, I think. :-/

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
Milan Boberic
2018-10-10 11:22:40 UTC
Permalink
Hi,
sorry, my explanation wasn't precise and I missed the point.
vCPU pinning with sched=null I put "just in case", because it doesn't hurt.

Yes, PetaLinux domain is dom0.

Tested with Credit scheduler before (it was just the LED blink
application but anyway), it results with bigger jitter then null
scheduler. For example, with Credit scheduler LED blinking result in
approximately 3us jitter where with null scheduler there is no jitter.
vwfi=native was giving the domain destruction problem which you fixed
by sending me patch, approximately 2 weeks ago if you recall :) but I
still didn't test it's impact on performance, I will do it ASAP and
share results (I think that without vwfi=native jitter will be the
same or even bigger).

When I say "without Xen", yes, I mean without any OS. Just hardware
and this bare-metal app. I do expect latency to be higher in the Xen
case and I'm curious how much exactly (which is the point of my work
and also master thesis for my faculty :D).

Now, the point is that when I set only LED blinking (without timer) in
my application there is no jitter (in Xen case) but when I add timer
which generates interrupt every us, jitter of 3 us occurs. Timer I use
is zynq ultrascale's triple timer counter. I'm suspecting that timer
interrupt is creating that jitter.

For interrupts I use passthrough in bare-metal application's
configuration file (which works for GPIO LED because there is no
jitter, interrupt can "freely go" from guest domain directly to GPIO
LED).

Also, when I create guest domain (which is this bare-metal
application) I get this messages:

(XEN) printk: 54 messages suppressed.
(XEN) d2v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
***@uz3eg-iocc-2018-2:~# (XEN) d2v0 No valid vCPU found for vIRQ34 in
the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ35 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d2v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it

In attachments I included dmesg, xl dmesg and bare-metal application's
configuration file.

Thanks in advance, Milan Boberic.
Post by Dario Faggioli
Post by Milan Boberic
Hi,
Hi Milan,
Post by Milan Boberic
I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
- start triple timer counter (ttc) which generates
interrupt every 1us
- turn on PS LED
- call function 100 times in for loop (function that sets
some values)
- turn off LED
- stop triple timer counter
- reset counter value
Ok, I'm adding Stefano, Julien, and a couple of other people interested
in RT/lowlat on Xen.
Post by Milan Boberic
- used null scheduler (sched=null) and vwfi=native
- bare-metal application have one vCPU and it is pinned for pCPU1
- domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.
So, this is probably me not being familiar with Xen on Xilinx (and with
Xen on ARM as a whole), but there's a few things I'm not sure I
- you say you use sched=null _and_ pinning? That should not be
necessary (although, it shouldn't hurt either)
- "domain which is PetaLinux", is that dom0?
IAC, if it's not terrible hard to run this kind of test, I'd say, try
without 'vwfi=native', and also with another scheduler, like Credit,
(but then do make sure you use pinning).
Post by Milan Boberic
When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.
Here, when you say "without Xen", do you also mean without any
baremetal OS at all?
Post by Milan Boberic
I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?
Right. So, I'm not sure I've understood the test scenario either. But
yeah, 3us jitter seems significant. Still, if we're comparing with
bare-hw, without even an OS at all, I think it could have been expected
for latency and jitter to be higher in the Xen case.
Anyway, I am not sure anyone has done a kind of analysis that could
help us identify accurately from where things like that come, and in
what proportions.
It would be really awesome to have something like that, so do go ahead
if you feel like it. :-)
I think tracing could help a little (although we don't have a super-
sophisticated tracing infrastructure like Linux's perf and such), but
sadly enough, that's still not available on ARM, I think. :-/
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Milan Boberic
2018-10-10 11:25:16 UTC
Permalink
Attachments.
Meng Xu
2018-10-10 16:41:05 UTC
Permalink
[Just add some thoughts on this.]
Post by Milan Boberic
Hi,
sorry, my explanation wasn't precise and I missed the point.
vCPU pinning with sched=null I put "just in case", because it doesn't hurt.
Yes, PetaLinux domain is dom0.
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)

Another thing you might have already done: make sure there is no print
information from either Xen or OS during your experiment. print causes
long delay.

Meng
Milan Boberic
2018-10-11 07:36:45 UTC
Permalink
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?

Yes, there are no prints.

I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.

Thanks in advance, Milan Boberic!
Milan Boberic
2018-10-11 12:17:54 UTC
Permalink
I misunderstood the passthrough concept, it only allows guest domain
to use certain interrupts and memory. Is there are way to somehow
route interrupt from domU (bare-metal app) to hw?
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
Yes, there are no prints.
I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.
Thanks in advance, Milan Boberic!
Dario Faggioli
2018-10-11 17:05:19 UTC
Permalink
Hey,

Be a bit more careful about not top posting, please? :-)
Post by Milan Boberic
I misunderstood the passthrough concept, it only allows guest domain
to use certain interrupts and memory.
I'm afraid we totally rely on people with much more experience than me
(and I guess Meng's) on how things work on ARM.
Post by Milan Boberic
Is there are way to somehow
route interrupt from domU (bare-metal app) to hw?
Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
Meng Xu
2018-10-11 15:39:02 UTC
Permalink
Hi Milan,
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
I'm not familiar with PetaLinux. :(
From my previous experience in measuring the rt-test in the
virtualization environment, I found:
Even though the app. is the only one running on the CPU, the CPU may
be used to handle other interrupts and its context (such as TLB and
cache) might be flushed by other components. When these happen, the
interrupt handling latency can vary a lot.

Hopefully, it helps. :)

Meng
Stefano Stabellini
2018-10-11 22:29:46 UTC
Permalink
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
Yes, there are no prints.
I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.
Thanks in advance, Milan Boberic!
Hi Milan,

Sorry for taking so long to go back to this thread. But I am here now :)

First, let me ask a couple of questions to understand the scenario
better: is there any interference from other virtual machines while you
measure the jitter? Or is the baremetal app the only thing actively
running on the board?

Second, it would be worth double-checking that Dario's patch to fix
sched=null is not having unexpected side effects. I don't think so, it
would be worth testing with it and without it to be sure.

I gave a look at your VM configuration. The configuration looks correct.
There is no dtdev settings, but given that none of the devices you are
assigning to the guest does any DMA, it should be OK. You want to make
sure that Dom0 is not trying to use those same devices -- make sure to
add "xen,passthrough;" to each corresponding node on the host device
tree.

The error messages "No valid vCPU found" are due to the baremetal
applications trying to configure as target cpu for the interrupt cpu1
(the second cpu in the system), while actually only 1 vcpu is assigned
to the VM. Hence, only cpu0 is allowed. I don't think it should cause
any jitter issues, because the request is simply ignored. Just to be
safe, you might want to double check that the physical interrupt is
delivered to the right physical cpu, which would be cpu1 in your
configuration, the one running the only vcpu of the baremetal app. You
can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
for example:

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..208fde7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
out:
spin_unlock_irqrestore(&v->arch.vgic.lock, flags);

+ if (v != current) printk("DEBUG irq slow path!\n");
/* we have a new higher priority irq, inject it into the guest */
vcpu_kick(v);

You don't want "DEBUG irq slow path!" to get printed.

Finally, I would try to set the timer to generate events less frequently
than every 1us and see what happens, maybe every 5-10us. In my tests,
the IRQ latency overhead caused by Xen is around 1us, so injecting 1
interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to
good results.

I hope this helps, please keep us updated with your results, they are
very interesting!
Milan Boberic
2018-10-12 15:33:59 UTC
Permalink
Hi Stefano, glad to have you back :D,
this is my setup:
- dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
- there is only one domU and this is my bare-metal app that also
have one vCPU and it's pinned for pCPU1
so yeah, there is only dom0 and bare-metal app on the board.

Jitter is the same with and without Dario's patch.

I'm still not sure about timer's passthrough because there is no mention of
triple timer counter is device tree so I added:

&ttc0 {
xen,passthrough = <0x1>;
};

at the end of the xen-overlay.dtsi file which I included in attachment.

About patch you sent, I can't find this funcion void vgic_inject_irq in
/xen/arch/arm/vgic.c file, this is link of git repository from where I
build my xen so you can take a look if that printk can be put somewhere
else.

https://github.com/Xilinx/xen/

I ran some more testing and realized that results are the same with or
without vwfi=native, which I think again points out that passthrough that I
need to provide in device tree isn't valid.

And of course, higher the frequency of interrupts results in higher
jitter. I'm still battling with Xilinx SDK and triple timer counter that's
why I can't figure out what is the exact frequency set (I'm just rising it
and lowering it), I'll give my best to solve that ASAP because we need to
know exact value of frequency set.

Thanks in advance!

Milan



On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini <
Post by Meng Xu
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on
PetaLinux.
Post by Milan Boberic
Post by Meng Xu
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
Yes, there are no prints.
I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.
Thanks in advance, Milan Boberic!
Hi Milan,
Sorry for taking so long to go back to this thread. But I am here now :)
First, let me ask a couple of questions to understand the scenario
better: is there any interference from other virtual machines while you
measure the jitter? Or is the baremetal app the only thing actively
running on the board?
Second, it would be worth double-checking that Dario's patch to fix
sched=null is not having unexpected side effects. I don't think so, it
would be worth testing with it and without it to be sure.
I gave a look at your VM configuration. The configuration looks correct.
There is no dtdev settings, but given that none of the devices you are
assigning to the guest does any DMA, it should be OK. You want to make
sure that Dom0 is not trying to use those same devices -- make sure to
add "xen,passthrough;" to each corresponding node on the host device
tree.
The error messages "No valid vCPU found" are due to the baremetal
applications trying to configure as target cpu for the interrupt cpu1
(the second cpu in the system), while actually only 1 vcpu is assigned
to the VM. Hence, only cpu0 is allowed. I don't think it should cause
any jitter issues, because the request is simply ignored. Just to be
safe, you might want to double check that the physical interrupt is
delivered to the right physical cpu, which would be cpu1 in your
configuration, the one running the only vcpu of the baremetal app. You
can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..208fde7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v,
unsigned int virq,
spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+ if (v != current) printk("DEBUG irq slow path!\n");
/* we have a new higher priority irq, inject it into the guest */
vcpu_kick(v);
You don't want "DEBUG irq slow path!" to get printed.
Finally, I would try to set the timer to generate events less frequently
than every 1us and see what happens, maybe every 5-10us. In my tests,
the IRQ latency overhead caused by Xen is around 1us, so injecting 1
interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to
good results.
I hope this helps, please keep us updated with your results, they are
very interesting!
Julien Grall
2018-10-12 16:36:13 UTC
Permalink
Hi,

Sorry for the formatting.
Post by Milan Boberic
Hi Stefano, glad to have you back :D,
- dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
- there is only one domU and this is my bare-metal app that also
have one vCPU and it's pinned for pCPU1
so yeah, there is only dom0 and bare-metal app on the board.
Jitter is the same with and without Dario's patch.
I'm still not sure about timer's passthrough because there is no mention
&ttc0 {
xen,passthrough = <0x1>;
};
Would you mind to explain what is the triple timer counter?
Post by Milan Boberic
at the end of the xen-overlay.dtsi file which I included in attachment.
About patch you sent, I can't find this funcion void vgic_inject_irq in
/xen/arch/arm/vgic.c file, this is link of git repository from where I
build my xen so you can take a look if that printk can be put somewhere
else.
There was some vGIC rework in Xen 4.11. There was also a new vGIC added
(selectable using NEW_VGIC). It might be worth to look at it.
Post by Milan Boberic
https://github.com/Xilinx/xen/
This is not the official Xen repository and look like patches have been
applied on top. I am afraid, I am not going to be able help here. Could you
do the same experiment with Xen 4.11?
Post by Milan Boberic
I ran some more testing and realized that results are the same with or
without vwfi=native, which I think again points out that passthrough that I
need to provide in device tree isn't valid.
This could also means that wfi is not used by the guest or you never go to
the idle vCPU.
Post by Milan Boberic
And of course, higher the frequency of interrupts results in higher
jitter. I'm still battling with Xilinx SDK and triple timer counter that's
why I can't figure out what is the exact frequency set (I'm just rising it
and lowering it), I'll give my best to solve that ASAP because we need to
know exact value of frequency set.
Thanks in advance!
Milan
On Fri, Oct 12, 2018 at 12:29 AM Stefano Stabellini <
Post by Meng Xu
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on
PetaLinux.
Post by Milan Boberic
Post by Meng Xu
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
Yes, there are no prints.
I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.
Thanks in advance, Milan Boberic!
Hi Milan,
Sorry for taking so long to go back to this thread. But I am here now :)
First, let me ask a couple of questions to understand the scenario
better: is there any interference from other virtual machines while you
measure the jitter? Or is the baremetal app the only thing actively
running on the board?
Second, it would be worth double-checking that Dario's patch to fix
sched=null is not having unexpected side effects. I don't think so, it
would be worth testing with it and without it to be sure.
I gave a look at your VM configuration. The configuration looks correct.
There is no dtdev settings, but given that none of the devices you are
assigning to the guest does any DMA, it should be OK. You want to make
sure that Dom0 is not trying to use those same devices -- make sure to
add "xen,passthrough;" to each corresponding node on the host device
tree.
The error messages "No valid vCPU found" are due to the baremetal
applications trying to configure as target cpu for the interrupt cpu1
(the second cpu in the system), while actually only 1 vcpu is assigned
to the VM. Hence, only cpu0 is allowed. I don't think it should cause
any jitter issues, because the request is simply ignored. Just to be
safe, you might want to double check that the physical interrupt is
delivered to the right physical cpu, which would be cpu1 in your
configuration, the one running the only vcpu of the baremetal app. You
can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..208fde7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu
*v, unsigned int virq,
spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+ if (v != current) printk("DEBUG irq slow path!\n");
/* we have a new higher priority irq, inject it into the guest */
vcpu_kick(v);
You don't want "DEBUG irq slow path!" to get printed.
Finally, I would try to set the timer to generate events less frequently
than every 1us and see what happens, maybe every 5-10us. In my tests,
the IRQ latency overhead caused by Xen is around 1us, so injecting 1
interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to
good results.
I hope this helps, please keep us updated with your results, they are
very interesting!
_______________________________________________
Xen-devel mailing list
https://lists.xenproject.org/mailman/listinfo/xen-devel
Stefano Stabellini
2018-10-12 17:43:42 UTC
Permalink
Post by Milan Boberic
Hi Stefano, glad to have you back :D,
        - dom0 is PetaLinux, has 1 vCPU and it's pinned for pCPU0
        - there is only one domU and this is my bare-metal app that also have one vCPU and it's pinned for pCPU1
so yeah, there is only dom0 and bare-metal app on the board.
Jitter is the same with and without Dario's patch.
&ttc0 {
   xen,passthrough = <0x1>;
};
at the end of the xen-overlay.dtsi file which I included in attachment.
This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0? You
should have something like:


***@ff110000 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};

For each of the nodes of the devices you are assigning to the DomU.
Post by Milan Boberic
About patch you sent, I can't find this funcion void vgic_inject_irq in /xen/arch/arm/vgic.c file, this is link of git repository
from where I build my xen so you can take a look if that printk can be put somewhere else.
https://github.com/Xilinx/xen/
It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462

BTW you are using a pretty old branch, I suggest you moving to:

https://github.com/Xilinx/xen/tree/xilinx/versal/xen/arch/arm

It will work on your board too and it is based on the much newer Xen
4.11.
Post by Milan Boberic
I ran some more testing and realized that results are the same with or without vwfi=native, which I think again points out that
passthrough that I need to provide in device tree isn't valid.
In reality, the results are the same with and without vwfi=native only
if the baremetal app never issues any wfi instructions.
Post by Milan Boberic
 And of course, higher the frequency of interrupts results in higher jitter. I'm still battling with Xilinx SDK and triple timer
counter that's why I can't figure out what is the exact frequency set (I'm just rising it and lowering it), I'll give my best to
solve that ASAP because we need to know exact value of frequency set. 
Yep, that's important :-)
Post by Milan Boberic
Thanks in advance!
Milan
 
Post by Milan Boberic
Post by Meng Xu
The jitter may come from Xen or the OS in dom0.
It will be useful to know what is the jitter if you run the test on PetaLinux.
(It's understandable the jitter is gone without OS. It is also common
that OS introduces various interferences.)
Hi Meng,
well... I'm using bare-metal application and I need it exclusively to
be ran on one CPU as domU (guest) without OS (and I'm not sure how
would I make the same app to be ran on PetaLinux dom0 :D haha).
Is there a chance that PetaLinux as dom0 is creating this jitter and
how? Is there a way of decreasing it?
Yes, there are no prints.
I'm not sure about this timer interrupt passthrough because I didn't
find any example of it, in attachment I included xen-overlay.dtsi file
which I edited to add passthrough, in earlier replies there are
bare-metal configuration file. It would be helpful to know if those
setting are correct. If they are not correct it would explain the
jitter.
Thanks in advance, Milan Boberic!
Hi Milan,
Sorry for taking so long to go back to this thread. But I am here now :)
First, let me ask a couple of questions to understand the scenario
better: is there any interference from other virtual machines while you
measure the jitter? Or is the baremetal app the only thing actively
running on the board?
Second, it would be worth double-checking that Dario's patch to fix
sched=null is not having unexpected side effects. I don't think so, it
would be worth testing with it and without it to be sure.
I gave a look at your VM configuration. The configuration looks correct.
There is no dtdev settings, but given that none of the devices you are
assigning to the guest does any DMA, it should be OK. You want to make
sure that Dom0 is not trying to use those same devices -- make sure to
add "xen,passthrough;" to each corresponding node on the host device
tree.
The error messages "No valid vCPU found" are due to the baremetal
applications trying to configure as target cpu for the interrupt cpu1
(the second cpu in the system), while actually only 1 vcpu is assigned
to the VM. Hence, only cpu0 is allowed. I don't think it should cause
any jitter issues, because the request is simply ignored. Just to be
safe, you might want to double check that the physical interrupt is
delivered to the right physical cpu, which would be cpu1 in your
configuration, the one running the only vcpu of the baremetal app. You
can do that by adding a printk to xen/arch/arm/vgic.c:vgic_inject_irq,
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..208fde7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -591,6 +591,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
     spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+    if (v != current) printk("DEBUG irq slow path!\n");
     /* we have a new higher priority irq, inject it into the guest */
     vcpu_kick(v);
You don't want "DEBUG irq slow path!" to get printed.
Finally, I would try to set the timer to generate events less frequently
than every 1us and see what happens, maybe every 5-10us. In my tests,
the IRQ latency overhead caused by Xen is around 1us, so injecting 1
interrupt every 1us, plus 1us of latency caused by Xen, cannot lead to
good results.
I hope this helps, please keep us updated with your results, they are
very interesting!
Milan Boberic
2018-10-13 16:01:09 UTC
Permalink
Hi,
Post by Dario Faggioli
Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?
Yes they do, sorry, I reversed the order because I'm a newbie :) .
Post by Dario Faggioli
Would you mind to explain what is the triple timer counter?
On this link on page 342 is explanation.
Post by Dario Faggioli
This is not the official Xen repository and look like patches have been applied on top. I am afraid, I am not going to be able help here. Could you do the same experiment with Xen 4.11?
I think I have to get Xen from Xilinx because I use board that has
Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it.
Post by Dario Faggioli
This could also means that wfi is not used by the guest or you never go to the idle vCPU.
Right.
Post by Dario Faggioli
This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0? You
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
For each of the nodes of the devices you are assigning to the DomU.
I put
&ttc0 {
xen,passthrough = <0x1>;
};
because when I was making bm app I was following this guide. Now I see
it's wrong. When I copied directly:
***@ff110000 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in to the xen-overlay.dtsi file it resulted an error during
device-tree build. I modified it a little bit so I can get successful
build, there are all device-tree files included in attachment. I'm not
sure how to set this passthrough properly, if you could take a look at
those files in attachment I'd be more then grateful.
Post by Dario Faggioli
It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
now I moved to Xen 4.11 like you suggested and applied your patch and
Dario's also.

Okay, now when I want to xl create my domU (bare-metal app) I get error:

Parsing config from timer.cfg
(XEN) IRQ 68 is already used by domain 0
libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
give domain access to irq 68: Device or resource busy
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
1:Destruction of domain failed

I guess my modifications of:
***@ff110000 {
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
are not correct. I tried to change interrupts to:
interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
because if you check here on page 310 interrupts for TTC0 are 68:70.
But that didn't work either I still get same error.

I also tried to change xen,passthrough; line with:
xen,passthrough = <0x1>;
but also without success, still the same error.

Are you sure about this line:
reg = <0x0 0xff110000 0x0 0x1000>; ?
Or it should be like this?
reg = <0x0 0xff110000 0x1000>;

I also included xl dmesg and dmesg in attachments (after xl create of bm app).

Thanks in advance!

Milan
Stefano Stabellini
2018-10-14 22:46:27 UTC
Permalink
Post by Milan Boberic
Post by Stefano Stabellini
This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0? You
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
For each of the nodes of the devices you are assigning to the DomU.
I put
&ttc0 {
xen,passthrough = <0x1>;
};
because when I was making bm app I was following this guide. Now I see
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in to the xen-overlay.dtsi file it resulted an error during
device-tree build. I modified it a little bit so I can get successful
build, there are all device-tree files included in attachment. I'm not
sure how to set this passthrough properly, if you could take a look at
those files in attachment I'd be more then grateful.
Post by Stefano Stabellini
It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
now I moved to Xen 4.11 like you suggested and applied your patch and
Dario's also.
Parsing config from timer.cfg
(XEN) IRQ 68 is already used by domain 0
libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
give domain access to irq 68: Device or resource busy
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
1:Destruction of domain failed
That means that the "xen,passthrough" addition to the host device tree went wrong.
Post by Milan Boberic
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
are not correct.
Right
Post by Milan Boberic
interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
because if you check here on page 310 interrupts for TTC0 are 68:70.
But that didn't work either I still get same error.
The interrupt numbers specified in the DTS are the real interrupt minus
32: 68-32 = 36 = 0x24. The DTS was correct.
Post by Milan Boberic
xen,passthrough = <0x1>;
but also without success, still the same error.
reg = <0x0 0xff110000 0x0 0x1000>; ?
Or it should be like this?
reg = <0x0 0xff110000 0x1000>;
Yes, that could be a problem. The format depends on the #address-cells
and #size-cells parameters. You didn't send me system-conf.dtsi, so I
don't know for sure which one of the two is right. In any case, you
should not duplicate the ***@ff110000 node in device tree. You should
only add "xen,passthrough;" to the existing ***@ff110000 node, which
is probably in system-conf.dtsi. So, avoid adding a new timer node to
xen-overlay.dtsi, and instead modify system-conf.dtsi.
Post by Milan Boberic
I also included xl dmesg and dmesg in attachments (after xl create of bm app).
Thanks in advance!
Milan
Milan Boberic
2018-10-15 12:27:21 UTC
Permalink
In attachment are device-tree files I found in my project:

device-tree.bbappend - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/

xen-overlay.dtsi , system-user.dtsi and zunqmp-qemu-arm.dts - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files

zynqmp-qemu-multiarch-arm and zynqmp-qemu-pmu - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files/multi-arch

pcw.dtsi , pl.dtsi , system-conf.dtsi , sistem-top.dts ,
zynqmp-clk-ccf.dtsi and zynqmp.dtsi -
under<path_to_project>/uz3eg_iocc_2018_2/components/plnx_workspace/device-tree/device-tree/

In system-conf.dtsi file first line says:
/*
* CAUTION: This file is automatically generated by PetaLinux SDK.
* DO NOT modify this file
*/
and there is no sigh of timer.
If you could take a look at this and other files in attachment it
would be great.

I also tried to run bare-metal app with this changes and it worked, added:

&ttc0 {
status = "okay";
compatible = "cdns,ttc";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;

};

in xen-overlay.dtsi file, because it's overlay it shouldn't duplicate
timer nod, right?
After build I ran:
dtc -I dtb -O dts -o system.dts system.dtb
and checked for ttc0, it seems okay except interrupt-parent is <0x4>
not <0x2> like in your example:

***@ff110000 {
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
xen,passthrough;
};
status was "disable" before.
system.dts is also added in attachment.

Is this the working passthrough?Because jitter is the same .

When legit, working passthrough is set correctly, jitter should be
smaller, right?

Thanks in advance!
Milan
On Mon, Oct 15, 2018 at 12:50 AM Stefano Stabellini
Post by Stefano Stabellini
Post by Milan Boberic
Post by Stefano Stabellini
This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0? You
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
For each of the nodes of the devices you are assigning to the DomU.
I put
&ttc0 {
xen,passthrough = <0x1>;
};
because when I was making bm app I was following this guide. Now I see
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in to the xen-overlay.dtsi file it resulted an error during
device-tree build. I modified it a little bit so I can get successful
build, there are all device-tree files included in attachment. I'm not
sure how to set this passthrough properly, if you could take a look at
those files in attachment I'd be more then grateful.
Post by Stefano Stabellini
It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
now I moved to Xen 4.11 like you suggested and applied your patch and
Dario's also.
Parsing config from timer.cfg
(XEN) IRQ 68 is already used by domain 0
libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
give domain access to irq 68: Device or resource busy
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
1:Destruction of domain failed
That means that the "xen,passthrough" addition to the host device tree went wrong.
Post by Milan Boberic
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
are not correct.
Right
Post by Milan Boberic
interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
because if you check here on page 310 interrupts for TTC0 are 68:70.
But that didn't work either I still get same error.
The interrupt numbers specified in the DTS are the real interrupt minus
32: 68-32 = 36 = 0x24. The DTS was correct.
Post by Milan Boberic
xen,passthrough = <0x1>;
but also without success, still the same error.
reg = <0x0 0xff110000 0x0 0x1000>; ?
Or it should be like this?
reg = <0x0 0xff110000 0x1000>;
Yes, that could be a problem. The format depends on the #address-cells
and #size-cells parameters. You didn't send me system-conf.dtsi, so I
don't know for sure which one of the two is right. In any case, you
is probably in system-conf.dtsi. So, avoid adding a new timer node to
xen-overlay.dtsi, and instead modify system-conf.dtsi.
Post by Milan Boberic
I also included xl dmesg and dmesg in attachments (after xl create of bm app).
Thanks in advance!
Milan
Stefano Stabellini
2018-10-16 07:13:37 UTC
Permalink
Post by Milan Boberic
device-tree.bbappend - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/
xen-overlay.dtsi , system-user.dtsi and zunqmp-qemu-arm.dts - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files
zynqmp-qemu-multiarch-arm and zynqmp-qemu-pmu - under
<path_to_project>/uz3eg_iocc_2018_2/project-spec/meta-user/recipes-bsp/device-tree/files/multi-arch
pcw.dtsi , pl.dtsi , system-conf.dtsi , sistem-top.dts ,
zynqmp-clk-ccf.dtsi and zynqmp.dtsi -
under<path_to_project>/uz3eg_iocc_2018_2/components/plnx_workspace/device-tree/device-tree/
/*
* CAUTION: This file is automatically generated by PetaLinux SDK.
* DO NOT modify this file
*/
and there is no sigh of timer.
If you could take a look at this and other files in attachment it
would be great.
The device tree with everything seems to be system.dts, that was enough
:-) I don't need the dtsi files you used to build the final dts, I only
need the one you use in uboot and for your guest.

In system.dts, the timers are all there:

***@ff110000 {
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
xen,passthrough;
};

***@ff120000 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x27 0x4 0x0 0x28 0x4 0x0 0x29 0x4>;
reg = <0x0 0xff120000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
};

***@ff130000 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x2a 0x4 0x0 0x2b 0x4 0x0 0x2c 0x4>;
reg = <0x0 0xff130000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3c>;
clocks = <0x3 0x1f>;
};

***@ff140000 {
compatible = "cdns,ttc";
status = "disabled";
interrupt-parent = <0x4>;
interrupts = <0x0 0x2d 0x4 0x0 0x2e 0x4 0x0 0x2f 0x4>;
reg = <0x0 0xff140000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3d>;
clocks = <0x3 0x1f>;
};

It looks like you set xen,passthrough correctly in system.dts for
Post by Milan Boberic
&ttc0 {
status = "okay";
compatible = "cdns,ttc";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in xen-overlay.dtsi file, because it's overlay it shouldn't duplicate
timer nod, right?
As I wrote, system.dts looks correct.
Post by Milan Boberic
dtc -I dtb -O dts -o system.dts system.dtb
and checked for ttc0, it seems okay except interrupt-parent is <0x4>
I don't know what you are referring to. In the system.dts you attached
interrupt-parent is <0x4> which is correct:

***@ff110000 {
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;
Post by Milan Boberic
compatible = "cdns,ttc";
status = "okay";
interrupt-parent = <0x4>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
clocks = <0x3 0x1f>;
xen,passthrough;
};
status was "disable" before.
system.dts is also added in attachment.
status is "okay" in the system.dts you attached. That is important
because status = "disable" it means the device cannot be used.
Post by Milan Boberic
Is this the working passthrough?Because jitter is the same .
When legit, working passthrough is set correctly, jitter should be
smaller, right?
If you are not getting any errors anymore when creating your baremetal
guest, then yes, it should be working passthrough. I would double-check
that everything is working as expected using the DEBUG patch for Xen I
suggested to you in the other email. You might even want to remove the
"if" check and always print something for every interrupt of your guest
just to get an idea of what's going on. See the attached patch.

Once everything is as expected I would change the frequency of the
timer, because 1u is way too frequent. I think it should be at least
3us, more like 5us. Keep in mind that jitter is about having
deterministic IRQ latency, not about having extremely frequent
interrupts.

I would also double check that you are not using any other devices or
virtual interfaces in your baremetal app because that could negatively
affect the numbers. For instance, Linux by default uses the virtual
timer interface ("arm,armv8-timer", I would double check that the
baremetal app is not doing the same -- you don't want to be using two
timers when doing your measurements.
Post by Milan Boberic
Thanks in advance!
Milan
On Mon, Oct 15, 2018 at 12:50 AM Stefano Stabellini
Post by Stefano Stabellini
Post by Milan Boberic
Post by Stefano Stabellini
This is definitely wrong. Can you please also post the full host device
tree with your modifications that you are using for Xen and Dom0? You
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
For each of the nodes of the devices you are assigning to the DomU.
I put
&ttc0 {
xen,passthrough = <0x1>;
};
because when I was making bm app I was following this guide. Now I see
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
in to the xen-overlay.dtsi file it resulted an error during
device-tree build. I modified it a little bit so I can get successful
build, there are all device-tree files included in attachment. I'm not
sure how to set this passthrough properly, if you could take a look at
those files in attachment I'd be more then grateful.
Post by Stefano Stabellini
It's here: https://github.com/Xilinx/xen/blob/xilinx/stable-4.9/xen/arch/arm/vgic.c#L462
Oh, about that. I sent you wrong branch, I was using Xen 4.10. Anyway
now I moved to Xen 4.11 like you suggested and applied your patch and
Dario's also.
Parsing config from timer.cfg
(XEN) IRQ 68 is already used by domain 0
libxl: error: libxl_create.c:1354:domcreate_launch_dm: Domain 1:failed
give domain access to irq 68: Device or resource busy
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain
1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain
1:Destruction of domain failed
That means that the "xen,passthrough" addition to the host device tree went wrong.
Post by Milan Boberic
compatible = "cdns,ttc";
interrupt-parent = <0x2>;
interrupts = <0x0 0x24 0x4 0x0 0x25 0x4 0x0 0x26 0x4>;
reg = <0x0 0xff110000 0x0 0x1000>;
timer-width = <0x20>;
power-domains = <0x3b>;
xen,passthrough;
};
are not correct.
Right
Post by Milan Boberic
interrupts = <0x0 0x44 0x4 0x0 0x45 0x4 0x0 0x46 0x4>;
because if you check here on page 310 interrupts for TTC0 are 68:70.
But that didn't work either I still get same error.
The interrupt numbers specified in the DTS are the real interrupt minus
32: 68-32 = 36 = 0x24. The DTS was correct.
Post by Milan Boberic
xen,passthrough = <0x1>;
but also without success, still the same error.
reg = <0x0 0xff110000 0x0 0x1000>; ?
Or it should be like this?
reg = <0x0 0xff110000 0x1000>;
Yes, that could be a problem. The format depends on the #address-cells
and #size-cells parameters. You didn't send me system-conf.dtsi, so I
don't know for sure which one of the two is right. In any case, you
is probably in system-conf.dtsi. So, avoid adding a new timer node to
xen-overlay.dtsi, and instead modify system-conf.dtsi.
Post by Milan Boberic
I also included xl dmesg and dmesg in attachments (after xl create of bm app).
Thanks in advance!
Milan
Julien Grall
2018-10-15 12:50:50 UTC
Permalink
(Resending with a different address)
Post by Milan Boberic
Hi,
Hi,
Post by Milan Boberic
Post by Dario Faggioli
Don't interrupt _come_ from hardware and go/are routed to
hypervisor/os/app?
Yes they do, sorry, I reversed the order because I'm a newbie :) .
Post by Dario Faggioli
Would you mind to explain what is the triple timer counter?
On this link on page 342 is explanation.
Which link?
Post by Milan Boberic
Post by Dario Faggioli
This is not the official Xen repository and look like patches have
been applied on top. I am afraid, I am not going to be able help
here. Could you do the same experiment with Xen 4.11?
I think I have to get Xen from Xilinx because I use board that has
Zynq Ultrascale. Stefano sent branch with Xen 4.11 so I built with it.
The board should be fully supported upstreamed. If Xilinx has more patch
on top, then you would need to seek support from them because I don't
know what they changed in Xen.
Cheers,
--
Julien Grall
Milan Boberic
2018-10-15 13:01:35 UTC
Permalink
Which link?
I made hyperlink on "link" word, looks like somehow it got lost, here
is the link:

https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
The board should be fully supported upstreamed. If Xilinx has more patch
on top, then you would need to seek support from them because I don't
know what they changed in Xen.
I think Stefano can help, thanks for sugesstion.

Cheers,
Milan
Julien Grall
2018-10-15 13:03:26 UTC
Permalink
Post by Milan Boberic
Which link?
I made hyperlink on "link" word, looks like somehow it got lost, here
https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
HTML should be avoided on the mailing list. Most of us are using
text-only clients.

Cheers,
--
Julien Grall
Milan Boberic
2018-10-17 15:19:47 UTC
Permalink
Hi,
Post by Stefano Stabellini
The device tree with everything seems to be system.dts, that was enough
:-) I don't need the dtsi files you used to build the final dts, I only
need the one you use in uboot and for your guest.
I wasn't sure so I sent everything, sorry for being bombarded with
all those files. :-)
Post by Stefano Stabellini
It looks like you set xen,passthrough correctly in system.dts for
Thank you for taking a look, now we are sure that passthrough works
correctly because there is no error during guest creation and there
are no prints of "DEBUG irq slow path".
Post by Stefano Stabellini
If you are not getting any errors anymore when creating your baremetal
guest, then yes, it should be working passthrough. I would double-check
that everything is working as expected using the DEBUG patch for Xen I
suggested to you in the other email. You might even want to remove the
"if" check and always print something for every interrupt of your guest
just to get an idea of what's going on. See the attached patch.
When I apply this patch it prints forever:
(XEN) DEBUG virq=68 local=1
which is a good thing I guess because interrupts are being generated non-stop.
Post by Stefano Stabellini
Once everything is as expected I would change the frequency of the
timer, because 1u is way too frequent. I think it should be at least
3us, more like 5us.
Okay, about this... I double checked my bare-metal application and
looks like interrupts weren't generated every 1 us. Maximum frequency
of interrupts is 8 us. I checked interrupt frequency with oscilloscope
just to be sure (toggling LED on/off when interrupts occur). So, when
I set:
- interrupts to be generated every 8 us I get jitter of 6 us
- interrupts to be generated every 10 us I get jitter of 3 us (after
2-3mins it jumps to 6 us)
- interrupts to be generated every 15 us jitter is the same as when
only bare-metal application runs on board (without Xen or any OS)

I want to remind you that bare-metal application that only blinks LED
with high speed gives 1 us jitter, somehow introducing frequent
interrupts causes this jitter, that's why I was unsecure about this
timer passthrough. Taking in consideration that you measured Xen
overhead of 1 us I have a feeling that I'm missing something, is there
anything else I could do to get better results except sched=null,
vwfi=native, hard vCPU pinning (1 vCPU on 1 pCPU) and passthrough (not
sure if it affects the jitter) ?
I'm forcing frequent interrupts because I'm testing to see if this
board with Xen on it could be used for real-time simulations,
real-time signal processing, etc. If I could get results like yours (1
us Xen overhead) of even better that would be great! BTW how did you
measure Xen's overhead?
Post by Stefano Stabellini
Keep in mind that jitter is about having
deterministic IRQ latency, not about having extremely frequent
interrupts.
Yes, but I want to see exactly where will I lose deterministic IRQ
latency which is extremely important in real-time signal processing.
So, what causes this jitter, are those Xen limits, ARM limits, etc? It
would be nice to know, I'll share all the results I get.
Post by Stefano Stabellini
I would also double check that you are not using any other devices or
virtual interfaces in your baremetal app because that could negatively
affect the numbers.
I checked the bare-metal app and I think there is no other devices
that bm app is using.
Post by Stefano Stabellini
Linux by default uses the virtual
timer interface ("arm,armv8-timer", I would double check that the
baremetal app is not doing the same -- you don't want to be using two
timers when doing your measurements.
Hmm, I'm not sure how to check that, I could send bare-metal app if
that helps, it's created in Xilinx SDK 2017.4.
Also, should I move to Xilinx SDK 2018.2 because I'm using PetaLinux 2018.2 ?
I'm also using hardware description file for SDK that is created in
Vivado 2017.4.
Is all this could be a "not matching version" problem (I don't think
so because bm app works)?
Post by Stefano Stabellini
Even though the app. is the only one running on the CPU, the CPU may
be used to handle other interrupts and its context (such as TLB and
cache) might be flushed by other components. When these happen, the
interrupt handling latency can vary a lot.
What do you think about this? I don't know how would I check this.

I also tried using default scheduler (removed sched=null and
vwfi=native) and jitter is 10 us when interrupt is generated every 10
us.

Thanks in advance!

Milan
Stefano Stabellini
2018-10-19 21:02:44 UTC
Permalink
Post by Milan Boberic
Hi,
Post by Stefano Stabellini
The device tree with everything seems to be system.dts, that was enough
:-) I don't need the dtsi files you used to build the final dts, I only
need the one you use in uboot and for your guest.
I wasn't sure so I sent everything, sorry for being bombarded with
all those files. :-)
Post by Stefano Stabellini
It looks like you set xen,passthrough correctly in system.dts for
Thank you for taking a look, now we are sure that passthrough works
correctly because there is no error during guest creation and there
are no prints of "DEBUG irq slow path".
Great!
Post by Milan Boberic
Post by Stefano Stabellini
If you are not getting any errors anymore when creating your baremetal
guest, then yes, it should be working passthrough. I would double-check
that everything is working as expected using the DEBUG patch for Xen I
suggested to you in the other email. You might even want to remove the
"if" check and always print something for every interrupt of your guest
just to get an idea of what's going on. See the attached patch.
(XEN) DEBUG virq=68 local=1
which is a good thing I guess because interrupts are being generated non-stop.
Yes, local=1 means that the interrupt is injected to the local vcpu,
which is exactly what we want.
Post by Milan Boberic
Post by Stefano Stabellini
Once everything is as expected I would change the frequency of the
timer, because 1u is way too frequent. I think it should be at least
3us, more like 5us.
Okay, about this... I double checked my bare-metal application and
looks like interrupts weren't generated every 1 us. Maximum frequency
of interrupts is 8 us. I checked interrupt frequency with oscilloscope
just to be sure (toggling LED on/off when interrupts occur). So, when
- interrupts to be generated every 8 us I get jitter of 6 us
- interrupts to be generated every 10 us I get jitter of 3 us (after
2-3mins it jumps to 6 us)
- interrupts to be generated every 15 us jitter is the same as when
only bare-metal application runs on board (without Xen or any OS)
These are very interesting numbers! Thanks again for running these
experiments. I don't want to jump to conclusions but they seem to verify
the theory that if the interrupt frequency is too high, we end up
spending too much time handling interrupts, the system cannot cope,
hence jitter increases.

However, I would have thought that the threshold should be lower than
15us, given that it takes 2.5us to inject an interrupt. I have a couple
of experiments suggestions below.
Post by Milan Boberic
I want to remind you that bare-metal application that only blinks LED
with high speed gives 1 us jitter, somehow introducing frequent
interrupts causes this jitter, that's why I was unsecure about this
timer passthrough. Taking in consideration that you measured Xen
overhead of 1 us I have a feeling that I'm missing something, is there
anything else I could do to get better results except sched=null,
vwfi=native, hard vCPU pinning (1 vCPU on 1 pCPU) and passthrough (not
sure if it affects the jitter) ?
I'm forcing frequent interrupts because I'm testing to see if this
board with Xen on it could be used for real-time simulations,
real-time signal processing, etc. If I could get results like yours (1
us Xen overhead) of even better that would be great! BTW how did you
measure Xen's overhead?
When I said overhead, I meant compared to Linux. The overall IRQ latency
with Xen on the Xilinx Zynq MPSoC is 2.5us. When I say "overall", I mean
from the moment the interrupt is generated to the point the interrupt
service routing is run in the baremetal guest. I measure the overhead
using TBM (https://github.com/sstabellini/tbm phys-timer) and a modified
version of Xen that injects the generic physical timer interrupts to the
guest. I think you should be able to reproduce the same number using
the TTC timer like you are doing.

In addition to sched=null and vwfi=native, I also passed
serrors=panic. This last option further reduces context switch times and
should be safe on your board. You might want to add it, and run the
numbers again.
Post by Milan Boberic
Post by Stefano Stabellini
Keep in mind that jitter is about having
deterministic IRQ latency, not about having extremely frequent
interrupts.
Yes, but I want to see exactly where will I lose deterministic IRQ
latency which is extremely important in real-time signal processing.
So, what causes this jitter, are those Xen limits, ARM limits, etc? It
would be nice to know, I'll share all the results I get.
Post by Stefano Stabellini
I would also double check that you are not using any other devices or
virtual interfaces in your baremetal app because that could negatively
affect the numbers.
I checked the bare-metal app and I think there is no other devices
that bm app is using.
This should also be confirmed by the fact that you are only getting
"DEBUG virq=68 local=1" messages and nothing else. If other interrupts
were to be injected you should see other lines such as

DEBUG virq=27 local=1

I have an idea to verify this, see below,
Post by Milan Boberic
Post by Stefano Stabellini
Linux by default uses the virtual
timer interface ("arm,armv8-timer", I would double check that the
baremetal app is not doing the same -- you don't want to be using two
timers when doing your measurements.
Hmm, I'm not sure how to check that, I could send bare-metal app if
that helps, it's created in Xilinx SDK 2017.4.
Also, should I move to Xilinx SDK 2018.2 because I'm using PetaLinux 2018.2 ?
I'm also using hardware description file for SDK that is created in
Vivado 2017.4.
Is all this could be a "not matching version" problem (I don't think
so because bm app works)?
Post by Stefano Stabellini
Even though the app. is the only one running on the CPU, the CPU may
be used to handle other interrupts and its context (such as TLB and
cache) might be flushed by other components. When these happen, the
interrupt handling latency can vary a lot.
What do you think about this? I don't know how would I check this.
I think we want to fully understand how many other interrupts the
baremetal guest is receiving. To do that, we can modify my previous
patch to suppress any debug messages for virq=68. That way, we should
only see the other interrupts. Ideally there would be none.

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..b7a8e17 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,

/* the irq is enabled */
if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
+ {
gic_raise_guest_irq(v, virq, priority);
+ if ( d->domain_id != 0 && virq != 68 )
+ printk("DEBUG virq=%d local=%d\n",virq,v == current);
+ }

list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
{


Next step would be to verify that there are no other physical interrupts
interrupting the vcpu execution other the irq=68. We should be able to
check that with the following debug patch:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e524ad5..b34c3e4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
/* Reading IRQ will ACK it */
irq = gic_hw_ops->read_irq();

+ if (current->domain->domain_id > 0 && irq != 68)
+ {
+ local_irq_enable();
+ printk("DEBUG irq=%d\n",irq);
+ local_irq_disable();
+ }
+
if ( likely(irq >= 16 && irq < 1020) )
{
local_irq_enable();
Dario Faggioli
2018-10-19 22:41:01 UTC
Permalink
Post by Stefano Stabellini
Post by Milan Boberic
I checked interrupt frequency with oscilloscope
just to be sure (toggling LED on/off when interrupts occur). So,
- interrupts to be generated every 8 us I get jitter of 6 us
- interrupts to be generated every 10 us I get jitter of 3 us (after
2-3mins it jumps to 6 us)
- interrupts to be generated every 15 us jitter is the same as when
only bare-metal application runs on board (without Xen or any OS)
These are very interesting numbers!
Indeed.
Post by Stefano Stabellini
Thanks again for running these
experiments. I don't want to jump to conclusions but they seem to verify
the theory that if the interrupt frequency is too high, we end up
spending too much time handling interrupts, the system cannot cope,
hence jitter increases.
Yep, this makes a lot of sense.
Post by Stefano Stabellini
However, I would have thought that the threshold should be lower than
15us, given that it takes 2.5us to inject an interrupt. I have a couple
of experiments suggestions below.
FWIW, I know that numbers are always relative (hw platform, workload,
etc), and I'm happy to see that you're quite confident that we can
improve further... but these numbers seems rather good to me. :-)

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
Milan Boberic
2018-10-22 15:02:18 UTC
Permalink
Hi,
Post by Stefano Stabellini
I think we want to fully understand how many other interrupts the
baremetal guest is receiving. To do that, we can modify my previous
patch to suppress any debug messages for virq=68. That way, we should
only see the other interrupts. Ideally there would be none.
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..b7a8e17 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
/* the irq is enabled */
if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
+ {
gic_raise_guest_irq(v, virq, priority);
+ if ( d->domain_id != 0 && virq != 68 )
+ printk("DEBUG virq=%d local=%d\n",virq,v == current);
+ }
list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
{
when I apply this patch there are no prints nor debug messages in xl
dmesg. So bare-metal receives only interrupt 68, which is good.
Post by Stefano Stabellini
Next step would be to verify that there are no other physical interrupts
interrupting the vcpu execution other the irq=68. We should be able to
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e524ad5..b34c3e4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
/* Reading IRQ will ACK it */
irq = gic_hw_ops->read_irq();
+ if (current->domain->domain_id > 0 && irq != 68)
+ {
+ local_irq_enable();
+ printk("DEBUG irq=%d\n",irq);
+ local_irq_disable();
+ }
+
if ( likely(irq >= 16 && irq < 1020) )
{
local_irq_enable();
But when I apply this patch it prints forever:
(XEN) DEBUG irq=1023

Thanks in advance!

Milan
Stefano Stabellini
2018-10-22 17:52:20 UTC
Permalink
Post by Milan Boberic
Hi,
Post by Stefano Stabellini
I think we want to fully understand how many other interrupts the
baremetal guest is receiving. To do that, we can modify my previous
patch to suppress any debug messages for virq=68. That way, we should
only see the other interrupts. Ideally there would be none.
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..b7a8e17 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
/* the irq is enabled */
if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
+ {
gic_raise_guest_irq(v, virq, priority);
+ if ( d->domain_id != 0 && virq != 68 )
+ printk("DEBUG virq=%d local=%d\n",virq,v == current);
+ }
list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
{
when I apply this patch there are no prints nor debug messages in xl
dmesg. So bare-metal receives only interrupt 68, which is good.
Yes, good!
Post by Milan Boberic
Post by Stefano Stabellini
Next step would be to verify that there are no other physical interrupts
interrupting the vcpu execution other the irq=68. We should be able to
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e524ad5..b34c3e4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
/* Reading IRQ will ACK it */
irq = gic_hw_ops->read_irq();
+ if (current->domain->domain_id > 0 && irq != 68)
+ {
+ local_irq_enable();
+ printk("DEBUG irq=%d\n",irq);
+ local_irq_disable();
+ }
+
if ( likely(irq >= 16 && irq < 1020) )
{
local_irq_enable();
(XEN) DEBUG irq=1023
Thanks in advance!
I know why! It's because we always loop around until we read the
spurious interrupt. Just add an && irq != 1023 to the if check.
Milan Boberic
2018-10-23 08:58:14 UTC
Permalink
Post by Stefano Stabellini
Just add an && irq != 1023 to the if check.
Added it and now when I create bare-metal guest it prints only once:

(XEN) DEBUG irq=0
(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
***@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it


This part always prints only once when I create this bare-metal guest
like I mentioned in earlier replies and we said it doesn't do any
harm:

(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
***@uz3eg-iocc-2018-2:~# (XEN) d1v0 No valid vCPU found for vIRQ35 in
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it

Now, from this patch I get:

(XEN) DEBUG irq=0

also printed only once.

Forgot to mention in reply before this one, I added serrors=panic and
it didn't make any change, numbers are the same.

Thanks in advance!

Milan
Stefano Stabellini
2018-10-24 00:24:12 UTC
Permalink
Post by Milan Boberic
Post by Stefano Stabellini
Just add an && irq != 1023 to the if check.
(XEN) DEBUG irq=0
(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it
This part always prints only once when I create this bare-metal guest
like I mentioned in earlier replies and we said it doesn't do any
(XEN) d1v0 No valid vCPU found for vIRQ32 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ33 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ34 in the target list (0x2). Skip it
the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ36 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ37 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ38 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ39 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ40 in the target list (0x2). Skip it
(XEN) d1v0 No valid vCPU found for vIRQ41 in the target list (0x2). Skip it
(XEN) DEBUG irq=0
also printed only once.
Forgot to mention in reply before this one, I added serrors=panic and
it didn't make any change, numbers are the same.
Thanks in advance!
It is good that there are no physical interrupts interrupting the cpu.
serrors=panic makes the context switch faster. I guess there are not
enough context switches to make a measurable difference.

I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).

Just to be paranoid, we might also want to check the following, again it
shouldn't get printed:

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..6cf6814 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
struct pending_irq *iter, *n;
unsigned long flags;

+ if ( d->domain_id != 0 && virq != 68 )
+ printk("DEBUG virq=%d local=%d\n",virq,v == current);
/*
* For edge triggered interrupts we always ignore a "falling edge".
* For level triggered interrupts we shouldn't, but do anyways.
Milan Boberic
2018-10-25 10:09:29 UTC
Permalink
Hi,
Post by Stefano Stabellini
It is good that there are no physical interrupts interrupting the cpu.
serrors=panic makes the context switch faster. I guess there are not
enough context switches to make a measurable difference.
Yes, when I did:
grep ctxt /proc/2153/status
I got:
voluntary_ctxt_switches: 5
nonvoluntary_ctxt_switches: 3
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
This bare-metal application is the most suspicious, indeed. Still
waiting answer on Xilinx forum.
Post by Stefano Stabellini
Just to be paranoid, we might also want to check the following, again it
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..6cf6814 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -532,6 +532,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
struct pending_irq *iter, *n;
unsigned long flags;
+ if ( d->domain_id != 0 && virq != 68 )
+ printk("DEBUG virq=%d local=%d\n",virq,v == current);
/*
* For edge triggered interrupts we always ignore a "falling edge".
* For level triggered interrupts we shouldn't, but do anyways.
Checked it again, no prints. I hoped that I will discover some vIRQs
or pIRQs slowing things down but no, no prints.
I might try something else instead of this bare-metal application
because this Xilinx SDK example is very suspicious.

Thank you for your time.

Milan
Julien Grall
2018-10-25 11:30:28 UTC
Permalink
Hi Milan,
Post by Milan Boberic
Hi,
Post by Stefano Stabellini
It is good that there are no physical interrupts interrupting the cpu.
serrors=panic makes the context switch faster. I guess there are not
enough context switches to make a measurable difference.
grep ctxt /proc/2153/status
voluntary_ctxt_switches: 5
nonvoluntary_ctxt_switches: 3
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
This bare-metal application is the most suspicious, indeed. Still
waiting answer on Xilinx forum.
Sorry if it was already asked. Can you provide your .config for your
test? Do you have DEBUG enabled?

Cheers,
--
Julien Grall
Milan Boberic
2018-10-25 12:36:51 UTC
Permalink
Post by Dario Faggioli
Hi Milan,
Hi Julien,
Post by Dario Faggioli
Sorry if it was already asked. Can you provide your .config for your
test?
Yes of course, bare-metal's .cfg file is in it's in attachment (if
that is what you asked :) ).
Post by Dario Faggioli
Do you have DEBUG enabled?
I'm not sure where exactly should I disable it. If you check line 18
in xl dmesg file in attachment it says debug=n, it's output of xl
dmesg. I'm not sure if that is the DEBUG you are talking about.
Also if I add prints somewhere in the code, I can see them, does that
mean that DEBUG is enabled? If yes, can you tell me where exactly
should I disable it?

Thanks in advance!

Milan
Dario Faggioli
2018-10-25 13:44:51 UTC
Permalink
Post by Milan Boberic
Post by Julien Grall
Do you have DEBUG enabled?
I'm not sure where exactly should I disable it. If you check line 18
in xl dmesg file in attachment it says debug=n, it's output of xl
dmesg. I'm not sure if that is the DEBUG you are talking about.
Yes, this mean debug is *not* enabled. Which is the correct setup for
doing performance/latency evaluation.

It might, OTOH, be wise to turn it on when investigating the system
behavior (but that's a general remark, I don't know to what Julien was
referring to in this specific case).

To turn it on, in a recent enough Xen, which I think is what you're
using, you can use Kconfig (e.g., `make -C xen/ menuconfig').
Post by Milan Boberic
Also if I add prints somewhere in the code, I can see them, does that
mean that DEBUG is enabled? If yes, can you tell me where exactly
should I disable it?
It depends on the "print". If you add 'printk("bla");', it is correct
that you see "bla" in the log, even with debug=n.

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
Julien Grall
2018-10-25 14:00:27 UTC
Permalink
Hi Dario,
Post by Dario Faggioli
Post by Milan Boberic
Post by Julien Grall
Do you have DEBUG enabled?
I'm not sure where exactly should I disable it. If you check line 18
in xl dmesg file in attachment it says debug=n, it's output of xl
dmesg. I'm not sure if that is the DEBUG you are talking about.
Yes, this mean debug is *not* enabled. Which is the correct setup for
doing performance/latency evaluation.
It might, OTOH, be wise to turn it on when investigating the system
behavior (but that's a general remark, I don't know to what Julien was
referring to in this specific case).
The narrow down the discrepancies during the measurement, I wanted to
check whether Milan were doing the performance measurement with debug
enabled.

Now I can tick off DEBUG been a potential cause of the latency/performance.

Cheers,
--
Julien Grall
Julien Grall
2018-10-25 14:04:10 UTC
Permalink
Post by Milan Boberic
Post by Dario Faggioli
Hi Milan,
Hi Julien,
Post by Dario Faggioli
Sorry if it was already asked. Can you provide your .config for your
test?
Yes of course, bare-metal's .cfg file is in it's in attachment (if
that is what you asked :) ).
I was asking the Xen configuration (xen/.config) to know what you have
enabled in Xen.

Cheers,
--
Julien Grall
Milan Boberic
2018-10-25 14:47:50 UTC
Permalink
Post by Julien Grall
I was asking the Xen configuration (xen/.config) to know what you have
enabled in Xen.
Oh, sorry, because I'm building xen from git repository here is the
link to it where you can check the file you mentioned.

https://github.com/Xilinx/xen/tree/xilinx/versal/xen
Post by Julien Grall
It might, OTOH, be wise to turn it on when investigating the system
behavior (but that's a general remark, I don't know to what Julien was
referring to in this specific case).
I will definitely try to enable DEBUG.

Milan
Julien Grall
2018-10-25 14:51:14 UTC
Permalink
Post by Milan Boberic
Post by Julien Grall
I was asking the Xen configuration (xen/.config) to know what you have
enabled in Xen.
Oh, sorry, because I'm building xen from git repository here is the
link to it where you can check the file you mentioned.
https://github.com/Xilinx/xen/tree/xilinx/versal/xen
I am afraid no. .config is generated during building time. So can you
paste here please.

Cheers,
--
Julien Grall
Stefano Stabellini
2018-10-25 16:18:51 UTC
Permalink
Post by Milan Boberic
Post by Julien Grall
I was asking the Xen configuration (xen/.config) to know what you have
enabled in Xen.
Oh, sorry, because I'm building xen from git repository here is the
link to it where you can check the file you mentioned.
https://github.com/Xilinx/xen/tree/xilinx/versal/xen
I am afraid no. .config is generated during building time. So can you paste
here please.
It most probably is the default kconfig for Xen 4.11.

Milan,

Julien was asking for the file named ".config" you can find under the
xen/ directory.
Julien Grall
2018-10-25 11:09:29 UTC
Permalink
Hi Stefano,
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
Is it number you measured on Xen 4.11 flavored Xilinx? Or are they
coming from the blog post [1] which is based on Xen 4.9?

If the latter, then I can't rule out we may have introduce a slowdown
for good or bad reason...

To rule out this possibility, I would recommend to try and reproduce the
same number on Xen 4.9 and then try with Xen 4.11.

Cheers,

[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/
--
Julien Grall
Stefano Stabellini
2018-10-25 16:15:51 UTC
Permalink
Post by Milan Boberic
Hi Stefano,
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from
the blog post [1] which is based on Xen 4.9?
If the latter, then I can't rule out we may have introduce a slowdown for good
or bad reason...
To rule out this possibility, I would recommend to try and reproduce the same
number on Xen 4.9 and then try with Xen 4.11.
Cheers,
[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/
I was talking about the old numbers from Xen 4.9. You are right, we
cannot rule out the possibility that we introduced a slowdown.
Julien Grall
2018-10-26 19:12:31 UTC
Permalink
Hi Stefano,
Post by Stefano Stabellini
Post by Milan Boberic
Hi Stefano,
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from
the blog post [1] which is based on Xen 4.9?
If the latter, then I can't rule out we may have introduce a slowdown for good
or bad reason...
To rule out this possibility, I would recommend to try and reproduce the same
number on Xen 4.9 and then try with Xen 4.11.
Cheers,
[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/
I was talking about the old numbers from Xen 4.9. You are right, we
cannot rule out the possibility that we introduced a slowdown.
Can you try to reproduce those number with your setup on Xen 4.11?

Cheers,
--
Julien Grall
Stefano Stabellini
2018-10-26 20:41:13 UTC
Permalink
Post by Milan Boberic
Hi Stefano,
Post by Stefano Stabellini
Post by Milan Boberic
Hi Stefano,
Post by Stefano Stabellini
I don't have any other things to suggest right now. You should be able
to measure an overall 2.5us IRQ latency (if the interrupt rate is not
too high).
Is it number you measured on Xen 4.11 flavored Xilinx? Or are they coming from
the blog post [1] which is based on Xen 4.9?
If the latter, then I can't rule out we may have introduce a slowdown for good
or bad reason...
To rule out this possibility, I would recommend to try and reproduce the same
number on Xen 4.9 and then try with Xen 4.11.
Cheers,
[1] https://blog.xenproject.org/2017/03/20/xen-on-arm-interrupt-latency/
I was talking about the old numbers from Xen 4.9. You are right, we
cannot rule out the possibility that we introduced a slowdown.
Can you try to reproduce those number with your setup on Xen 4.11?
Yes, I intend to, it is on my TODO.
Milan Boberic
2018-10-29 12:29:34 UTC
Permalink
Sorry for late reply,
Post by Julien Grall
I am afraid no. .config is generated during building time. So can you
paste here please.
".config" file is in attachment.

I also tried Xen 4.9 and I got almost same numbers, jitter is smaller
by 150ns which isn't significant change at all.

Milan
Julien Grall
2018-10-31 18:59:38 UTC
Permalink
Hi Milan,
Post by Milan Boberic
Sorry for late reply,
Don't worry, thank you for the testing and sending the .config.
Post by Milan Boberic
Post by Julien Grall
I am afraid no. .config is generated during building time. So can you
paste here please.
".config" file is in attachment.
I also tried Xen 4.9 and I got almost same numbers, jitter is smaller
by 150ns which isn't significant change at all.
Interesting. Could you confirm the commit you were using (or the point
release)?

Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.

Cheers,
--
Julien Grall
Milan Boberic
2018-10-31 20:35:17 UTC
Permalink
Hi,
Post by Julien Grall
Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.
All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9, etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
Here is link to it:
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9

Best regards.

Milan
Julien Grall
2018-10-31 21:16:43 UTC
Permalink
Post by Milan Boberic
Hi,
Post by Julien Grall
Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.
All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9, etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9
This branch is quite ahead of the branch Stefano's used. There are 94
commits more just for Arm specific code.

What I am interested is to see if we are able to reproduce Stefano's
number with the same branch. So we can have a clue whether there are a
slow down introduce in new code.

Stefano, you mention you will look at reproducing the numbers. Do you
have any update on this?

Cheers,
--
Julien Grall
Stefano Stabellini
2018-11-01 20:20:42 UTC
Permalink
Post by Milan Boberic
Hi,
Post by Julien Grall
Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.
All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9, etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9
This branch is quite ahead of the branch Stefano's used. There are 94 commits
more just for Arm specific code.
What I am interested is to see if we are able to reproduce Stefano's number
with the same branch. So we can have a clue whether there are a slow down
introduce in new code.
Stefano, you mention you will look at reproducing the numbers. Do you have any
update on this?
No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.
Julien Grall
2018-11-01 20:35:03 UTC
Permalink
Hi Stefano,
Post by Stefano Stabellini
Post by Milan Boberic
Hi,
Post by Julien Grall
Interesting. Could you confirm the commit you were using (or the point
release)?
Stefano's number were based on commit "fuzz: update README.afl example"
55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a which is an unreleased version
of Xen.
All Xens I used are from Xilinx git repository because I have
UltraZed-EG board which has Zynq UltraScale SoC.
Under branches you can find Xen 4.8, 4.9, etc.
I always used latest commit: c227fe68589bdfb36b85f7b78c034a40c95b9a30
https://github.com/Xilinx/xen/tree/xilinx/stable-4.9
This branch is quite ahead of the branch Stefano's used. There are 94 commits
more just for Arm specific code.
What I am interested is to see if we are able to reproduce Stefano's number
with the same branch. So we can have a clue whether there are a slow down
introduce in new code.
Stefano, you mention you will look at reproducing the numbers. Do you have any
update on this?
No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.
I would be actually surprised that the interrupt latency varies with
virtualization depending on the interrupts...

If that were the case, then doing the latency on the physical interrupt
(unlikely going to be used by virtualized guest) was quite pointless.

Cheers,
--
Julien Grall
Andrii Anisov
2018-11-20 11:33:31 UTC
Permalink
Hello Stefano,
Post by Stefano Stabellini
No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.
Could you share your approach for interrupts latency measurement? Are
you using any HW specifics or it is SoC independent?

I would like to get more evidences for optimizations of gic/vgic/gic-v2
code I did for our customer (its about old vgic, we are still on xen 4.10).
--
Sincerely,
Andrii Anisov.
Stefano Stabellini
2018-11-27 21:27:33 UTC
Permalink
Post by Andrii Anisov
Hello Stefano,
Post by Stefano Stabellini
No, I haven't had any time. Aside from the Xen version, another
difference is the interrupt source. I used the physical timer for
testing.
Could you share your approach for interrupts latency measurement? Are you
using any HW specifics or it is SoC independent?
I would like to get more evidences for optimizations of gic/vgic/gic-v2 code I
did for our customer (its about old vgic, we are still on xen 4.10).
Hi Andrii,

See the following:

https://marc.info/?l=xen-devel&m=148668817704668

The numbers have improved now thanks to vwfi=native and other
optimizations but the mechanism to setup the experiment are the same.

Cheers,

Stefano
Andrii Anisov
2018-11-29 08:19:44 UTC
Permalink
Hello Stefano,
Post by Stefano Stabellini
Hi Andrii,
https://marc.info/?l=xen-devel&m=148668817704668
Thank you for the point. I remember this email, but missed it also gives
details to setup the experiment. It looks that bare-metal app is not SoC
specific, so going to take it in use.
Post by Stefano Stabellini
The numbers have improved now thanks to vwfi=native and other
optimizations but the mechanism to setup the experiment are the same.I know about `vwfi=native` but it does not fit our requirements :(
--
Sincerely,
Andrii Anisov.
Andrii Anisov
2018-12-10 12:23:46 UTC
Permalink
Hello Julien,
What are the numbers without Xen?
Good question. Didn't try. At least putchar should be implemented for that.
Which version of Xen are you using?
This morning's staging, commit-id 58eb90a9650a8ea73533bc2b87c13b8ca7bbe35a.
This also tells you that in the trap case the vGIC is not the bigger overhead.
Indeed, not the bigger. But significant even in this trivial case (receiving an interrupt twice a second).
This is with all your series applied but [4], correct?
Right.
Did you try to see the perfomance improvement patch by patch?
No. Not yet.
--
Sincerely,
Andrii Anisov.
Julien Grall
2018-11-07 13:14:55 UTC
Permalink
Hi Dario,
Post by Dario Faggioli
Post by Milan Boberic
Hi,
Hi Milan,
Post by Milan Boberic
I'm testing Xen Hypervisor 4.10 performance on UltraZed-EG board with
carrier card.
I created bare-metal application in Xilinx SDK.
- start triple timer counter (ttc) which generates
interrupt every 1us
- turn on PS LED
- call function 100 times in for loop (function that sets
some values)
- turn off LED
- stop triple timer counter
- reset counter value
Ok, I'm adding Stefano, Julien, and a couple of other people interested
in RT/lowlat on Xen.
Post by Milan Boberic
- used null scheduler (sched=null) and vwfi=native
- bare-metal application have one vCPU and it is pinned for pCPU1
- domain which is PetaLinux also have one vCPU pinned for pCPU0,
other pCPUs are unused.
Under Xen Hypervisor I can see 3us jitter on oscilloscope.
So, this is probably me not being familiar with Xen on Xilinx (and with
Xen on ARM as a whole), but there's a few things I'm not sure I
- you say you use sched=null _and_ pinning? That should not be
necessary (although, it shouldn't hurt either)
- "domain which is PetaLinux", is that dom0?
IAC, if it's not terrible hard to run this kind of test, I'd say, try
without 'vwfi=native', and also with another scheduler, like Credit,
(but then do make sure you use pinning).
Post by Milan Boberic
When I ran same bm application with JTAG from Xilinx SDK (without Xen
Hypervisor, directly on the board) there is no jitter.
Here, when you say "without Xen", do you also mean without any
baremetal OS at all?
Post by Milan Boberic
I'm curios what causes this 3us jitter in Xen (which isn't small
jitter at all) and is there any way of decreasing it?
Right. So, I'm not sure I've understood the test scenario either. But
yeah, 3us jitter seems significant. Still, if we're comparing with
bare-hw, without even an OS at all, I think it could have been expected
for latency and jitter to be higher in the Xen case.
Anyway, I am not sure anyone has done a kind of analysis that could
help us identify accurately from where things like that come, and in
what proportions.
It would be really awesome to have something like that, so do go ahead
if you feel like it. :-)
I think tracing could help a little (although we don't have a super-
sophisticated tracing infrastructure like Linux's perf and such), but
sadly enough, that's still not available on ARM, I think. :-/
FWIW, I just posted a series to add xentrace support on Arm. Hopefully we can
get this merged for Xen 4.12.

Cheers,

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg00563.html
Post by Dario Faggioli
Regards,
Dario
--
Julien Grall
Continue reading on narkive:
Loading...