[Xen-devel] xen/arm: Software Step ARMv8

Discussion:

[Xen-devel] xen/arm: Software Step ARMv8 - PC stuck on instruction

Florian Jakobsmeier

2017-07-04 12:30:46 UTC

Hello all,

I'm trying to implement a single step functionality for XEN on ARMv8 using
"Software Step Exceptions". My problem with this is, that after taking the
exception the PC will stay on the same instruction.

By adding a "singlestep_enabled" flag in the "struct arch_domain" (based on
the single step mechanism for x86), I'm able to set the needed registers
(namely MDSCR_EL1.SS , SPSR_EL2.SS, MDCR_EL2.TDE ) for each vcpu that is
used by a given domain (referenced by its domain_id).
Within the "arch/arm/traps.c:leave_hypervisor_tail()" function, which is
called when exiting the hypervisor (according to /arch/arm/arm64/entry.S),
I am checking the singlestep_enabled flag and set the registers (by this i
can assure that each register is set on every vm entry). Also I'm checking
that the registers are set for the correct domain and vcpu (by examining
current->domain)

In comparison with the ARM ArchManual State machine(ARM DDI 0487B.a: Page
1957) an instruction that should be single-stepped, will be executed when
"ERET setting PSTATE.SS to 1".
For this to happen, specific conditions should be met. Table D2-22 on page
1959 defines which Table sets this condition (in my case :{MDSCR_EL1.SS=1,
Lock=False, NS=1, TDE=1} ).

Because I'm routing the exception from EL1 to EL2 (because of their naming
convention "From EL = EL2" and "Target EL=EL1, according to Page 1959) with
KDE=1, PSTATE.D=1 (monitored by printing MDSCR_EL1 and cpu_user_regs.cpsr)
the system should copy SPSR_EL2.SS to PSTATE.SS when executing ERET.

The state machine dictates, that when PSTATE.SS=1 the system should be in
the "Active-not-pending" state and after this should execute the
single-stepped instruction, which should increase the PC.
But because my PC stays constant, the state in which the system is, should
be the Active-Pending state.

By printing the PC value within the Exception handler
(xen/arch/arm/traps.c:do_trap_guest_sync()) I can see the exceptions will
be generated (otherwise there would be no prints) and the PC stays on the
same value, which results in a not working VM.

Following is the code, that is use to setup single stepping:

--- original_xen/xen/xen/arch/arm/traps.c 2017-07-04 13:58:09.526280389

+0200
+++ xen/xen/arch/arm/traps.c 2017-07-04 13:48:48.146066332 +0200
@@ -1247,6 +1247,7 @@
asmlinkage void do_trap_guest_sync(struct cpu_user_regs *regs)
{
const union hsr hsr = { .bits = regs->hsr };
enter_hypervisor_head(regs);
switch (hsr.ec) {
/*
@@ -2917,6 +2931,7 @@
#endif
+ do_trap_software_step(regs);
+ break;
gprintk(XENLOG_WARNING,
"Unknown Guest Trap. HSR=0x%x EC=0x%x IL=%x
Syndrome=0x%"PRIx32"\n",
Extended the Switch case in trap_guest_sync_handler to support singlestep

on ARMv8. Defined "HSR_EC_SOFTSTEP_LOWER_EL"=0x32 in
/xen/include/asm/processor.h

+asmlinkage void do_trap_software_step(struct cpu_user_regs *regs)
+{
+ /*inform dom0*/
+ //PC to next instruction
+ gprintk(XENLOG_ERR, "SPSR_EL2 = 0x%lx Regs.SPSR = 0x%x\n",
READ_SYSREG(SPSR_EL2) ,regs->cpsr);
+}
+

Handler method that will be called when a software step exception is
catched by the hypervisor (currently just prints various information). This
is also the function, which allows me to check whether or not the PC was
increased.

asmlinkage void leave_hypervisor_tail(void)
{
+ /*This methode will be called after the 'guest_entry' macro in
/arch/arm64/entry.S set guest registers
+ Check single_step_enabled flag in domain struct here and set needed
registers
+
+ */
+
+ struct vcpu *v = current;
+
+ if ( unlikely(v->domain->arch.monitor.singlestep_enabled ) )
+ {
+
+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);
+ WRITE_SYSREG(READ_SYSREG(SPSR_EL2) | 0x200000, SPSR_EL2 );
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);
+
+ if (!(v->arch.single_step ))
+ {
+ gprintk(XENLOG_ERR, "Setting vcpu=%d for
domain=%d\n",v->vcpu_id,v->domain->domain_id);
+
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDSCR_EL1 0x%lx\n",
READ_SYSREG(MDSCR_EL1));
+ gprintk(XENLOG_ERR, "[Set_singlestep] SPSR_EL2 0x%lx\n",
READ_SYSREG(SPSR_EL2));
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDCR_EL2 0x%lx\n",
READ_SYSREG(MDCR_EL2));
+ v->arch.single_step = 1;
+
+ return;
+ }else
+ {
+ //gprintk(XENLOG_ERR, "Register for vcpu=%d for domain=%d
already set\n",v->vcpu_id,v->domain->domain_id);
+ }
+ }

As mentioned, this function will set the needed registers.
"monitor.singlestep_enabled" is the domain SS flag which is used to
determine if the registers should be set. "arch.single_step" is the vcpu
flag to check if the register were already set once (not really in use as
for now). "HDCR_TDE" is the same value as "MDCR_EL2_TDE" would be, but this
one is not implemented yet, thats why I'm using HDCR_TDE. "SPSR_EL2 |
0x200000" sets the SS bit for EL2 (because our exception will be taken to
the hypervisor). "MDSCR_EL1 | 0x1" to enable the SS bit.
Because I'm checking the domain in this function, every vcpu that will be
used, will be set with the values above. By this I can assure that each
vcpu will trigger these exceptions.

--- original_xen/xen/xen/arch/arm/monitor.c 2017-07-04

13:58:09.522280302 +0200
+++ xen/xen/arch/arm/monitor.c 2017-07-04 10:37:09.553642139 +0200
@@ -28,6 +28,10 @@

int arch_monitor_domctl_event(struct domain *d,
struct xen_domctl_monitor_op *mop)

{
struct arch_domain *ad = &d->arch;
bool_t requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
switch ( mop->event )
@@ -45,6 +49,168 @@
break;
}
+ {
+ /*Adapted from x8/singlestepping*/
+
+ bool_t old_status = ad->monitor.singlestep_enabled;
+
+ if ( unlikely(old_status == requested_status) )
+ return -EEXIST;
+ gprintk(XENLOG_ERR, "Setting singlestep enabled to %x\n",
requested_status);
+ gprintk(XENLOG_ERR, "Anzahl VCPUs=%d in Domain %d\n",
d->domain_id, d->max_vcpus);
+ gprintk(XENLOG_ERR, "Setting singlestep Flag for Domain=%x\n",
d->domain_id);
+
+ domain_pause(d);
+ ad->monitor.singlestep_enabled = requested_status;
+ domain_unpause(d);

This method will be called through the /tools/tests/xen-access tool test
and sets the domain flag in order to enable single step.

My guess is that (in relation to the state machine of software stepping) my
implementation misses something for the ERET instruction to copy the
correct value to PSTATE.SS, even though the table D2-24 (page 1961) should
indicate that the SPSR_EL2.SS bit will be written.

I would be thankful if somebody who is familiar with the ARM debug
architecture could help me find the necessary information to resolve this
problem

Greetings Florian

Julien Grall

2017-07-04 18:37:12 UTC

Permalink

Post by Florian Jakobsmeier
Hello all,

Hi Florian,

Post by Florian Jakobsmeier
asmlinkage void leave_hypervisor_tail(void)
{
+ /*This methode will be called after the 'guest_entry' macro in
/arch/arm64/entry.S set guest registers
+ Check single_step_enabled flag in domain struct here and set
needed registers
+
+ */
+
+ struct vcpu *v = current;
+
+ if ( unlikely(v->domain->arch.monitor.singlestep_enabled ) )
+ {
+
+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);
+ WRITE_SYSREG(READ_SYSREG(SPSR_EL2) | 0x200000, SPSR_EL2 );
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);
+
+ if (!(v->arch.single_step ))
+ {
+ gprintk(XENLOG_ERR, "Setting vcpu=%d for
domain=%d\n",v->vcpu_id,v->domain->domain_id);
+
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDSCR_EL1
0x%lx\n", READ_SYSREG(MDSCR_EL1));
+ gprintk(XENLOG_ERR, "[Set_singlestep] SPSR_EL2
0x%lx\n", READ_SYSREG(SPSR_EL2));
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDCR_EL2
0x%lx\n", READ_SYSREG(MDCR_EL2));
+ v->arch.single_step = 1;
+
+ return;
+ }else
+ {
+ //gprintk(XENLOG_ERR, "Register for vcpu=%d for
domain=%d already set\n",v->vcpu_id,v->domain->domain_id);
+ }
+ }
As mentioned, this function will set the needed registers.
"monitor.singlestep_enabled" is the domain SS flag which is used to
determine if the registers should be set. "arch.single_step" is the vcpu
flag to check if the register were already set once (not really in use
as for now). "HDCR_TDE" is the same value as "MDCR_EL2_TDE" would be,
but this one is not implemented yet, thats why I'm using HDCR_TDE.
"SPSR_EL2 | 0x200000" sets the SS bit for EL2 (because our exception
will be taken to the hypervisor). "MDSCR_EL1 | 0x1" to enable the SS bit.
Because I'm checking the domain in this function, every vcpu that will
be used, will be set with the values above. By this I can assure that
each vcpu will trigger these exceptions.

SPSR_EL2 is saved/restored on entry and exit of a trap to the hypervisor
(see arch/arm/arm*/entry.S). So the value you wrote in the register is
overridden afterwards.

If you want to set the SS bit, you need to do in the save registered
cpsr. You can access using:

guest_cpu_user_regs()->cpsr |= 0x200000;

Cheers,

--
Julien Grall

Florian Jakobsmeier

2017-07-05 14:03:00 UTC

Permalink

Post by Julien Grall

Post by Florian Jakobsmeier
Hello all,

Hi Florian,
asmlinkage void leave_hypervisor_tail(void)

Post by Florian Jakobsmeier
{
+ /*This methode will be called after the 'guest_entry' macro in
/arch/arm64/entry.S set guest registers
+ Check single_step_enabled flag in domain struct here and set
needed registers
+
+ */
+
+ struct vcpu *v = current;
+
+ if ( unlikely(v->domain->arch.monitor.singlestep_enabled ) )
+ {
+
+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);
+ WRITE_SYSREG(READ_SYSREG(SPSR_EL2) | 0x200000, SPSR_EL2 );
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);
+
+ if (!(v->arch.single_step ))
+ {
+ gprintk(XENLOG_ERR, "Setting vcpu=%d for
domain=%d\n",v->vcpu_id,v->domain->domain_id);
+
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDSCR_EL1
0x%lx\n", READ_SYSREG(MDSCR_EL1));
+ gprintk(XENLOG_ERR, "[Set_singlestep] SPSR_EL2
0x%lx\n", READ_SYSREG(SPSR_EL2));
+ gprintk(XENLOG_ERR, "[Set_singlestep] MDCR_EL2
0x%lx\n", READ_SYSREG(MDCR_EL2));
+ v->arch.single_step = 1;
+
+ return;
+ }else
+ {
+ //gprintk(XENLOG_ERR, "Register for vcpu=%d for
domain=%d already set\n",v->vcpu_id,v->domain->domain_id);
+ }
+ }
As mentioned, this function will set the needed registers.
"monitor.singlestep_enabled" is the domain SS flag which is used to
determine if the registers should be set. "arch.single_step" is the vcpu
flag to check if the register were already set once (not really in use as
for now). "HDCR_TDE" is the same value as "MDCR_EL2_TDE" would be, but this
one is not implemented yet, thats why I'm using HDCR_TDE. "SPSR_EL2 |
0x200000" sets the SS bit for EL2 (because our exception will be taken to
the hypervisor). "MDSCR_EL1 | 0x1" to enable the SS bit.
Because I'm checking the domain in this function, every vcpu that will be
used, will be set with the values above. By this I can assure that each
vcpu will trigger these exceptions.

SPSR_EL2 is saved/restored on entry and exit of a trap to the hypervisor
(see arch/arm/arm*/entry.S). So the value you wrote in the register is
overridden afterwards.
If you want to set the SS bit, you need to do in the save registered cpsr.
guest_cpu_user_regs()->cpsr |= 0x200000;
This solved the problem. Thank you
Cheers,
--
Julien Grall

Greetings
Florian

James Morse

2017-08-03 09:49:06 UTC

Permalink

Hi Florian, Julien,

i was just testing the single step implementation and realized that the
before mentioned solution is not fully working. I'm still trying to
enable SS for a VM on Xen.
To test my implementation i wrote a small Kernel Module and started it
in the DomU. The module only contains a loop which increments a counter
and prints its value.
Right after loading the moduleI start the single step mechanism in the
Dom0 for the VM (again with xen-access).
As soon as i start the SS the VM will stop working.
The ARM ARM (D2-1956 - ARM DDI 0487B.a ID033117) states that, in order
A debugger enables MDSCR_EL1.SS = 1
Executes an ERET

...with SPSR.SS = 1, and you need to ERET with PSTATE.D disabled (which I assume
Xen always has).

This then becomes the guest's PSTATE.SS bit, which suppresses the single-step
exception until it has stepped one instruction.

The PE executes the instruction to be single-stepped
Takes a software step exception on the next instruction
My guess is that by setting the needed SS registers ever time when we
leave the guest, the configuration won't allow the guest to execute the
"to be single stepped instruction"
Before executing the (first) instruction the VM will generate the SS
exception (as desired). In the hypervisor we will set the SS registers
again, which could hinder the VM to execute the instruction (which we
want because we already generated an SS exception for this instruction)
and instead generate a second SS exception for it. This will lead to the
second PC print in the single step handler
But I'm not able to find any proof for this.

I'm afraid the ARM-ARM could be clearer about how this works. (It's had me
confused all week!).

The state machine in figure D2-4 (D2.12.3 the software step state machine)
should help. I haven't looked at Xen (or your patch), but from your description
it sounds like you are entering the guest with the debug state machine in
active-pending state, then taking a single-step exception immediately. You never
actually execute the instruction-to-be-stepped.

Instead you want to enter the guest in active-not-pending state, the rules for
this are in 'D2.12.4 Entering the active-not-pending state'.

Thanks,

James

Florian Jakobsmeier

2017-08-03 10:16:29 UTC

Permalink

Hey Julien,

Would you mind sharing the latest version of your code?
Of course not. This is the current version:

asmlinkage void leave_hypervisor_tail(void)

{
+ /*This methode will be called after the 'guest_entry' macro in
/arch/arm64/entry.S set guest registers
+ Check single_step_enabled flag in domain struct here and set needed
registers
+ */
+
+ struct vcpu *v = current;
+
+ if ( unlikely(v->domain->arch.monitor.singlestep_enabled ) )
+ {
+
+
+
+ if(!(guest_cpu_user_regs()->cpsr & 0b1000))
+ {
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);
+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);
+ guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr |
0x200000;
+ WRITE_SYSREG( READ_SYSREG(DAIF) & ~0x200, DAIF);
+ isb();
+ v->arch.single_step = 1;
+
+ }
+
+
+ }else
+ {
+ //single_step Domain flag not set
+ if( v->arch.single_step )
+ {
+ gprintk(XENLOG_ERR, "Domain flag not set, but vcpu flag is
set\n");
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) & ~0x1, MDSCR_EL1);
+ guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr &
~0x200000;
+ //WRITE_SYSREG(READ_SYSREG(SPSR_EL2) | 0x200000, SPSR_EL2 );
+ WRITE_SYSREG( READ_SYSREG(DAIF) & ~0x200, DAIF);
+ v->arch.single_step = 0;
+ }
+
+ }
+
while (1)
{
local_irq_disable();

This code does still the same. Check if domain flag is set, if so check if
SS bit has to be set. If Domain flag is not set, clear SS related bits.

Did you look where the PCs point to? Is it your kernel module?

Yes. I compared it with my Linux Image (using objdump) and found that

these instructions are within a spinlock (the function is called
_raw_spin_lock to be exact). My module is always loaded around the address
0xffff0000008e0000. Also I could see that its not the case that the
addresses are printed twice but the VM just keeps within the spinlock
(which created problems with printing).

By tracing every single step I could determine the function sequence that
is called. It starts out with an <el1_irq> and stops with an
<hrtimer_interrupt> that keeps getting locked in <_raw_spin_lock>.
In order to solve this, I compared my solution with the KVM one, where I
saw (at least for my understanding) that they disable Interrupts for the
VM. In the KVM file: /kvm/virt/arm/arm.c the function
"kvm_arch_vcpu_ioctl_run" handles the running of the VM. The
"kvm_arm_setup_debug" function does the same steps as I do in order to
enable software step exceptions.
So I can't see any difference there.

I also modified xen-access so that the singlestepp will be startet with an
SMC from the Guest. Additionally i wrote a second test kernel module which
only executes an SMC and than will be stopped. I can trace the two SMCs and
than the course to the spinlock (I put the trace below). While in the
spinlock the VM won't response to anything. But after disabling singlestep
it starts to work again.
Enabled singlestep directly (not with an SMC) results in the VM to be
locked in the spinlock immediately.
There is also a problem with using printk within my module for the same
reason. It will always end in the spinlock.

One reason I could imaging is that because I'm singlestepping everything,
including timer interrupts, there will be problems with the scheduling of
the VM. This results in, not meeting the conditions to exit the spinlock.

I hope i made the situation as clear as possible.

Thank you for your help
Florian

This is the function trace obtained by singlestepping every instruction
until the system reached the spinlock below. The two SMCs from my Module
are needed for my xen-access implementation to ensure that the SS starts
withing my module

Singlestep request found
xenaccess init
max_gpfn = 60000
Run singlestep with commands? (y/n):y
Starting loop
Privileged call: pc=ffff0000008e0000 (vcpu 0)
Privileged call: pc=ffff0000008e0004 (vcpu 0)
Singlestep: PC=ffff000008081a80 b ffff000008082700 <el1_irq>
Singlestep: PC=ffff000008082700 <el1_irq>
[...]
[...]
[...]
[...]
[...]
Singlestep: PC=ffff0000080c1714 return to <irq_enter>
[...]
[...]
[...]
[...]
[...]
[...]
[...]
[...]
[...]
Singlestep: PC=ffff000008105b18 handle_percpu_devid_irq
[...]
/linux/driver/clocksource/arm_arch_timer.c
[...]
[...]
Singlestep: PC=ffff0000088cbd54
Singlestep: PC=ffff0000088cbd58
Singlestep: PC=ffff0000088cbd5c
Singlestep: PC=ffff0000088cbd60
Singlestep: PC=ffff0000088cbd64
Singlestep: PC=ffff0000088cbd68 This code block gets repeated until
SS disabled
Singlestep: PC=ffff0000088cbd6c
Singlestep: PC=ffff0000088cbd70
Singlestep: PC=ffff0000088cbd64
Singlestep: PC=ffff0000088cbd68
Singlestep: PC=ffff0000088cbd6c
Singlestep: PC=ffff0000088cbd70

James Morse

2017-08-03 10:46:17 UTC

Permalink

Hi Florian,

asmlinkage void leave_hypervisor_tail(void)

This tests SPSR.M[4], which is set for exits from AARCH32.

+ {
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);

Sets MDSCR.EL1.SS to enable the state machine.

+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);

Sets TDE to trap debug exceptions to EL2 from lower exception levels. I'm
surprised this isn't always set for Xen. Do you allow guests to use the debug
features for their own purposes?

+ guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr |
0x200000;

Sets SPSR.SS to suppress the step exception in the guest until its executed an
instruction.

+ WRITE_SYSREG( READ_SYSREG(DAIF) & ~0x200, DAIF);

Here I'm confused. This looks like you are clearing PSTATE.D from the DAIF
register for EL2. This means debug exceptions are unmasked for exceptions from
Xen at EL2.

If you ERET with PSTATE.D clear the SPSR.SS bit will be discarded instead. I
thin this is your bug. The rules in 'D2.12.4 Entering the active-not-pending
state' require 'Debug exceptions are disabled from the current Exception level'.

You've set MDSCR_EL1.SS, if you also have MDSCR_EL1.KDE set you will start
single-stepping Xen once the register writes take effect, (which may be before
or after this:)

+ isb();

What are you synchronising here? (Unless you want to single-step Xen I suspect
you don't need this at all.)

Do you still see the issue if you remove the PSTATE.D write?

Thanks,

James

Julien Grall

2017-08-03 11:08:00 UTC

Permalink

Post by Julien Grall
Hi Florian,

asmlinkage void leave_hypervisor_tail(void)

This tests SPSR.M[4], which is set for exits from AARCH32.

+ {
+ WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);

Sets MDSCR.EL1.SS to enable the state machine.

+ WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);

Sets TDE to trap debug exceptions to EL2 from lower exception levels. I'm
surprised this isn't always set for Xen. Do you allow guests to use the debug
features for their own purposes?

No. We already trap debug exceptions to EL2 by default and MDCR_EL2
should already be configured correctly for that.

Post by Julien Grall

+ guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr |
0x200000;

Sets SPSR.SS to suppress the step exception in the guest until its executed an
instruction.

+ WRITE_SYSREG( READ_SYSREG(DAIF) & ~0x200, DAIF);

Here I'm confused. This looks like you are clearing PSTATE.D from the DAIF
register for EL2. This means debug exceptions are unmasked for exceptions from
Xen at EL2.
If you ERET with PSTATE.D clear the SPSR.SS bit will be discarded instead. I
thin this is your bug. The rules in 'D2.12.4 Entering the active-not-pending
state' require 'Debug exceptions are disabled from the current Exception level'.
You've set MDSCR_EL1.SS, if you also have MDSCR_EL1.KDE set you will start
single-stepping Xen once the register writes take effect, (which may be before
or after this:)

+ isb();

What are you synchronising here? (Unless you want to single-step Xen I suspect
you don't need this at all.)

Xen is been over cautious with isb at the moment :). I think this one is
not necessary because you will affect EL1/EL0 context and it will be
synchronized on the eret.

--
Julien Grall

Florian Jakobsmeier

2017-08-03 12:29:39 UTC

Permalink

Post by James Morse
Sets TDE to trap debug exceptions to EL2 from lower exception levels. I'm

Post by James Morse
surprised this isn't always set for Xen. Do you allow guests to use the debug
features for their own purposes?

No. We already trap debug exceptions to EL2 by default and MDCR_EL2 should
already be configured correctly for that.
Ok my thoughts were that i just want to ensure that this bit is set.
+ WRITE_SYSREG( READ_SYSREG(DAIF) & ~0x200, DAIF);

Post by James Morse
Here I'm confused. This looks like you are clearing PSTATE.D from the DAIF
register for EL2. This means debug exceptions are unmasked for exceptions from
Xen at EL2.
If you ERET with PSTATE.D clear the SPSR.SS bit will be discarded instead. I
thin this is your bug. The rules in 'D2.12.4 Entering the
active-not-pending
state' require 'Debug exceptions are disabled from the current Exception level'.
I'm definitely checking this once again.

You've set MDSCR_EL1.SS, if you also have MDSCR_EL1.KDE set you will start

Post by James Morse
single-stepping Xen once the register writes take effect, (which may be before
or after this:)
+ isb();
What are you synchronising here? (Unless you want to single-step Xen I suspect
you don't need this at all.)

Xen is been over cautious with isb at the moment :). I think this one is
not necessary because you will affect EL1/EL0 context and it will be
synchronized on the eret.

This was, once again, just to make sure that everything is synchronized.
But I'm going to remove it if its redundant.

So as far as I understood both of you don't see a general problem with
(timer) interrupts or the scheduler while being single stepped? Because in
my opinion after enabling singlestep the system will go into a "spinlock"
routine.

Adapting your recommendations doesn't change the behavior.
I'm still able to step over each instruction, but the control flow does not
follow my module but rather executes my SMC to start SS and then enters the
before mentioned procedure.

James Morse

2017-08-03 13:02:52 UTC

Permalink

Hi Florian,

Post by Florian Jakobsmeier
So as far as I understood both of you don't see a general problem with
(timer) interrupts or the scheduler while being single stepped? Because in
my opinion after enabling singlestep the system will go into a "spinlock"
routine.

Interrupts taken to EL2 will cause PSTATE.SS to be saved in SPSR_EL2.SS. This is
then restored by the ERET (provided Xen's PSTATE.D bit is set).

If its a virtual interrupt taken to EL1, you will end up stepping the interrupt
handler.

Post by Florian Jakobsmeier
Adapting your recommendations doesn't change the behavior.
I'm still able to step over each instruction, but the control flow does not
follow my module but rather executes my SMC to start SS and then enters the
before mentioned procedure.

again?

SMC... Xen runs at EL2 so you must be trapping this. If the SMC is taken as trap
the ELR isn't updated to point to the instruction after the SMC, you have to do
this yourself. (See the 'note' for HCR_EL2.TSC in 'D1.15.3 EL2 configurable
controls')

SMC is also a corner case for single step. The PSTATE.SS bit isn't saved in the
SPSR. See Table D2-25 in 'D2.12.5 Behaviour in the active-not-pending state'.

Thanks,

James

Florian Jakobsmeier

2017-08-03 16:00:46 UTC

Permalink

Hello James, Julien,

regarding your previous mails. I was able to single step every instruction
of my module. The problem (or rather the solution) was to _disable_ the IRQ
interrupts from within my guest module. This solves the problem of
singlestepping a module which previously ended in a spinlock. But it does
not solve the problem with system that is singlestepped with enabled IRQ's,
as it still will be locked within a spinlock.

My module starts the singlestep routine by executing an SMC instruction.
This will be catched from within my xen-access in the Dom0 (which also will
increase the PC accordingly). Xen-access will then trigger an event to set
the domain flag for my VM (the Domain flag determines if SS should be
enabled when entering the guest) and by this start singlestepping. Every SS
exception is also caught within xen-access which then lets me decide how to
continue the execution.

This is my module code which is executed in the DomU:

int init_module()

{
printk(KERN_INFO "### Init address 0x%lx\n", &init_module);
printk(KERN_INFO " Set function hook\n");
patch_function_hook();
printk(KERN_INFO " Starting singlestep\n");

This following part is the importand bit. I disable the local_irq's and
execute two SMCs.

local_irq_disable();
__asm__ __volatile__ ("SMC 1");
__asm__ __volatile__ ("SMC 1");
if(!already_trapped)
{
__asm__ __volatile__ ("NOP");
__asm__ __volatile__ ("NOP");
__asm__ __volatile__ ("NOP");
//Just for keeping module busy while singlestep
for ( c = 0 ; c < n ; c++ )

When executing these exact steps, it is possible to singlestep the whole
module. Without the local_irq_disable() the system will stop the module
execution right after the first SMC.

My Xen Access version got, just like the original one, a main loop where

printf("Starting loop\n" );
for (;;)
{
if ( interrupted )
{
[...]
}
while ( RING_HAS_UNCONSUMED_REQUESTS(&xenaccess->vm_event.back_ring)
)
{
[...]
switch (req.reason) {
pc=%"PRIx64" (vcpu %d)\n",
req.data.regs.arm.pc,
req.vcpu_id);
if (!ss_started)
{
ss_started = 1;
toggle_single_step_domain(1);

This part enables singlestepping the VM with the first SMC. (The function
toggle_single_step_domain just sets the domain flag i mentioned in earlier
mails)

}else
{
printf("Signlestep already activated\n");
}
rsp.data.regs.arm = req.data.regs.arm;
rsp.data.regs.arm.pc += 4;

This is the part which sets the VM.PC to the next instruction after the SMC
instruction within the guest module.

printf("Singlestep: PC=%016"PRIx64", vcpu %d, \n",
req.data.regs.arm.pc,req.vcpu_id);
if(!run)
{
printf("Enter command: s:step; c:end; r:regs:;x:end :"
);
do
{
command = getchar();
}
while (isspace(command));
switch(command)
{
case 'c': printf("Singlestep weiterlaufen \n");
// rsp.flags |= VM_EVENT_FLAG_TOGGLE_
SINGLESTEP;
run = 1;
break;
[More possible commands here]
default: printf("Wrong command \"%c\"!\n",
command );
break;
}

The above part is the handling of software step exceptions. As you can
see, i print the PC value and read a command from the user in order to let
the guest execute further or print its registers.
These two are the main control flow of xen-access.

Finally my Xen Code within the leave_hypervisor_tail looks like this. The
leave_hypervisor_tail function is called right before the control flow
switches to the guest. Because of this, every time when the guest gets to
run, we set the bits needed to singlestep.

asmlinkage void leave_hypervisor_tail(void)

{
struct vcpu *v = current;
if ( unlikely(v->domain->arch.monitor.singlestep_enabled ) )
{
if(!(guest_cpu_user_regs()->cpsr & 0b1000))
{
WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1, MDSCR_EL1);
WRITE_SYSREG(READ_SYSREG(MDCR_EL2) | HDCR_TDE, MDCR_EL2);
guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr |
0x200000;
v->arch.single_step = 1;
}
}else
{
//single_step Domain flag not set
if( v->arch.single_step )
{
gprintk(XENLOG_ERR, "Domain flag not set, but vcpu flag is
set\n");
WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) & ~0x1, MDSCR_EL1);
guest_cpu_user_regs()->cpsr = guest_cpu_user_regs()->cpsr &
~0x200000;
v->arch.single_step = 0;
}
}
while (1)
{
local_irq_disable();

[...]
So, I adapted your comments and as I mentioned. The general SingleStep
functionality works fine.

But: It's not possible to singlestep a system as long as the VM IRQ's are
enabled. If we would activate single stepping with enabled interrupts, we
will be locked in the mentioned spinlock.
Because of this it is not possible to singlestep other application.
Additionally it is not possible to print anything while singlestepping
because, as far as I understood, the system will wait within a spinlock
until the terminal is free to print.

Do you have any idea why it's not possible to escape the lock while
singlestepping? Like I mentioned, my guess is on timer interrupts, which
should unlock the spinlock but generate problems with singlestep enabled at
the same time. This would also explain why i can observe the control flow
of my guest module with IRQ's being disabled.

Below is an abstract of the program flow. The first one is with disabled
IRQ's. You can see that every instruction is from my module (as the
addresses are around 0xffff0000008e0000). The second one shows instructions
which, directly after the SMC, starts the "go to spinlock" routine because
of enabled IRQ's.

Kind regards,
Florian

Singlestep request found
Starting loop
++++++++++++++++++++++++++++Privileged call: pc=ffff0000008e0618 (vcpu 0)
Seeting Singlestep to 1
SingleStep succesfull rc=0
++++++++++++++++++++++++++++Privileged call: pc=ffff0000008e061c (vcpu 0)
Singlestep: PC=ffff0000008e0624, vcpu 0,
Singlestep: PC=ffff0000008e0628, vcpu 0,
Singlestep: PC=ffff0000008e062c, vcpu 0,
Singlestep: PC=ffff0000008e0630, vcpu 0,
Singlestep: PC=ffff0000008e0634, vcpu 0,
Singlestep request found
Starting loop

++++++++++++++++++++++++++++Privileged call: pc=ffff0000008e0000 (vcpu 0)
(vcpu 0)

Seeting Singlestep to 1
SingleStep succesfull rc=0

Singlestep: PC=ffff000008081a80
[...]
Singlestep: PC=ffff000008082700

[...]

Singlestep: PC=ffff0000080814e8

[...]

Singlestep: PC=ffff000008100f60

James Morse

2017-08-07 17:05:36 UTC

Permalink

Hi Florian,

Post by Florian Jakobsmeier
regarding your previous mails. I was able to single step every instruction
of my module. The problem (or rather the solution) was to _disable_ the IRQ
interrupts from within my guest module. This solves the problem of
singlestepping a module which previously ended in a spinlock. But it does
not solve the problem with system that is singlestepped with enabled IRQ's,
as it still will be locked within a spinlock.

If I understand correctly: if you try to single-step a spinlock then you get
stuck in a loop. The code you want to single-step doesn't take any spinlocks,
but if you take an IRQ, the IRQ handler does.

[...]

Post by Florian Jakobsmeier
int init_module()

{
printk(KERN_INFO "### Init address 0x%lx\n", &init_module);
printk(KERN_INFO " Set function hook\n");
patch_function_hook();
printk(KERN_INFO " Starting singlestep\n");
local_irq_disable();
__asm__ __volatile__ ("SMC 1");
__asm__ __volatile__ ("SMC 1");

(Back-to-back SMCs, this explains why I thought you were missing the PC-advance
logic).

Post by Florian Jakobsmeier

if(!already_trapped)
{
__asm__ __volatile__ ("NOP");
__asm__ __volatile__ ("NOP");
__asm__ __volatile__ ("NOP");
//Just for keeping module busy while singlestep
for ( c = 0 ; c < n ; c++ )

When executing these exact steps, it is possible to singlestep the whole
module. Without the local_irq_disable() the system will stop the module
execution right after the first SMC.

What do you mean by 'stop'? Options are:
Fail to make progress by taking IRQs all the time instead?
Fail to make progress, instead taking a single step exception on the same
instruction forever.
(or something else)

[...]

Post by Florian Jakobsmeier
But: It's not possible to singlestep a system as long as the VM IRQ's are
enabled. If we would activate single stepping with enabled interrupts, we
will be locked in the mentioned spinlock.
Because of this it is not possible to singlestep other application.
Additionally it is not possible to print anything while singlestepping
because, as far as I understood, the system will wait within a spinlock
until the terminal is free to print.
Do you have any idea why it's not possible to escape the lock while
singlestepping? Like I mentioned, my guess is on timer interrupts, which
should unlock the spinlock but generate problems with singlestep enabled at
the same time. This would also explain why i can observe the control flow
of my guest module with IRQ's being disabled.

I suspect you are trying to single-step Linux's spinlocks which use
load-exclusive/store-exclusive. (There is an annotated example in the ARM-ARM
'K9.3.1 Acquiring a lock').
LDAXR sets the 'exclusive monitor' and STXR only succeeds if the exclusive
monitor is still set. If another CPU accesses the memory protected by the
exclusive monitor, the monitor is cleared. This is how the spinlock code knows
it has to re-read its value and try to take the lock again.
Changing exception level also clears the exclusive monitor, so taking
single-step exception between a LDAXR/STXR pair means the loop has to be retried.

Ideally, you should avoid single-stepping atomic instruction sequences, as the
single-step mechanism has made them in-atomic, the CPU detects this and the code
then retries.

As a workaround (for just the case you are looking at), you could ask Xen's vgic
to empty the LR registers so that there are no pending guest interrupts, but
this doesn't guarantee you don't try and single-step atomic code.

The best way to fix this is for Xen to emulate the load/store exclusives. You
will need to ensure that no other vcpu for that domain is running between the
load and store. (This stops them from modifying the protected value, and means
they must retry their own load/store exclusives if they were in the same code).

You also have the problem that another vcpu may be holding the lock, so if the
single-stepped vcpu is not making progress with the emulated-and-single-stepped
sequence, you should run each other vcpu in the domain to see if they release
the lock.

Linux enters a low-power state by issuing a 'wfe' from the waiting CPU and a
'sev' from the releasing CPU, you may end up trapping these as you try and step
them.

This is going to be tricky to get right, I'm not aware of anything that allows
single-stepping of atomic regions like this today.

Thanks,

James