Discussion:
[RFH]: AMD CR intercept for lmsw/clts
Mukesh Rathor
2014-08-05 01:33:37 UTC
Permalink
Hi,

On AMD, clts/lmsw will cause "mov cr" vmexit, but unlike intel, they
can't be handled via svm_vmexit_do_cr_access and are emulated thru
handle_mmio() which is a problem for pvh because of:

handle_mmio():
..
ASSERT(!is_pvh_vcpu(curr));

AMD CR intercepts in svm.c:
case VMEXIT_CR0_READ ... VMEXIT_CR15_READ:
case VMEXIT_CR0_WRITE ... VMEXIT_CR15_WRITE:
if ( cpu_has_svm_decode && (vmcb->exitinfo1 & (1ULL << 63)) )
svm_vmexit_do_cr_access(vmcb, regs);
else if ( !handle_mmio() ) <==========
hvm_inject_hw_exception(TRAP_gp_fault, 0);
break;

Soooo, this leaves no choice but to make the ASSERT conditional
for intel only, and let handle_mmio go thru x86_emulate and let
x86_emulate fail for anything other than lmsw/clts? I was thinking
something like:

x86_emulate()
int fail_pvh_emul = 1;
...
case lmsw/clts:
.....
fail_pvh_emul = 0;

then
done:
if (fail_pvh_emul)
rc = X86EMUL_UNHANDLEABLE;
return rc;

Or, should I just create a new function for clts/lmsw and call it
directly from vmexit switch itself?

Can't think of any other clever way to do this... thoughts/suggestions?

thanks,
Mukesh
Jan Beulich
2014-08-05 07:46:20 UTC
Permalink
Post by Mukesh Rathor
Hi,
On AMD, clts/lmsw will cause "mov cr" vmexit, but unlike intel, they
can't be handled via svm_vmexit_do_cr_access and are emulated thru
..
ASSERT(!is_pvh_vcpu(curr));
if ( cpu_has_svm_decode && (vmcb->exitinfo1 & (1ULL << 63)) )
svm_vmexit_do_cr_access(vmcb, regs);
else if ( !handle_mmio() ) <==========
hvm_inject_hw_exception(TRAP_gp_fault, 0);
break;
Soooo, this leaves no choice but to make the ASSERT conditional
for intel only, and let handle_mmio go thru x86_emulate and let
x86_emulate fail for anything other than lmsw/clts? I was thinking
x86_emulate()
int fail_pvh_emul = 1;
...
.....
fail_pvh_emul = 0;
then
if (fail_pvh_emul)
rc = X86EMUL_UNHANDLEABLE;
return rc;
I'd strongly recommend against adding any such to x86_emulate():
There's nothing precluding the emulator to be used for PVH guests.
Post by Mukesh Rathor
Or, should I just create a new function for clts/lmsw and call it
directly from vmexit switch itself?
I'd prefer this - it seems pretty ugly to me that handle_mmio()/
x86_emulate() gets used for this purpose - but am not certain this
will actually work out nicely for other than CLTS: All the
instructions currently handled specially are ones with fixed
operands, and only CLTS fits that.

You'll btw have the same problem with SMSW and DRx accesses,
string I/O instructions, as well as (on older CPUs) with moves to/from
CRx and INVLPG.

In the case this doesn't turn out reasonable, rather than
manipulating handle_mmio() any further, I'd suggest to investigate
bypassing that function in favor of a more direct access to the x86
emulator. After all you're not after any MMIO emulation here, just
bare instructions (many of which without memory operands at all).

Jan
Andrew Cooper
2014-08-05 11:16:16 UTC
Permalink
This post might be inappropriate. Click to display it.
Jan Beulich
2014-08-05 12:11:43 UTC
Permalink
Post by Andrew Cooper
Post by Jan Beulich
I'd prefer this - it seems pretty ugly to me that handle_mmio()/
x86_emulate() gets used for this purpose - but am not certain this
will actually work out nicely for other than CLTS: All the
instructions currently handled specially are ones with fixed
operands, and only CLTS fits that.
You'll btw have the same problem with SMSW and DRx accesses,
string I/O instructions, as well as (on older CPUs) with moves to/from
CRx and INVLPG.
SMSW certainly, but what is wrong with the others? For those, the exit
info appears to give fully decoded information.
All the decoding info is conditional upon decoding assists being
available. And for string I/O handle_mmio() is being used kind of
unconditionally.
Post by Andrew Cooper
Post by Jan Beulich
In the case this doesn't turn out reasonable, rather than
manipulating handle_mmio() any further, I'd suggest to investigate
bypassing that function in favor of a more direct access to the x86
emulator. After all you're not after any MMIO emulation here, just
bare instructions (many of which without memory operands at all).
In the AMD System manual (vol 2), there are special cases listed for
lmsw, clts and smsw in section 15.8.1. In each case, bit 63 of
exitinfo1 will be clear (which is checked using magic numbers in the
code). It would appear that under these circumstances, the only
information provided is "it wasn't a mov instruction".
Right, except that at least for CLTS (not having any operands) the
light weight decoding could still be used. I'd even think decoding the
CRx/DRx moves would be okay as they only permit register
operands. But with LMSW and SMSW having memory operands it
would indeed start to become like re-implementing x86_emulate().
Question is whether one couldn't simply forbid PVH guests to use
them, as moves to/from CR0 provide all the needed functionality.

Jan
Andrew Cooper
2014-08-05 13:00:25 UTC
Permalink
Post by Jan Beulich
Post by Andrew Cooper
Post by Jan Beulich
I'd prefer this - it seems pretty ugly to me that handle_mmio()/
x86_emulate() gets used for this purpose - but am not certain this
will actually work out nicely for other than CLTS: All the
instructions currently handled specially are ones with fixed
operands, and only CLTS fits that.
You'll btw have the same problem with SMSW and DRx accesses,
string I/O instructions, as well as (on older CPUs) with moves to/from
CRx and INVLPG.
SMSW certainly, but what is wrong with the others? For those, the exit
info appears to give fully decoded information.
All the decoding info is conditional upon decoding assists being
available. And for string I/O handle_mmio() is being used kind of
unconditionally.
The decode assists apply strictly to mov CRx/DRx, INTn, INVLPG and
(nested) #PF. In contrast, the IOIO intercept appears to
unconditionally fill the decode information in exitinfo1, so I believe
the current code is correct.
Post by Jan Beulich
Post by Andrew Cooper
Post by Jan Beulich
In the case this doesn't turn out reasonable, rather than
manipulating handle_mmio() any further, I'd suggest to investigate
bypassing that function in favor of a more direct access to the x86
emulator. After all you're not after any MMIO emulation here, just
bare instructions (many of which without memory operands at all).
In the AMD System manual (vol 2), there are special cases listed for
lmsw, clts and smsw in section 15.8.1. In each case, bit 63 of
exitinfo1 will be clear (which is checked using magic numbers in the
code). It would appear that under these circumstances, the only
information provided is "it wasn't a mov instruction".
Right, except that at least for CLTS (not having any operands) the
light weight decoding could still be used. I'd even think decoding the
CRx/DRx moves would be okay as they only permit register
operands. But with LMSW and SMSW having memory operands it
would indeed start to become like re-implementing x86_emulate().
Question is whether one couldn't simply forbid PVH guests to use
them, as moves to/from CR0 provide all the needed functionality.
I think that is a very dangerous route to take.

Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM. Restricting the use of
certain instructions would certainly move PVH into the latter category.

Fundamentally, the end goal of PVH needs deciding ASAP, and documenting,
to help guide decisions like this.

~Andrew
Jan Beulich
2014-08-05 13:15:21 UTC
Permalink
Post by Andrew Cooper
Post by Jan Beulich
Post by Andrew Cooper
Post by Jan Beulich
I'd prefer this - it seems pretty ugly to me that handle_mmio()/
x86_emulate() gets used for this purpose - but am not certain this
will actually work out nicely for other than CLTS: All the
instructions currently handled specially are ones with fixed
operands, and only CLTS fits that.
You'll btw have the same problem with SMSW and DRx accesses,
string I/O instructions, as well as (on older CPUs) with moves to/from
CRx and INVLPG.
SMSW certainly, but what is wrong with the others? For those, the exit
info appears to give fully decoded information.
All the decoding info is conditional upon decoding assists being
available. And for string I/O handle_mmio() is being used kind of
unconditionally.
The decode assists apply strictly to mov CRx/DRx, INTn, INVLPG and
(nested) #PF. In contrast, the IOIO intercept appears to
unconditionally fill the decode information in exitinfo1, so I believe
the current code is correct.
And I didn't say anything to the contrary. All I said was that there is
another use of handle_mmio() lurking.
Post by Andrew Cooper
Post by Jan Beulich
Post by Andrew Cooper
Post by Jan Beulich
In the case this doesn't turn out reasonable, rather than
manipulating handle_mmio() any further, I'd suggest to investigate
bypassing that function in favor of a more direct access to the x86
emulator. After all you're not after any MMIO emulation here, just
bare instructions (many of which without memory operands at all).
In the AMD System manual (vol 2), there are special cases listed for
lmsw, clts and smsw in section 15.8.1. In each case, bit 63 of
exitinfo1 will be clear (which is checked using magic numbers in the
code). It would appear that under these circumstances, the only
information provided is "it wasn't a mov instruction".
Right, except that at least for CLTS (not having any operands) the
light weight decoding could still be used. I'd even think decoding the
CRx/DRx moves would be okay as they only permit register
operands. But with LMSW and SMSW having memory operands it
would indeed start to become like re-implementing x86_emulate().
Question is whether one couldn't simply forbid PVH guests to use
them, as moves to/from CR0 provide all the needed functionality.
I think that is a very dangerous route to take.
Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM. Restricting the use of
certain instructions would certainly move PVH into the latter category.
Generally I would agree. For those two specific instructions though,
I don't think any reasonably modern OS would be using them
anyway outside of 16-bit boot code. After adding to this the fact
that PVH, just like PV, implies a modified OS, putting on such a
restriction doesn't seem very harmful to me.

Jan
Mukesh Rathor
2014-08-05 22:30:25 UTC
Permalink
On Tue, 05 Aug 2014 14:00:25 +0100
...
Post by Andrew Cooper
Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM.
I think that might be a dangerous route to take, classifying upfront
whether it's that way or the other. Eg, if we say it's former, then
anyone adding any feature would not examine the best approach, but just
take hvm approach.
Post by Andrew Cooper
Fundamentally, the end goal of PVH needs deciding ASAP, and
documenting, to help guide decisions like this.
I think it's decided somewhat. Evolve to one of three approaches: PV,
HVM, or alternate, picking the easiest and fastest. IMO, at the very
least, pvh should retain "guest modified" characteristic, that would be
good for xen future imho.

Mukesh
Andrew Cooper
2014-08-06 09:34:21 UTC
Permalink
Post by Mukesh Rathor
On Tue, 05 Aug 2014 14:00:25 +0100
...
Post by Andrew Cooper
Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM.
I think that might be a dangerous route to take, classifying upfront
whether it's that way or the other. Eg, if we say it's former, then
anyone adding any feature would not examine the best approach, but just
take hvm approach.
There are many PV-isms which already exist for HVM. Saying "HVM -
device model" does not preclude further PVism from being introduced and
used. It does however means that PV-aware HVM guests get equal
opportunity at these improvements. Fundamentally, having PVH closer to
HVM than PV means fewer modifications required to turn a native kernel
into a PVH kernel, which is a *very* good thing from the point of view
of the kernel authors.

But as I said, this is only my opinion.
Post by Mukesh Rathor
Post by Andrew Cooper
Fundamentally, the end goal of PVH needs deciding ASAP, and
documenting, to help guide decisions like this.
I think it's decided somewhat. Evolve to one of three approaches: PV,
HVM, or alternate, picking the easiest and fastest. IMO, at the very
least, pvh should retain "guest modified" characteristic, that would be
good for xen future imho.
It clearly is not decided, or even semi-certain, by virtue of having
this conversation.

There are currently many opinions (some of which certainly can't
coexist, many which can), a lot of semi-baked code with many
restrictions (and repeated breaking of PVH/PVHdom0 by making seemingly
innocent code changes elsewhere), and no concrete plan of what PVH is or
what it should be.

What needs to happen urgently is for someone to make a firm decision,
and prepare a document for /docs/specs/pvh. A document like that is not
immutable in the future if hindsight shows otherwise, but it will
provide solid guidance as to how to proceed in matters like this.

~Andrew
Konrad Rzeszutek Wilk
2014-08-15 21:04:20 UTC
Permalink
Post by Andrew Cooper
Post by Mukesh Rathor
On Tue, 05 Aug 2014 14:00:25 +0100
...
Post by Andrew Cooper
Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM.
I think that might be a dangerous route to take, classifying upfront
whether it's that way or the other. Eg, if we say it's former, then
anyone adding any feature would not examine the best approach, but just
take hvm approach.
There are many PV-isms which already exist for HVM. Saying "HVM -
device model" does not preclude further PVism from being introduced and
used. It does however means that PV-aware HVM guests get equal
opportunity at these improvements. Fundamentally, having PVH closer to
HVM than PV means fewer modifications required to turn a native kernel
into a PVH kernel, which is a *very* good thing from the point of view
of the kernel authors.
Right. I would like to stress that the x86 maintainers are excited
about this as it would remove the pvops that don't have clear
semantic.
Post by Andrew Cooper
But as I said, this is only my opinion.
Post by Mukesh Rathor
Post by Andrew Cooper
Fundamentally, the end goal of PVH needs deciding ASAP, and
documenting, to help guide decisions like this.
I think it's decided somewhat. Evolve to one of three approaches: PV,
HVM, or alternate, picking the easiest and fastest. IMO, at the very
least, pvh should retain "guest modified" characteristic, that would be
good for xen future imho.
It clearly is not decided, or even semi-certain, by virtue of having
this conversation.
HA!
Post by Andrew Cooper
There are currently many opinions (some of which certainly can't
coexist, many which can), a lot of semi-baked code with many
restrictions (and repeated breaking of PVH/PVHdom0 by making seemingly
innocent code changes elsewhere), and no concrete plan of what PVH is or
what it should be.
What needs to happen urgently is for someone to make a firm decision,
and prepare a document for /docs/specs/pvh. A document like that is not
immutable in the future if hindsight shows otherwise, but it will
provide solid guidance as to how to proceed in matters like this.
That could certainly be done but I think we are all tied in fixing
code and trying to get features in Xen 4.5 before the feature
freeze gates are shut.

It should be fairly easy as most of it is 'runs like HVM' with
some HVM-ism disabled (so point to Intel SDM and AMD). And then
going through the hypercalls and seeing which are enabled.

Then there is the business of the startup which is complex, but
fortunatly there is a Wiki page to rip:
http://wiki.xenproject.org/wiki/X86_Paravirtualised_Memory_Management

Andrew, that nice template you used for the migrationv2 - where can
one find it?
Post by Andrew Cooper
~Andrew
_______________________________________________
Xen-devel mailing list
http://lists.xen.org/xen-devel
Andrew Cooper
2014-08-15 21:48:14 UTC
Permalink
Post by Konrad Rzeszutek Wilk
Post by Andrew Cooper
Post by Mukesh Rathor
On Tue, 05 Aug 2014 14:00:25 +0100
...
Post by Andrew Cooper
Despite the current limitations, I firmly believe that PVH should be HVM
- device model, rather than PV + VMX/SVM.
I think that might be a dangerous route to take, classifying upfront
whether it's that way or the other. Eg, if we say it's former, then
anyone adding any feature would not examine the best approach, but just
take hvm approach.
There are many PV-isms which already exist for HVM. Saying "HVM -
device model" does not preclude further PVism from being introduced and
used. It does however means that PV-aware HVM guests get equal
opportunity at these improvements. Fundamentally, having PVH closer to
HVM than PV means fewer modifications required to turn a native kernel
into a PVH kernel, which is a *very* good thing from the point of view
of the kernel authors.
Right. I would like to stress that the x86 maintainers are excited
about this as it would remove the pvops that don't have clear
semantic.
Post by Andrew Cooper
But as I said, this is only my opinion.
Post by Mukesh Rathor
Post by Andrew Cooper
Fundamentally, the end goal of PVH needs deciding ASAP, and
documenting, to help guide decisions like this.
I think it's decided somewhat. Evolve to one of three approaches: PV,
HVM, or alternate, picking the easiest and fastest. IMO, at the very
least, pvh should retain "guest modified" characteristic, that would be
good for xen future imho.
It clearly is not decided, or even semi-certain, by virtue of having
this conversation.
HA!
Post by Andrew Cooper
There are currently many opinions (some of which certainly can't
coexist, many which can), a lot of semi-baked code with many
restrictions (and repeated breaking of PVH/PVHdom0 by making seemingly
innocent code changes elsewhere), and no concrete plan of what PVH is or
what it should be.
What needs to happen urgently is for someone to make a firm decision,
and prepare a document for /docs/specs/pvh. A document like that is not
immutable in the future if hindsight shows otherwise, but it will
provide solid guidance as to how to proceed in matters like this.
That could certainly be done but I think we are all tied in fixing
code and trying to get features in Xen 4.5 before the feature
freeze gates are shut.
It should be fairly easy as most of it is 'runs like HVM' with
some HVM-ism disabled (so point to Intel SDM and AMD). And then
going through the hypercalls and seeing which are enabled.
Then there is the business of the startup which is complex, but
http://wiki.xenproject.org/wiki/X86_Paravirtualised_Memory_Management
Andrew, that nice template you used for the migrationv2 - where can
one find it?
For the pdf spec? That is just completely standard pandoc.

See
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=commitdiff;h=ba4c1c9072c623ffb795310e538ea6eed81bd658
for how I expect it to be committed when migration v2 is accepted.

~Andrew

Mukesh Rathor
2014-08-05 22:22:10 UTC
Permalink
On Tue, 05 Aug 2014 08:46:20 +0100
Post by Jan Beulich
Post by Mukesh Rathor
Hi,
On AMD, clts/lmsw will cause "mov cr" vmexit, but unlike intel, they
can't be handled via svm_vmexit_do_cr_access and are emulated thru
.....
Post by Jan Beulich
Post by Mukesh Rathor
Or, should I just create a new function for clts/lmsw and call it
directly from vmexit switch itself?
I'd prefer this - it seems pretty ugly to me that handle_mmio()/
x86_emulate() gets used for this purpose - but am not certain this
will actually work out nicely for other than CLTS: All the
instructions currently handled specially are ones with fixed
operands, and only CLTS fits that.
You'll btw have the same problem with SMSW and DRx accesses,
string I/O instructions, as well as (on older CPUs) with moves to/from
CRx and INVLPG.
I see, I'd be duplicating from x86_emulate more than just couple cases, so
I think the best thing would be to just call x86_emulate directly, which
is what should've been done in the first place for the CR intercepts.

BTW, as regards INVLPG, I'm *requiring* svm decode for pvh.

thanks,
Mukesh
Loading...