[Rust-VMM] [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos

Mon Oct 4 21:53:26 UTC 2021

On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
> On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini at kernel.org> wrote:
> 
> Hi Stefano, all
> 
> [Sorry for the possible format issues]
> [I have CCed Julien]
> 
> 
>       On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
>       > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini at kernel.org> wrote:
>       >
>       > Hi Stefano, all
>       >
>       > [Sorry for the possible format issues]
>       >
>       >
>       >       On Mon, 27 Sep 2021, Christopher Clark wrote:
>       >       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev at op-lists.linaro.org> wrote:
>       >       >
>       >       >       Marek Marczykowski-Górecki <marmarek at invisiblethingslab.com> writes:
>       >       >
>       >       >       > [[PGP Signed Part:Undecided]]
>       >       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>       >       >       >> Hi,
>       >       >       >
>       >       >       > Hi,
>       >       >       >
>       >       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>       >       >       >> ───────────────────────────────────────────────────────────────
>       >       >       >>
>       >       >       >>   Currently the foreign memory mapping support only works for dom0 due
>       >       >       >>   to reference counting issues. If we are to support backends running in
>       >       >       >>   their own domains this will need to get fixed.
>       >       >       >>
>       >       >       >>   Estimate: 8w
>       >       >       >>
>       >       >       >>
>       >       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>       >       >       >
>       >       >       > I'm pretty sure it was discussed before, but I can't find relevant
>       >       >       > (part of) thread right now: does your model assumes the backend (running
>       >       >       > outside of dom0) will gain ability to map (or access in other way)
>       >       >       > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
>       >       >
>       >       >       The aim is for some DomU's to host backends for other DomU's instead of
>       >       >       all backends being in Dom0. Those backend DomU's would have to be
>       >       >       considered trusted because as you say the default memory model of VirtIO
>       >       >       is to have full access to the frontend domains memory map.
>       >       >
>       >       >
>       >       > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices
>       without
>       >       extending
>       >       > this level of trust to the backend domains.
>       >
>       >       >From a safety perspective, it would be challenging to deploy a system
>       >       with privileged backends. From a safety perspective, it would be a lot
>       >       easier if the backend were unprivileged.
>       >
>       >       This is one of those times where safety and security requirements are
>       >       actually aligned.
>       >
>       >
>       > Well, the foreign memory mapping has one advantage in the context of Virtio use-case
>       > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen.
>       > The only issue with foreign memory here is that Guest memory actually mapped without its agreement
>       > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300,
>       > but I think it will go away sooner or later, at least there are some attempts to eliminate it).
>       > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0
>       > (which we usually trust), this will certainly violate Xen security model if we want to run it in other
>       > domain, so I completely agree with the existing concern.
> 
>       Yep, that's what I was referring to.
> 
> 
>       > It was discussed before [1], but I couldn't find any decisions regarding that. As I understand,
>       > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever)
>       > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU
>       > by default and only allows requests with mapping which were *implicitly* granted by the Guest before.
>       > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers
>       > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request
>       > and retrieve descriptors to make a decision which GFNs are actually *allowed*.
>       >
>       > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device
>       > in Xen we could probably avoid Guest modifications at all. Of course, for this to work
>       > the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
>       >
>       > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted
>       > by the Xen community? I wonder, has someone already looked in this direction, are there any
>       > pitfalls here or is this even feasible?
>       >
>       > [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/
> 
>       The discussion that went further is actually one based on the idea that
>       there is a pre-shared memory area and the frontend always passes
>       addresses from it. For ease of implementation, the pre-shared area is
>       the virtqueue itself so this approach has been called "fat virtqueue".
>       But it requires guest modifications and it probably results in
>       additional memory copies.
> 
>  
> I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it
> much better than
> map arbitrary pages at runtime.

Yeah that's the idea

> If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping
> this region only and deny other attempts.

No, but there are patches (not yet upstream) to introduce a way to
pre-share memory regions between VMs using xl:
https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5b008f77429007a307b658db9c0f636+104&branch=xilinx%2Frelease-2021.1

So I think it would probably be the other way around: xen/libxl
advertises on device tree (or ACPI) the presence of the pre-shared
regions to both domains. Then frontend and backend would start using it.

>       I am not sure if the approach you mentioned could be implemented
>       completely without frontend changes. It looks like Xen would have to
>       learn how to inspect virtqueues in order to verify implicit grants
>       without frontend changes.
> 
>  
> I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls.
> Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
> 
> 1. I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
> besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc.
> Otherwise we might break the guest. I expect a huge amount of work to implement this properly.

Yeah, I think we would want to stay away from shadow pagetables unless
we are really forced to go there.

> 2. Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
> behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is
> IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via
> foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the
> data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors,
> etc). 
> I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we
> need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow?
> I think this would simplify things.

None of the above looks easy. I think you are right that we would need
IOVA == GPA to make the implementation feasible and with decent
performance. But if we need a spec change, then I think Juergen's
proposal of introducing a new transport that uses grant table references
instead of GPAs is worth considering.

> 3. Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
> will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the
> same IOMMU domain. Probably, we could control that, but I am not 100% sure.