[Rust-VMM] [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
sstabellini at kernel.org
Mon Oct 4 21:53:26 UTC 2021
On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
> On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini at kernel.org> wrote:
> Hi Stefano, all
> [Sorry for the possible format issues]
> [I have CCed Julien]
> On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
> > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini at kernel.org> wrote:
> > Hi Stefano, all
> > [Sorry for the possible format issues]
> > On Mon, 27 Sep 2021, Christopher Clark wrote:
> > > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev at op-lists.linaro.org> wrote:
> > >
> > > Marek Marczykowski-Górecki <marmarek at invisiblethingslab.com> writes:
> > >
> > > > [[PGP Signed Part:Undecided]]
> > > > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
> > > >> Hi,
> > > >
> > > > Hi,
> > > >
> > > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> > > >> ───────────────────────────────────────────────────────────────
> > > >>
> > > >> Currently the foreign memory mapping support only works for dom0 due
> > > >> to reference counting issues. If we are to support backends running in
> > > >> their own domains this will need to get fixed.
> > > >>
> > > >> Estimate: 8w
> > > >>
> > > >>
> > > >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
> > > >
> > > > I'm pretty sure it was discussed before, but I can't find relevant
> > > > (part of) thread right now: does your model assumes the backend (running
> > > > outside of dom0) will gain ability to map (or access in other way)
> > > > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
> > >
> > > The aim is for some DomU's to host backends for other DomU's instead of
> > > all backends being in Dom0. Those backend DomU's would have to be
> > > considered trusted because as you say the default memory model of VirtIO
> > > is to have full access to the frontend domains memory map.
> > >
> > >
> > > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices
> > extending
> > > this level of trust to the backend domains.
> > >From a safety perspective, it would be challenging to deploy a system
> > with privileged backends. From a safety perspective, it would be a lot
> > easier if the backend were unprivileged.
> > This is one of those times where safety and security requirements are
> > actually aligned.
> > Well, the foreign memory mapping has one advantage in the context of Virtio use-case
> > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen.
> > The only issue with foreign memory here is that Guest memory actually mapped without its agreement
> > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300,
> > but I think it will go away sooner or later, at least there are some attempts to eliminate it).
> > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0
> > (which we usually trust), this will certainly violate Xen security model if we want to run it in other
> > domain, so I completely agree with the existing concern.
> Yep, that's what I was referring to.
> > It was discussed before , but I couldn't find any decisions regarding that. As I understand,
> > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever)
> > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU
> > by default and only allows requests with mapping which were *implicitly* granted by the Guest before.
> > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers
> > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request
> > and retrieve descriptors to make a decision which GFNs are actually *allowed*.
> > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device
> > in Xen we could probably avoid Guest modifications at all. Of course, for this to work
> > the Virtio infrastructure in Guest should use DMA API as mentioned in .
> > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted
> > by the Xen community? I wonder, has someone already looked in this direction, are there any
> > pitfalls here or is this even feasible?
> >  https://firstname.lastname@example.org/
> The discussion that went further is actually one based on the idea that
> there is a pre-shared memory area and the frontend always passes
> addresses from it. For ease of implementation, the pre-shared area is
> the virtqueue itself so this approach has been called "fat virtqueue".
> But it requires guest modifications and it probably results in
> additional memory copies.
> I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it
> much better than
> map arbitrary pages at runtime.
Yeah that's the idea
> If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping
> this region only and deny other attempts.
No, but there are patches (not yet upstream) to introduce a way to
pre-share memory regions between VMs using xl:
So I think it would probably be the other way around: xen/libxl
advertises on device tree (or ACPI) the presence of the pre-shared
regions to both domains. Then frontend and backend would start using it.
> I am not sure if the approach you mentioned could be implemented
> completely without frontend changes. It looks like Xen would have to
> learn how to inspect virtqueues in order to verify implicit grants
> without frontend changes.
> I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls.
> Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
> 1. I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
> besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc.
> Otherwise we might break the guest. I expect a huge amount of work to implement this properly.
Yeah, I think we would want to stay away from shadow pagetables unless
we are really forced to go there.
> 2. Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
> behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is
> IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via
> foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the
> data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors,
> I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we
> need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow?
> I think this would simplify things.
None of the above looks easy. I think you are right that we would need
IOVA == GPA to make the implementation feasible and with decent
performance. But if we need a spec change, then I think Juergen's
proposal of introducing a new transport that uses grant table references
instead of GPAs is worth considering.
> 3. Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
> will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the
> same IOMMU domain. Probably, we could control that, but I am not 100% sure.
More information about the Rust-vmm