From kendall at openstack.org Wed Oct 7 20:41:03 2020 From: kendall at openstack.org (Kendall Waters) Date: Wed, 7 Oct 2020 15:41:03 -0500 Subject: [Rust-VMM] vPTG Oct 2020 Registration & Schedule Message-ID: <150AAE97-2D6D-4765-973E-D5E3CEEAF204@openstack.org> Hey everyone, The October 2020 Project Teams Gathering is right around the corner! The official schedule has now been posted on the PTG website [1], the PTGbot has been updated[2], and we have also attached it to this email. Friendly reminder, if you have not already registered, please do so [3]. It is important that we get everyone to register for the event as this is how we will contact you about tooling information/passwords and other event details. Please let us know if you have any questions. Cheers, The Kendalls (diablo_rojo & wendallkaters) [1] PTG Website www.openstack.org/ptg [2] PTGbot: http://ptg.openstack.org/ptg.html [3] PTG Registration: https://october2020ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PTG2-Oct26-30-2020_Schedule (1).pdf Type: application/pdf Size: 706133 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanha at redhat.com Fri Oct 9 16:18:15 2020 From: stefanha at redhat.com (Stefan Hajnoczi) Date: Fri, 9 Oct 2020 17:18:15 +0100 Subject: [Rust-VMM] Requirements for out-of-process device emulation Message-ID: <20201009161815.GA321402@stefanha-x1.localdomain> I just posted the following on my blog to outline the requirements that have been discussed over the past few months around out-of-process device emulation (vhost-user, vfio-user, etc). I hope it's helpful for covering various angles of out-of-process device emulation. It's long, so no worries if you don't want to join the discussion. Stefan --- Requirements for out-of-process device emulation ================================================ Over the past months I have participated in discussions about out-of-process device emulation. This post describes the requirements that have become apparent. I hope this will be a useful guide to understanding the big picture about out-of-process device emulation. What is out-of-process device emulation? ---------------------------------------- Device emulation is traditionally implemented in the program that executes guest code. This approach is natural because accesses to device registers are trapped as part of the CPU run loop that sits at the core of an emulator or virtual machine monitor (VMM). In some use cases it is advantageous to perform device emulation in separate processes. For example, software-defined network switches can minimize data copies by emulating network cards directly in the switch process. Out-of-process device emulation also enables privilege separation and tighter sandboxing for security. Why are these requirements important? ------------------------------------- When emulated devices are implemented in the VMM they use common VMM APIs. Adding new devices is relatively easy because the APIs are already there and the developer can focus on the device specifics. Out-of-process device emulation potentially leaves developers without APIs since the device emulation program is a separate program that literally starts from main(). Developers want to focus on implementing their specific device, not on solving general problems related to out-of-process device emulation infrastructure. It is not only a lot of work to implement an out-of-process device completely from scratch, but there is also a risk of developing the wrong solution because some subtleties of device emulation are not obvious at first glance. I hope sharing these requirements will help in the creation of common infrastructure so it's easy to implement high-quality out-of-process devices. Not all use cases have the full set of requirements. Therefore it's best if requirements are addressed in separate, reusable libraries so that device implementors can pick the ones that are relevant to them. Device emulation ---------------- Device resources ```````````````` Devices provide resources that drivers interact with such as hardware registers, memory, or interrupts. The fundamental requirement of out-of-process device emulation is exposing device resources. The following types of device resources are needed: Synchronous MMIO/PIO accesses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most basic device emulation operation is the hardware register access. This is a memory-mapped I/O (MMIO) or programmed I/O (PIO) access to the device. A read loads a value from a device register. A write stores a value to a device register. These operations are synchronous because the vCPU is paused until completion. Asynchronous doorbells Devices often have doorbell registers, allowing the driver to inform the device that new requests are ready for processing. The vCPU does not need to wait since the access is a posted write. The kvm.ko ioeventfd mechanism can be used to implement asynchronous doorbells. Shared device memory ~~~~~~~~~~~~~~~~~~~~ Devices may have memory-like regions that the CPU can access (such as PCI Memory BARs). The device emulation process therefore needs to share a region of its memory space with the VMM so the guest can access it. This mechanism also allows device emulation to busy wait (poll) instead of using synchronous MMIO/PIO accesses or asynchronous doorbells for notifications. Direct Memory Access (DMA) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Devices often require read and write access to a memory address space belonging to the CPU. This allows network cards to transmit packet payloads that are located in guest RAM, for example. Early out-of-process device emulation interfaces simply shared guest RAM. The allowed DMA to any guest physical memory address. More advanced IOMMU and address space identifier mechanisms are now becoming ubiquitous. Therefore, new out-of-process device emulation interfaces should incorporate IOMMU functionality. The key requirement for IOMMU mechanisms is allowing the VMM to grant access to a region of memory so the device emulation process can read from and/or write to it. Interrupts ~~~~~~~~~~ Devices notify the CPU using interrupts. An interrupt is simply a message sent by the device emulation process to the VMM. Interrupt configuration is flexible on modern devices, meaning the driver may be able to select the number of interrupts and a mapping (using one interrupt with multiple event sources). This can be implemented using the Linux eventfd mechanism or via in-band device emulation protocol messages, for example. Extensibility for new bus types ``````````````````````````````` It should be possible to support multiple bus types. vhost-user only supports vhost devices. VFIO is more extensible but currently focussed on PCI devices. It is likely that QEMU SysBus devices will be desirable for implementing ad-hoc out-of-process devices (especially for System-on-Chip target platforms). Bus-level APIs, not protocol bindings ````````````````````````````````````` Developers should not need to learn the out-of-process device emulation protocol (vfio-user, etc). APIs should focus on bus-level concepts such as defining VIRTIO or PCI devices rather than protocol bindings for dealing with protocol messages, file descriptor passing, and shared memory. In other words, developers should be thinking in terms of the problem domain, not worrying about how out-of-process device emulation is implemented. The protocol should be hidden behind bus-level APIs. Multi-threading support from the beginning `````````````````````````````````````````` Threading issues arise often in device emulation because asynchronous requests or multi-queue devices can be implemented using threads. Therefore it is necessary to clearly document what threading models are supported and how device lifecycle operations like reset interact with in-flight requests. Live migration, live upgrade, and crash recovery ------------------------------------------------ There are several related issues around device state and restarting the device emulation program without disrupting the guest. Live migration `````````````` Live migration transfers the state of a device from one device emulation process to another (typically running on another host). This requires the following functionality: Quiescing the device ~~~~~~~~~~~~~~~~~~~~ Some devices can be live migrated at any point in time without any preparation, while others must be put into a quiescent state to avoid issues. An example is a storage controller that has a write request in flight. It is not safe to live migration until the write request has completed or been canceled. Failure to wait might result in data corruption if the write takes effect after the destination has resumed execution. Therefore it is necessary to quiesce a device. After this point there is no further device activity and no guest-visible changes will be made by the device. Saving/loading device state ~~~~~~~~~~~~~~~~~~~~~~~~~~~ It must be possible to save and load device state. Device state includes the contents of hardware registers as well as device-internal state necessary for resuming operation. It is typically necessary to determine whether the device emulation processes on the migration source and destination are compatible before attempting migration. This avoids migration failure when the destination tries to load the device state and discovers it doesn't support it. It may be desirable to support loading device state that was generated by a different implementation of the same device type (for example, two virtio-net implementations). Dirty memory logging ~~~~~~~~~~~~~~~~~~~~ Pre-copy live migration starts with an iterative phase where dirty memory pages are copied from the migration source to the destination host. Devices need to participate in dirty memory logging so that all written pages are transferred to the destination and no pages are "missed". Crash recovery `````````````` If the device emulation process crashes it should be possible to restart it and resume device emulation without disrupting the guest (aside from a possible pause during reconnection). Doing this requires maintaining device state (contents of hardware registers, etc) outside the device emulation process. This way the state remains even if the process crashes and it can be resume when a new process starts. Live upgrade ```````````` It must be possible to upgrade the device emulation process and the VMM without disrupting the guest. Upgrading the device emulation process is similar to crash recovery in that the process terminates and a new one resumes with the previous state. Device versioning ````````````````` The guest-visible aspects of the device must be versioned. In the simplest case the device emulation program would have a --compat-version=N command-line option that controls which version of the device the guest sees. When guest-visible changes are made to the program the version number must be increased. By giving control of the guest-visible device behavior it is possible to save/load and live migrate reliably. Otherwise loading device state in a newer device emulation program could affect the running guest. Guest drivers typically are not prepared for the device to change underneath them and doing so could result in guest crashes or data corruption. Security -------- The trust model ``````````````` The VMM must not trust the device emulation program. This is key to implementing privilege separation and the principle of least privilege. If a compromised device emulation program is able to gain control of the VMM then out-of-process device emulation has failed to provide isolation between devices. The device emulation program must not trust the VMM to the extent that this is possible. For example, it must validate inputs so that the VMM cannot gain control of the device emulation process through memory corruptions or other bugs. This makes it so that even if the VMM has been compromised, access to device resources and associated system calls still requires further compromising the device emulation process. Unprivileged operation `````````````````````` The device emulation program should run unprivileged to the extent that this is possible. If special permissions are required to access hardware resources then these resources can sometimes be provided via file descriptor passing by a more privileged parent process. Sandboxing `````````` Operating system sandboxing mechanisms can be applied to device emulation processes more effectively than monolithic VMMs. Seccomp can limit the Linux system calls that may be invoked. SELinux can restrict access to system resources. Sandboxing is a common task that most device emulation programs need. Therefore it is a good candidate for a library or launcher tool that is shared by device emulation programs. Management ---------- Command-line interface `````````````````````` A common command-line interface should be defined where possible. For example, vhost-user's standard --socket-path=PATH argument makes it easy to launch any vhost-user device backend. Protocol-specific options (e.g. socket path) and device type-specific options (e.g. virtio-net) can be standardized. Some options are necessarily specific to the device emulation program and therefore cannot be standardized. The advantage of standard options is that management tools like libvirt can launch the device emulation programs without further user configuration. RPC interface ````````````` It may be necessary to issue commands at runtime. Examples include adjusting throttling limits, enabling/disabling logging, etc. These operations can be performed over an RPC interface. Various RPC interfaces are used throughout open source virtualization software. Adopting a widely-used RPC protocol and standardizing commands is beneficial because it makes it easy to communicate with the software and management tools can support them relatively easily. Conclusion ---------- This was largely a brain dump but I hope it is useful food for thought as out-of-process device emulation interfaces are designed and developed. There is a lot more to it than simply implementing a protocol for device register accesses and guest RAM DMA. Developing open source libraries in Rust and C that can be used as needed will ensure that out-of-process devices are high-quality and easy for users to deploy. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From alex.williamson at redhat.com Fri Oct 9 19:44:49 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 9 Oct 2020 13:44:49 -0600 Subject: [Rust-VMM] Requirements for out-of-process device emulation In-Reply-To: <20201009161815.GA321402@stefanha-x1.localdomain> References: <20201009161815.GA321402@stefanha-x1.localdomain> Message-ID: <20201009134449.041b5e71@x1.home> On Fri, 9 Oct 2020 17:18:15 +0100 Stefan Hajnoczi wrote: > Device emulation > ---------------- > Device resources > ```````````````` > Devices provide resources that drivers interact with such as hardware > registers, memory, or interrupts. The fundamental requirement of > out-of-process device emulation is exposing device resources. > > The following types of device resources are needed: > > Synchronous MMIO/PIO accesses > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The most basic device emulation operation is the hardware register > access. This is a memory-mapped I/O (MMIO) or programmed I/O (PIO) > access to the device. A read loads a value from a device register. A > write stores a value to a device register. These operations are > synchronous because the vCPU is paused until completion. > Asynchronous doorbells > > Devices often have doorbell registers, allowing the driver to inform the > device that new requests are ready for processing. The vCPU does not > need to wait since the access is a posted write. > > The kvm.ko ioeventfd mechanism can be used to implement asynchronous > doorbells. > > Shared device memory > ~~~~~~~~~~~~~~~~~~~~ > Devices may have memory-like regions that the CPU can access (such as > PCI Memory BARs). The device emulation process therefore needs to share > a region of its memory space with the VMM so the guest can access it. > This mechanism also allows device emulation to busy wait (poll) instead > of using synchronous MMIO/PIO accesses or asynchronous doorbells for > notifications. > > Direct Memory Access (DMA) > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > Devices often require read and write access to a memory address space > belonging to the CPU. This allows network cards to transmit packet > payloads that are located in guest RAM, for example. > > Early out-of-process device emulation interfaces simply shared guest > RAM. The allowed DMA to any guest physical memory address. More advanced > IOMMU and address space identifier mechanisms are now becoming > ubiquitous. Therefore, new out-of-process device emulation interfaces > should incorporate IOMMU functionality. > > The key requirement for IOMMU mechanisms is allowing the VMM to grant > access to a region of memory so the device emulation process can read > from and/or write to it. > > Interrupts > ~~~~~~~~~~ > Devices notify the CPU using interrupts. An interrupt is simply a > message sent by the device emulation process to the VMM. Interrupt > configuration is flexible on modern devices, meaning the driver may be > able to select the number of interrupts and a mapping (using one > interrupt with multiple event sources). This can be implemented using > the Linux eventfd mechanism or via in-band device emulation protocol > messages, for example. > > Extensibility for new bus types > ``````````````````````````````` > It should be possible to support multiple bus types. vhost-user only > supports vhost devices. VFIO is more extensible but currently focussed > on PCI devices. Wait a sec, the vfio API essentially deconstructs devices into exactly the resources you've outlined above. We not only have a vfio-pci device convention within vfio, but we've defined vfio-platform, vfio-amba, vfio-ccw, vfio-ap, and we'll likely be adding vfio-fsl-mc in the next kernel. The core device, group, and container model within vfio is completely device/bus agnostic. So while it's true that vfio-pci is the most mature and featureful convention, that's largely a reflection that PCI is the most ubiquitous device interface currently available. Thanks, Alex From stefanha at redhat.com Mon Oct 12 15:39:53 2020 From: stefanha at redhat.com (Stefan Hajnoczi) Date: Mon, 12 Oct 2020 16:39:53 +0100 Subject: [Rust-VMM] Requirements for out-of-process device emulation In-Reply-To: <20201009134449.041b5e71@x1.home> References: <20201009161815.GA321402@stefanha-x1.localdomain> <20201009134449.041b5e71@x1.home> Message-ID: <20201012153953.GC145304@stefanha-x1.localdomain> On Fri, Oct 09, 2020 at 01:44:49PM -0600, Alex Williamson wrote: > On Fri, 9 Oct 2020 17:18:15 +0100 > Stefan Hajnoczi wrote: > > Extensibility for new bus types > > ``````````````````````````````` > > It should be possible to support multiple bus types. vhost-user only > > supports vhost devices. VFIO is more extensible but currently focussed > > on PCI devices. > > Wait a sec, the vfio API essentially deconstructs devices into exactly > the resources you've outlined above. We not only have a vfio-pci > device convention within vfio, but we've defined vfio-platform, > vfio-amba, vfio-ccw, vfio-ap, and we'll likely be adding vfio-fsl-mc in > the next kernel. The core device, group, and container model within > vfio is completely device/bus agnostic. So while it's true that > vfio-pci is the most mature and featureful convention, that's largely a > reflection that PCI is the most ubiquitous device interface currently > available. Thanks, Hi Alex, Yes, I don't mean to say that VFIO cannot support new bus types. The most likely new bus type I can foresee is QEMU's SysBus, which would allow moving ISA, System-on-Chip, etc devices into a separate process. We'll need to figure out whether vfio-user evolves independently from the kernel VFIO ioctl interface or whether efforts are made to keep the two in sync. The kernel may not need SysBus, but as the vfio-user protocol diverges from the kernel VFIO ioctl interface it becomes harder to share the commands and avoid duplication. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From alex.bennee at linaro.org Mon Oct 12 17:16:18 2020 From: alex.bennee at linaro.org (Alex =?utf-8?Q?Benn=C3=A9e?=) Date: Mon, 12 Oct 2020 18:16:18 +0100 Subject: [Rust-VMM] Requirements for out-of-process device emulation In-Reply-To: <20201009161815.GA321402@stefanha-x1.localdomain> References: <20201009161815.GA321402@stefanha-x1.localdomain> Message-ID: <87ft6jz7od.fsf@linaro.org> Stefan Hajnoczi writes: > I just posted the following on my blog to outline the requirements that > have been discussed over the past few months around out-of-process > device emulation (vhost-user, vfio-user, etc). I hope it's helpful for > covering various angles of out-of-process device emulation. > > It's long, so no worries if you don't want to join the discussion. > Nice post. > Security > -------- > The trust model > ``````````````` > The VMM must not trust the device emulation program. This is key to > implementing privilege separation and the principle of least privilege. > If a compromised device emulation program is able to gain control of the > VMM then out-of-process device emulation has failed to provide isolation > between devices. > > The device emulation program must not trust the VMM to the extent that > this is possible. For example, it must validate inputs so that the VMM > cannot gain control of the device emulation process through memory > corruptions or other bugs. This makes it so that even if the VMM has > been compromised, access to device resources and associated system calls > still requires further compromising the device emulation process. However in this model the guest intrinsically trusts device emulation because it currently has full access to the guest's address space. It would probably be worth making that explicit. There are security models where the guest doesn't need to trust the VMM or particular device emulations. > Conclusion > ---------- > This was largely a brain dump but I hope it is useful food for thought > as out-of-process device emulation interfaces are designed and > developed. There is a lot more to it than simply implementing a protocol > for device register accesses and guest RAM DMA. Developing open source > libraries in Rust and C that can be used as needed will ensure that > out-of-process devices are high-quality and easy for users to deploy. A useful exercise ;-) -- Alex Bennée From kendall at openstack.org Fri Oct 16 19:58:17 2020 From: kendall at openstack.org (Kendall Waters) Date: Fri, 16 Oct 2020 14:58:17 -0500 Subject: [Rust-VMM] Your Virtual PTG Checklist Message-ID: <454C8BA0-BCD0-4AF0-AC69-00247FDC1E76@openstack.org> REGISTRATION If you haven't done so, please register for the PTG! This is how we will be able to provide you with the tooling information and passwords. Register here: https://october2020ptg.eventbrite.com FINAL SCHEDULE The final schedule [1] for the event is set here and in the PTGBot [2]. IRC The main form of synchronous communication between attendees during the PTG is on IRC. If you are not on IRC, learn how to get started here [3]. The main PTG IRC channel is #openstack-ptg on Freenode. It's used to interact with the PTGbot, and Foundation staff will be present to help answer questions. PTGBOT The PTGbot [3] is an open source tool that PTG track moderators use to surface what's currently happening at the event. Track moderators will send messages to the bot via IRC, and from that information, the bot publishes a webpage with several sections of information: - The discussion topics currently discussed in the room ("now") - An indicative set of discussion topics coming up next ("next") - The schedule for the day with available extra slots you can book Learn more about the ptgbot via the documentation here [4]. HELP DESK We are here to help! If you have any questions during the event week, we encourage you to join the #openstack-ptg IRC channel and ask them there. You can ping Kendall Waters (wendallkaters) and Kendall Nelson(diablo_rojo) directly on IRC. We will also have a dedicated Zoom room, but ONLY FOR MONDAY, October 26 where an OSF staff member will be available to answer your event related questions. You can also always reach someone at ptg at openstack.org if you are unable to connect to IRC. FEEDBACK We have preemptively created an etherpad[5] to collect all of your feedback throughout the event. Please add your thoughts as the week goes on. GAME TIME! Since we all know that coming together as a community to hang out and play games is part of what makes the PTG great, we are going to try something new this time around. This PTG, we are going to have two 'game nights'- obviously time zones are hard and it might not be night for everyone, but the times are as follows. - Thursday, October 29 at 19:00 UTC - Friday, October 30 at 10:00 UTC - Friday, October 30 at 23:00 UTC If you are interested in participating, write your name in the etherpad [6]. We will use it to coordinate games for each of the timeslots. [1] PTG schedule: https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/PTG2-Oct26-30-2020-Schedule-1.pdf [2] PTGbot: http://ptg.openstack.org/ptg.html [3] How to get started on IRC: https://docs.openstack.org/contributors/common/irc.html [4] PTGbot documentation: https://github.com/openstack/ptgbot/blob/master/README.rst [5] Feedback Etherpad: https://etherpad.opendev.org/p/October2020-PTG-Feedback [6] Game Time Etherpad: https://etherpad.opendev.org/p/October2020-PTG-Games -------------- next part -------------- An HTML attachment was scrubbed... URL: From claire at openstack.org Mon Oct 19 15:17:25 2020 From: claire at openstack.org (Claire Massey) Date: Mon, 19 Oct 2020 10:17:25 -0500 Subject: [Rust-VMM] Introducing the Open Infrastructure Foundation Message-ID: Hi everyone, If you’re not watching the Open Infrastructure Summit keynotes right now, you just missed some big news (and it’s not too late to tune in!). Jonathan Bryce just announced the launch of the Open Infrastructure Foundation (OIF) as the successor to the OpenStack Foundation (OSF). The name change reflects all of our community’s work building open infrastructure with dozens of open source components. We look forward to continuing to collaborate with you over the next decade to build open source communities that write software for production. The Summit keynotes are still going , so login now to hear more about this announcement and how Rust-VMM and other open source projects fit into the next decade of open infrastructure. And if you haven’t already, join OIF as an individual member so you can get involved! Thanks, Claire -------------- next part -------------- An HTML attachment was scrubbed... URL: