[Rust-VMM] [PTG] Meeting Notes
jenny.mankin at crowdstrike.com
Wed May 8 03:26:32 UTC 2019
Thanks for the detailed summary of everything discussed at the PTG meetup!
Regarding the vmm-vcpu crate, I'd provided more detail on the PR as a reply to Zach's comment, but it's not a very visible location that gets a lot of traffic so I thought I'd solicit feedback here as well.
In that thread, I've provided what I think is a technical justification for a vCPU abstraction crate, regardless of its ultimate utility in a full hypervisor-agnostic or Hyper-V implementation of Firecracker or Crosvm. Full explanation below (feel free to reply here, or on the comment thread itself at https://github.com/rust-vmm/vmm-vcpu/pull/3#issuecomment-489174754).
I'm curious for the community's thoughts on whether this is sufficient justification for the crate, or whether demonstrable integration into Crosvm or Firecracker is actually a prerequisite for a rust-vmm abstraction crate such as this one (eg, as requested, prove that Firecracker/Crosvm can support different hypervisors).
**** Original comment below (thread: https://github.com/rust-vmm/vmm-vcpu/pull/3#issuecomment-489174754) ****
You are certainly right in that the differences in the VcpuExit structure (due to the underlying vCPU exits exposed by each hypervisor) make it such that any code making use of the run() function would need to specialize its processing of the exits based on hypervisor. This would need to either be accomplished directly at the layer performing the vCPU.run(), or might be itself abstracted within a higher-level crate. For example, a hypervisor-agnostic VM crate might utilize the trait generic (with VMM-specific references providing implementation of those operations). See, for example, the proposed issue to provide additional abstractions of a VM and a VMM that makes use of abstracted vCPU functionality.
Getting crosvm/Firecracker to achieve parity with Hyper-V in addition to KVM is an ambitious goal, and it's true that doing so will require more layers than just swapping in a vCPU implementation of a generic trait. The specifics of what this would look like is something we'd like to look at, and focusing on/POCing the handling of the VcpuExit is a good suggestion.
Stepping back from these more-ambitious goals, I think the vCPU crate still offers opportunity for abstraction for common VMM-related operations in higher-level crates that utilize common vCPU functionality. The arch crate comes to mind. In development of the Hyper-V-based libwhp crate, some of the arch functionality had to be duplicated, stripped of KVM-specific objects and APIs, and imported as a separate libwhp-specific crate. The duplication was one of the motivations behind my proposal of the arch crate here for rust-vmm: it naturally lends itself to a hypervisor-agnostic solution that can be easily imported into different VMM projects. And as we discussed a couple weeks ago on the rust-vmm call, since those APIs accept the Vcpu trait generic as an input parameter, there is "zero cost" to the abstraction due to the static dispatch.
That is one example where the generic abstraction provided by the vCPU crate benefits other hypervisor-agnostic crates; I think it's reasonable to assume others exist. For example, we are also currently researching and developing a Windows loader crate; this makes use of these same vCPU APIs and abstraction implementations to set up the VM.
So independent of our goals to achieve interchangeable VMMs in ambitious projects like crosvm and Firecracker, I think that having generic crates abstracting lower-level functionality provides benefits to smaller-scale projects, like those that might be using rust-vmm crates as building blocks to their own VMMs.
From: Boeuf, Sebastien <sebastien.boeuf at intel.com>
Sent: Tuesday, May 7, 2019 6:44 AM
To: rust-vmm at lists.opendev.org
Subject: [External] [Rust-VMM] [PTG] Meeting Notes
Here are some notes about the PTG meeting that we had in Denver:
The dual licensing purpose is to make sure that Apache2 will not conflict with GPLv2 licensed projects such as QEMU, which could eventually use rust-vmm. The decision is to move from a dual
MIT+Apache2 proposal to a dual 3-clause BSD+Apache2. 3-clause BSD is
not incompatible with GPLv2, the same way MIT is not incompatible with GPLv2. But the benefit of having 3-clause BSD instead of MIT is to not conflict with the Crosvm existing code that uses 3-clause BSD.
We currently have Buildkite running on kvm-ioctls crate. Buildkite runs on x86-64 and aarch64. We need some Windows testing, to test the abstraction patches proposed by Crowdstrike folks. Cloudbase will provide the Windows server to run the Windows CI.
Proposal about having a dedicated “test” repo in the rust-vmm organization. This would allow every crate to rely on this common “test” repo to centralize the tests.
Also, we talked about creating a “dummy” VMM that would be a superset VMM since it would pull every rust-vmm crate. This VMM would allow full integration testing, additionally to the unit tests already running on each crate.
The CI should rely on top of tree crate on every pull request, as we want to test the latest master version. Because the CI will ensure that a pull request is merged only if the CI passes, we can be sure that master will never be broken. This is the reason why we can safely run integration tests based on master branch from each crate.
Testing will be slightly different on a “release” pull request because it will modify the Cargo.toml file to make sure we’re testing the right version of every crate, before we can give the green light before publishing.
How will the https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= publishing be done?
“All at once” vs “each crate can be updated at any time”. Because we want to keep the CI simple, which means it will always test on top of tree, we chose to go with “all at once” solution. Releases are cheap, hence we will bump the versions of every crate every time we want to update one or more crates.
We took the decision not to have any stable branches on our repositories. The project is not mature enough to increase the complexity of having one or more stable branches. With the same idea in mind, we took the decision not to have any stable releases to https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= .
How to publish on https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= with some human gatekeeping?
We didn’t take a decision regarding this question, but here are the two discussed approaches:
We would create a bot having a https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= key stored on Github with a set of maintainers that can push the button to let the bot do the work.
We would have manual publishing from any maintainer from the set of maintainers keys.
The concern regarding the bot is about the key which needs to be stored on Github (security concern about having the key being stolen).
Crosvm/Firecracker consuming rust-vmm crates
Both projects expect to consume crates directly from https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= , because this means the crates are mature enough to be consumed.
The criteria for a crate to be published on https://urldefense.proofpoint.com/v2/url?u=http-3A__crates.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=xhrEdtnHf3MFtE3B_lIWGO8-CDq9nQiA_ZvXN-QicL4&e= is to have proper documentation, tests, … as documented here:
QEMU’s interest in rust-vmm
QEMU could benefit from low level parts such as vm-memory crate to make QEMU core parts more secure.
The other aspect is about vhost-user backends, as they should be able to consume rust-vmm crates to be implemented in Rust, and be reused by any VMM.
The first PR is ready and waiting for more review from Firecracker folks. From Crosvm perspective, the PR is alright, but internal projects (not only about VMM) are using sys-utils too. That’s the reason why it’s not straightforward to replace their sys-utils crate with the vm-sys-utils one, as they will have to write a sys-utils wrapper on top of vm-sys-utils, so that other internal projects can still consume the right set of utility functions.
Vhost and vhost-user
Vhost pull request has been submitted by Gerry and needs to be reviewed. We didn’t spend time reviewing it during the PTG.
The vhost-user protocol needs to implement the protocol both for the slave and the master. This way, the master can be consumed from the VMM side, and the slave can be consumed from any vhost-user daemon (interesting for any VMM that could directly reuse the vhost-user backend). In particular, there is some ongoing work from Redhat about writing a virtiofsd vhost-user daemon in Rust. This should be part of the rust-vmm project, pulling the vhost-user-protocol.
The remaining question is to determine if the vhost backends, including vhost-user (hence including the protocol itself) should live under a single crate?
We had a discussion about using a vm-allocator crate as a helpful component to decide about memory ranges (MMIO or PIO) for any memory region related to a device.
Based on the feedback from Paolo, Alex and Stefan, we need to design carefully this allocator if we want to be able to support PCI BAR programming from the firmware or the guest OS. This means we should be able to handle any sort of PCI reprogramming to update the ranges chosen by the vm-allocator, since this is how the PCI spec is defined.
Summary of the priorities
We could maintain a list of priorities in an etherpad (or any sort of public/shared document), and at the end of each week, send the list of hot topics and priorities to make sure everyone from the community is on the same page.
Which tool should we use to control code coverage?
Kcov is one alternative but it seems like it’s not running on aarch64, which is a blocker since the project wants to support multiple CPU architectures. Kcov might be the one being picked up as at first, but we need to investigate for other solutions.
Do we need to gate pull requests based on the coverage result?
The discussion went both ways on this topic, but I think the solution we agreed upon was to gate based on a code coverage value. Now, an important point about this value, it is not immutable, and based on the manual review from the maintainers, we can lower this value if it makes sense.
For instance, if some new piece of code is being added, it does not mean that we have to implement test for the sake of keeping the coverage at the same level. If maintainers, as they are smart people, realize it makes no sense to test this new piece of code, then the decision will be made to reduce the threshold.
Part of the regular CI running on every pull request, we want to run multiple linters to maintain a good code quality.
We want some fuzzing on the rust-vmm crates. Now the question is to identify which one are the most unsafe crates. For instance, being able to fuzz the virtqueues (part of the virtio crate) should be very interesting to validate their proper behavior.
Also, fuzzing vhost-user backends when they will be part of the rust- vmm project will be one very important task if we want to provide secure backends for any VMM that could reuse them.
At some point, the project will run into some nasty bugs considered as real security threats. In order to anticipate when this day will come, we should define a clear process on how to limit the impact on the rust-vmm users, and to describe how to handle this issue (quick fix, long term plan, etc...).
After a lot of discussions about the feasibility of having a trait for Vcpu, we came up to the conclusion that without further proof and justification the trait will provide any benefit, we should simply split HyperV and KVM into separate packages. The reason is, we don’t think those two hypervisors have a lot in common, and it might be more efforts to try to find similarities rather than splitting them into distincts pieces.
One interesting data point that we would like to look at in the context of this discussion is about the work that Alessandro has been doing to port Firecracker on HyperV. Being able to look at his code might be helpful in understanding the fundamental differences between HyperV and KVM.
The pull request #10 is splitting the mmap functionality coming from Linux, and it adds the support for the Windows mmap equivalent. The code has been acknowledged by everybody as ready to be merged once the comments about squashing and reworking the commit message will be addressed.
We discussed about the vm-device issue that has been opened for some time now. Some mentioned that it is important to keep the Bus trait generic so that any VMM could still reuse it, adapting some wrappers for devices if necessary.
Based on the comments on the issue, it was pretty confusing where things will go with this crate, and that’s why we agreed on waiting for the pull request to be submitted before going further into hypothetical reviews and comments.
Samuel will take care of submitting the pull request for this.
Community README about rust-vmm goals
We listed the main points we wanted to mention on the README from the community repository. Andreea took the AR to write the documentation describing the goals and motivation behind the project, based on the defined skeleton.
We also mentioned that having a https://urldefense.proofpoint.com/v2/url?u=http-3A__github.io&d=DwIGaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=PoK8upqsrqMY9Q21QxWB0ENVVKaX285kXk_XNb3b0rA&m=GEXtlh6N-Kldupyx-IfpIsG-kbyGhS1DKMLTO8yRLao&s=wx-ksxvpLnd2CakE0HNAH41C6ykm5kLL7h4f4L9xbpE&e= webpage for the project would be a better way to promote the project. We will need to create a dedicated repo for that, as part of the rust-vmm Github organization.
We will need to put some effort into putting this webpage together at some point, the first step being to duplicate more or less the content of the README.
Rust-vmm mailing list
Rust-vmm at lists.opendev.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4901 bytes
Desc: not available
More information about the Rust-vmm