[Rust-VMM] vhost reply_ack negotiation (a.k.a differences in vhost-user behaviour with libvhost-user and vhost-user-backend.rs)

Alex Bennée alex.bennee at linaro.org
Sat Feb 27 12:23:08 UTC 2021


Raphael Norwitz <raphael.s.norwitz at gmail.com> writes:

> As an afterthought - if VHOST_USER_F_PROTOCOL_FEATURES is indeed
> unset, the issue may well be caused by QEMU reading an uninitialized
> value for dev->protocol_features. Some device types like cryptodev
> explicitly zero it out. As I said, it isn't set anywhere else in the
> source and If dev->protocol_features had REPLY_ACK set when the
> vhost_dev device is initialized, it would exactly explain the behavior
> you are seeing.

Can I point you at the proposed clarification of the vhost-user
specification:

  Subject: [VHOST USER SPEC PATCH] vhost-user.rst: add clarifying language about protocol negotiation
  Date: Fri, 26 Feb 2021 11:16:19 +0000
  Message-Id: <20210226111619.21178-1-alex.bennee at linaro.org>


>
> On Fri, Feb 26, 2021 at 9:58 AM Raphael Norwitz
> <raphael.s.norwitz at gmail.com> wrote:
>>
>> There are two sets of features being negotiated - virtio and
>> vhost-user.  Based on what you've posted here, I suspect the
>> VHOST_USER_F_PROTOCOL_FEATURES virtio feature may not be negotiated by
>> the backend, preventing the vhost-user protocol feature negotiation
>> from happening at all. I'm not 100% sure why this would cause QEMU to
>> assume that REPLY_ACK was negotiated though.
>>
>> some questions:
>>
>> On Mon, Feb 22, 2021 at 3:26 AM Alex Bennée <alex.bennee at linaro.org> wrote:
>> >
>> >
>> > Dr. David Alan Gilbert <dgilbert at redhat.com> writes:
>> >
>> > > * Alex Bennée (alex.bennee at linaro.org) wrote:
>> > >> Hi,
>> > >>
>> > >> I finally got a chance to get down into the guts of vhost-user while
>> > >> attempting to port my original C RPMB daemon to Rust using the
>> > >> vhost-user-backend and related crates. I ended up with this hang during
>> > >> negotiation:
>> > >>
>> > >>   startup
>> > >>
>> > >>   vhost_user_write req:1 flags:0x1
>> > >>   vhost_user_read_start
>> > >>   vhost_user_read req:1 flags:0x5
>> > >>   vhost_user_backend_init: we got 170000000
>> >
>> > GET_FEATURES
>>
>> Do we also see a GET_PROTOCOL_FEATURES and a SET_PROTOCOL_FEATURES
>> message here? If so can you confirm what flags they contained?
>>
>> vhost-user feature negotiation works as follows (see vhost_user_backend_init()):
>>
>> err = vhost_user_get_features(dev, &features);
>> if (err < 0) {
>>     return err;
>> }
>>
>> if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
>>     dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
>>
>>     err = vhost_user_get_u64(dev, VHOST_USER_GET_PROTOCOL_FEATURES,
>>                                               &protocol_features);
>>     if (err < 0) {
>>         return err;
>>     }
>>
>>     dev->protocol_features =
>>         protocol_features & VHOST_USER_PROTOCOL_FEATURE_MASK;
>> ...
>>
>>     err = vhost_user_set_protocol_features(dev, dev->protocol_features);
>>     if (err < 0) {
>>          return err;
>>      }
>> }
>>
>> So we first get the virtio features and check if the backend
>> advertises VHOST_USER_F_PROTOCOL_FEATURES. If it does, we proceed to
>> negotiate vhost-user features, in which case we should see
>> GET_PROTOCOL_FEATURES and a SET_PROTOCOL_FEATURES. Otherwise it looks
>> like the function just returns, and we leave the vhost-user features
>> uninitialized (presumably zeroed out?), and the backend will never
>> even receive a GET/SET_PROTOCOL_FEATURES.
>>
>> dev->protocol_features is not touched anywhere else, and, if
>> VHOST_USER_F_PROTOCOL_FEATURES is negotiated, comes directly to the
>> backend from the protocol_features the backend &ed with
>> VHOST_USER_PROTOCOL_FEATURE_MASK. Therefore if
>> VHOST_USER_F_PROTOCOL_FEATURES is indeed negotiated here I'm not sure
>> what could cause QEMU to think REPLY_ACK was negotiated while the
>> backend does not, spare something obvious like the backend mishandling
>> the GET/SET_PROTOCOL_FEATURES messages. I briefly checked the rustvmm
>> code for that and didn't see anything obvious.
>>
>> mst - are backend devices meant to function if
>> VHOST_USER_F_PROTOCOL_FEATURES is not advertised? Do we know of any
>> functioning backend which does not advertise this virtio feature? If
>> not, maybe we consider failing out here?
>>
>> alex - Are you sure QEMU gets stuck waiting on a reply_ack message,
>> and not somewhere else in the setup path? I trust a SET_MEM_TABLE
>> message was actually received by the backend. Did you confirm that
>> QEMU was indeed stuck waiting for a reply and not somewhere else later
>> on?
>>
>> >
>> > >>   vhost_user_write req:15 flags:0x1
>> > >>   vhost_user_read_start
>> > >>   vhost_user_read req:15 flags:0x5
>> > >>   vhost_user_set_protocol_features: 2008
>> > >>   vhost_user_write req:16 flags:0x1
>> > >>   vhost_user_write req:3 flags:0x1
>> > >>   vhost_user_write req:1 flags:0x1
>> > >>   vhost_user_read_start
>> > >>   vhost_user_read req:1 flags:0x5
>> > >>   vhost_user_write req:13 flags:0x1
>> > >>
>> > >>   kernel initialises device
>> > >>
>> > >>   virtio_rpmb virtio1: init done!
>> > >>   vhost_user_write req:13 flags:0x1
>> > >>   vhost_dev_set_features: 130000000
>> > >>   vhost_user_set_features: 130000000
>> >
>> > SET_FEATURES
>>
>> This is setting virtio features - should have nothing to do with REPLY_ACK.
>>
>> >
>> > >>   vhost_user_write req:2 flags:0x1
>> > >>   vhost_user_write req:5 flags:0x9
>> > >>   vhost_user_read_start
>> > >>
>> > <snip>
>> > >>
>> > >>  - Should QEMU have preserved VhostUserVirtioFeatures::PROTOCOL_FEATURES
>> > >>    when doing the eventual VHOST_USER_SET_FEATURES reply?
>> > >>
>> > >>  - Is vhost.rs being to strict or libvhost-user too lax in interpreting
>> > >>    the negotiated features before processing the ``need_reply`` [Bit 3]
>> > >>    field of the messages?
>> > >
>> > > I think vhost.rs is being correctly strict - but there would be no harm
>> > > in it flagging that you'd hit an inconsistency if it finds a need_reply
>> > > without the feature.
>> >
>> > But the feature should have been negotiated. So unless the slave can
>> > assume it is enabled because it asked I think QEMU is in the wrong by
>> > not preserving the feature bits in it's SET_FEATURES reply. We just gets
>> > away with it with libvhostuser being willing to reply anyway.
>> >
>> > >
>> > >>  - are VHOST_USER_SET_MEM_TABLE to VHOST_USER_SET_INFLIGHT_FD included
>> > >>    in the "list of the ones that do" require replies or do they only
>> > >>    reply when REPLY_ACK has been negotiated as the ambiguous "seealso::"
>> > >>    box out seems to imply?
>> > >
>> > > set_mem_table gives a reply when postcopy is enabled (and then qemu
>> > > replies to the reply!) but otherwise doesn't.
>> > > (Note there's an issue opened for .rs to support ADD_MEM_REGION
>> > > since it's a lot better than SET_MEM_TABLE which has a fixed size table
>> > > that's small).
>> >
>> > Thanks for the heads up.
>> >
>> > >
>> > > Dave
>> > >
>> > >> Currently I have some hacks in:
>> > >>
>> > >>   https://github.com/stsquad/vhost/tree/my-hacks
>> > >>
>> > >> which gets my daemon booting up to the point we actually need to do a
>> > >> transaction. However I won't submit a PR until I've worked out exactly
>> > >> where the problems are.
>> > >>
>> > >> --
>> > >> Alex Bennée
>> > >>
>> >
>> >
>> > --
>> > Alex Bennée
>> >


-- 
Alex Bennée



More information about the Rust-vmm mailing list