[Edge-computing] [openstack-dev] [ironic][edge] Notes from the PTG
Csatari, Gergely (Nokia - HU/Budapest)
gergely.csatari at nokia.com
Fri Sep 28 09:54:11 UTC 2018
Thanks for sharing your notes.
One note about the jumping automomus control plane requirement.
This requirement was already identified during the Dublin PTG workshop [1<https://wiki.openstack.org/w/index.php?title=OpenStack_Edge_Discussions_Dublin_PTG>]. This is needed for two reasons the edge cloud instance should stay operational even if there is a network break towards other edge cloud instances and the edge cloud instance should work together with other edge cloud instances running other version of the control plane. In Denver we deided to leave out these requirements form the MVP architecture discussions.
From: Jim Rollenhagen <jim at jimrollenhagen.com<mailto:jim at jimrollenhagen.com>>
Reply-To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, September 19, 2018 at 10:49 AM
To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: [openstack-dev] [ironic][edge] Notes from the PTG
I wrote up some notes from my perspective at the PTG for some internal teams and figured I may as well share them here. They're primarily from the ironic and edge WG rooms. Fairly raw, very long, but hopefully useful to someone. Enjoy.
Edge WG (IMHO) has historically just talked about use cases, hand-waved a bit, and jumped to requiring an autonomous control plane per edge site - thus spending all of their time talking about how they will make glance and keystone sync data between control planes.
penick described roughly what we do with keystone/athenz and how that can be used in a federated keystone deployment to provide autonomy for any control plane, but also a single view via a global keystone.
penick and I both kept pushing for people to define a real architecture, and we ended up with 10-15 people huddled around an easel for most of the afternoon. Of note:
- Windriver (and others?) refuse to budge on the many control plane thing
- This means that they will need some orchestration tooling up top in the main DC / client machines to even come close to reasonably managing all of these sites
- They will probably need some syncing tooling
- glance->glance isn’t a thing, no matter how many people say it is.
- Glance PTL recommends syncing metadata outside of glance process, and a global(ly distributed?) glance backend.
- We also defined the single pane of glass architecture that Oath plans to deploy
- Okay with losing connectivity from central control plane to single edge site
- Each edge site is a cell
- Each far edge site is just compute nodes
- Still may want to consider image distribution to edge sites so we don’t have to go back to main DC?
- Keystone can be distributed the same as first architecture
- Nova folks may start investigating putting API hosts at the cell level to get the best of both worlds - if there’s a network partition, can still talk to cell API to manage things
- Need to think about removing the need for rabbitmq between edge and far edge
- Kafka was suggested in the edge room for oslo.messaging in general
- Etcd watchers may be another option for an o.msg driver
- Other other options are more invasive into nova - involve changing how nova-compute talks to conductor (etcd, etc) or even putting REST APIs in nova-compute (and nova-conductor?)
- Neutron is going to work on an OVS “superagent” - superagent does the RPC handling, talks some other way to child agents. Intended to scale to thousands of children. Primary use case is smart nics but seems like a win for the edge case as well.
penick took an action item to draw up the architecture diagrams in a digestable format.
Wednesday: ironic things
Started with a retrospective. See https://etherpad.openstack.org/p/ironic-stein-ptg-retrospective for the notes - there wasn’t many surprising things here. We did discuss trying to target some quick wins for the beginning of the cycle, so that we didn’t have all of our features trying to land at the end. Using wsgi with the ironic-api was mentioned as a potential regression, but we agreed it’s a config/documentation issue. I took an action to make a task to document this better.
Next we quickly reviewed our vision doc, and people didn’t have much to say about it.
Metalsmith: it’s a thing, it’s being included into the ironic project. Dmitry is open to optionally supporting placement. Multiple instances will be a feature in the future. Otherwise mostly feature complete, goal is to keep it simple.
Networking-ansible: redhat building tooling that integrates with upstream ansible modules for networking gear. Kind of an alternative to n-g-s. Not really much on plans here, RH just wanted to introduce it to the community. Some discussion about it possibly replacing n-g-s later, but no hard plans.
Deploy steps/templates: we talked about what the next steps are, and what an MVP looks like. Deploy templates are triggered by the traits that nodes are scheduled against, and can add steps before or after (or in between?) the default deploy steps. We agreed that we should add a RAID deploy step, with standing questions for how arguments are passed to that deploy step, and what the defaults look like. Myself and mgoddard took an action item to open an RFE for this. We also agreed that we should start thinking about how the current (only) deploy step should be split into multiple steps.
Graphical console: we discussed what the next steps are for this work. We agreed that we should document the interface and what is returned (a URL), and also start working on a redfish driver for graphical consoles. We also noted that we can test in the gate with qemu, but we only need to test that a correct URL is returned, not that the console actually works (because we don’t really care that qemu’s console works).
Python 3: we talked about the changes to our jobs that are needed. We agreed to use the base name of the jobs for Python 3 (as those will be used for a long time), and add a “python2” prefix for the Python 2 jobs. We also discussed dropping certain coverage for Python 2, as our CI jobs tend to mostly test the same codepaths with some config differences. Last, we talked about mixed environment Python 2 and 3 testing, as this will be a thing people doing rolling upgrades of Python versions will hit. I sent an email to the ML asking if others had done or thought about this, and it sounds like we can limit that testing to oslo.messaging, and a task was reported there.
Pre-upgrade checks: Not much was discussed here; TheJulia is going to look into it. One item of note is that there is an oslo project being proposed that can carry some of the common code for this.
Performance improvements: We first discussed our virt driver’s performance. It was found that Nova’s power sync loop makes a call to Ironic for each instance that the compute service is managing. We do some node caching in our driver that would be useful for this. I took an action item to look into it, and have a WIP patch: https://review.openstack.org/#/c/602127/ . That patch just needs a bug filed and unit tests written. On Thursday, we talked with Nova about other performance things, and agreed we should implement a hook in Nova that Ironic can do to say “power changed” and “deploy done” and other things like this. This will help reduce or eliminate polling from our virt driver to Ironic, and also allow Nova to notice these changes faster. More on that later?
Splitting the conductor: we discussed the many tasks the conductor is responsible for, and pondered if we could or should split things up. This has implications (good and bad) for operability, scalability, and security. Splitting the conductor to multiple workers would allow operators to use different security models for different tasks (e.g. only allowing an “OOB worker” access to the OOB network). It would also allow folks to scale out workers that do lots of work (like the power status loop) separately from those that do minimal work (writing PXE configs). I intend to investigate this more during this cycle and lay out a plan for doing the work. This also may require better distributed locking, which TheJulia has started investigating.
Changing boot mode defaults: Apparently Intel is going to stop shipping hardware that is capable of legacy BIOS booting in 2020. We agreed that we should work toward changing the default boot mode to UEFI to better prepare our users, but we can’t drop legacy BIOS mode until all of the old hardware in the world is gone. TheJulia is going to dig through the code and make a task list.
UEFI HTTPClient booting: This is a DHCP class that allows the DHCP server to return a URL instead of a “next-server” (TFTP location) response. This is a clear value add, and TheJulia is going to work on it as she is already neck deep in that area of code. We also need to ensure that Neutron supports this. It should, as it’s just more DHCP options, but we need to verify.
SecureBoot: I presented Oath’s secureboot model, which doesn’t depend on a centralized attestation server. It made sense to people, and we discussed putting the driver in tree. The process does rely on some enhancements to iPXE, so Oath is going to investigate upstreaming those changes and publishing more documentation, and then an in-tree driver should be no problem. We also discussed Ironic’s current SecureBoot (TrustedBoot?) implementations. Currently it only works with PXE, not iPXE or Grub2. TheJulia is going to look into adding this support. We should be able to do CI jobs for it, as TPM 1.2 and 2.0 emulation both seem to be supported in QEMU as of 2.11.
NIC PXE configuration as a clean step: the DRAC driver team has a desire to configure NICs for PXE or not, and sync with the ironic database’s pxe_enabled field. This has gone back and forth in IRC. We were able to resolve some of the issues with it, and rpioso is going to write a small spec to make sure we get the details right.
Thursday: more ironic things
Neutron cross-project discussion: we discussed SmartNICs, which the Neutron team had also discussed the previous day. In short, SmartNICs are NICs that run OVS. The Neutron team discussed the scalability of their OVS agent running across thousands of machines, and are planning to make some sort of “superagent”. This superagent essentially owns a group of OVS agents. It will talk to Neutron over rabbit as usual, but then use some other protocol to talk to the OVS agents it is managing. This should help with rabbit load even in “standard” Openstack environments, and is especially useful (to me) for minimizing rabbitmq connections from far edge sites. The catch with SmartNICs and Ironic is that the NICs must have power to be configured (and thus the machine must be on). This breaks our general model of “only configure networking with the machine off, to make sure we don’t cross streams between tenants and control plane”. We came to a decent compromise (I think), and agreed to continue in the ironic spec, and revisit the topic in Berlin.
Federation: we discussed federation and people seemed interested, however I don’t believe we made any real progress toward getting it done. There’s still a debate whether this should be something in Ironic itself, or if there should just be some sort of proxy layer in front of multiple Ironic environments. To be continued in the spec.
Agent polling: we discussed the spec to drop communication from IPA to the conductor. It seems like nobody has major issues with it, and the spec just needs some polishing before landing.
L3 deployments: We brought this up, and again there seems to be little contention. I ended up approving the spec shortly after.
Neutron event processing: This work has been hanging for years and not getting done. Some folks wondered if we should just poll Neutron, if that gets the work done more quickly. Others wondered if we should even care about it at all (we should). TheJulia is going to follow up with dtantsur and vdrok to see if we can get someone to mainline some caffeine and just get it done.
CMDB: Oath and CERN presented their work toward speccing out a CMDB application that can integrate with Ironic. We discussed the problems that they are trying to solve and agreed they need solving. We also agreed that strict schema is better than blobjects (© jaypipes). We agreed it probably doesn’t need to be in Ironic governance, but could be one day. The next steps are to start hacking in a new repo in the OpenStack infrastructure, and propose specs for any Ironic integration that is needed. Red Hat and Dell contributors also showed interest in the project and volunteered to help. Some folks are going to try and talk to the wider OpenStack community to find out if there’s interest or needs from projects like Nova/Neutron/Cinder, etc.
Stein goals: We put together a list of goals and voted on them. Julia has since proposed the patch to document them: https://review.openstack.org/#/c/603161/
Last thing Thursday: Cross-project discussions with Nova. Summarized here, but lots of detail in the etherpad under the Ironic section: https://etherpad.openstack.org/p/nova-ptg-stein
Power sync: We discussed some problems CERN has with the instance power sync (Rackspace also saw these problems). In short, nova asserts power state if the instance “should” be off but the power is turned on out-of-band. Operators definitely need to be aware of this when doing maintenance on active machines, but we also discussed Ironic calling back to Nova when Ironic knows that the power state has been updated (via Ironic API, etc). I volunteered to look at this, and dansmith volunteered to help out.
API heaviness: We discussed how many API calls our virt driver does. As mentioned earlier, I proposed a patch to make the power sync loop more lightweight. There’s also lots of polling for tasks like deploy and rescue, which we can dramatically reduce with a callback from Ironic to Nova. I also volunteered to investigate this, and dansmith again agreed to help.
Compute host grouping: Ironic now has a mechanism for grouping conductors to nodes, and we want to mirror that in Nova. We discussed how to take the group as a config option and be able to find the other compute services managing that group, so we can build the hash ring correctly. We concluded that it’s a really hard problem (TM), and agreed to also add a config option like “peer_list” that can be used to list other compute services in the same group. This can be read dynamically each time we build the hash ring, or can be a mutable config with updates triggered by a SIGHUP. We’ll hash out the details in a blueprint or spec. Again, I agreed to begin the work, and dansmith agreed to help.
Capabilities filter: This was the last topic. It’s been on the chopping block for ages, but we are just now reaching the point where it can be properly deprecated. We discussed the plan, and mostly agreed it was good enough. johnthetubaguy is going to send the plan wider and make sure it will work for folks. We also discussed modeling countable resources on Ironic resource providers, which will work as long as there is still some resource class with an inventory of one, like we have today. Some folks may investigate doing this, but it’s fuzzy how much people care or if we really need/want to do it.
Friday: kind of bummed around the Ironic and TC rooms. Lots of interesting discussions, but nothing I feel like writing about here (as Ironic conversations were things like code deep-dives not worth communicating widely, and the TC topics have been written about to death).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Edge-computing