From ildiko at openstack.org Thu May 2 14:50:37 2019 From: ildiko at openstack.org (Ildiko Vancsa) Date: Thu, 2 May 2019 08:50:37 -0600 Subject: [Edge-computing] Zoom link for Edge Forum sessions today and tomorrow at the PTG Message-ID: <37305046-14D9-4E98-AAF4-924851E83876@openstack.org> Hi, As I mentioned earlier I will open a Zoom call for the Edge WG PTG session today and the StarlingX sessions this afternoon and Friday. PTG session etherpads: https://wiki.openstack.org/wiki/PTG/Train/Etherpads Dial-in info is here: https://zoom.us/j/642623527 One tap mobile +16699006833,,642623527# US (San Jose) +16468769923,,642623527# US (New York) Dial by your location +1 669 900 6833 US (San Jose) +1 646 876 9923 US (New York) Meeting ID: 642 623 527 Find your local number: https://zoom.us/u/achaHVeO9b Please let me know if you have any questions. Thanks and Best Regards, Ildikó From David.Paterson at dell.com Sat May 4 17:06:03 2019 From: David.Paterson at dell.com (David.Paterson at dell.com) Date: Sat, 4 May 2019 17:06:03 +0000 Subject: [Edge-computing] - Airship 2.0 support for edge Message-ID: <3c3a9f62c2ba48e6b9ae706ca7d7ffb5@ausx13mpc124.AMER.DELL.COM> I am in the last Airship PTG day and the Edge Working Group came up in the discussion. It would be really good to have someone from Airship attend one of our calls to go over Airship's plan for the edge in their next release. I would suggest sometime this summer would be good. Thanks! dp From allison at lohutok.net Sat May 4 22:01:36 2019 From: allison at lohutok.net (Allison Randal) Date: Sat, 4 May 2019 16:01:36 -0600 Subject: [Edge-computing] proceedings of EdgeSys'19 at EuroSys Message-ID: The paper list of the EdgeSys Workshop is available in the ACM library, click the "Table of Contents" for an overview: https://dl.acm.org/citation.cfm?id=3301418 Allison From beth.cohen at verizon.com Sun May 5 19:06:46 2019 From: beth.cohen at verizon.com (beth.cohen at verizon.com) Date: Sun, 5 May 2019 19:06:46 +0000 Subject: [Edge-computing] [E] - Airship 2.0 support for edge In-Reply-To: <3c3a9f62c2ba48e6b9ae706ca7d7ffb5@ausx13mpc124.AMER.DELL.COM> References: <3c3a9f62c2ba48e6b9ae706ca7d7ffb5@ausx13mpc124.AMER.DELL.COM> Message-ID: Agreed. -------- Original Message -------- From: "David.Paterson at dell.com" > Date: Sat, May 4, 2019, 1:08 PM To: "edge-computing at lists.openstack.org" > Subject: [E] [Edge-computing] - Airship 2.0 support for edge I am in the last Airship PTG day and the Edge Working Group came up in the discussion. It would be really good to have someone from Airship attend one of our calls to go over Airship's plan for the edge in their next release. I would suggest sometime this summer would be good. Thanks! dp _______________________________________________ Edge-computing mailing list Edge-computing at lists.openstack.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_edge-2Dcomputing&d=DwIGaQ&c=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ&r=zSPYRgoBij7eDdloAJpktY4pyBQRGnvgrz_9Wy3fyA4&m=4OrN8vdz2ETDotcnsPKOB-C4fabyq2eFiWqyCy_ZTKs&s=pVm9ZIfUeJ9fAf-X4OGoNQ9VgG7mur2_AA1NgTZrXTw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From ildiko at openstack.org Tue May 7 14:04:11 2019 From: ildiko at openstack.org (Ildiko Vancsa) Date: Tue, 7 May 2019 16:04:11 +0200 Subject: [Edge-computing] Edge weekly call is on Message-ID: <2E126081-18FF-4602-9553-93A35E762057@openstack.org> Hi, I missed to update the meeting invite, but if any of you is around the edge call is on: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings We will have the Summit and PTG as main topics for this week and next week as well. Thanks, Ildikó From ildiko at openstack.org Mon May 13 09:08:17 2019 From: ildiko at openstack.org (Ildiko Vancsa) Date: Mon, 13 May 2019 11:08:17 +0200 Subject: [Edge-computing] Project EVE presentation on the Edge Group call tomorrow Message-ID: Hi, It is a friendly reminder that we are having our next weekly call tomorrow (Tuesday, May 14) at 7am PST / 1400 UTC where we will have Roman Shaposhnik from Zededa to talk about Project EVE. You can find the full meeting agenda and dial-in details on the wiki: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings Thanks and Best Regards, Ildikó From ildiko at openstack.org Tue May 14 14:03:40 2019 From: ildiko at openstack.org (Ildiko Vancsa) Date: Tue, 14 May 2019 16:03:40 +0200 Subject: [Edge-computing] Project EVE presentation on the Edge Group call tomorrow In-Reply-To: References: Message-ID: <08527552-34F9-4DAE-8A8B-EA737CA05DA5@openstack.org> Friendly reminder that the project EVE overview presentation is running now on the weekly call! :) Zoom link: https://zoom.us/j/879678938 Further meeting info: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings Thanks, Ildikó > On 2019. May 13., at 11:08, Ildiko Vancsa wrote: > > Hi, > > It is a friendly reminder that we are having our next weekly call tomorrow (Tuesday, May 14) at 7am PST / 1400 UTC where we will have Roman Shaposhnik from Zededa to talk about Project EVE. > > You can find the full meeting agenda and dial-in details on the wiki: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings > > Thanks and Best Regards, > Ildikó > > > > _______________________________________________ > Edge-computing mailing list > Edge-computing at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From ildiko at openstack.org Mon May 20 20:03:03 2019 From: ildiko at openstack.org (Ildiko Vancsa) Date: Mon, 20 May 2019 22:03:03 +0200 Subject: [Edge-computing] Use cases sub-group call Message-ID: <4624BFAF-F340-43FD-8538-005444378E7C@openstack.org> Hi, Friendly reminder that the next Use cases sub-group call is today according to the calendar: https://wiki.openstack.org/wiki/Edge_Computing_Group#Use_cases If you’re interested in joining the call details are up on the above wiki. The latest calendar invite file is available here: https://www.openstack.org/edge-computing/ Thanks, Ildikó From bdobreli at redhat.com Tue May 21 08:13:08 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 21 May 2019 10:13:08 +0200 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: References: Message-ID: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> [CC'ed edge-computing at lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a >   "take a node out of service" mechanism? There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > - would these use cases put additional constraints on how the >   feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have >   (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, >  Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > -- Best regards, Bogdan Dobrelya, Irc #bogdando From christopher.price at est.tech Tue May 21 08:26:25 2019 From: christopher.price at est.tech (Christopher Price) Date: Tue, 21 May 2019 08:26:25 +0000 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> References: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> Message-ID: <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) / Chris On 2019-05-21, 09:16, "Bogdan Dobrelya" wrote: [CC'ed edge-computing at lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism? There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Edge-computing mailing list Edge-computing at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From Arkady.Kanevsky at dell.com Tue May 21 12:55:06 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Tue, 21 May 2019 12:55:06 +0000 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> References: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> Message-ID: <09e4bfaa95404bcfba37ee63f6bf1189@AUSX13MPS304.AMER.DELL.COM> Let's dig deeper into requirements. I see three distinct use cases: 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced". 3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose. Thanks, Arkady -----Original Message----- From: Christopher Price Sent: Tuesday, May 21, 2019 3:26 AM To: Bogdan Dobrelya; openstack-discuss at lists.openstack.org; edge-computing at lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production [EXTERNAL EMAIL] I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) / Chris On 2019-05-21, 09:16, "Bogdan Dobrelya" wrote: [CC'ed edge-computing at lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism? There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Edge-computing mailing list Edge-computing at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing _______________________________________________ Edge-computing mailing list Edge-computing at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From Arkady.Kanevsky at dell.com Tue May 21 19:00:41 2019 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Tue, 21 May 2019 19:00:41 +0000 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: References: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> <09e4bfaa95404bcfba37ee63f6bf1189@AUSX13MPS304.AMER.DELL.COM> Message-ID: Inline response -----Original Message----- From: Julia Kreger Sent: Tuesday, May 21, 2019 12:33 PM To: Kanevsky, Arkady Cc: Christopher Price; Bogdan Dobrelya; openstack-discuss; edge-computing at lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production [EXTERNAL EMAIL] On Tue, May 21, 2019 at 5:55 AM wrote: > > Let's dig deeper into requirements. > I see three distinct use cases: > 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. > 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced". Or troubleshooted by a human, and could be returned to a non-failure state. I think largely the only way we as developers could support that is allow for hook scripts to be called upon entering/exiting such a state. That being said, At least from what Beth was saying at the PTG, this seems to be one of the most important states. > 3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose. Do you mean "unprovision" a node and move it through cleaning? I'm not sure I understand what your trying to get across. There is a case where a node would have been moved to a "failed" state, and could be "unprovisioned". If we reach the point where we are able to unprovision, it seems like we might be able to re-deploy, so maybe the option is to automatically move to state which is kind of like bucket for broken nodes? AK: Before node is removed from Ironic some level of cleanup is expected. Especially if node is to be reused as Chris stated. I assume that that cleanup will be done by Ironic. What you do with the node after it is outside of Ironic is out of scope. > > Thanks, > Arkady > > -----Original Message----- > From: Christopher Price > Sent: Tuesday, May 21, 2019 3:26 AM > To: Bogdan Dobrelya; openstack-discuss at lists.openstack.org; > edge-computing at lists.openstack.org > Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of > production > > > [EXTERNAL EMAIL] > > I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". > > Further, at least in my imagination, one might be reallocating > hardware from one Ironic domain to another which may have implications > on how we best bring a new node online. (or not, I'm no expert) end dubious thought stream> > > / Chris > > On 2019-05-21, 09:16, "Bogdan Dobrelya" wrote: > > [CC'ed edge-computing at lists.openstack.org] > > On 20.05.2019 18:33, Arne Wiebalck wrote: > > Dear all, > > > > One of the discussions at the PTG in Denver raised the need for > > a mechanism to take ironic nodes out of production (a task for > > which the currently available 'maintenance' flag does not seem > > appropriate [1]). > > > > The use case there is an unhealthy physical node in state 'active', > > i.e. associated with an instance. The request is then to enable an > > admin to mark such a node as 'faulty' or 'in quarantine' with the > > aim of not returning the node to the pool of available nodes once > > the hosted instance is deleted. > > > > A very similar use case which came up independently is node > > retirement: it should be possible to mark nodes ('active' or not) > > as being 'up for retirement' to prepare the eventual removal from > > ironic. As in the example above, ('active') nodes marked this way > > should not become eligible for instance scheduling again, but > > automatic cleaning, for instance, should still be possible. > > > > In an effort to cover these use cases by a more general > > "quarantine/retirement" feature: > > > > - are there additional use cases which could profit from such a > > "take a node out of service" mechanism? > > There are security related examples described in the Edge Security > Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the > chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used > to force a shutdown of an edge node". So a node may be taken out of > service as an indicator of a particular condition of edge hardware. > > [0] > https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq > [1] > https://github.com/kubernetes/community/tree/master/wg-iot-edge > > > > > - would these use cases put additional constraints on how the > > feature should look like (e.g.: "should not prevent cleaning") > > > > - are there other characteristics such a feature should have > > (e.g.: "finding these nodes should be supported by the cli") > > > > Let me know if you have any thoughts on this. > > > > Cheers, > > Arne > > > > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > _______________________________________________ > Edge-computing mailing list > Edge-computing at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing > > > _______________________________________________ > Edge-computing mailing list > Edge-computing at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From gergely.csatari at nokia.com Tue May 28 14:51:01 2019 From: gergely.csatari at nokia.com (Csatari, Gergely (Nokia - HU/Budapest)) Date: Tue, 28 May 2019 14:51:01 +0000 Subject: [Edge-computing] Minutes of todays meeting Message-ID: Are here: http://eavesdrop.openstack.org/meetings/weekly_edge_computing_group_call/2019/weekly_edge_computing_group_call.2019-05-28-14.03.html Br, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gergely.csatari at nokia.com Tue May 28 15:05:43 2019 From: gergely.csatari at nokia.com (Csatari, Gergely (Nokia - HU/Budapest)) Date: Tue, 28 May 2019 15:05:43 +0000 Subject: [Edge-computing] Lab requirements collection Message-ID: Hi, During the PTG sessions we agreed that we will try to build and verify the minimal reference architectures (formally known as MVP architectures). We also discovered, that we might need to have some hardware for this. Some companies were kind enough to promise some hardware resource for us if we can define the "lab requirements" for these. There was someone in the room who volunteered for this task, but unfortunatelly I forgot the name. Can someone please remind me who was the kind person to volunteer for this task? Thanks, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gergely.csatari at nokia.com Tue May 28 15:31:12 2019 From: gergely.csatari at nokia.com (Csatari, Gergely (Nokia - HU/Budapest)) Date: Tue, 28 May 2019 15:31:12 +0000 Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions Message-ID: Hi, There was a one hour discussion with Julia from Ironic with the Edge Computing Group [1]. In this mail I try to conclude what was discussed and ask some clarification questions. Current Ironic uses DHCP for hardware provisioning, therefore it requires DHCP relay enabled on the whole path to the edge cloud instances. There are two alternatives to solve this: 1) Virtual media support [2] where the ip configuration is embedded into a virtual image what is booted via the board management interface 2) Redfish support, however the state and support of redfish for host management is not clear. Is there already a specification has been added for redfish support? Upgrade of edge cloud infrastructures: - Firmware upgrade should be supported by Ironic. Is this something on its way or is this a new need? - Operating system and infra update can be solved using Fenix [3], however handling several edge cloud instances from a central location needs new features. Handling of failed servers: - A monitoring system or the operator should provide the input to mark a server as failed - Ironic can power down the failed servers and have the definition of a maintenance state - Discussed in [4] Additional ideas what we half discussed: - Running Ironic containers in a switch with the images hosted by Swift somewhere else. Are there any concerns about this idea? Any missing features from somewhere? [1]: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 [2]: https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/L3-based-deployment.html [3]: https://wiki.openstack.org/wiki/Fenix [4]: http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html Br, Gerg0 From gergely.csatari at nokia.com Tue May 28 15:36:24 2019 From: gergely.csatari at nokia.com (Csatari, Gergely (Nokia - HU/Budapest)) Date: Tue, 28 May 2019 15:36:24 +0000 Subject: [Edge-computing] [edge][neutron]: PTG conclusions Message-ID: Hi, According to my best memories we agreed on the PTG, that Ian will propose a neutron specification for "Segment ranges in tenant networks configureble by a tenant using an API extension" [1]. Do I remember correctly? [1]: https://photos.app.goo.gl/hGzBA2Nzu2dfG3if8 Thanks, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsneddon at redhat.com Tue May 28 17:23:40 2019 From: dsneddon at redhat.com (Dan Sneddon) Date: Tue, 28 May 2019 10:23:40 -0700 Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions In-Reply-To: References: Message-ID: On Tue, May 28, 2019 at 8:33 AM Csatari, Gergely (Nokia - HU/Budapest) < gergely.csatari at nokia.com> wrote: > Hi, > > There was a one hour discussion with Julia from Ironic with the Edge > Computing Group [1]. In this mail I try to conclude what was discussed and > ask some clarification questions. > > Current Ironic uses DHCP for hardware provisioning, therefore it requires > DHCP relay enabled on the whole path to the edge cloud instances. There are > two alternatives to solve this: > 1) Virtual media support [2] where the ip configuration is embedded into a > virtual image what is booted via the board management interface > 2) Redfish support, however the state and support of redfish for host > management is not clear. Is there already a specification has been added > for redfish support? > > Upgrade of edge cloud infrastructures: > - Firmware upgrade should be supported by Ironic. Is this something on > its way or is this a new need? > - Operating system and infra update can be solved using Fenix [3], > however handling several edge cloud instances from a central location needs > new features. > > Handling of failed servers: > - A monitoring system or the operator should provide the input to mark a > server as failed > - Ironic can power down the failed servers and have the definition of a > maintenance state > - Discussed in [4] > > Additional ideas what we half discussed: > - Running Ironic containers in a switch with the images hosted by Swift > somewhere else. Are there any concerns about this idea? Any missing > features from somewhere? > > [1]: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 > [2]: > https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/L3-based-deployment.html > [3]: https://wiki.openstack.org/wiki/Fenix > [4]: > http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html > > Br, > Gerg0 > I have researched putting Ironic into a container on a switch. However, Ironic has historically required DHCP services, which would also be running inside the same container. In order to respond to DHCP requests, the container must be able to listen on the network for DHCP requests. Not all switches will allow a container which is attached directly to the VLAN interfaces and can receive traffic to the broadcast MAC address. If you have a switch which allows you to listen to MAC broadcasts inside a container then this should be feasible. Also note that Ironic is not monolithic. There are separate functions for API (which would not live on the switch), Ironic Inspector, and Ironic Conductor. When using DHCP for Ironic Inspection, Ironic provides DHCP using its own dnsmasq process. When using DHCP for deploying a node, the DHCP services are provided by Neutron. It would be best to avoid DHCP in this scenario so that neither inspection nor deployment required a DHCP server. However, as you note we are working on booting without DHCP, which will make it much easier to host an Ironic service inside a container on a switch or router. Without DHCP, the Ironic container must still be reachable from the outside, but only by its IP address. -- Dan Sneddon | Senior Principal Software Engineer dsneddon at redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Paterson at dell.com Thu May 30 21:50:47 2019 From: David.Paterson at dell.com (David.Paterson at dell.com) Date: Thu, 30 May 2019 21:50:47 +0000 Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions In-Reply-To: References: Message-ID: <9734b5a6cf23459890adae6ac4bab7c7@AUSX13MPC106.AMER.DELL.COM> There is an ironic redfish driver but it's still a WIP. Right now, I believe you can control power and manage boot mode (pxe, media...) with Redfish driver but implementing BIOS and RAID support is still ongoing. Re: firmware, there is an ironic spec for Dell EMC hardware here: https://github.com/openstack/ironic-specs/blob/master/specs/approved/drac-firmware-update-spec.rst Thanks, dp -----Original Message----- From: Csatari, Gergely (Nokia - HU/Budapest) Sent: Tuesday, May 28, 2019 11:31 AM To: edge-computing at lists.openstack.org; openstack-discuss at lists.openstack.org Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions [EXTERNAL EMAIL] Hi, There was a one hour discussion with Julia from Ironic with the Edge Computing Group [1]. In this mail I try to conclude what was discussed and ask some clarification questions. Current Ironic uses DHCP for hardware provisioning, therefore it requires DHCP relay enabled on the whole path to the edge cloud instances. There are two alternatives to solve this: 1) Virtual media support [2] where the ip configuration is embedded into a virtual image what is booted via the board management interface 2) Redfish support, however the state and support of redfish for host management is not clear. Is there already a specification has been added for redfish support? Upgrade of edge cloud infrastructures: - Firmware upgrade should be supported by Ironic. Is this something on its way or is this a new need? - Operating system and infra update can be solved using Fenix [3], however handling several edge cloud instances from a central location needs new features. Handling of failed servers: - A monitoring system or the operator should provide the input to mark a server as failed - Ironic can power down the failed servers and have the definition of a maintenance state - Discussed in [4] Additional ideas what we half discussed: - Running Ironic containers in a switch with the images hosted by Swift somewhere else. Are there any concerns about this idea? Any missing features from somewhere? [1]: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 [2]: https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/L3-based-deployment.html [3]: https://wiki.openstack.org/wiki/Fenix [4]: http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html Br, Gerg0 _______________________________________________ Edge-computing mailing list Edge-computing at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From ANDREAS.FLORATH at TELEKOM.DE Fri May 31 09:54:06 2019 From: ANDREAS.FLORATH at TELEKOM.DE (ANDREAS.FLORATH at TELEKOM.DE) Date: Fri, 31 May 2019 09:54:06 +0000 Subject: [Edge-computing] Lab requirements collection In-Reply-To: References: Message-ID: Hello! We are also waiting that somebody asks for hardware ;-) IMHO Greg volunteered to collect requirements: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 > ACTION(gwaines): Put together requirements and start collecting an inventory of hardware that can be used for a testing lab > Requirements for both distributed and centralized MVPs > Greg Waines, greg.waines at windriver.com, GregWaines, in person Kind regards Andre ________________________________ From: Csatari, Gergely (Nokia - HU/Budapest) Sent: Tuesday, May 28, 2019 17:05 To: edge-computing at lists.openstack.org Subject: [Edge-computing] Lab requirements collection Hi, During the PTG sessions we agreed that we will try to build and verify the minimal reference architectures (formally known as MVP architectures). We also discovered, that we might need to have some hardware for this. Some companies were kind enough to promise some hardware resource for us if we can define the “lab requirements” for these. There was someone in the room who volunteered for this task, but unfortunatelly I forgot the name. Can someone please remind me who was the kind person to volunteer for this task? Thanks, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Waines at windriver.com Fri May 31 10:47:16 2019 From: Greg.Waines at windriver.com (Waines, Greg) Date: Fri, 31 May 2019 10:47:16 +0000 Subject: [Edge-computing] Lab requirements collection Message-ID: <9F42D42F-8BCF-4437-B026-CD102212AB33@windriver.com> Agreed I did volunteer. I can put something together for next week’s meeting. Greg. From: "ANDREAS.FLORATH at TELEKOM.DE" Date: Friday, May 31, 2019 at 5:58 AM To: "gergely.csatari at nokia.com" , "edge-computing at lists.openstack.org" Cc: "matthias.britsch at telekom.de" Subject: Re: [Edge-computing] Lab requirements collection Hello! We are also waiting that somebody asks for hardware ;-) IMHO Greg volunteered to collect requirements: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 > ACTION(gwaines): Put together requirements and start collecting an inventory of hardware that can be used for a testing lab > Requirements for both distributed and centralized MVPs > Greg Waines, greg.waines at windriver.com, GregWaines, in person Kind regards Andre ________________________________ From: Csatari, Gergely (Nokia - HU/Budapest) Sent: Tuesday, May 28, 2019 17:05 To: edge-computing at lists.openstack.org Subject: [Edge-computing] Lab requirements collection Hi, During the PTG sessions we agreed that we will try to build and verify the minimal reference architectures (formally known as MVP architectures). We also discovered, that we might need to have some hardware for this. Some companies were kind enough to promise some hardware resource for us if we can define the “lab requirements” for these. There was someone in the room who volunteered for this task, but unfortunatelly I forgot the name. Can someone please remind me who was the kind person to volunteer for this task? Thanks, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gergely.csatari at nokia.com Fri May 31 10:59:32 2019 From: gergely.csatari at nokia.com (Csatari, Gergely (Nokia - HU/Budapest)) Date: Fri, 31 May 2019 10:59:32 +0000 Subject: [Edge-computing] Lab requirements collection In-Reply-To: <9F42D42F-8BCF-4437-B026-CD102212AB33@windriver.com> References: <9F42D42F-8BCF-4437-B026-CD102212AB33@windriver.com> Message-ID: Hi, Great. Thanks. Br, Gerg0 From: Waines, Greg Sent: Friday, May 31, 2019 12:47 PM To: ANDREAS.FLORATH at TELEKOM.DE; Csatari, Gergely (Nokia - HU/Budapest) ; edge-computing at lists.openstack.org Cc: matthias.britsch at telekom.de Subject: Re: [Edge-computing] Lab requirements collection Agreed I did volunteer. I can put something together for next week’s meeting. Greg. From: "ANDREAS.FLORATH at TELEKOM.DE" > Date: Friday, May 31, 2019 at 5:58 AM To: "gergely.csatari at nokia.com" >, "edge-computing at lists.openstack.org" > Cc: "matthias.britsch at telekom.de" > Subject: Re: [Edge-computing] Lab requirements collection Hello! We are also waiting that somebody asks for hardware ;-) IMHO Greg volunteered to collect requirements: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 > ACTION(gwaines): Put together requirements and start collecting an inventory of hardware that can be used for a testing lab > Requirements for both distributed and centralized MVPs > Greg Waines, greg.waines at windriver.com, GregWaines, in person Kind regards Andre ________________________________ From: Csatari, Gergely (Nokia - HU/Budapest) > Sent: Tuesday, May 28, 2019 17:05 To: edge-computing at lists.openstack.org Subject: [Edge-computing] Lab requirements collection Hi, During the PTG sessions we agreed that we will try to build and verify the minimal reference architectures (formally known as MVP architectures). We also discovered, that we might need to have some hardware for this. Some companies were kind enough to promise some hardware resource for us if we can define the “lab requirements” for these. There was someone in the room who volunteered for this task, but unfortunatelly I forgot the name. Can someone please remind me who was the kind person to volunteer for this task? Thanks, Gerg0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Tue May 21 17:28:12 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 21 May 2019 17:28:12 -0000 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> References: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> Message-ID: On Tue, May 21, 2019 at 9:34 AM Christopher Price wrote: > > I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". > > Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) You raise a really good point and we've had some past discussions from a standpoint of leasing hardware between clusters. One was ultimately to allow for a federated model where ironic could talk to ironic, however... that wasn't a very well received idea because it would mean ironic could become aware of other ironics... And soon ironic takes over the rest of the world. > > / Chris > [trim] From juliaashleykreger at gmail.com Tue May 21 17:33:22 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 21 May 2019 17:33:22 -0000 Subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production In-Reply-To: <09e4bfaa95404bcfba37ee63f6bf1189@AUSX13MPS304.AMER.DELL.COM> References: <08cb8294-04c8-e4ba-78c0-dec00f87156a@redhat.com> <6A205BFA-881E-4D2D-9A7D-E35935F6631B@est.tech> <09e4bfaa95404bcfba37ee63f6bf1189@AUSX13MPS304.AMER.DELL.COM> Message-ID: On Tue, May 21, 2019 at 5:55 AM wrote: > > Let's dig deeper into requirements. > I see three distinct use cases: > 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. > 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced". Or troubleshooted by a human, and could be returned to a non-failure state. I think largely the only way we as developers could support that is allow for hook scripts to be called upon entering/exiting such a state. That being said, At least from what Beth was saying at the PTG, this seems to be one of the most important states. > 3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose. Do you mean "unprovision" a node and move it through cleaning? I'm not sure I understand what your trying to get across. There is a case where a node would have been moved to a "failed" state, and could be "unprovisioned". If we reach the point where we are able to unprovision, it seems like we might be able to re-deploy, so maybe the option is to automatically move to state which is kind of like bucket for broken nodes? > > Thanks, > Arkady > > -----Original Message----- > From: Christopher Price > Sent: Tuesday, May 21, 2019 3:26 AM > To: Bogdan Dobrelya; openstack-discuss at lists.openstack.org; edge-computing at lists.openstack.org > Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production > > > [EXTERNAL EMAIL] > > I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". > > Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) > > / Chris > > On 2019-05-21, 09:16, "Bogdan Dobrelya" wrote: > > [CC'ed edge-computing at lists.openstack.org] > > On 20.05.2019 18:33, Arne Wiebalck wrote: > > Dear all, > > > > One of the discussions at the PTG in Denver raised the need for > > a mechanism to take ironic nodes out of production (a task for > > which the currently available 'maintenance' flag does not seem > > appropriate [1]). > > > > The use case there is an unhealthy physical node in state 'active', > > i.e. associated with an instance. The request is then to enable an > > admin to mark such a node as 'faulty' or 'in quarantine' with the > > aim of not returning the node to the pool of available nodes once > > the hosted instance is deleted. > > > > A very similar use case which came up independently is node > > retirement: it should be possible to mark nodes ('active' or not) > > as being 'up for retirement' to prepare the eventual removal from > > ironic. As in the example above, ('active') nodes marked this way > > should not become eligible for instance scheduling again, but > > automatic cleaning, for instance, should still be possible. > > > > In an effort to cover these use cases by a more general > > "quarantine/retirement" feature: > > > > - are there additional use cases which could profit from such a > > "take a node out of service" mechanism? > > There are security related examples described in the Edge Security > Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the > chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used > to force a shutdown of an edge node". So a node may be taken out of > service as an indicator of a particular condition of edge hardware. > > [0] > https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq > [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > > > > - would these use cases put additional constraints on how the > > feature should look like (e.g.: "should not prevent cleaning") > > > > - are there other characteristics such a feature should have > > (e.g.: "finding these nodes should be supported by the cli") > > > > Let me know if you have any thoughts on this. > > > > Cheers, > > Arne > > > > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > _______________________________________________ > Edge-computing mailing list > Edge-computing at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing > > > _______________________________________________ > Edge-computing mailing list > Edge-computing at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing From miguel at mlavalle.com Tue May 28 16:21:17 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Tue, 28 May 2019 16:21:17 -0000 Subject: [Edge-computing] [edge][neutron]: PTG conclusions In-Reply-To: References: Message-ID: HI, Yes, that is my recollection as well Regards On Tue, May 28, 2019 at 10:36 AM Csatari, Gergely (Nokia - HU/Budapest) < gergely.csatari at nokia.com> wrote: > Hi, > > > > According to my best memories we agreed on the PTG, that Ian will propose > a neutron specification for “Segment ranges in tenant networks > configureble by a tenant using an API extension” [1]. > > > > Do I remember correctly? > > > > [1]: https://photos.app.goo.gl/hGzBA2Nzu2dfG3if8 > > > > Thanks, > > Gerg0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Fri May 31 16:10:06 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Fri, 31 May 2019 16:10:06 -0000 Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions In-Reply-To: References: Message-ID: On Tue, May 28, 2019 at 8:31 AM Csatari, Gergely (Nokia - HU/Budapest) wrote: > > Hi, > > There was a one hour discussion with Julia from Ironic with the Edge Computing Group [1]. In this mail I try to conclude what was discussed and ask some clarification questions. > > Current Ironic uses DHCP for hardware provisioning, therefore it requires DHCP relay enabled on the whole path to the edge cloud instances. There are two alternatives to solve this: > 1) Virtual media support [2] where the ip configuration is embedded into a virtual image what is booted via the board management interface > 2) Redfish support, however the state and support of redfish for host management is not clear. Is there already a specification has been added for redfish support? I don't quite remember discussing redfish in this regard. Is there something your expecting redfish to be able to provide in this regard? Redfish virtual media is something we're working on. > > Upgrade of edge cloud infrastructures: > - Firmware upgrade should be supported by Ironic. Is this something on its way or is this a new need? The ilo hardware type supports OOB firmware upgrades. iRMC has code up in review, and Dell has a posted specification. Work is going into the redfish libraries to help support this so we will likely see something for redfish at some point, but it may also be that because of vendor differences, that we may not be able to provide a generic surface through which to provide firmware update capabilities through the generic hardware type. > - Operating system and infra update can be solved using Fenix [3], however handling several edge cloud instances from a central location needs new features. > > Handling of failed servers: > - A monitoring system or the operator should provide the input to mark a server as failed > - Ironic can power down the failed servers and have the definition of a maintenance state > - Discussed in [4] > > Additional ideas what we half discussed: > - Running Ironic containers in a switch with the images hosted by Swift somewhere else. Are there any concerns about this idea? Any missing features from somewhere? > > [1]: https://etherpad.openstack.org/p/edge-wg-ptg-preparation-denver-2019 > [2]: https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/L3-based-deployment.html > [3]: https://wiki.openstack.org/wiki/Fenix > [4]: http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html > > Br, > Gerg0 > From juliaashleykreger at gmail.com Fri May 31 16:18:40 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Fri, 31 May 2019 16:18:40 -0000 Subject: [Edge-computing] [ironic][edge]: Recap of PTG discussions In-Reply-To: <9734b5a6cf23459890adae6ac4bab7c7@AUSX13MPC106.AMER.DELL.COM> References: <9734b5a6cf23459890adae6ac4bab7c7@AUSX13MPC106.AMER.DELL.COM> Message-ID: On Thu, May 30, 2019 at 3:37 PM wrote: > > There is an ironic redfish driver but it's still a WIP. Right now, I believe you can control power and manage boot mode (pxe, media...) with Redfish driver but implementing BIOS and RAID support is still ongoing. Power, boot mode, boot device (pxe, disk), inspection, and bios settings are present in Ironic today for the redfish hardware type. Sensor data collection, RAID, virtual media, and firmware management are hopefully things we evolve in the next cycle or two. > > Re: firmware, there is an ironic spec for Dell EMC hardware here: https://github.com/openstack/ironic-specs/blob/master/specs/approved/drac-firmware-update-spec.rst > > Thanks, > dp