[Openinfralabs] Open Infra Labs/Operate Monitoring Architecture Discussion
Daitzman, Michael S
msd at bu.edu
Tue May 12 20:43:19 UTC 2020
9 AM to 10AM EST on Wednesday - tomorrow was the time that the most people could attend. Looking forward to seeing you all!
Join Zoom Meeting
1. Frequency of Requirements Discussion
2. Discuss https://gitlab.com/open-infrastructure-labs/nerc-architecture/-/issues/4
3. Notes and Agenda here: https://etherpad.opendev.org/p/May_13_2020Monitoring
4. Zoom notes to be uploaded
5. Identify Volunteers/Victims to convert notes to gitlab stories ☺.
See you then!
Michael and Bill
Michael Daitzman (He/Him)
msd at bu.edu
From: "Daitzman, Michael S" <msd at bu.edu>
Date: Monday, May 11, 2020 at 4:29 PM
To: Justin Riley <justinriley at g.harvard.edu>, "Abaris, Augustine" <augustin at bu.edu>, Jacob Chappel <jacob.chappell at uky.edu>, Bill Burns <bburns at redhat.com>, Chris Long <chlong at redhat.com>, Jeremy Stanley <fungi at yuggoth.org>, Mohammed Naser <mnaser at vexxhost.com>, "Ansari, Mohhamad Naved" <naved001 at bu.edu>, Lars Kellogg-Stedman <lars at redhat.com>, "openinfralabs at lists.opendev.org" <openinfralabs at lists.opendev.org>, "rrackow at redhat.com" <rrackow at redhat.com>, Stig Telfer <stig at stackhpc.com>, "operate-first at redhat.com" <operate-first at redhat.com>
Subject: Open Infra Labs/Operate Monitoring Architecture Discussion
Initially we planned to do this via IRC but after more thought we decided to start the bell rollign with a zoom call which we wil share publicly with notes.
The seeds for the discusssion are in this Epic: https://gitlab.com/open-infrastructure-labs/nerc-architecture/-/issues/4
what are the things desired from a monitoring platform?
what kind of information is expected (is just metrics enough? what other data types do you require?)
what kind of presentation layer is expected? do you just need a dashboard that shows graphs, or do you need more detailed analysis, and what might that look like if you do?
how can the operational knowledge be conveyed? are there a default set of rules that can be provided out of the box as it were? what defines a "healthy cloud"? what defines a cloud that is degraded? what is the remediation path, such as alerting a human, or are there systems that can fix the cloud automatically?
are closed-loop systems different from monitoring? are they different planes? are they similar but have different scopes?
what is the expected resolution of metrics? sub-second? 5 seconds? 30 seconds? minutes?
Feel free to add topics you want to be discussed in comments in the Epic.
To help us pick a time please let us know which of the slots in this doodle poll https://doodle.com/poll/qrztwefpvebd53gv
Bill and I will begin scheduling these more regularly and will do an doodle poll to find timeslots for recurring meetings.
If there are people you feel should attend who may not be on the associated mailing lists please forward this note.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openinfralabs