[Openinfralabs] Open Infra Labs/Operate Monitoring Architecture Discussion

Daitzman, Michael S msd at bu.edu
Mon May 11 20:29:53 UTC 2020

Initially we planned to do this via IRC but after more thought we decided to start the bell rollign with a zoom call which we wil share publicly with notes.

The seeds for the discusssion are in this Epic:  https://gitlab.com/open-infrastructure-labs/nerc-architecture/-/issues/4

  what are the things desired from a monitoring platform?
  what kind of information is expected (is just metrics enough? what other data types do you require?)
  what kind of presentation layer is expected? do you just need a dashboard that shows graphs, or do you need more detailed analysis, and what might that look like if you do?
  how can the operational knowledge be conveyed? are there a default set of rules that can be provided out of the box as it were? what defines a "healthy cloud"? what defines a cloud that is degraded? what is the remediation path, such as alerting a human, or are there systems that can fix the cloud automatically?
  are closed-loop systems different from monitoring? are they different planes? are they similar but have different scopes?
  what is the expected resolution of metrics? sub-second? 5 seconds? 30 seconds? minutes?

Feel free to add topics you want to be discussed in comments in the Epic.

To help us pick a time please let us know which of the slots in this doodle poll https://doodle.com/poll/qrztwefpvebd53gv

Bill and I will begin scheduling these more regularly and will do an doodle poll to find timeslots for recurring meetings.

If there are people you feel should attend who may not be on the associated mailing lists please forward this note.

