[Openinfralabs] What operational data and data streams should be gathered?
mhild at redhat.com
Thu Apr 30 16:19:22 UTC 2020
Wow, that's some good open data.
Is there a single entry page, from which I could've discovered that myself?
And is that all related to a single OpenStack cluster that's being run by
the community or is it build related infrastructure?
On Thu, Apr 30, 2020 at 6:02 PM Jeremy Stanley <fungi at yuggoth.org> wrote:
> On 2020-04-16 12:52:28 +0200 (+0200), Marcel Hild wrote:
> > I would start with monitoring data, i.e. all prometheus metrics
> > being collected.
> I agree that's a good route to take initially. Using OpenDev as an
> example, while obviously not the same systems and not necessarily
> constrained by the same risks, we try to keep all non-sensitive
> monitoring and trending data public, like:
> > But those in itself are not really useful without a picture of the
> > infrastructure and access to the infrastructure.
> One thing we've done in that vein is to try to keep all our
> deployment and systems management notes (runbooks or whatever you
> like to term them) on a public site:
> We also try to publish logs for some services of interest if we can
> be certain they won't leak things like PII or credentials:
> https://nb01.openstack.org/ (sorry, self-signed cert on that one)
> > And you might need access to ticket systems, since you want to
> > correlate the metrics with incidents.
> Absolutely, though relying on a public incident/defect tracker does
> also mean you need to train your users to leverage privacy features
> for anything which may contain sensitive data, or to avoid putting
> such information in tickets and instead forwarding it via more
> confidential channels. It also means getting better about redacting
> certain classes of data either automatically or on request (and then
> dealing with fallout for the latter, treating that material as
> though it has leaked even if you've masked or deleted it after the
> fact to limit the damage).
> > My point is, we need to open up _all_ aspects of operations to
> > make it actually useful, otherwise it'll be just a pile of data.
> For more general day-to-day operations, as well as scheduled
> maintenance or similar change activity, we've found it's useful to
> keep discussion on publicly-archived mailing lists and in publicly
> logged IRC channels so that they're easy to refer back to later.
> We're also using an IRC bot which our sysadmins can command to log
> important status information to a Web page:
> That allows us to easily publish notes about what we're doing which
> might have public impact, and we also tie it into a notification
> system which can echo messages in subscribed IRC channels or
> temporarily update their channel topics to reflect important service
> status information.
> Another major choice we've made is to perform installation and
> life-cycle management of a lot of our services through continuous
> deployment jobs in a public-facing CI/CD system:
> It does require careful thought, however, to make sure you're not
> exposing anything which could compromise the integrity or security
> of those systems.
> Jeremy Stanley
> Openinfralabs mailing list
> Openinfralabs at lists.opendev.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openinfralabs