[service-announce] October 20 Gerrit Outage Update

Thu Oct 22 06:53:11 UTC 2020

On Tue, Oct 20, 2020 at 10:50:42PM -0400, Mohammed Naser wrote:
> - There is no need for us to have anyone with admin powers to Gerrit
> at all times,

Totally agree; I mentioned in the remediation we should look at the
way we handle Gerrit administrators.  I would say it's mostly a
convenience for adding initial members to groups, and the occasional
case where we need to force-merge something.

I think we should discuss as part of the upcoming PTG

> - If the above is not possible, anyone who is part of this group
> should have 2FA enabled inside Launchpad's SSO.

I agree with this too.  It's not that obvious how to enable this, but
it can be done via [2].  I would probably just recommend everyone does
it.

We know longer term we want to move away from Launchpad only as well.

> Thank you for this.  I'd also like to raise the question of moving
> forward, how to be able to track these things.  We had a user that
> had full root access to our Gerrit installation for ~2 weeks without
> our knowledge entirely, only uncovered when they did something
> (that, in the grand scheme of things, was relatively trivial,
> compared to what could have happened).

Yeah, not to go into great detail but this wasn't able to be
"upgraded" to either the on-disk repos or, importantly, the logs.  And
it's not just luck that ensures such separation :)

The major (potential for) escalation here happened because our version
of gerrit keeps plain-text HTTP API keys.  So both an example of
defence-in-depth success and failure all at once.  We are well on the
way to replacing that, so we are not sweeping that one under the rug.

There's a few other thoughts I have, but TBH I'm hesitant to start
broadcasting them in public mails.  I am of course for transparency
and participation -- I mean the entire infra is driven by completely
public git-ops CI and CD; can anyone else say that?!

What we don't have is a formalised way for security discussions.  I
think we should

a) more clearly describe how to responsibly communicate infra security
   issues; I don't think we have anything like that documented.

b) start a closed list where we can have free-form discussions about
   security issues.  I think we have a track record of transparency
   that would ensure that doesn't turn into a "star chamber" and
   anyone who was interested with a modicum of trust from the project
   could join (e.g. our cloud providers and others who are clearly
   invested in the system).

   This could also be part of the responsible disclosure, which would
   be helpful so that people don't have to sign up for full accounts
   to post storyboard issues, etc. to alert us to issues.

-i

[1] https://etherpad.opendev.org/p/opendev-ptg-planning-oct-2020
[2] https://help.ubuntu.com/community/SSO/FAQs/2FA