[service-announce] October 20 Gerrit Outage Update

Wed Oct 21 13:06:22 UTC 2020

On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote:
[...]
> So, just to be clear, someone who had root access to our Gerrit
> installation had their account compromised which resulted in this
> (and not something that occurred as a by-product of some other
> service -- say storyboard -- leaking some sort of information?)

Yes, just to be clear it was the Launchpad/UbuntuOne SSO ID which
was compromised, the attacker then used that ID to log into the
Gerrit service. Those OpenIDs aren't trusted to authenticate SSH
into our servers. That account was then used to convey
Administrators group membership to another new account which the
attacker used to probe database records and also add itself to a
review group and approve a change (which was spotted and did not
merge). They also proposed a change to one project's configuration,
which couldn't merge but if it had would have been subsequently
overwritten by our project management.

> I see two issues in this at the moment:
> 
> - There is no need for us to have anyone with admin powers to
> Gerrit at all times, we've done enough automation to sustain us
> and a manual 'circuit breaker' of adding a user *IF* necessary
> should be put in place.

Yes, we've discussed this already in the past. Our use of OpenID
makes it harder to switch between different Gerrit accounts with the
WebUI (though maybe less so now that browser containers are a
thing). But also, alternative accounts with no OpenIDs at all could
be used to perform routine administrative tasks like adding initial
users to new groups. It's certainly looking like a compelling
option.

> - If the above is not possible, anyone who is part of this group
> should have 2FA enabled inside Launchpad's SSO.

Or we switch to an SSO solution with broader 2FA support, also under
discussion (the 2FA on UbuntuOne SSO is by request, with a sizeable
backlog of folks wanting to be added, and has been in beta for 6
years).

> I would very much prefer the first option rather than the second
> one.

I concur, for what it's worth.

> If it was an individual's account that was accessed and not a
> system account, have we audited that there are not other things
> that might have been accessed such as resources relating to Zuul,
> other systems and potentially rotating/auditing all our
> infrastructure?
[...]

The only systems of ours that OpenID had access to were Gerrit,
StoryBoard and MediaWiki. Obviously Gerrit was our primary concern,
though we've been looking through the other two in case we need to
clean up or reset anything in them.

For that account to alter our automation it would need to have done
so through merged changes in Gerrit, and the team has been reviewing
recent systems configuration changes for any impersonated suspicious
alterations, just as we recommend all teams do for their changes
since the first of the month.

> I saw this artifact, I have no idea if it was put into consideration, but,
> food for thought:
> 
> https://review.opendev.org/#/c/758881/
[...]

Yes, that's how Gerrit normally expects project configurations to be
altered (through change proposal, review and approval). For a
variety of reasons we don't rely on those, but Gerrit allows any
user to propose them.

> Thank you for this.  I'd also like to raise the question of moving
> forward, how to be able to track these things.  We had a user that
> had full root access to our Gerrit installation for ~2 weeks
> without our knowledge entirely, only uncovered when they did
> something (that, in the grand scheme of things, was relatively
> trivial, compared to what could have happened).
> 
> What can we do to set up the necessary infrastructure to ensure
> that these things are monitored.  OpenDev is considered to be
> critical infrastructure for this entire community and there's not
> much that an outsider can do other than the 'keyholders' for the
> resources.
> 
> We've historically refused to have any monitoring and now things
> like this have slipped up, I'm just worried that we have a big
> looming thing coming up ahead of us that will catch us off guard
> and we'll be completely unprepared for it...

I understand the desire, but what monitoring solution do you
recommend which would identify when an SSO OpenID account isn't
being operated by its rightful owner?

I do think if Gerrit had E-mail notifications to group owners any
time group membership was altered, that would have helped us spot
the secondary escalation (and it's something we'll look into finding
out if the newer Gerrit we've been working on moving to supports),
but that was a couple of weeks after the initial intrusion.

Ultimately, monitoring for "compromised" accounts and
differentiating them from accounts which are being operated by their
legitimate owners is nontrivial, so assistance or suggestions there
are welcome.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.opendev.org/pipermail/service-discuss/attachments/20201021/ff41d220/attachment.sig>