[service-announce] October 20 Gerrit Outage Update
mnaser at vexxhost.com
Wed Oct 21 02:50:42 UTC 2020
I'm happy to see that things are back in order, however, I *hate* to
be that person,
but I think there are still some hard questions that we need to answer together
I am especially concerned because of how this affects the workflow of developers
overall but also the security measures we have in place, should something like
Zuul be targeted instead.
I've added some questions in-line below
On Tue, Oct 20, 2020 at 8:33 PM Ian Wienand <iwienand at redhat.com> wrote:
> As of this mail, Gerrit access has been restored. Please read on for
> important information, especially around change verification.
> On 2020-10-20 at 01:30 a user unexpectedly added a workflow approval
> to a change that they were not expected to have access to. At 02:06
> UTC an alert was raised via IRC. Administrators found the account had
> added themselves to a core group and made the +W vote. The account
> was disabled, and removed from the groups it had added itself to by
> 02:55 UTC. Administrators began to analyse the situation and Gerrit
> was taken offline at 04:02 UTC to preserve state and allow for
> From this time, administrators were working on log collection and
> analysis, along with restoring backups for comparison purposes.
> By around 08:45 UTC it was clear that the privilege escalation had
> been achieved by gaining control of a Launchpad SSO account with
> Gerrit administrator privileges. By this time, we had ruled out
> software vulnerabilities. Logs showed the first unauthorized access
> of the administrator account in Gerrit on 2020-10-06. Communication
> with Launchpad admins agrees with this analysis. We saw one session
> opened as the administrator user to StoryBoard on this same day, but
> logs show no data was modified or hidden stories viewed.
So, just to be clear, someone who had root access to our Gerrit installation
had their account compromised which resulted in this (and not something
that occurred as a by-product of some other service -- say storyboard -- leaking
some sort of information?)
I see two issues in this at the moment:
- There is no need for us to have anyone with admin powers to Gerrit
at all times,
we've done enough automation to sustain us and a manual 'circuit breaker' of
adding a user *IF* necessary should be put in place.
- If the above is not possible, anyone who is part of this group should have 2FA
enabled inside Launchpad's SSO.
I would very much prefer the first option rather than the second one.
If it was an individual's account that was accessed and not a system account,
have we audited that there are not other things that might have been accessed
such as resources relating to Zuul, other systems and potentially
> Analysis has been performed on the Gerrit database and git trees from
> October 1st, pre-dating any known unauthorized access.
> Access was restored at around 2020-10-21 00:00 UTC
> The following has been verified:
> The administrator account used has been disabled and credentials
> We have verified that all group and user addition/removals since
> Oct 1 are valid. The only invalid additions were made by the
> compromised administrator account to add a single user account to
> the Administrators group; and then that account added itself to
> another known group.
> The account given administrator privilege has been removed from
> the groups it added itself to and is disabled.
> There is no evidence of any unauthorized access via methods other
> than Gerrit HTTP and Gerrit SSH access.
> No commits have been pushed to git trees bypassing code review.
> Every git tree has been compared to the Oct 1 version and all
> commits have been correctly inserted via Gerrit changes.
I saw this artifact, I have no idea if it was put into consideration, but,
food for thought:
> The version of Gerrit we use stores HTTP API passwords in
> plain-text. We know that a limited number of passwords were
> gathered via the HTTP API and it is possible passwords were
> gathered via the database. We thus have assumed that all HTTP API
> passwords have been disclosed. This password needs to be
> explicitly enabled by users, and many users do not have it
> This leaves us with the following remediation actions:
> Users should double-check their Launchpad recent activity at
> https://login.launchpad.net/activity for any suspicious logins. If
> found, please notify the OpenDev admins in Freenode #opendev and
> Launchpad admins in #launchpad immediately.
> All HTTP API passwords have been cleared. If you push changes via
> HTTPS (instead of typical SSH), are a gertty user, or run a CI
> system or something else that communicates with the Gerrit HTTP
> API, you will need to regenerate a password.
> Any SSH keys added to accounts since 2020-10-01 have been removed.
> This affects only a limited number of accounts. This is done in
> an abundance of caution, and we do not believe any accounts had
> unauthorized SSH keys added
> We should audit all changes for projects since 2020-10-01.
> We have no evidence that any account had its ssh keys compromised,
> thus we can rule out any unauthorized changes being uploaded via SSH.
> However we can not conclusively rule out that compromised HTTP API
> passwords were used to push a change through Gerrit. For example, a
> change could be uploaded that looks like it came from a user, or the
> API key of a core team member may have been used to approve a change
> without authorization.
> Given our extensive analysis we consider it exceedingly unlikely that
> this vector was used. We have had no notifications of users seeing
> unexpected changes either uploaded by them, or approved by them in
> projects they work on. This said, we believe it is important to
> inform the community of this very unlikely, but still possible,
> vulnerability of the source code.
> To this end, we have prepared a list of all changes from the known
> affected period which should be audited for correctness. These are
> available at
> Team members should browse these changes and make sure they were
> correctly approved in Gerrit. If any change looks suspicious you
> should notify OpenDev administrators in Freenode #opendev immediately.
> Further actions
> We are planning the following for the short term future:
> The Opendev administrators will be looking at alternative models
> for Gerrit admin account management.
> We are already well into planning and testing a coming upgrade to
> a version of Gerrit which does not store plain-text API keys.
> Longer term, we've written a spec for replacing Launchpad SSO as
> our authentication provider.
> We thank you for your patience during this trying time, and we look
> forward to returning to supporting the community doing what it does
> best -- working together to create great things.
Thank you for this. I'd also like to raise the question of moving forward, how
to be able to track these things. We had a user that had full root
access to our
Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered
when they did something (that, in the grand scheme of things, was relatively
trivial, compared to what could have happened).
What can we do to set up the necessary infrastructure to ensure that
are monitored. OpenDev is considered to be critical infrastructure
for this entire
community and there's not much that an outsider can do other than the
for the resources.
We've historically refused to have any monitoring and now things like this have
slipped up, I'm just worried that we have a big looming thing coming
up ahead of us
that will catch us off guard and we'll be completely unprepared for it...
> service-announce mailing list
> service-announce at lists.opendev.org
More information about the service-discuss