Re: [service-announce] October 20 Gerrit Outage Update
Hi everyone, I'm happy to see that things are back in order, however, I *hate* to be that person, but I think there are still some hard questions that we need to answer together transparently. I am especially concerned because of how this affects the workflow of developers overall but also the security measures we have in place, should something like Zuul be targeted instead. I've added some questions in-line below Thanks, Mohammed On Tue, Oct 20, 2020 at 8:33 PM Ian Wienand <iwienand@redhat.com> wrote:
As of this mail, Gerrit access has been restored. Please read on for important information, especially around change verification.
Background -----------
On 2020-10-20 at 01:30 a user unexpectedly added a workflow approval to a change that they were not expected to have access to. At 02:06 UTC an alert was raised via IRC. Administrators found the account had added themselves to a core group and made the +W vote. The account was disabled, and removed from the groups it had added itself to by 02:55 UTC. Administrators began to analyse the situation and Gerrit was taken offline at 04:02 UTC to preserve state and allow for analysis.
From this time, administrators were working on log collection and analysis, along with restoring backups for comparison purposes.
By around 08:45 UTC it was clear that the privilege escalation had been achieved by gaining control of a Launchpad SSO account with Gerrit administrator privileges. By this time, we had ruled out software vulnerabilities. Logs showed the first unauthorized access of the administrator account in Gerrit on 2020-10-06. Communication with Launchpad admins agrees with this analysis. We saw one session opened as the administrator user to StoryBoard on this same day, but logs show no data was modified or hidden stories viewed.
So, just to be clear, someone who had root access to our Gerrit installation had their account compromised which resulted in this (and not something that occurred as a by-product of some other service -- say storyboard -- leaking some sort of information?) I see two issues in this at the moment: - There is no need for us to have anyone with admin powers to Gerrit at all times, we've done enough automation to sustain us and a manual 'circuit breaker' of adding a user *IF* necessary should be put in place. - If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO. I would very much prefer the first option rather than the second one. If it was an individual's account that was accessed and not a system account, have we audited that there are not other things that might have been accessed such as resources relating to Zuul, other systems and potentially rotating/auditing all our infrastructure?
Analysis has been performed on the Gerrit database and git trees from October 1st, pre-dating any known unauthorized access.
Access was restored at around 2020-10-21 00:00 UTC
Outcomes -----------
The following has been verified:
The administrator account used has been disabled and credentials updated
We have verified that all group and user addition/removals since Oct 1 are valid. The only invalid additions were made by the compromised administrator account to add a single user account to the Administrators group; and then that account added itself to another known group.
The account given administrator privilege has been removed from the groups it added itself to and is disabled.
There is no evidence of any unauthorized access via methods other than Gerrit HTTP and Gerrit SSH access.
No commits have been pushed to git trees bypassing code review. Every git tree has been compared to the Oct 1 version and all commits have been correctly inserted via Gerrit changes.
I saw this artifact, I have no idea if it was put into consideration, but, food for thought: https://review.opendev.org/#/c/758881/
The version of Gerrit we use stores HTTP API passwords in plain-text. We know that a limited number of passwords were gathered via the HTTP API and it is possible passwords were gathered via the database. We thus have assumed that all HTTP API passwords have been disclosed. This password needs to be explicitly enabled by users, and many users do not have it enabled.
Remediation -----------
This leaves us with the following remediation actions:
Users should double-check their Launchpad recent activity at https://login.launchpad.net/activity for any suspicious logins. If found, please notify the OpenDev admins in Freenode #opendev and Launchpad admins in #launchpad immediately.
All HTTP API passwords have been cleared. If you push changes via HTTPS (instead of typical SSH), are a gertty user, or run a CI system or something else that communicates with the Gerrit HTTP API, you will need to regenerate a password.
Any SSH keys added to accounts since 2020-10-01 have been removed. This affects only a limited number of accounts. This is done in an abundance of caution, and we do not believe any accounts had unauthorized SSH keys added
We should audit all changes for projects since 2020-10-01.
We have no evidence that any account had its ssh keys compromised, thus we can rule out any unauthorized changes being uploaded via SSH. However we can not conclusively rule out that compromised HTTP API passwords were used to push a change through Gerrit. For example, a change could be uploaded that looks like it came from a user, or the API key of a core team member may have been used to approve a change without authorization.
Given our extensive analysis we consider it exceedingly unlikely that this vector was used. We have had no notifications of users seeing unexpected changes either uploaded by them, or approved by them in projects they work on. This said, we believe it is important to inform the community of this very unlikely, but still possible, vulnerability of the source code.
To this end, we have prepared a list of all changes from the known affected period which should be audited for correctness. These are available at
https://static.opendev.org/project/opendev.org/gerrit-diffs/
Team members should browse these changes and make sure they were correctly approved in Gerrit. If any change looks suspicious you should notify OpenDev administrators in Freenode #opendev immediately.
Further actions ----------------
We are planning the following for the short term future:
The Opendev administrators will be looking at alternative models for Gerrit admin account management.
We are already well into planning and testing a coming upgrade to a version of Gerrit which does not store plain-text API keys.
Longer term, we've written a spec for replacing Launchpad SSO as our authentication provider.
We thank you for your patience during this trying time, and we look forward to returning to supporting the community doing what it does best -- working together to create great things.
Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened). What can we do to set up the necessary infrastructure to ensure that these things are monitored. OpenDev is considered to be critical infrastructure for this entire community and there's not much that an outsider can do other than the 'keyholders' for the resources. We've historically refused to have any monitoring and now things like this have slipped up, I'm just worried that we have a big looming thing coming up ahead of us that will catch us off guard and we'll be completely unprepared for it...
_______________________________________________ service-announce mailing list service-announce@lists.opendev.org http://lists.opendev.org/cgi-bin/mailman/listinfo/service-announce
-- Mohammed Naser VEXXHOST, Inc.
On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: [...]
So, just to be clear, someone who had root access to our Gerrit installation had their account compromised which resulted in this (and not something that occurred as a by-product of some other service -- say storyboard -- leaking some sort of information?)
Yes, just to be clear it was the Launchpad/UbuntuOne SSO ID which was compromised, the attacker then used that ID to log into the Gerrit service. Those OpenIDs aren't trusted to authenticate SSH into our servers. That account was then used to convey Administrators group membership to another new account which the attacker used to probe database records and also add itself to a review group and approve a change (which was spotted and did not merge). They also proposed a change to one project's configuration, which couldn't merge but if it had would have been subsequently overwritten by our project management.
I see two issues in this at the moment:
- There is no need for us to have anyone with admin powers to Gerrit at all times, we've done enough automation to sustain us and a manual 'circuit breaker' of adding a user *IF* necessary should be put in place.
Yes, we've discussed this already in the past. Our use of OpenID makes it harder to switch between different Gerrit accounts with the WebUI (though maybe less so now that browser containers are a thing). But also, alternative accounts with no OpenIDs at all could be used to perform routine administrative tasks like adding initial users to new groups. It's certainly looking like a compelling option.
- If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO.
Or we switch to an SSO solution with broader 2FA support, also under discussion (the 2FA on UbuntuOne SSO is by request, with a sizeable backlog of folks wanting to be added, and has been in beta for 6 years).
I would very much prefer the first option rather than the second one.
I concur, for what it's worth.
If it was an individual's account that was accessed and not a system account, have we audited that there are not other things that might have been accessed such as resources relating to Zuul, other systems and potentially rotating/auditing all our infrastructure? [...]
The only systems of ours that OpenID had access to were Gerrit, StoryBoard and MediaWiki. Obviously Gerrit was our primary concern, though we've been looking through the other two in case we need to clean up or reset anything in them. For that account to alter our automation it would need to have done so through merged changes in Gerrit, and the team has been reviewing recent systems configuration changes for any impersonated suspicious alterations, just as we recommend all teams do for their changes since the first of the month.
I saw this artifact, I have no idea if it was put into consideration, but, food for thought:
Yes, that's how Gerrit normally expects project configurations to be altered (through change proposal, review and approval). For a variety of reasons we don't rely on those, but Gerrit allows any user to propose them.
Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened).
What can we do to set up the necessary infrastructure to ensure that these things are monitored. OpenDev is considered to be critical infrastructure for this entire community and there's not much that an outsider can do other than the 'keyholders' for the resources.
We've historically refused to have any monitoring and now things like this have slipped up, I'm just worried that we have a big looming thing coming up ahead of us that will catch us off guard and we'll be completely unprepared for it...
I understand the desire, but what monitoring solution do you recommend which would identify when an SSO OpenID account isn't being operated by its rightful owner? I do think if Gerrit had E-mail notifications to group owners any time group membership was altered, that would have helped us spot the secondary escalation (and it's something we'll look into finding out if the newer Gerrit we've been working on moving to supports), but that was a couple of weeks after the initial intrusion. Ultimately, monitoring for "compromised" accounts and differentiating them from accounts which are being operated by their legitimate owners is nontrivial, so assistance or suggestions there are welcome. -- Jeremy Stanley
---- On Wed, 21 Oct 2020 08:06:22 -0500 Jeremy Stanley <fungi@yuggoth.org> wrote ----
On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: [...]
So, just to be clear, someone who had root access to our Gerrit installation had their account compromised which resulted in this (and not something that occurred as a by-product of some other service -- say storyboard -- leaking some sort of information?)
Yes, just to be clear it was the Launchpad/UbuntuOne SSO ID which was compromised, the attacker then used that ID to log into the Gerrit service. Those OpenIDs aren't trusted to authenticate SSH into our servers. That account was then used to convey Administrators group membership to another new account which the attacker used to probe database records and also add itself to a review group and approve a change (which was spotted and did not merge). They also proposed a change to one project's configuration, which couldn't merge but if it had would have been subsequently overwritten by our project management.
I see two issues in this at the moment:
- There is no need for us to have anyone with admin powers to Gerrit at all times, we've done enough automation to sustain us and a manual 'circuit breaker' of adding a user *IF* necessary should be put in place.
Yes, we've discussed this already in the past. Our use of OpenID makes it harder to switch between different Gerrit accounts with the WebUI (though maybe less so now that browser containers are a thing). But also, alternative accounts with no OpenIDs at all could be used to perform routine administrative tasks like adding initial users to new groups. It's certainly looking like a compelling option.
- If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO.
Or we switch to an SSO solution with broader 2FA support, also under discussion (the 2FA on UbuntuOne SSO is by request, with a sizeable backlog of folks wanting to be added, and has been in beta for 6 years).
I would very much prefer the first option rather than the second one.
I concur, for what it's worth.
If it was an individual's account that was accessed and not a system account, have we audited that there are not other things that might have been accessed such as resources relating to Zuul, other systems and potentially rotating/auditing all our infrastructure? [...]
The only systems of ours that OpenID had access to were Gerrit, StoryBoard and MediaWiki. Obviously Gerrit was our primary concern, though we've been looking through the other two in case we need to clean up or reset anything in them.
For that account to alter our automation it would need to have done so through merged changes in Gerrit, and the team has been reviewing recent systems configuration changes for any impersonated suspicious alterations, just as we recommend all teams do for their changes since the first of the month.
I saw this artifact, I have no idea if it was put into consideration, but, food for thought:
Yes, that's how Gerrit normally expects project configurations to be altered (through change proposal, review and approval). For a variety of reasons we don't rely on those, but Gerrit allows any user to propose them.
Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened).
What can we do to set up the necessary infrastructure to ensure that these things are monitored. OpenDev is considered to be critical infrastructure for this entire community and there's not much that an outsider can do other than the 'keyholders' for the resources.
We've historically refused to have any monitoring and now things like this have slipped up, I'm just worried that we have a big looming thing coming up ahead of us that will catch us off guard and we'll be completely unprepared for it...
I understand the desire, but what monitoring solution do you recommend which would identify when an SSO OpenID account isn't being operated by its rightful owner?
I do think if Gerrit had E-mail notifications to group owners any time group membership was altered, that would have helped us spot the secondary escalation (and it's something we'll look into finding out if the newer Gerrit we've been working on moving to supports), but that was a couple of weeks after the initial intrusion.
Ultimately, monitoring for "compromised" accounts and differentiating them from accounts which are being operated by their legitimate owners is nontrivial, so assistance or suggestions there are welcome.
Enabling the email notification to all the existing members of any core groups if there is any change in that group can help this. Or the developer can enable the review comment email so that you can catch such suspicious activity very soon but review comments email can be huge :) but works for me. -gmann
-- Jeremy Stanley
On 2020-10-21 10:06:07 -0500 (-0500), Ghanshyam Mann wrote: [...]
Enabling the email notification to all the existing members of any core groups if there is any change in that group can help this. [...]
Yes, like I said, that doesn't seem to be a feature of Gerrit 2.13. It may have been added in a later version, but someone will need to check. We could also add our own auditing tools which anyone can run, for example group membership information can be queried from the REST API even by non-administrators. Something like this: <URL: https://opendev.org/opendev/system-config/src/commit/b5ee5e6eb8c30ff6e8a9ef9... > I wrote that some years back as an example for the folks who were regularly organizing "core reviewer parties" at summits, but it could be turned to more useful endeavors. Note that it probably needs some updating, I haven't tried running it in ages. Let's call that an exercise for the reader. ;) Just remember, as I've said already, while notification of suspicious group membership changes would be handy, this particular incident started with a compromised admin identity and the group escalation was really an unnecessary/secondary event weeks later. While it might help us catch future breaches, it wouldn't on its own have caught the initial intrusion for this one. -- Jeremy Stanley
On 2020-10-21 13:06:22 +0000 (+0000), Jeremy Stanley wrote:
On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: [...]
There is no need for us to have anyone with admin powers to Gerrit at all times, we've done enough automation to sustain us and a manual 'circuit breaker' of adding a user *IF* necessary should be put in place.
Yes, we've discussed this already in the past. Our use of OpenID makes it harder to switch between different Gerrit accounts with the WebUI (though maybe less so now that browser containers are a thing). But also, alternative accounts with no OpenIDs at all could be used to perform routine administrative tasks like adding initial users to new groups. It's certainly looking like a compelling option. [...]
An implementation for this is up for review, if anyone's interested in taking a look: https://review.opendev.org/760051 -- Jeremy Stanley
On Tue, Oct 20, 2020 at 10:50:42PM -0400, Mohammed Naser wrote:
- There is no need for us to have anyone with admin powers to Gerrit at all times,
Totally agree; I mentioned in the remediation we should look at the way we handle Gerrit administrators. I would say it's mostly a convenience for adding initial members to groups, and the occasional case where we need to force-merge something. I think we should discuss as part of the upcoming PTG
- If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO.
I agree with this too. It's not that obvious how to enable this, but it can be done via [2]. I would probably just recommend everyone does it. We know longer term we want to move away from Launchpad only as well.
Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened).
Yeah, not to go into great detail but this wasn't able to be "upgraded" to either the on-disk repos or, importantly, the logs. And it's not just luck that ensures such separation :) The major (potential for) escalation here happened because our version of gerrit keeps plain-text HTTP API keys. So both an example of defence-in-depth success and failure all at once. We are well on the way to replacing that, so we are not sweeping that one under the rug. There's a few other thoughts I have, but TBH I'm hesitant to start broadcasting them in public mails. I am of course for transparency and participation -- I mean the entire infra is driven by completely public git-ops CI and CD; can anyone else say that?! What we don't have is a formalised way for security discussions. I think we should a) more clearly describe how to responsibly communicate infra security issues; I don't think we have anything like that documented. b) start a closed list where we can have free-form discussions about security issues. I think we have a track record of transparency that would ensure that doesn't turn into a "star chamber" and anyone who was interested with a modicum of trust from the project could join (e.g. our cloud providers and others who are clearly invested in the system). This could also be part of the responsible disclosure, which would be helpful so that people don't have to sign up for full accounts to post storyboard issues, etc. to alert us to issues. -i [1] https://etherpad.opendev.org/p/opendev-ptg-planning-oct-2020 [2] https://help.ubuntu.com/community/SSO/FAQs/2FA
On Thu, Oct 22, 2020 at 2:53 AM Ian Wienand <iwienand@redhat.com> wrote:
On Tue, Oct 20, 2020 at 10:50:42PM -0400, Mohammed Naser wrote:
- There is no need for us to have anyone with admin powers to Gerrit at all times,
Totally agree; I mentioned in the remediation we should look at the way we handle Gerrit administrators. I would say it's mostly a convenience for adding initial members to groups, and the occasional case where we need to force-merge something.
I think we should discuss as part of the upcoming PTG
- If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO.
I agree with this too. It's not that obvious how to enable this, but it can be done via [2]. I would probably just recommend everyone does it.
We know longer term we want to move away from Launchpad only as well.
Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened).
Yeah, not to go into great detail but this wasn't able to be "upgraded" to either the on-disk repos or, importantly, the logs. And it's not just luck that ensures such separation :)
The major (potential for) escalation here happened because our version of gerrit keeps plain-text HTTP API keys. So both an example of defence-in-depth success and failure all at once. We are well on the way to replacing that, so we are not sweeping that one under the rug.
There's a few other thoughts I have, but TBH I'm hesitant to start broadcasting them in public mails. I am of course for transparency and participation -- I mean the entire infra is driven by completely public git-ops CI and CD; can anyone else say that?!
What we don't have is a formalised way for security discussions. I think we should
a) more clearly describe how to responsibly communicate infra security issues; I don't think we have anything like that documented.
b) start a closed list where we can have free-form discussions about security issues. I think we have a track record of transparency that would ensure that doesn't turn into a "star chamber" and anyone who was interested with a modicum of trust from the project could join (e.g. our cloud providers and others who are clearly invested in the system).
+100 this. I am 100% for transparency but I do think there are things that need to be better discussed in private, IMHO.
This could also be part of the responsible disclosure, which would be helpful so that people don't have to sign up for full accounts to post storyboard issues, etc. to alert us to issues.
-i
[1] https://etherpad.opendev.org/p/opendev-ptg-planning-oct-2020 [2] https://help.ubuntu.com/community/SSO/FAQs/2FA
-- Mohammed Naser VEXXHOST, Inc.
On 2020-10-22 10:36:11 -0400 (-0400), Mohammed Naser wrote:
On Thu, Oct 22, 2020 at 2:53 AM Ian Wienand <iwienand@redhat.com> wrote: [...]
start a closed list where we can have free-form discussions about security issues. I think we have a track record of transparency that would ensure that doesn't turn into a "star chamber" and anyone who was interested with a modicum of trust from the project could join (e.g. our cloud providers and others who are clearly invested in the system).
+100 this. I am 100% for transparency but I do think there are things that need to be better discussed in private, IMHO. [...]
I have proposed https://review.opendev.org/759293 to this end. We can continue to discuss its merits and possible uses here or in review comments on that change. -- Jeremy Stanley
participants (4)
-
Ghanshyam Mann
-
Ian Wienand
-
Jeremy Stanley
-
Mohammed Naser