From cboylan at sapwetik.org Mon Oct 5 22:40:14 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 05 Oct 2020 15:40:14 -0700 Subject: Team Meeting Agenda for October 6, 2020 Message-ID: We will meet at 19:00 UTC in #opendev-meeting on October 6, 2020 with this agenda: == Agenda for next meeting == * Announcements ** OpenStack Release next week. ** Rax hosted db outages around 03:00-05:00 UTC Friday including those for review and grafana * Actions from last meeting * Specs approval * Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.) ** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html Update Config Management] *** topic:update-cfg-mgmt *** Zuul as CD engine ** OpenDev *** Where is review-test is now a mimic of production Gerrit as of October 1 **** Testing of the upgrade process to begin shortly. *** Luca has offered to do a conference call with us. Let me know if interested and I'll include you for scheduling if/when that happens. * General topics ** PTG PLanning (clarkb 20200929) *** October PTG registration is now open: https://www.openstack.org/ptg/ *** OpenDev planning stats here: https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 ** Rehoming tarballs (ianw 20200929) *** large selection of things in tarballs.openstack.org/openstack/... that should be under tenant dirs opendev/ x/ zuul/ etc. *** https://review.opendev.org/#/c/754257/ -- script to generate script that generates a script to cleanup *** http://paste.openstack.org/show/798368/ -- resulting moves *** symlink old directories? apache redirects (files gone on afs)? just move but notifiy lists? ** Splitting puppet else into specific infra-prod jobs (clarkb 20200929) *** Should be mostly mechanical *** Does it make sense to try and sprint this? Have several people work on getting it done in a short period of time? ** Bup and Borg Backups (clarkb 20200929) *** https://review.opendev.org/741366 is ready to land when we are ready. ** Trusty Upgrade Progress (clarkb 20200929) *** Wiki updates * Open discussion From cboylan at sapwetik.org Tue Oct 6 23:41:32 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 06 Oct 2020 16:41:32 -0700 Subject: Testing the results of Gerrit upgrades on review-test.opendev.org Message-ID: Hello everyone, Recently fungi and I managed to make review-test.opendev.org a copy of production Gerrit (review.opendev.org) as of October 1. We have since upgraded that installation to Gerrit 2.16. The 2.16 version is a potential upgrade target for our production Gerrit as we have to stop there and do a required notedb migration step. What isn't yet clear is if we'll expose the 2.16 installation or continue on to upgrade to a 3.x version. In order to figure that out we'll be doing the notedb migration and the 3.x upgrades in the near future. Once we've gone through the process we should have a good idea of what makes sense as far as upgrades go for us. In any case the server is up for testing now if people want to take a look at 2.16 and offer feedback. I'll followup on this thread when we get upgraded to 3.x and ask for feedback on that version too. Head to https://review-test.opendev.org and check it out. There are some minor known issues. The biggest that people will probably notice is that our hacky javascript CI results table does not work in the old gerrit UI or the new polygerrit UI. The commentlinks for zuul results are also not working on polygerrit UI. We're still investigating, but we're going to be careful that we don't worry about every single issue like that pre upgrade otherwise we'll never get the upgrade done. Instead we'll gather feedback and prioritize then sort out what needs fixing now and what we can sort out later. Let us know what you think, Clark From cboylan at sapwetik.org Fri Oct 9 18:46:03 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 09 Oct 2020 11:46:03 -0700 Subject: Testing the results of Gerrit upgrades on review-test.opendev.org In-Reply-To: References: Message-ID: On Tue, Oct 6, 2020, at 4:41 PM, Clark Boylan wrote: > Hello everyone, > > Recently fungi and I managed to make review-test.opendev.org a copy of > production Gerrit (review.opendev.org) as of October 1. We have since > upgraded that installation to Gerrit 2.16. The 2.16 version is a > potential upgrade target for our production Gerrit as we have to stop > there and do a required notedb migration step. What isn't yet clear is > if we'll expose the 2.16 installation or continue on to upgrade to a > 3.x version. In order to figure that out we'll be doing the notedb > migration and the 3.x upgrades in the near future. Once we've gone > through the process we should have a good idea of what makes sense as > far as upgrades go for us. > > In any case the server is up for testing now if people want to take a > look at 2.16 and offer feedback. I'll followup on this thread when we > get upgraded to 3.x and ask for feedback on that version too. Head to > https://review-test.opendev.org and check it out. > > There are some minor known issues. The biggest that people will > probably notice is that our hacky javascript CI results table does not > work in the old gerrit UI or the new polygerrit UI. The commentlinks > for zuul results are also not working on polygerrit UI. We're still > investigating, but we're going to be careful that we don't worry about > every single issue like that pre upgrade otherwise we'll never get the > upgrade done. Instead we'll gather feedback and prioritize then sort > out what needs fixing now and what we can sort out later. > > Let us know what you think, > Clark > > Quick update. Fungi and I got the server upgraded from 2.16 to 3.2 over the last day and a half. Unfortunately the notedb migration is a significant chunk of time which will prolong the upgrade process, but it succeeded and the server is back up and running on the latest release. Feel free to test the server out and give feedback. As far as upgrade planning goes, we have a few important things to test like replication behavior and sizes post notedb migration. I'm working on spinning up a test gitea server to replicate into from review-test for that. Otherwise basic functionality seems to work: I can log in, git review -s is able to grab the hook script, git review can push changes, I am able to leave top level and inline comments on changes, and so on. The upgrade itself will likely take about two days with Gerrit being inaccessible the whole time. On day one we'll upgrade from 2.13 to 2.16 pre notedb migration. This will act as a checkpoint which we can fallback to if the 3.x upgrade has any problems. Then between day one and day two we can run the long notedb migration. When we return to the upgrading on day two we can upgrade from 2.16 to 3.2, then apply config updates to services like zuul as well as other related cleanups. Actual details to be determined after more testing. I don't want to commit to such exact plans just yet. Clark From cboylan at sapwetik.org Mon Oct 12 21:02:24 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 12 Oct 2020 14:02:24 -0700 Subject: Meeting Agenda for October 13, 2020 Message-ID: <1f347c8a-7398-42de-b576-ce6181585638@www.fastmail.com> We will meet in #opendev-meeting at 19:00 UTC on October 13, 2020 with this agenda: == Agenda for next meeting == * Announcements ** OpenStack Release October 14 ** Summit next week. PTG the week after. * Actions from last meeting * Specs approval * Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.) ** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html Update Config Management] *** topic:update-cfg-mgmt *** Zuul as CD engine ** OpenDev *** Preparing to upgrade Gerrit from 2.13 to 3.2 **** review-test.opendev.org is an upgraded snapshot of production from October 1. Please check it out **** Basic functionality seems to be working ***** logging in, git review -s, git review to push, commenting on changes, ICLA signing, replication, change searching, and so on. **** jeepyb bug/spec update hooks and the welcome message hook rely on database access and will need to be updated or sunsetted **** Upgrade Process ***** Backup then upgrade from 2.13 to 2.16. This is our fallback midpoint checkpoint ***** Backup again then migrate to notedb on 2.16 ***** Upgrade to 3.2 ***** Upgrade to 2.16 along with backups should be doable in a day. Then notedb migration can happen overnight with 3.2 upgrade happening on day two. **** Unknowns ***** Storyboard integration **** Can we start talking about scheduling the outage and upgrade? *** Luca has offered to do a conference call with us. Let me know if interested and I'll include you for scheduling if/when that happens. * General topics ** PTG PLanning (clarkb 20200929) *** October PTG registration is now open: https://www.openstack.org/ptg/ *** OpenDev planning stats here: https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 ** Bup and Borg Backups (clarkb 20200929) *** Ethercalc to be the first borg backed up service ** Splitting puppet else into specific infra-prod jobs (clarkb 20200929) *** Should be mostly mechanical *** Does it make sense to try and sprint this? Have several people work on getting it done in a short period of time? ** Trusty Upgrade Progress (clarkb 20200929) *** Wiki updates * Open discussion From cboylan at sapwetik.org Mon Oct 19 23:09:44 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 19 Oct 2020 16:09:44 -0700 Subject: Cancelling the team meeting October 20, 2020 Message-ID: Hello everyone, I've been swamped with summit things. Initially I thought that we could do a quieter meeting during the summit, but I've been too distracted to even look at an agenda. For that reason I think we should cancel the meeting this week. We've got the PTG next week. Please register and check the schedule if you plan to attend. We'll see you there as well as at next week's meeting (I hope). Clark From jbryce at jbryce.com Tue Oct 20 18:18:04 2020 From: jbryce at jbryce.com (Jonathan Bryce) Date: Tue, 20 Oct 2020 13:18:04 -0500 Subject: [service-announce] October 20 Gerrit Outage In-Reply-To: <2ee83d02-f6eb-4ea9-916c-5b5558da862c@www.fastmail.com> References: <2ee83d02-f6eb-4ea9-916c-5b5558da862c@www.fastmail.com> Message-ID: <175473b8360.27de.eb5fa01e01bf15c6e0d805bdb1ad935e@jbryce.com> Thanks Clark and the rest of the OpenDev infra crew for literally working around the clock on this issue! Appreciate the effort to verify everything. Also wanted to share that the updates are being posted here if people want to see the history for completeness: https://review.opendev.org/maintenance.html Jonathan On October 20, 2020 12:50:32 "Clark Boylan" wrote: > Hello everyone, > > By now most of you have probably noticed that we took Gerrit offline > recently. The reason for that is we believe an admin account in Gerrit was > compromised allowing an attacker to escalate privileges within Gerrit. > > Around 02:00 UTC October 20 suspicious review activity was noticed, and we > were made aware of it shortly afterwards. The involved account was disabled > and removed from privileged Gerrit groups. After further investigation we > decided that we needed to stop the service, this happened at about 04:00 UTC. > > After the service was stopped we shifted focus to identifying the source of > the issue as well as investigating impact. We believe this originated on > October 6th with at least two compromised Ubuntu One accounts. One of which > was a Gerrit admin account. These accounts, like the one that initially > tipped us off, have been dealt with at this point. > > In order to evaluate impact we are using backups from October 1 to find > configuration, database, and git repo changes that have been made. We have > identified 97 accounts that updated ssh keys after that point in time. > These ssh keys are being removed as we can't be sure the changes were valid > changes made by the user. If you are one of these users you will need to > add your key(s) back in. We will also attempt to reach out to the affected > users directly by email soon. We will be checking openid urls and group > membership changes as well. We will determine what actions make sense for > these items once we have evaluated the impact to them. > > All Gerrit HTTP API tokens will be deleted. You will need to generate new > ones if you are an API user. Sorry, gertty fans. > > On the git repo side of things there are a few things that we will need to > check. Using our October 1 state we will generate lists of commits that > have landed since then for each branch on each repo. We will verify that > the latest commit can reach the last known good commit in the git DAG. For > non merge commits we will also correlate these to Gerrit changes. We will > then ask that you help us by verifying the commits on your projects are as > reviewed and not malicious. We will also need to check git tags which > should all be signed and can be verified that way. > > This is a good reminder to check activity on your online accounts and > identities for anything unexpected. > > We understand that an inaccessible Gerrit is not fun. We are trying to go > as quickly as we can while also not sacrificing caution and care. > > Clark > > _______________________________________________ > service-announce mailing list > service-announce at lists.opendev.org > http://lists.opendev.org/cgi-bin/mailman/listinfo/service-announce -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Oct 21 02:50:42 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 20 Oct 2020 22:50:42 -0400 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: <20201021003314.GB1695651@fedora19.localdomain> References: <20201021003314.GB1695651@fedora19.localdomain> Message-ID: Hi everyone, I'm happy to see that things are back in order, however, I *hate* to be that person, but I think there are still some hard questions that we need to answer together transparently. I am especially concerned because of how this affects the workflow of developers overall but also the security measures we have in place, should something like Zuul be targeted instead. I've added some questions in-line below Thanks, Mohammed On Tue, Oct 20, 2020 at 8:33 PM Ian Wienand wrote: > > As of this mail, Gerrit access has been restored. Please read on for > important information, especially around change verification. > > Background > ----------- > > On 2020-10-20 at 01:30 a user unexpectedly added a workflow approval > to a change that they were not expected to have access to. At 02:06 > UTC an alert was raised via IRC. Administrators found the account had > added themselves to a core group and made the +W vote. The account > was disabled, and removed from the groups it had added itself to by > 02:55 UTC. Administrators began to analyse the situation and Gerrit > was taken offline at 04:02 UTC to preserve state and allow for > analysis. > > From this time, administrators were working on log collection and > analysis, along with restoring backups for comparison purposes. > > By around 08:45 UTC it was clear that the privilege escalation had > been achieved by gaining control of a Launchpad SSO account with > Gerrit administrator privileges. By this time, we had ruled out > software vulnerabilities. Logs showed the first unauthorized access > of the administrator account in Gerrit on 2020-10-06. Communication > with Launchpad admins agrees with this analysis. We saw one session > opened as the administrator user to StoryBoard on this same day, but > logs show no data was modified or hidden stories viewed. So, just to be clear, someone who had root access to our Gerrit installation had their account compromised which resulted in this (and not something that occurred as a by-product of some other service -- say storyboard -- leaking some sort of information?) I see two issues in this at the moment: - There is no need for us to have anyone with admin powers to Gerrit at all times, we've done enough automation to sustain us and a manual 'circuit breaker' of adding a user *IF* necessary should be put in place. - If the above is not possible, anyone who is part of this group should have 2FA enabled inside Launchpad's SSO. I would very much prefer the first option rather than the second one. If it was an individual's account that was accessed and not a system account, have we audited that there are not other things that might have been accessed such as resources relating to Zuul, other systems and potentially rotating/auditing all our infrastructure? > Analysis has been performed on the Gerrit database and git trees from > October 1st, pre-dating any known unauthorized access. > > Access was restored at around 2020-10-21 00:00 UTC > > Outcomes > ----------- > > The following has been verified: > > The administrator account used has been disabled and credentials > updated > > We have verified that all group and user addition/removals since > Oct 1 are valid. The only invalid additions were made by the > compromised administrator account to add a single user account to > the Administrators group; and then that account added itself to > another known group. > > The account given administrator privilege has been removed from > the groups it added itself to and is disabled. > > There is no evidence of any unauthorized access via methods other > than Gerrit HTTP and Gerrit SSH access. > > No commits have been pushed to git trees bypassing code review. > Every git tree has been compared to the Oct 1 version and all > commits have been correctly inserted via Gerrit changes. I saw this artifact, I have no idea if it was put into consideration, but, food for thought: https://review.opendev.org/#/c/758881/ > The version of Gerrit we use stores HTTP API passwords in > plain-text. We know that a limited number of passwords were > gathered via the HTTP API and it is possible passwords were > gathered via the database. We thus have assumed that all HTTP API > passwords have been disclosed. This password needs to be > explicitly enabled by users, and many users do not have it > enabled. > > Remediation > ----------- > > This leaves us with the following remediation actions: > > Users should double-check their Launchpad recent activity at > https://login.launchpad.net/activity for any suspicious logins. If > found, please notify the OpenDev admins in Freenode #opendev and > Launchpad admins in #launchpad immediately. > > All HTTP API passwords have been cleared. If you push changes via > HTTPS (instead of typical SSH), are a gertty user, or run a CI > system or something else that communicates with the Gerrit HTTP > API, you will need to regenerate a password. > > Any SSH keys added to accounts since 2020-10-01 have been removed. > This affects only a limited number of accounts. This is done in > an abundance of caution, and we do not believe any accounts had > unauthorized SSH keys added > > We should audit all changes for projects since 2020-10-01. > > We have no evidence that any account had its ssh keys compromised, > thus we can rule out any unauthorized changes being uploaded via SSH. > However we can not conclusively rule out that compromised HTTP API > passwords were used to push a change through Gerrit. For example, a > change could be uploaded that looks like it came from a user, or the > API key of a core team member may have been used to approve a change > without authorization. > > Given our extensive analysis we consider it exceedingly unlikely that > this vector was used. We have had no notifications of users seeing > unexpected changes either uploaded by them, or approved by them in > projects they work on. This said, we believe it is important to > inform the community of this very unlikely, but still possible, > vulnerability of the source code. > > To this end, we have prepared a list of all changes from the known > affected period which should be audited for correctness. These are > available at > > https://static.opendev.org/project/opendev.org/gerrit-diffs/ > > Team members should browse these changes and make sure they were > correctly approved in Gerrit. If any change looks suspicious you > should notify OpenDev administrators in Freenode #opendev immediately. > > Further actions > ---------------- > > We are planning the following for the short term future: > > The Opendev administrators will be looking at alternative models > for Gerrit admin account management. > > We are already well into planning and testing a coming upgrade to > a version of Gerrit which does not store plain-text API keys. > > Longer term, we've written a spec for replacing Launchpad SSO as > our authentication provider. > > We thank you for your patience during this trying time, and we look > forward to returning to supporting the community doing what it does > best -- working together to create great things. > Thank you for this. I'd also like to raise the question of moving forward, how to be able to track these things. We had a user that had full root access to our Gerrit installation for ~2 weeks without our knowledge entirely, only uncovered when they did something (that, in the grand scheme of things, was relatively trivial, compared to what could have happened). What can we do to set up the necessary infrastructure to ensure that these things are monitored. OpenDev is considered to be critical infrastructure for this entire community and there's not much that an outsider can do other than the 'keyholders' for the resources. We've historically refused to have any monitoring and now things like this have slipped up, I'm just worried that we have a big looming thing coming up ahead of us that will catch us off guard and we'll be completely unprepared for it... > > _______________________________________________ > service-announce mailing list > service-announce at lists.opendev.org > http://lists.opendev.org/cgi-bin/mailman/listinfo/service-announce -- Mohammed Naser VEXXHOST, Inc. From fungi at yuggoth.org Wed Oct 21 13:06:22 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 21 Oct 2020 13:06:22 +0000 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: References: <20201021003314.GB1695651@fedora19.localdomain> Message-ID: <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: [...] > So, just to be clear, someone who had root access to our Gerrit > installation had their account compromised which resulted in this > (and not something that occurred as a by-product of some other > service -- say storyboard -- leaking some sort of information?) Yes, just to be clear it was the Launchpad/UbuntuOne SSO ID which was compromised, the attacker then used that ID to log into the Gerrit service. Those OpenIDs aren't trusted to authenticate SSH into our servers. That account was then used to convey Administrators group membership to another new account which the attacker used to probe database records and also add itself to a review group and approve a change (which was spotted and did not merge). They also proposed a change to one project's configuration, which couldn't merge but if it had would have been subsequently overwritten by our project management. > I see two issues in this at the moment: > > - There is no need for us to have anyone with admin powers to > Gerrit at all times, we've done enough automation to sustain us > and a manual 'circuit breaker' of adding a user *IF* necessary > should be put in place. Yes, we've discussed this already in the past. Our use of OpenID makes it harder to switch between different Gerrit accounts with the WebUI (though maybe less so now that browser containers are a thing). But also, alternative accounts with no OpenIDs at all could be used to perform routine administrative tasks like adding initial users to new groups. It's certainly looking like a compelling option. > - If the above is not possible, anyone who is part of this group > should have 2FA enabled inside Launchpad's SSO. Or we switch to an SSO solution with broader 2FA support, also under discussion (the 2FA on UbuntuOne SSO is by request, with a sizeable backlog of folks wanting to be added, and has been in beta for 6 years). > I would very much prefer the first option rather than the second > one. I concur, for what it's worth. > If it was an individual's account that was accessed and not a > system account, have we audited that there are not other things > that might have been accessed such as resources relating to Zuul, > other systems and potentially rotating/auditing all our > infrastructure? [...] The only systems of ours that OpenID had access to were Gerrit, StoryBoard and MediaWiki. Obviously Gerrit was our primary concern, though we've been looking through the other two in case we need to clean up or reset anything in them. For that account to alter our automation it would need to have done so through merged changes in Gerrit, and the team has been reviewing recent systems configuration changes for any impersonated suspicious alterations, just as we recommend all teams do for their changes since the first of the month. > I saw this artifact, I have no idea if it was put into consideration, but, > food for thought: > > https://review.opendev.org/#/c/758881/ [...] Yes, that's how Gerrit normally expects project configurations to be altered (through change proposal, review and approval). For a variety of reasons we don't rely on those, but Gerrit allows any user to propose them. > Thank you for this. I'd also like to raise the question of moving > forward, how to be able to track these things. We had a user that > had full root access to our Gerrit installation for ~2 weeks > without our knowledge entirely, only uncovered when they did > something (that, in the grand scheme of things, was relatively > trivial, compared to what could have happened). > > What can we do to set up the necessary infrastructure to ensure > that these things are monitored. OpenDev is considered to be > critical infrastructure for this entire community and there's not > much that an outsider can do other than the 'keyholders' for the > resources. > > We've historically refused to have any monitoring and now things > like this have slipped up, I'm just worried that we have a big > looming thing coming up ahead of us that will catch us off guard > and we'll be completely unprepared for it... I understand the desire, but what monitoring solution do you recommend which would identify when an SSO OpenID account isn't being operated by its rightful owner? I do think if Gerrit had E-mail notifications to group owners any time group membership was altered, that would have helped us spot the secondary escalation (and it's something we'll look into finding out if the newer Gerrit we've been working on moving to supports), but that was a couple of weeks after the initial intrusion. Ultimately, monitoring for "compromised" accounts and differentiating them from accounts which are being operated by their legitimate owners is nontrivial, so assistance or suggestions there are welcome. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From gmann at ghanshyammann.com Wed Oct 21 15:06:07 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 21 Oct 2020 10:06:07 -0500 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> References: <20201021003314.GB1695651@fedora19.localdomain> <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> Message-ID: <1754bb22532.121243b7f149104.7362489015264007054@ghanshyammann.com> ---- On Wed, 21 Oct 2020 08:06:22 -0500 Jeremy Stanley wrote ---- > On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: > [...] > > So, just to be clear, someone who had root access to our Gerrit > > installation had their account compromised which resulted in this > > (and not something that occurred as a by-product of some other > > service -- say storyboard -- leaking some sort of information?) > > Yes, just to be clear it was the Launchpad/UbuntuOne SSO ID which > was compromised, the attacker then used that ID to log into the > Gerrit service. Those OpenIDs aren't trusted to authenticate SSH > into our servers. That account was then used to convey > Administrators group membership to another new account which the > attacker used to probe database records and also add itself to a > review group and approve a change (which was spotted and did not > merge). They also proposed a change to one project's configuration, > which couldn't merge but if it had would have been subsequently > overwritten by our project management. > > > I see two issues in this at the moment: > > > > - There is no need for us to have anyone with admin powers to > > Gerrit at all times, we've done enough automation to sustain us > > and a manual 'circuit breaker' of adding a user *IF* necessary > > should be put in place. > > Yes, we've discussed this already in the past. Our use of OpenID > makes it harder to switch between different Gerrit accounts with the > WebUI (though maybe less so now that browser containers are a > thing). But also, alternative accounts with no OpenIDs at all could > be used to perform routine administrative tasks like adding initial > users to new groups. It's certainly looking like a compelling > option. > > > - If the above is not possible, anyone who is part of this group > > should have 2FA enabled inside Launchpad's SSO. > > Or we switch to an SSO solution with broader 2FA support, also under > discussion (the 2FA on UbuntuOne SSO is by request, with a sizeable > backlog of folks wanting to be added, and has been in beta for 6 > years). > > > I would very much prefer the first option rather than the second > > one. > > I concur, for what it's worth. > > > If it was an individual's account that was accessed and not a > > system account, have we audited that there are not other things > > that might have been accessed such as resources relating to Zuul, > > other systems and potentially rotating/auditing all our > > infrastructure? > [...] > > The only systems of ours that OpenID had access to were Gerrit, > StoryBoard and MediaWiki. Obviously Gerrit was our primary concern, > though we've been looking through the other two in case we need to > clean up or reset anything in them. > > For that account to alter our automation it would need to have done > so through merged changes in Gerrit, and the team has been reviewing > recent systems configuration changes for any impersonated suspicious > alterations, just as we recommend all teams do for their changes > since the first of the month. > > > I saw this artifact, I have no idea if it was put into consideration, but, > > food for thought: > > > > https://review.opendev.org/#/c/758881/ > [...] > > Yes, that's how Gerrit normally expects project configurations to be > altered (through change proposal, review and approval). For a > variety of reasons we don't rely on those, but Gerrit allows any > user to propose them. > > > Thank you for this. I'd also like to raise the question of moving > > forward, how to be able to track these things. We had a user that > > had full root access to our Gerrit installation for ~2 weeks > > without our knowledge entirely, only uncovered when they did > > something (that, in the grand scheme of things, was relatively > > trivial, compared to what could have happened). > > > > What can we do to set up the necessary infrastructure to ensure > > that these things are monitored. OpenDev is considered to be > > critical infrastructure for this entire community and there's not > > much that an outsider can do other than the 'keyholders' for the > > resources. > > > > We've historically refused to have any monitoring and now things > > like this have slipped up, I'm just worried that we have a big > > looming thing coming up ahead of us that will catch us off guard > > and we'll be completely unprepared for it... > > I understand the desire, but what monitoring solution do you > recommend which would identify when an SSO OpenID account isn't > being operated by its rightful owner? > > I do think if Gerrit had E-mail notifications to group owners any > time group membership was altered, that would have helped us spot > the secondary escalation (and it's something we'll look into finding > out if the newer Gerrit we've been working on moving to supports), > but that was a couple of weeks after the initial intrusion. > > Ultimately, monitoring for "compromised" accounts and > differentiating them from accounts which are being operated by their > legitimate owners is nontrivial, so assistance or suggestions there > are welcome. Enabling the email notification to all the existing members of any core groups if there is any change in that group can help this. Or the developer can enable the review comment email so that you can catch such suspicious activity very soon but review comments email can be huge :) but works for me. -gmann > -- > Jeremy Stanley > From fungi at yuggoth.org Wed Oct 21 15:20:04 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 21 Oct 2020 15:20:04 +0000 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: <1754bb22532.121243b7f149104.7362489015264007054@ghanshyammann.com> References: <20201021003314.GB1695651@fedora19.localdomain> <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> <1754bb22532.121243b7f149104.7362489015264007054@ghanshyammann.com> Message-ID: <20201021152004.yebc5gy5kdm5yg57@yuggoth.org> On 2020-10-21 10:06:07 -0500 (-0500), Ghanshyam Mann wrote: [...] > Enabling the email notification to all the existing members of any > core groups if there is any change in that group can help this. [...] Yes, like I said, that doesn't seem to be a feature of Gerrit 2.13. It may have been added in a later version, but someone will need to check. We could also add our own auditing tools which anyone can run, for example group membership information can be queried from the REST API even by non-administrators. Something like this: I wrote that some years back as an example for the folks who were regularly organizing "core reviewer parties" at summits, but it could be turned to more useful endeavors. Note that it probably needs some updating, I haven't tried running it in ages. Let's call that an exercise for the reader. ;) Just remember, as I've said already, while notification of suspicious group membership changes would be handy, this particular incident started with a compromised admin identity and the group escalation was really an unnecessary/secondary event weeks later. While it might help us catch future breaches, it wouldn't on its own have caught the initial intrusion for this one. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From iwienand at redhat.com Thu Oct 22 06:53:11 2020 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 22 Oct 2020 17:53:11 +1100 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: References: <20201021003314.GB1695651@fedora19.localdomain> Message-ID: <20201022065311.GA1731784@fedora19.localdomain> On Tue, Oct 20, 2020 at 10:50:42PM -0400, Mohammed Naser wrote: > - There is no need for us to have anyone with admin powers to Gerrit > at all times, Totally agree; I mentioned in the remediation we should look at the way we handle Gerrit administrators. I would say it's mostly a convenience for adding initial members to groups, and the occasional case where we need to force-merge something. I think we should discuss as part of the upcoming PTG > - If the above is not possible, anyone who is part of this group > should have 2FA enabled inside Launchpad's SSO. I agree with this too. It's not that obvious how to enable this, but it can be done via [2]. I would probably just recommend everyone does it. We know longer term we want to move away from Launchpad only as well. > Thank you for this. I'd also like to raise the question of moving > forward, how to be able to track these things. We had a user that > had full root access to our Gerrit installation for ~2 weeks without > our knowledge entirely, only uncovered when they did something > (that, in the grand scheme of things, was relatively trivial, > compared to what could have happened). Yeah, not to go into great detail but this wasn't able to be "upgraded" to either the on-disk repos or, importantly, the logs. And it's not just luck that ensures such separation :) The major (potential for) escalation here happened because our version of gerrit keeps plain-text HTTP API keys. So both an example of defence-in-depth success and failure all at once. We are well on the way to replacing that, so we are not sweeping that one under the rug. There's a few other thoughts I have, but TBH I'm hesitant to start broadcasting them in public mails. I am of course for transparency and participation -- I mean the entire infra is driven by completely public git-ops CI and CD; can anyone else say that?! What we don't have is a formalised way for security discussions. I think we should a) more clearly describe how to responsibly communicate infra security issues; I don't think we have anything like that documented. b) start a closed list where we can have free-form discussions about security issues. I think we have a track record of transparency that would ensure that doesn't turn into a "star chamber" and anyone who was interested with a modicum of trust from the project could join (e.g. our cloud providers and others who are clearly invested in the system). This could also be part of the responsible disclosure, which would be helpful so that people don't have to sign up for full accounts to post storyboard issues, etc. to alert us to issues. -i [1] https://etherpad.opendev.org/p/opendev-ptg-planning-oct-2020 [2] https://help.ubuntu.com/community/SSO/FAQs/2FA From mnaser at vexxhost.com Thu Oct 22 14:36:11 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 22 Oct 2020 10:36:11 -0400 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: <20201022065311.GA1731784@fedora19.localdomain> References: <20201021003314.GB1695651@fedora19.localdomain> <20201022065311.GA1731784@fedora19.localdomain> Message-ID: On Thu, Oct 22, 2020 at 2:53 AM Ian Wienand wrote: > > On Tue, Oct 20, 2020 at 10:50:42PM -0400, Mohammed Naser wrote: > > - There is no need for us to have anyone with admin powers to Gerrit > > at all times, > > Totally agree; I mentioned in the remediation we should look at the > way we handle Gerrit administrators. I would say it's mostly a > convenience for adding initial members to groups, and the occasional > case where we need to force-merge something. > > I think we should discuss as part of the upcoming PTG > > > - If the above is not possible, anyone who is part of this group > > should have 2FA enabled inside Launchpad's SSO. > > I agree with this too. It's not that obvious how to enable this, but > it can be done via [2]. I would probably just recommend everyone does > it. > > We know longer term we want to move away from Launchpad only as well. > > > Thank you for this. I'd also like to raise the question of moving > > forward, how to be able to track these things. We had a user that > > had full root access to our Gerrit installation for ~2 weeks without > > our knowledge entirely, only uncovered when they did something > > (that, in the grand scheme of things, was relatively trivial, > > compared to what could have happened). > > Yeah, not to go into great detail but this wasn't able to be > "upgraded" to either the on-disk repos or, importantly, the logs. And > it's not just luck that ensures such separation :) > > The major (potential for) escalation here happened because our version > of gerrit keeps plain-text HTTP API keys. So both an example of > defence-in-depth success and failure all at once. We are well on the > way to replacing that, so we are not sweeping that one under the rug. > > There's a few other thoughts I have, but TBH I'm hesitant to start > broadcasting them in public mails. I am of course for transparency > and participation -- I mean the entire infra is driven by completely > public git-ops CI and CD; can anyone else say that?! > > What we don't have is a formalised way for security discussions. I > think we should > > a) more clearly describe how to responsibly communicate infra security > issues; I don't think we have anything like that documented. > > b) start a closed list where we can have free-form discussions about > security issues. I think we have a track record of transparency > that would ensure that doesn't turn into a "star chamber" and > anyone who was interested with a modicum of trust from the project > could join (e.g. our cloud providers and others who are clearly > invested in the system). +100 this. I am 100% for transparency but I do think there are things that need to be better discussed in private, IMHO. > This could also be part of the responsible disclosure, which would > be helpful so that people don't have to sign up for full accounts > to post storyboard issues, etc. to alert us to issues. > > -i > > [1] https://etherpad.opendev.org/p/opendev-ptg-planning-oct-2020 > [2] https://help.ubuntu.com/community/SSO/FAQs/2FA > -- Mohammed Naser VEXXHOST, Inc. From fungi at yuggoth.org Thu Oct 22 16:20:25 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 22 Oct 2020 16:20:25 +0000 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: References: <20201021003314.GB1695651@fedora19.localdomain> <20201022065311.GA1731784@fedora19.localdomain> Message-ID: <20201022162025.kjiedviv2vxsanff@yuggoth.org> On 2020-10-22 10:36:11 -0400 (-0400), Mohammed Naser wrote: > On Thu, Oct 22, 2020 at 2:53 AM Ian Wienand wrote: [...] > > start a closed list where we can have free-form discussions > > about security issues. I think we have a track record of > > transparency > > that would ensure that doesn't turn into a "star chamber" and > > anyone who was interested with a modicum of trust from the project > > could join (e.g. our cloud providers and others who are clearly > > invested in the system). > > +100 this. I am 100% for transparency but I do think there are > things that need to be better discussed in private, IMHO. [...] I have proposed https://review.opendev.org/759293 to this end. We can continue to discuss its merits and possible uses here or in review comments on that change. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From qizhangapp at gmail.com Thu Oct 22 22:01:37 2020 From: qizhangapp at gmail.com (App Support) Date: Thu, 22 Oct 2020 15:01:37 -0700 Subject: Some of your OpenDev Gerrit SSH keys have been deleted In-Reply-To: <20201022213629.eingacoophae8iew@yuggoth.org> References: <20201022213629.eingacoophae8iew@yuggoth.org> Message-ID: <9682F3F2-653B-4998-AD26-E6BE3C8B8202@gmail.com> Hi Jeremy, Would the deletion affect the login to https://review.opendev.org/? Somehow I got an error when trying to login. I am new to OpenStack and just started contributing to documentations. Thank you, Qi Zhang > On Oct 22, 2020, at 2:36 PM, Jeremy Stanley wrote: > > In response to a security breach for the Gerrit Code Review service > at https://review.opendev.org/ I have deleted one or more SSH public > keys uploaded to your account between October 1 and October 19, > 2020. If you added any SSH public keys during that time, you will > need to reupload them in your user preferences before they can be > used to authenticate to the service. For more information on the > breach, see this announcement: > > http://lists.opendev.org/pipermail/service-announce/2020-October/000011.html > > If you have any questions, feel free to reach out to OpenDev's > systems administrators on the service-discuss at lists.opendev.org > mailing list or in the #opendev channel on the Freenode IRC network. > -- > Jeremy Stanley, OpenDev Systems Administrator -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Oct 22 22:15:14 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 22 Oct 2020 22:15:14 +0000 Subject: Some of your OpenDev Gerrit SSH keys have been deleted In-Reply-To: <9682F3F2-653B-4998-AD26-E6BE3C8B8202@gmail.com> References: <20201022213629.eingacoophae8iew@yuggoth.org> <9682F3F2-653B-4998-AD26-E6BE3C8B8202@gmail.com> Message-ID: <20201022221514.jcmjxeitpxrqioq7@yuggoth.org> On 2020-10-22 15:01:37 -0700 (-0700), App Support wrote: > Would the deletion affect the login to > https://review.opendev.org/? > Somehow I got an error when trying to login. > > I am new to OpenStack and just started contributing to > documentations. [...] It should not affect your ability to log into the Web interface at https://review.opendev.org/ only your ability to push changes over SSH on port 29418/tcp (for example, using the git-review tool). If you're having trouble logging into the Web interface, we're still more than happy to help. You presumably had a working login for the Web interface at some point, or else you wouldn't have been able to add an SSH public key. When you visit the site, does it show your name in the top-right corner or does it say "Sign In" there? If it says "Sign In" and you click on those words, it should send your browser to the login.ubuntu.com site prompting you to click a "Yes, log me in" button. Once you click that, you should be returned to the review site and see your name in the top-right corner. Is that not working? Do you receive an error at some point in that process? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From fungi at yuggoth.org Thu Oct 22 22:33:26 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 22 Oct 2020 22:33:26 +0000 Subject: Some of your OpenDev Gerrit SSH keys have been deleted In-Reply-To: <20201022221514.jcmjxeitpxrqioq7@yuggoth.org> References: <20201022213629.eingacoophae8iew@yuggoth.org> <9682F3F2-653B-4998-AD26-E6BE3C8B8202@gmail.com> <20201022221514.jcmjxeitpxrqioq7@yuggoth.org> Message-ID: <20201022223326.ecqqhywqlbot572v@yuggoth.org> On 2020-10-22 22:15:14 +0000 (+0000), Jeremy Stanley wrote: > On 2020-10-22 15:01:37 -0700 (-0700), App Support wrote: > > Would the deletion affect the login to > > https://review.opendev.org/? > > Somehow I got an error when trying to login. > > > > I am new to OpenStack and just started contributing to > > documentations. > [...] > > It should not affect your ability to log into the Web interface at > https://review.opendev.org/ only your ability to push changes over > SSH on port 29418/tcp (for example, using the git-review tool). If > you're having trouble logging into the Web interface, we're still > more than happy to help. For closure on this thread, Qi Zhang joined us in #opendev and after some troubleshooting, worked out that a cookie blocker browser extension was causing "Invalid OpenID transaction" errors when clicking the Sign In link. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From cboylan at sapwetik.org Tue Oct 27 17:01:26 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 27 Oct 2020 10:01:26 -0700 Subject: Cancelling the team meeting today Message-ID: Hello everyone, We'll cancel the team meeting today since many of us are distracted by the PTG. We'll see you next week! Clark From radoslaw.piliszek at gmail.com Wed Oct 28 13:26:15 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 28 Oct 2020 14:26:15 +0100 Subject: [service-announce] review.opendev.org Gerrit outage and upgrade 15:00UTC November 20 to 01:00UTC November 23, 2020 In-Reply-To: <410ac6e8-6381-49d8-836e-302fe166b47a@www.fastmail.com> References: <410ac6e8-6381-49d8-836e-302fe166b47a@www.fastmail.com> Message-ID: On Tue, Oct 27, 2020 at 10:17 PM Clark Boylan wrote: > The OpenDev team is planning a long weekend Gerrit outage on review.opendev.org starting 15:00UTC November 20 and running to 01:00UTC November 23, 2020 in order to upgrade to Gerrit 3.2. <3 Finally! > * Gerrit 3.2 requires git 2.2.0 or newer to use the Change ID commit hook. This may be a problem for RHEL/CentOS 7 users. I thought it depends on git-review itself rather than a particular Gerrit version? > How can I help? > Once the upgrade is complete you'll want to confirm the basic functionality you rely on is there. We know there will be differences or missing features. Patience as we figure out how to address those on a new Gerrit installation is much appreciated. If you're interested in hacking on Java and Javascript we'd love help with the plugins necessary to address the known problems. You should be able to build this out locally without any special access. Please let us know if you are interested and we can help you bootstrap. Count me in if there are things to be done, might be able to spare some time for Gerrit. :-) We might want to port the live Zuul status reporter as well. -yoctozepto From cboylan at sapwetik.org Wed Oct 28 17:54:48 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 28 Oct 2020 10:54:48 -0700 Subject: =?UTF-8?Q?Re:_[service-announce]_review.opendev.org_Gerrit_outage_and_up?= =?UTF-8?Q?grade_15:00UTC_November_20_to_01:00UTC_November_23,_2020?= In-Reply-To: References: <410ac6e8-6381-49d8-836e-302fe166b47a@www.fastmail.com> Message-ID: <539d95e5-7694-4299-b01d-5660f4443343@www.fastmail.com> On Wed, Oct 28, 2020, at 6:26 AM, Radosław Piliszek wrote: > On Tue, Oct 27, 2020 at 10:17 PM Clark Boylan wrote: > > The OpenDev team is planning a long weekend Gerrit outage on review.opendev.org starting 15:00UTC November 20 and running to 01:00UTC November 23, 2020 in order to upgrade to Gerrit 3.2. > > <3 Finally! > > > > > * Gerrit 3.2 requires git 2.2.0 or newer to use the Change ID commit hook. This may be a problem for RHEL/CentOS 7 users. > > I thought it depends on git-review itself rather than a particular > Gerrit version? The `git review -s` step downloads the commit hook script from the Gerrit server you are interacting with. There was some discussion that if the git 2.2.0 requirement is a problem we can probably bundle a version of the script in git review that is known to work with older git. > > > > > How can I help? > > Once the upgrade is complete you'll want to confirm the basic functionality you rely on is there. We know there will be differences or missing features. Patience as we figure out how to address those on a new Gerrit installation is much appreciated. If you're interested in hacking on Java and Javascript we'd love help with the plugins necessary to address the known problems. You should be able to build this out locally without any special access. Please let us know if you are interested and we can help you bootstrap. > > Count me in if there are things to be done, might be able to spare > some time for Gerrit. :-) > We might want to port the live Zuul status reporter as well. We expect most of these sorts of features will need to become polygerrit plugins. Upstream docs on developing those can be found here: https://gerrit-review.googlesource.com/Documentation/pg-plugin-dev.html. I've also set up a simple testing platform on my desktop using docker-compose which works pretty well. You should be able to do similar in order to test any plugin development that happens. We can coordinate off list on what that looks like if you are interested. > > -yoctozepto > > From radoslaw.piliszek at gmail.com Wed Oct 28 18:42:10 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 28 Oct 2020 19:42:10 +0100 Subject: [service-announce] review.opendev.org Gerrit outage and upgrade 15:00UTC November 20 to 01:00UTC November 23, 2020 In-Reply-To: <539d95e5-7694-4299-b01d-5660f4443343@www.fastmail.com> References: <410ac6e8-6381-49d8-836e-302fe166b47a@www.fastmail.com> <539d95e5-7694-4299-b01d-5660f4443343@www.fastmail.com> Message-ID: On Wed, Oct 28, 2020 at 6:55 PM Clark Boylan wrote: > The `git review -s` step downloads the commit hook script from the Gerrit server you are interacting with. There was some discussion that if the git 2.2.0 requirement is a problem we can probably bundle a version of the script in git review that is known to work with older git. Ack, thanks, never realized it did this much. On CentOS 7 I was just using git from SCL which was 2.18. So it's easy to work around still. > > > > > > > > > How can I help? > > > Once the upgrade is complete you'll want to confirm the basic functionality you rely on is there. We know there will be differences or missing features. Patience as we figure out how to address those on a new Gerrit installation is much appreciated. If you're interested in hacking on Java and Javascript we'd love help with the plugins necessary to address the known problems. You should be able to build this out locally without any special access. Please let us know if you are interested and we can help you bootstrap. > > > > Count me in if there are things to be done, might be able to spare > > some time for Gerrit. :-) > > We might want to port the live Zuul status reporter as well. > > We expect most of these sorts of features will need to become polygerrit plugins. Upstream docs on developing those can be found here: https://gerrit-review.googlesource.com/Documentation/pg-plugin-dev.html. I've also set up a simple testing platform on my desktop using docker-compose which works pretty well. You should be able to do similar in order to test any plugin development that happens. We can coordinate off list on what that looks like if you are interested. This week is obviously a no-go but I would be eager to coordinate. Could you share your workflow? -yoctozepto From fungi at yuggoth.org Thu Oct 29 23:24:38 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 29 Oct 2020 23:24:38 +0000 Subject: [service-announce] October 20 Gerrit Outage Update In-Reply-To: <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> References: <20201021003314.GB1695651@fedora19.localdomain> <20201021130621.a2jriwx4qwp3ajf7@yuggoth.org> Message-ID: <20201029232437.7orfa5feohj42yqb@yuggoth.org> On 2020-10-21 13:06:22 +0000 (+0000), Jeremy Stanley wrote: > On 2020-10-20 22:50:42 -0400 (-0400), Mohammed Naser wrote: [...] > > There is no need for us to have anyone with admin powers to > > Gerrit at all times, we've done enough automation to sustain us > > and a manual 'circuit breaker' of adding a user *IF* necessary > > should be put in place. > > Yes, we've discussed this already in the past. Our use of OpenID > makes it harder to switch between different Gerrit accounts with the > WebUI (though maybe less so now that browser containers are a > thing). But also, alternative accounts with no OpenIDs at all could > be used to perform routine administrative tasks like adding initial > users to new groups. It's certainly looking like a compelling > option. [...] An implementation for this is up for review, if anyone's interested in taking a look: https://review.opendev.org/760051 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: