Re: [service-announce] review.opendev.org Gerrit outage and upgrade 15:00UTC November 20 to 01:00UTC November 23, 2020
Hello, Wikimedia has upgraded from 2.16 to 3.2 in June 2020 conducted by Christian Aistleitner. He wrote a nice report on the upstream mailling list: https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag On 27/10/2020 22:16, Clark Boylan wrote:
The OpenDev team is planning a long weekend Gerrit outage on review.opendev.org starting 15:00UTC November 20 and running to 01:00UTC November 23, 2020 in order to upgrade to Gerrit 3.2.
The upgrade has two major portions. First we will incrementally move from our current version 2.13 to 2.16. Each point release requires a database migration and git indexing operations that take the majority of the time.
From Christian report, you should be able to skip the git indexing between each minor upgrades and only do a full reindexing once you are upgraded to 3.2. The git indexing had an issue which is that the changes to index were split by repository. In the worse case scenario, if you have a thousand of small repositories and one very large one, the later was only processed by a single thread. Christian added code to chunk by changes regardless of the repository, that has dramatically speed up the indexing. If I am not mistaken, the commit is 20784548c3fb and it has been released in 2.16.22, 3.0.12, 3.1.8 and 3.2.3. Given you should be able to skip reindexing between upgrades. Make sure to use 3.2.3 to benefit from the faster indexing.
The second major part, once at version 2.16, is to convert Gerrit to use the new NoteDB backend. This stores reviews together with code in the git trees, rather than in a separate database. There are known problems converting to NoteDB from any version prior to 2.16, which is why we need the initial upgrade steps. This is this slowest portion of the upgrade process, and the one most prone to unforeseen issues despite extensive pre-testing. We will be creating snapshots to facilitate quick fallback if required. Once this is complete, we can move to the 3.x series and incrementally upgrade from 3.0 to 3.2.
2.16 has received improvements for the NoteDB migration. Again make sure to use the latest patch release (2.16.23 at this time). <snip>
Some important things to know: * Gerrit's web UI will be changing. We don't have a lot of control over this. As a result we'll lose some existing CI result rendering niceness that we have on 2.13 (in particular the summary table and CI results toggle will go away). We hope that once we've upgraded we can investigate solving this through Gerrit plugins.
summary table ------------- The new Gerrit web UI is enterely JavaScript driven (using https://www.polymer-project.org/ ) and relies on the REST API to fetch data. For the summary table, Wikimedia has a light version of the one you have which got borrowed from somewhere else. See "gr-test-result-table-module" at: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/head... It is hopefully not too complicated to implement. CI results toggle ----------------- That is now built-in Gerrit, the UI will have a "Only comments" toggle which filters out any message with a tag prefixed by "autogenerated:". Thus in Wikimedia good old Zuul 2.5 we have simply tagged with "autogenerated:ci" by using: success: gerrit: verified: 2 tag: autogenerated:ci Result: https://phabricator.wikimedia.org/T48148#6294913 (there are a few more details on that task). Another one is that the SUCCESS/FAILURE message can no more be highlighted via rewriting the message and using CSS. The way text is parsed has completely changed and its now impossible to do it via the commentlinks Gerrit settings. As a result on Zuul 2.5 we have set job_name_in_report = false and lack the nice formatting and green/red coloring. I am not sure whether that still applies to Zuul 3.x though. Wikimedia task: https://phabricator.wikimedia.org/T256575 That being said, it is definitely possible to write a Gerrit Javascript plugin that would parse the message and prettify the results comment on the client side. <snip>
* Q&A * If this upgrade isn't perfect why are we doing it anyway? We've come to the realization that if we don't make imperfect progress we'll never make any progress. We have decided that the benefits outweigh the known drawbacks and we'll do our best to work on those issues after the upgrade.
I would add a couple user facing improvements: The new UI alone is definitely worth the upgrade. It is way more pleasant than the old one. Albeit it needs some days to adjust, you will surely never look back :] <snip> The cherry on the cake is that with the upgrade comes support for Git protocol version 2. In short, when enabled, it makes fetches from big repository an order of magnitude faster. I wrote a quick blog post about git protocol v2 at: https://phabricator.wikimedia.org/J199 And Google announcement with lot more details: https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2... I wish you the very best for the maintenance. Thank you for making it happen! -- Antoine "hashar" Musso
On Thu, Nov 5, 2020, at 7:16 AM, Antoine Musso wrote:
Hello,
Wikimedia has upgraded from 2.16 to 3.2 in June 2020 conducted by Christian Aistleitner. He wrote a nice report on the upstream mailling list: https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag
On 27/10/2020 22:16, Clark Boylan wrote:
The OpenDev team is planning a long weekend Gerrit outage on review.opendev.org starting 15:00UTC November 20 and running to 01:00UTC November 23, 2020 in order to upgrade to Gerrit 3.2.
The upgrade has two major portions. First we will incrementally move from our current version 2.13 to 2.16. Each point release requires a database migration and git indexing operations that take the majority of the time.
From Christian report, you should be able to skip the git indexing between each minor upgrades and only do a full reindexing once you are upgraded to 3.2.
The git indexing had an issue which is that the changes to index were split by repository. In the worse case scenario, if you have a thousand of small repositories and one very large one, the later was only processed by a single thread. Christian added code to chunk by changes regardless of the repository, that has dramatically speed up the indexing.
If I am not mistaken, the commit is 20784548c3fb and it has been released in 2.16.22, 3.0.12, 3.1.8 and 3.2.3.
Given you should be able to skip reindexing between upgrades. Make sure to use 3.2.3 to benefit from the faster indexing.
The actual process is: * Stop Gerrit * git gc --aggressive all repos * gerrit init on 2.14 * gerrit init on 2.15 * gerrit init on 2.16 * Perform complete offline reindex. We do this here in order to have a working midpoint snapshot And it doesn't take too long as an individual step. * Stop gerrit * git gc --aggressive * Perform offline notedb migration * git gc --aggressive * gerrit init on 3.0 * gerrit init on 3.1 * gerrit init on 3.2 * Perform complete offline reindex. * Start gerrit We can probably optimize a few of those git gc steps away but they appear to have a large impact on steps like reindexing so we're just doing them. But every little step of db migrations, reindexing, gc'ing, and doing the notedb migration adds up. Note I tested doing gerrit init to skip versions but it doesn't seem to work properly when you do that.
The second major part, once at version 2.16, is to convert Gerrit to use the new NoteDB backend. This stores reviews together with code in the git trees, rather than in a separate database. There are known problems converting to NoteDB from any version prior to 2.16, which is why we need the initial upgrade steps. This is this slowest portion of the upgrade process, and the one most prone to unforeseen issues despite extensive pre-testing. We will be creating snapshots to facilitate quick fallback if required. Once this is complete, we can move to the 3.x series and incrementally upgrade from 3.0 to 3.2.
2.16 has received improvements for the NoteDB migration. Again make sure to use the latest patch release (2.16.23 at this time).
We actually build our images off the tip of the branches and not from release tags. But this is a good reminder to rebuild to ensure we've got the latest releases.
<snip>
<snip> Thanks for the feedback, it helps to have reminders like this as well as general input ensuring we're on the right change.
On 2020-11-05 16:16:49 +0100 (+0100), Antoine Musso wrote: [...]
CI results toggle ----------------- That is now built-in Gerrit, the UI will have a "Only comments" toggle which filters out any message with a tag prefixed by "autogenerated:". [...]
Yep, this behavior became the default for Zuul's Gerrit connection driver in https://review.opendev.org/682473 over a year ago, but the larger challenge and why we relied on a different toggle is that there are dozens of third-party CI systems reporting on changes for some projects using a variety of different CI implementations, and not all (probably most in fact) tag their messages that way currently. -- Jeremy Stanley
participants (3)
-
Antoine Musso
-
Clark Boylan
-
Jeremy Stanley