We will meet with this agenda on May 12, 2026 at 19:00 UTC in #opendev-meeting:
== Agenda for next meeting ==
* Announcements
* Actions from last meeting
* Specs Review
* Topics
** Upgrading Old Servers (clarkb 20230627)
*** https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
*** Next on the list are graphite and backup servers
*** Can probably spin up new backup servers alongside the old ones then migrate the old volumes off the old servers to the new ones and finally delete the old servers. Just need to double check borg version support matrix details and also what adding new backup servers will do to our cron job setups for backups.
*** mnasiadka has replaced our older mirror nodes and is now looking at other low hanging upgrade fruit. Please help review these changes to migrate servers.
*** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
** Dealing with web crawlers (clarkb 20251216)
*** We have deployed anubis to lists.opendev.org and it seems to be working well enough there
*** We have also deployed anubis on the gitea servers. This was after spending much of last week fighting the crawler flood
**** Would be good to look at getting PROXY protocol support working with gitea to make debugging easier
*** Should we be looking at adding anubis to other services like static?
** Deploying a Prometheus for Server Metrics (clarkb 20260331)
*** https://review.opendev.org/c/opendev/system-config/+/980840
*** This change and its child deploy prometheus with node exporter to collect server metrics
*** Napkin math says that a 1TB volume should get us about 60 days of metrics. mnasiadka also indicates that Prometheus doesn't handle longer term metrics super well
*** Ideally we would collect at least a years' worth of data. Can we make that happen with Prometheus?
*** Do we need to look at Prometheus adjacent tools like Mimir or Thanos?
**** Both of these solutions seem to tie into Prometheus using Prometheus as the data collection system. Then they store the data in a different system which can handle long term storage more nimbly. Then for queries they speak promql and prometheus apis allowing you to point tools like grafana at them as if they were prometheus.
** Upgrade Ansible to v9 (clarkb 20260310)
*** https://docs.ansible.com/projects/ansible/latest/reference_appendices/relea…
*** https://review.opendev.org/c/opendev/system-config/+/976282
*** Based on Ansible's python support Matrix Ansible 9 gives us a good deal of flexibility for bridge and remote nodes
*** Ansible 9 also fixes problems with the use of pkg_resources in the Ansible ip module
*** Any concerns with proceeding with the upgrade since tests look good?
** Gerrit Account Cleanups (clarkb 20260317)
*** Since the upgrade to Gerrit notedb we've had account inconsistencies that prevent us from push to the external ids ref/table directly.
*** clarkb did a bunch of work to get the number down from hundreds to about 33 consistency errors before stalling out.
*** The tail was the most difficult as it wasn't clear what the more appropriate fix for each account would be
*** Since then it has been years and those accounts are likely inactive and unused. We can rerun the Gerrit consistency check, feed the info back through our audit script then decide if we need to be careful with any of these accounts
*** Chances are we can simply disable them all and remove the conflicting external ids.
*** If we take good notes we can reconstruct the accounts as appropriate after the fact without Gerrit downtime should one of these users show up and wonder what happened.
** Gerrit 3.13 Upgrade Planning (clarkb 20260414)
*** The 3.12 upgrade seems to have gone well
**** H2 v2 caches appear to be no worse than H2 v1 caches
**** Old H2 v1 caches have been removed from review03 at this point
*** Clarkb would like to target a 3.13 upgrade for the end of May/early June
*** https://review.opendev.org/c/opendev/system-config/+/987375 Update 3.12 and 3.13 images to the latest versions
**** This includes a fix for the previously unconfigurable AI review button
*** Gerrit 3.13 removes support for Robot comments so Zuul will start making normal inline comments
*** This also means that the Zuul restarts performed as part of the upgrade process are actually required when we upgrade to 3.13 to get Zuul's Gerrit version detection sorted out.
*** https://etherpad.opendev.org/p/gerrit-upgrade-3.13 Beginnings of an upgrade plan document
** Etherpad 2.7.3 Upgrade
*** https://review.opendev.org/c/opendev/system-config/+/985843 Etherpad 2.7.3 is out and this change will upgrade us to it
*** Held node looks good to clarkb
**** Default text on new pads inherits pad creator color
**** Pad creator gets a pad deletion token that is shown when they first create the pad
** Ubuntu Resolute Test Nodes (clarkb 20260331)
*** Images, labels, nodesets, etc have been added for Ubuntu Resolute
*** Image uploads ran into problems with the new intermediate swift container which should be resolved now but uploads will take some time to catch up
*** Still need to figure out our plan for mirroring the Resolute packages
** Noble Docker Not Talking to Podman Socket for all Operations (clarkb 20260414)
*** During the Gerrit 3.12 upgrade we noticed that `docker image ls` doesn't work on Noble nodes anymore due to API version support mismatches between Docker and Podman
*** `podman image ls` does work just fine and is what we used
*** The problem appears to be due to noble-updates upgrading the docker.io package compared to noble proper
*** The problem does not appear to affect all docker subcommands `docker ps -a` works just fine.
*** Please keep an eye out for problems in configuration management caused by this.
* Open discussion
After a couple of weeks away we are back for the OpenDev team meeting this week. We will meet on May 5, 2026 at 19:00 UTC in #opendev-meeting with this agenda:
== Agenda for next meeting ==
* Announcements
* Actions from last meeting
* Specs Review
* Topics
** Upgrading Old Servers (clarkb 20230627)
*** https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
*** Next on the list are graphite and backup servers
*** Can probably spin up new backup servers alongside the old ones then migrate the old volumes off the old servers to the new ones and finally delete the old servers. Just need to double check borg version support matrix details and also what adding new backup servers will do to our cron job setups for backups.
*** mnasiadka has replaced our older mirror nodes and is now looking at other low hanging upgrade fruit. Please help review these changes to migrate servers.
*** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
** Dealing with web crawlers (clarkb 20251216)
*** We have deployed anubis to lists.opendev.org and it seems to be working well enough there
*** We have also deployed anubis on the gitea servers. This was after spending much of last week fighting the crawler flood
**** Would be good to look at getting PROXY protocol support working with gitea to make debugging easier
*** Should we be looking at adding anubis to other services like static?
** Deploying a Prometheus for Server Metrics (clarkb 20260331)
*** https://review.opendev.org/c/opendev/system-config/+/980840
*** This change and its child deploy prometheus with node exporter to collect server metrics
*** Napkin math says that a 1TB volume should get us about 60 days of metrics. mnasiadka also indicates that Prometheus doesn't handle longer term metrics super well
*** Ideally we would collect at least a years' worth of data. Can we make that happen with Prometheus?
*** Do we need to look at Prometheus adjacent tools like Mimir or Thanos?
**** Both of these solutions seem to tie into Prometheus using Prometheus as the data collection system. Then they store the data in a different system which can handle long term storage more nimbly. Then for queries they speak promql and prometheus apis allowing you to point tools like grafana at them as if they were prometheus.
** Upgrade Ansible to v9 (clarkb 20260310)
*** https://docs.ansible.com/projects/ansible/latest/reference_appendices/relea…
*** https://review.opendev.org/c/opendev/system-config/+/976282
*** Based on Ansible's python support Matrix Ansible 9 gives us a good deal of flexibility for bridge and remote nodes
*** Ansible 9 also fixes problems with the use of pkg_resources in the Ansible ip module
*** Any concerns with proceeding with the upgrade since tests look good?
** Gerrit Account Cleanups (clarkb 20260317)
*** Since the upgrade to Gerrit notedb we've had account inconsistencies that prevent us from push to the external ids ref/table directly.
*** clarkb did a bunch of work to get the number down from hundreds to about 33 consistency errors before stalling out.
*** The tail was the most difficult as it wasn't clear what the more appropriate fix for each account would be
*** Since then it has been years and those accounts are likely inactive and unused. We can rerun the Gerrit consistency check, feed the info back through our audit script then decide if we need to be careful with any of these accounts
*** Chances are we can simply disable them all and remove the conflicting external ids.
*** If we take good notes we can reconstruct the accounts as appropriate after the fact without Gerrit downtime should one of these users show up and wonder what happened.
** Gerrit 3.13 Upgrade Planning (clarkb 20260414)
*** The 3.12 upgrade seems to have gone well
**** H2 v2 caches appear to be no worse than H2 v1 caches
**** Old H2 v1 caches have been removed from review03 at this point
*** Clarkb would like to target a 3.13 upgrade for the end of May/early June
*** Gerrit 3.13 made the AI review button configurable but I don't think that is in a release yet
*** Gerrit 3.13 removes support for Robot comments so Zuul will start making normal inline comments
*** This also means that the Zuul restarts performed as part of the upgrade process are actually required when we upgrade to 3.13 to get Zuul's Gerrit version detection sorted out.
*** Clarkb will start on an upgrade document soon
** Gitea 1.26 Upgrade
*** https://review.opendev.org/c/opendev/system-config/+/985834 Gitea 1.26 is out and this change will upgrade us to it
*** Should hold a node and do a quick sanity check before upgrading
** Etherpad 2.7.2 Upgrade
*** https://review.opendev.org/c/opendev/system-config/+/985843 Etherpad 2.7.2 is out and this change will upgrade us to it
*** Should probably hold a node and test things. The changelog refers to fixes to chat as well as the time slider undo functionality
** Ubuntu Resolute Test Nodes (clarkb 20260331)
*** https://review.opendev.org/c/opendev/zuul-providers/+/982182 Add Resolute images to Zuul
*** Ubuntu Bionic mirror content has been removed. We can probably start the process of mirroring Resolute packages.
** Noble Docker Not Talking to Podman Socket for all Operations (clarkb 20260414)
*** During the Gerrit 3.12 upgrade we noticed that `docker image ls` doesn't work on Noble nodes anymore due to API version support mismatches between Docker and Podman
*** `podman image ls` does work just fine and is what we used
*** The problem appears to be due to noble-updates upgrading the docker.io package compared to noble proper
*** The problem does not appear to affect all docker subcommands `docker ps -a` works just fine.
*** Please keep an eye out for problems in configuration management caused by this.
* Open discussion