Team Meeting Agenda for November 4, 2025
We will meet with this agenda on November 4, 2025 at 19:00 UTC in #opendev-meeting: == Agenda for next meeting == * Announcements ** Clarkb out early Friday and all of Monday. Expect a meeting Tuesday with a possibly delayed agenda. * Actions from last meeting * Specs Review * Topics ** Gerrit 3.11 Upgrade Planning (clarkb 20250401) *** https://www.gerritcodereview.com/3.11.html *** Please check this for any concerns with the way we use Gerrit. *** New held nodes are now available. *** There is discussion about replication and reindexing bugs in 3.11.4/3.12.1 that don't affect us yet due to being on 3.10. We should try to test these problems to see if they affect us. **** Clarkb tested this with held notes on admittedly trivial test data sets but was unable to reproduce the problems *** Shutting down Gerrit can race the indexing of new changes. If this happens we can have new changes created that are not indexed leading to multiple changes with the same changeid later. To mitigate this we can/should reindex changes on Gerrit startup. *** Gerrit shutdown remain plagued by large h2 cache files. Seems removing our long timeout was not enough to prevent shutdowns timing out. *** https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade *** Will need to select a date once we know more about the upgrade itself. ** Gerrit Spontaneous Shutdown During Summit (clarkb 20251028) *** The Gerrit server was discovered to be in a shutdown state with no recorded errors *** The Nova team indicates this could happen if the hypervisor is unable to allocate memory. The host would be shutdown and Nova would notice but not record an error *** Its possible the underlying cause is something else; either way we need to work with Vexxhost to identify the source of the problem and mitigate it *** Is someone willing to figure out how to file a ticket with Vexxhost? This was their suggested approach last time we had problems. ** Upgrading Old Servers (clarkb 20230627) *** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades *** https://etherpad.opendev.org/p/opendev-focal-server-upgrades *** https://etherpad.opendev.org/p/opendev-server-replacement-sprint **** wiki.openstack.org: https://etherpad.opendev.org/p/opendev-mediawiki-upgrade **** tonyb looking at cacti after wiki *** Next on the list are graphite and backup servers *** Can probably spin up new backup servers alongside the old ones then migrate the old volumes off the old servers to the new ones and finally delete the old servers. Just need to double check borg version support matrix details and also what adding new backup servers will do to our cron job setups for backups. *** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic ** AFS mirror content updates (clarkb 20250916) *** https://review.opendev.org/c/opendev/system-config/+/965334 Mirror trixie packages *** Rocky Linux didn't hit these mirror issues because we don't mirror any rocky linux releases. Debian expects mirror content for all releases. ** Zuul Launcher Updates (clarkb 20250923) *** Nodes can be from mixed providers again due to a bug when reassigning unassigned nodes. *** https://review.opendev.org/c/zuul/zuul/+/965954 Fix assignment of unassigned nodes. ** Moving OpenDev Synchronous Communication to Matrix (clarkb 20250520) *** The spec (954826) has merged. *** clarkb will start working on this as soon as time permits. ** Etherpad 2.5.2 Upgrade (clarkb 20250805) *** https://github.com/ether/etherpad-lite/blob/v2.5.2/CHANGELOG.md *** https://review.opendev.org/c/opendev/system-config/+/956593 *** This fixes our css issues and clarkb believes we are ready to upgrade. ** Gitea 1.25.0 Upgrade (clarkb 20251028) *** https://review.opendev.org/c/opendev/system-config/+/965960 Upgrade Gitea to 1.25.0 ** Gitea Performance (clarkb 20251028) *** https://review.opendev.org/c/opendev/system-config/+/964728 Don't allow direct backend access **** Some performance issues seem related to crawlers hitting backends directly so they aren't balanced properly **** Most recent issue was via traffic to the load balancer frontend so this won't fix everything **** Means we'll have to take extra steps to test backends directly (ssh port forwards) *** https://review.opendev.org/c/opendev/system-config/+/965420 Increase memcached cache size to mitigate effect of crawlers poisoning the cache ** Raxflex DFW3 Disabled (clarkb 20251104) *** We discovered the raxflex dfw3 mirror had afs cache errors. fs flushall appeared to get stuck and the kernel oopsed. After forcing a shutdown via openstack the server refuses to boot *** We've asked dan_with what our next steps should be via IRC *** Worst case we rebuild the mirror with a new cinder volume and replace the whole thing. * Open discussion
participants (1)
-
Clark Boylan