Hello Fellow OpenStack and OpenDev Folks!
TL;DR click on [3] and enjoy.
I am starting this thread to not hijack the discussion happening on [1].
First of all, I would like to thank gibi (Balazs Gibizer) for hacking
a way to get the place to render the table in the first place (pun
intended).
I have been a long-time-now user of [2].
I have improved and customised it for myself but never really got to
share back the changes I made.
The new Gerrit obviously broke the whole script so it was of no use to
share at that particular state.
However, inspired by gibi's work, I decided to finally sit down and
fix it to work with Gerrit 3 and here it comes: [3].
Works well on Chrome with Tampermonkey. Not tested others.
I hope you will enjoy this little helper (I do).
I know the script looks super fugly but it generally boils down to a
mix of styles of 3 people and Gerrit having funky UI rendering.
Finally, I'd also like to thank hrw (Marcin Juszkiewicz) for linking
me to the original Michel's script in 2019.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019051…
[2] https://opendev.org/x/coats/src/commit/444c95738677593dcfed0cfd9667d4c4f0d5…
[3] https://gist.github.com/yoctozepto/7ea1271c299d143388b7c1b1802ee75e
Kind regards,
-yoctozepto
Hi,
one of our jobs (python-tempestconf project) is frequently failing with
POST_FAILURE [1]
during the following task:
export-devstack-journal : Export journal
I'm bringing this to a broader audience as we're not sure where exactly the
issue might be.
Did you encounter a similar issue lately or in the past?
[1]
https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tem…
Thanks for any advice,
--
Martin Kopec
We will meet on April 27, 2021 at 19:00UTC in #opendev-meeting with this agenda:
== Agenda for next meeting ==
* Announcements
* Actions from last meeting
* Specs approval
* Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.)
** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-… Update Config Management]
*** topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev
*** Gerrit account inconsistencies
**** All preferred emails lack external ids issues have been corrected. All group loops have been corrected.
**** Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)
**** Next steps
***** More "dangerous" list has been generated. Should still be safe-ish particularly if we disable the accounts first.
*** Configuration tuning
**** Reduce the number of ssh threads. Possibly create bot/batch user groups and thread counts as part of this.
**** https://groups.google.com/g/repo-discuss/c/BQKxAfXBXuo Upstream conversation with people struggling with similar problems.
* General topics
** Picking up steam on Puppet -> Ansible rewrites (clarkb 20210427)
*** Enable Xenial -> Bionic/Focal system upgrades
*** https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here
*** Zuul service host updates in progress now. Scheduler and Zookeeper cluster remaining. Will focus on ZK first.
**** https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 discussion of options for zk upgrades
** survey.openstack.org (clarkb 20210427)
*** We're getting friendly reminders that this SSL cert is about to expire. Would be good to cleanup.
** Debian Bullseye Images (clarkb 20210427)
*** Need some DIB updates to hack around Debian versioning and Ansible's factorizing of that info.
** Minor git-review release to support --no-thin (clarkb 202104027)
* Open discussion
Hello,
In short, Ansible reports "n/a" for ansible_distribution_release on
our new bullseye nodes. This screws up our mirror setup. This has
turned into quite an adventure.
Currently, Debian is frozen to create the "bullseye" release. This
means that "bullseye" is really an alias for "testing", that will turn
into the release after the freeze period.
So currently Debian bullseye reports itself in /etc/debian_version or
/etc/os-release as "bullseye/sid". This sort of makes sense if you
consider that you don't commit things to "testing" directly, they go
into unstable ("sid") and then migrate after a period of stability.
So you can't have "base-files" package in bullseye that hasn't gone
through unstable/sid. You can read "bullseye/sid" as "we've chosen
the name bullseye and packages going through unstable are destined for
it".
Now, you might see a problem in that "unstable" and "bullseye"
(testing) now both report themselves in these version files as the
same thing (because the unstable packages that provide them move into
testing).
"lsb_release -c" tries to be a bit smart about this, and looks at the
output of "apt-cache policy" to try and see if you are actually
pulling the .deb files from a bullseye repo or an unstable one.
Interestingly, this relies on a "Label" being present in the mirror
release files. Since we use reprepro to build our own mirrors, we do
not have this (and why nobody else who doens't use our mirrors seems
to notice this problem). A fix is proposed with
https://review.opendev.org/c/opendev/system-config/+/787661
So "lsb_release -c" doesn't report anything, leaving Ansible in the
dark as to what repo it uses.
When "lsb_release -c" doesn't return anything helpful, Ansible tries
to do it's own parsing of the release files. I started hacking on
these, but the point raised in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845651
gave me pause. It is a fair point that you can not really know if
you're on bullseye or sid by examining these files. N/A is probably
actually the correct answer from Ansible's POV. Anyway, that is
https://github.com/ianw/ansible/commit/847817a82ed86b5f39a4ccc3ffbff0e0cd63…
Now, even more annoyingly, setting the label in our mirrors may not be
sufficient for "lsb_release -c" to work on our images, because we have
cleared out the apt repositories. You would need to run "apt-get
update" before Ansible tries to run "lsb_release" to populate it's
facts. Now the problem is that we're trying to use Ansible's fact
about the distro name to setup apt to point to our mirrors -- so we
can't apt-get update before we have that written out! Classic chicken
and egg.
The only other idea I have is to hack dib/early setup overwrite
/etc/debian_version with "11.0" so that we look like the upcoming
release has already been done. "lsb_release -c" will then report
"bullsye". However, there is some possibility this will confuse other
things, as this release technically hasn't been done. I've proposed
that with
https://review.opendev.org/c/openstack/diskimage-builder/+/787665
I'm open to suggestions!
-i
The PTG is next week, and OpenDev is participating alongside the OpenStack TaCT SIG. We are going to try something a bit different this time around, which is to treat the time as office hours rather than time for our own projects. We will be meeting on April 22 from 14:00 - 16:00 UTC and 22:00 - 00:00 UTC in https://meetpad.opendev.org/apr2021-ptg-opendev.
Join us if you would like to:
* Start contributing to either OpenDev or the TaCT sig.
* Debug a particular job problem.
* Learn how to write and review Zuul jobs and related configs.
* Learn about specific services or how they are deployed.
* And anything else related to OpenDev and our project infrastructure.
Feel free to add your topics and suggest preferred times for those topics here: https://etherpad.opendev.org/p/apr2021-ptg-opendev. This etherpad corresponds to the document that will be auto loaded in our meetpad room above.
I will also be around next week and will try to keep a flexible schedule. Feel free to reach out if you would like us to join discussions as they happen.
See you there,
Clark
We will meet with this agenda on April 13, 2021 at 19:00 UTC in #opendev-meeting:
== Agenda for next meeting ==
* Announcements
** OpenStack completing release April 14. Airship 2.0 doesn't seem to exist yet so will assume they are still working on it.
* Actions from last meeting
* Specs approval
* Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.)
** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-… Update Config Management]
*** topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev
*** Gerrit upgrade to 3.2.8
**** https://review.opendev.org/c/opendev/system-config/+/784152
*** Gerrit account inconsistencies
**** All preferred emails lack external ids issues have been corrected. All group loops have been corrected.
**** Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)
**** Next steps
***** ~224 accounts were cleaned up. Next batch of ~56 has been started. Will clean their external IDs after letting the retired users sit for a few days.
***** Email sent to two Third Party CI groups about correcting external id conflicts among their accounts. These accounts will not be retired (for the most part).
*** Configuration tuning
**** Reduce the number of ssh threads. Possibly create bot/batch user groups and thread counts as part of this.
**** https://groups.google.com/g/repo-discuss/c/BQKxAfXBXuo Upstream conversation with people struggling with similar problems.
* General topics
** Picking up steam on Puppet -> Ansible rewrites (clarkb 20210413)
*** Enable Xenial -> Bionic/Focal system upgrades
*** https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here
*** Zuul service host updates in progress now. Scheduler and Zookeeper cluster remaining. Will focus on ZK first.
**** https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 discussion of options for zk upgrades
** planet.openstack.org (ianw 20210413)
*** Strong preference from clarkb to retire it
*** Superuser appears to be a major blog showing up there as well as a couple of others. Maybe we reach out to them and double check they don't want to help? (fungi and clarkb reached out to Superuser and they seem ok)
** survey.openstack.org (clarkb 20210413)
*** Can we go ahead and clean this service up? I don't think it ever got much use (maybe one or two surveys total).
** docs-old volume cleanup (ianw 20210413)
*** We were going to double check with Ajaeger if we can then proceed to cleanup if no one had a reason to keep it.
** PTG Planning (clarkb 20210413)
*** Next PTG April 19-23
**** Thursday April 22 1400-1600UTC and 2200-0000UTC
* Open discussion
Hi,
I recently spent some time trying to figure out why a job worked as
expected during one run and then failed due to limited memory on the
following run. It turns out that back in February this change was
merged on an emergency basis, which caused us to start occasionally
providing nodes with 32G of ram instead of the typical 8G:
https://review.opendev.org/773710
Nodepool labels are designed to represent the combination of an image
and set of resources. To the best of our ability, the images and
resources they provide should be consistent across different cloud
providers. That's why we use DIB to create consistent images and that's
why we use "-expanded" labels to request nodes with additional memory.
It's also the case that when we add new clouds, we generally try to
benchmark performance and adjust flavors as needed.
Unfortunately, providing such disparate resources under the same
Nodepool labels makes it impossible for job authors to reliably design
jobs.
To be clear, it's fine to provide resources of varying size, we just
need to use different Nodepool labels for them so that job authors get
what they're asking for.
The last time we were in this position, we updated our Nodepool images
to add the mem= Linux kernel command line parameter in order to limit
the total available RAM. I suspect that is still possible, but due to
the explosion of images and flavors, doing so will be considerably more
difficult this time.
We now also have the ability to reboot nodes in jobs after they come
online, but doing that would add additional run time for every job.
I believe we need to address this. Despite the additional work, it
seems like the "mem=" approach is our best bet; unless anyone has other
ideas?
-Jim
Hi,
We have a large server provided by Vexxhost up and running in a
staging capacity to replace the current server at
review02.openstack.org.
I have started to track some things at [1]
There's a couple of things:
1) Production database
Currently, we use a hosted db. Since NoteDB this only stores review
seen flags. We've been told that other sites treat this data as
ephemeral; they use a H2 db on disk and don't worry about backing up
or restoring across upgrades.
I have proposed storing this in a mariadb sibling container with [2].
We know how to admin, backup and restore that. That would be my
preference, but I'm not terribly fussed. If I could request some
reviews on that; I'll take +2's as a sign we should use a container,
otherwise we can leave it with H2 it has now.
2) IPv6 issues
We've seen a couple of cases that are looking increasingly like stray
RA's are some how assigning extra addresses, similar to [1]. Our
mirror in the same region has managed to acquire 50+ default routes
somehow.
It seems like inbound traffic keeps working (why we haven't seen
issues with other production servers?). But I feel like it's a little
bit troubling to have undiagnosed before we switch our major service
to it. I'm running some tracing, trying to at least catch a stray RA
while the server is quite, in the etherpad. But suggestions here are
welcome.
-i
[1] https://etherpad.opendev.org/p/gerrit-upgrade-2021
[2] https://review.opendev.org/c/opendev/system-config/+/775961
[3] https://launchpad.net/bugs/1844712
We will meet with this agenda on April 6, 2021 at 19:00UTC in #opendev-meeting:
== Agenda for next meeting ==
* Announcements
** OpenStack producing final RCs this week. Airship also working on a release.
* Actions from last meeting
* Specs approval
* Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.)
** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-… Update Config Management]
*** topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev
*** Gerrit upgrade to 3.2.8
**** https://review.opendev.org/c/opendev/system-config/+/784152
*** Gerrit account inconsistencies
**** All preferred emails lack external ids issues have been corrected. All group loops have been corrected.
**** Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)
**** Next steps
***** Cleaning external IDs for the last batch of retired users.
*** Configuration tuning
**** Using strong refs for jgit caches
**** Batch user groups and threads
* General topics
** Picking up steam on Puppet -> Ansible rewrites (clarkb 20210406)
*** Enable Xenial -> Bionic/Focal system upgrades
*** https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here
*** Zuul service host updates in progress now. Scheduler and Zookeeper cluster remaining. Will focus on ZK first.
**** https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 discussion of options for zk upgrades
** PTG Planning (ianw 20210406)
*** Next PTG April 19-23
*** Clarkb filled out the survey and requested a few hours for us. Likely to be spent in more office hours type setup.
**** Thursday April 22 1400-1600UTC and 2200-0000UTC
** docs-old volume cleanup (ianw 20210406)
*** We were going to double check with Ajaeger if we can then proceed to cleanup if no one had a reason to keep it.
** planet.openstack.org (ianw 20210406)
*** Strong preference from clarkb to retire it
*** Superuser appears to be a major blog showing up there as well as a couple of others. Maybe we reach out to them and double check they don't want to help? (fungi and clarkb reached out to Superuser and they seem ok.
** tarballs ORD replication (ianw 20210406)
*** This has been done. Other than long initial sync is this happy day to day?
* Open discussion