From iwienand at redhat.com Thu Apr 1 02:27:16 2021 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 1 Apr 2021 13:27:16 +1100 Subject: Next steps with new review server Message-ID: Hi, We have a large server provided by Vexxhost up and running in a staging capacity to replace the current server at review02.openstack.org. I have started to track some things at [1] There's a couple of things: 1) Production database Currently, we use a hosted db. Since NoteDB this only stores review seen flags. We've been told that other sites treat this data as ephemeral; they use a H2 db on disk and don't worry about backing up or restoring across upgrades. I have proposed storing this in a mariadb sibling container with [2]. We know how to admin, backup and restore that. That would be my preference, but I'm not terribly fussed. If I could request some reviews on that; I'll take +2's as a sign we should use a container, otherwise we can leave it with H2 it has now. 2) IPv6 issues We've seen a couple of cases that are looking increasingly like stray RA's are some how assigning extra addresses, similar to [1]. Our mirror in the same region has managed to acquire 50+ default routes somehow. It seems like inbound traffic keeps working (why we haven't seen issues with other production servers?). But I feel like it's a little bit troubling to have undiagnosed before we switch our major service to it. I'm running some tracing, trying to at least catch a stray RA while the server is quite, in the etherpad. But suggestions here are welcome. -i [1] https://etherpad.opendev.org/p/gerrit-upgrade-2021 [2] https://review.opendev.org/c/opendev/system-config/+/775961 [3] https://launchpad.net/bugs/1844712 From cboylan at sapwetik.org Thu Apr 1 15:20:31 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 01 Apr 2021 08:20:31 -0700 Subject: Next steps with new review server In-Reply-To: References: Message-ID: On Wed, Mar 31, 2021, at 7:27 PM, Ian Wienand wrote: > Hi, > > We have a large server provided by Vexxhost up and running in a > staging capacity to replace the current server at > review02.openstack.org. > > I have started to track some things at [1] > > There's a couple of things: > > 1) Production database > > Currently, we use a hosted db. Since NoteDB this only stores review > seen flags. We've been told that other sites treat this data as > ephemeral; they use a H2 db on disk and don't worry about backing up > or restoring across upgrades. > > I have proposed storing this in a mariadb sibling container with [2]. > We know how to admin, backup and restore that. That would be my > preference, but I'm not terribly fussed. If I could request some > reviews on that; I'll take +2's as a sign we should use a container, > otherwise we can leave it with H2 it has now. Agreed, sticking with known DB tooling seems like a good idea for ease of operator interaction. I'll try to review this change today. > > 2) IPv6 issues > > We've seen a couple of cases that are looking increasingly like stray > RA's are some how assigning extra addresses, similar to [1]. Our > mirror in the same region has managed to acquire 50+ default routes > somehow. > > It seems like inbound traffic keeps working (why we haven't seen > issues with other production servers?). But I feel like it's a little > bit troubling to have undiagnosed before we switch our major service > to it. I'm running some tracing, trying to at least catch a stray RA > while the server is quite, in the etherpad. But suggestions here are > welcome. Agreed, ideally we would sort this out before any migration completes. I want to say we saw similar with the mirror in vexxhost and the "solution" there was to disable RAs and create a static yaml config for ubuntu using its new network management config file? That seems less than ideal from a cloud perspective as we can't be the only ones noticing this (in fact some of our CI jobs may indicate they suffer from similar causing some jobs to run long when reaching network resources). I know when we brought this up with the mirror mnaser suggested static config was fine, but maybe we need to reinforce that this is problematic as a cloud user and see if we can help debug (network traces seem like a good start there). > > -i > > > [1] https://etherpad.opendev.org/p/gerrit-upgrade-2021 > [2] https://review.opendev.org/c/opendev/system-config/+/775961 > [3] https://launchpad.net/bugs/1844712 From cboylan at sapwetik.org Thu Apr 1 21:35:32 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 01 Apr 2021 14:35:32 -0700 Subject: Next steps with new review server In-Reply-To: References: Message-ID: On Thu, Apr 1, 2021, at 8:20 AM, Clark Boylan wrote: > On Wed, Mar 31, 2021, at 7:27 PM, Ian Wienand wrote: snip > > > > 2) IPv6 issues > > > > We've seen a couple of cases that are looking increasingly like stray > > RA's are some how assigning extra addresses, similar to [1]. Our > > mirror in the same region has managed to acquire 50+ default routes > > somehow. > > > > It seems like inbound traffic keeps working (why we haven't seen > > issues with other production servers?). But I feel like it's a little > > bit troubling to have undiagnosed before we switch our major service > > to it. I'm running some tracing, trying to at least catch a stray RA > > while the server is quite, in the etherpad. But suggestions here are > > welcome. > > Agreed, ideally we would sort this out before any migration completes. > I want to say we saw similar with the mirror in vexxhost and the > "solution" there was to disable RAs and create a static yaml config for > ubuntu using its new network management config file? That seems less > than ideal from a cloud perspective as we can't be the only ones > noticing this (in fact some of our CI jobs may indicate they suffer > from similar causing some jobs to run long when reaching network > resources). I know when we brought this up with the mirror mnaser > suggested static config was fine, but maybe we need to reinforce that > this is problematic as a cloud user and see if we can help debug > (network traces seem like a good start there). I ended up double checking the mirror node and in mirror.ca-ymq-1.vexxhost.opendev.org:/etc/netplan/50-cloud-init.yaml you can see what we did there. Essentially we set dhcpv6 and accept-ra to false then set an address and routes. We should be able to do the same thing with the new review host if we can't figure anything else out. If we do go this route maybe we should consider updating launch-node to do it for us automatically when launching focal nodes on vexxhost (I don't think bionic does netplan?), or at the very least document this somewhere. We should also double check that the address and routes are static and can be configured statically like this (the address should not change but I suppose the routes could at some point?). Ideally though we would sort this out properly and avoid these workarounds. > > > > > -i > > > > > > [1] https://etherpad.opendev.org/p/gerrit-upgrade-2021 > > [2] https://review.opendev.org/c/opendev/system-config/+/775961 > > [3] https://launchpad.net/bugs/1844712 > > From cboylan at sapwetik.org Mon Apr 5 22:30:01 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 05 Apr 2021 15:30:01 -0700 Subject: Team Meeting Agenda for April 6, 2021 Message-ID: We will meet with this agenda on April 6, 2021 at 19:00UTC in #opendev-meeting: == Agenda for next meeting == * Announcements ** OpenStack producing final RCs this week. Airship also working on a release. * Actions from last meeting * Specs approval * Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.) ** [http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html Update Config Management] *** topic:update-cfg-mgmt *** Zuul as CD engine ** OpenDev *** Gerrit upgrade to 3.2.8 **** https://review.opendev.org/c/opendev/system-config/+/784152 *** Gerrit account inconsistencies **** All preferred emails lack external ids issues have been corrected. All group loops have been corrected. **** Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?) **** Next steps ***** Cleaning external IDs for the last batch of retired users. *** Configuration tuning **** Using strong refs for jgit caches **** Batch user groups and threads * General topics ** Picking up steam on Puppet -> Ansible rewrites (clarkb 20210406) *** Enable Xenial -> Bionic/Focal system upgrades *** https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here *** Zuul service host updates in progress now. Scheduler and Zookeeper cluster remaining. Will focus on ZK first. **** https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 discussion of options for zk upgrades ** PTG Planning (ianw 20210406) *** Next PTG April 19-23 *** Clarkb filled out the survey and requested a few hours for us. Likely to be spent in more office hours type setup. **** Thursday April 22 1400-1600UTC and 2200-0000UTC ** docs-old volume cleanup (ianw 20210406) *** We were going to double check with Ajaeger if we can then proceed to cleanup if no one had a reason to keep it. ** planet.openstack.org (ianw 20210406) *** Strong preference from clarkb to retire it *** Superuser appears to be a major blog showing up there as well as a couple of others. Maybe we reach out to them and double check they don't want to help? (fungi and clarkb reached out to Superuser and they seem ok. ** tarballs ORD replication (ianw 20210406) *** This has been done. Other than long initial sync is this happy day to day? * Open discussion From mkopec at redhat.com Tue Apr 6 11:21:17 2021 From: mkopec at redhat.com (Martin Kopec) Date: Tue, 6 Apr 2021 13:21:17 +0200 Subject: [devstack][infra] POST_FAILURE on export-devstack-journal : Export journal Message-ID: Hi, one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task: export-devstack-journal : Export journal I'm bringing this to a broader audience as we're not sure where exactly the issue might be. Did you encounter a similar issue lately or in the past? [1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf Thanks for any advice, -- Martin Kopec -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Apr 6 15:14:02 2021 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 6 Apr 2021 17:14:02 +0200 Subject: [devstack][infra] POST_FAILURE on export-devstack-journal : Export journal In-Reply-To: References: Message-ID: I am testing whether replacing xz with gzip would solve the problem [1] [2]. [1] https://review.opendev.org/c/openstack/devstack/+/784964 [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967 -yoctozepto On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec wrote: > > Hi, > > one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] > during the following task: > > export-devstack-journal : Export journal > > I'm bringing this to a broader audience as we're not sure where exactly the issue might be. > > Did you encounter a similar issue lately or in the past? > > [1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf > > Thanks for any advice, > -- > Martin Kopec > > > From cboylan at sapwetik.org Tue Apr 6 15:51:19 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 06 Apr 2021 08:51:19 -0700 Subject: =?UTF-8?Q?Re:_[devstack][infra]_POST=5FFAILURE_on_export-devstack-journa?= =?UTF-8?Q?l_:_Export_journal?= In-Reply-To: References: Message-ID: On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote: > I am testing whether replacing xz with gzip would solve the problem [1] [2]. The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix. > > [1] https://review.opendev.org/c/openstack/devstack/+/784964 > [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967 > > -yoctozepto > > On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec wrote: > > > > Hi, > > > > one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] > > during the following task: > > > > export-devstack-journal : Export journal > > > > I'm bringing this to a broader audience as we're not sure where exactly the issue might be. > > > > Did you encounter a similar issue lately or in the past? > > > > [1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf > > > > Thanks for any advice, > > -- > > Martin Kopec From fungi at yuggoth.org Tue Apr 6 16:02:48 2021 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 6 Apr 2021 16:02:48 +0000 Subject: [devstack][infra] POST_FAILURE on export-devstack-journal : Export journal In-Reply-To: References: Message-ID: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> On 2021-04-06 13:21:17 +0200 (+0200), Martin Kopec wrote: > one of our jobs (python-tempestconf project) is frequently failing with > POST_FAILURE [1] > during the following task: > > export-devstack-journal : Export journal > > I'm bringing this to a broader audience as we're not sure where exactly the > issue might be. > > Did you encounter a similar issue lately or in the past? > > [1] > https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From radoslaw.piliszek at gmail.com Tue Apr 6 16:11:41 2021 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 6 Apr 2021 18:11:41 +0200 Subject: [devstack][infra] POST_FAILURE on export-devstack-journal : Export journal In-Reply-To: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> References: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> Message-ID: On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley wrote: > Looking at the error, I strongly suspect memory exhaustion. We could > try tuning xz to use less memory when compressing. That was my hunch as well, hence why I test using gzip. On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan wrote: > > On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote: > > I am testing whether replacing xz with gzip would solve the problem [1] [2]. > > The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix. Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested. -yoctozepto From radoslaw.piliszek at gmail.com Tue Apr 6 16:15:28 2021 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 6 Apr 2021 18:15:28 +0200 Subject: [devstack][infra] POST_FAILURE on export-devstack-journal : Export journal In-Reply-To: References: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> Message-ID: On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek wrote: > On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan wrote: > > > > On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote: > > > I am testing whether replacing xz with gzip would solve the problem [1] [2]. > > > > The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix. > > Let's see how bad the file sizes are. devstack.journal.gz 23.6M Less than all the other logs together, I would not mind. I wonder how it is in other jobs (this is from the failing one). -yoctozepto From cboylan at sapwetik.org Tue Apr 6 16:39:04 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 06 Apr 2021 09:39:04 -0700 Subject: =?UTF-8?Q?Re:_[devstack][infra]_POST=5FFAILURE_on_export-devstack-journa?= =?UTF-8?Q?l_:_Export_journal?= In-Reply-To: References: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> Message-ID: On Tue, Apr 6, 2021, at 9:15 AM, Radosław Piliszek wrote: > On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek > wrote: > > On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan wrote: > > > > > > On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote: > > > > I am testing whether replacing xz with gzip would solve the problem [1] [2]. > > > > > > The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix. > > > > Let's see how bad the file sizes are. > > devstack.journal.gz 23.6M > > Less than all the other logs together, I would not mind. > I wonder how it is in other jobs (this is from the failing one). There does seem to be a range (likely due to how much the job workload causes logging to happen in journald) from about a few megabytes to eighty something MB [3]. This is probably acceptable. Just keep an eye out for jobs that end up with much larger file sizes and we can reevaluate if we notice them. [3] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_038/784964/1/check/tempest-multinode-full-py3/038bd51/controller/logs/index.html From cboylan at sapwetik.org Tue Apr 6 16:46:33 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 06 Apr 2021 09:46:33 -0700 Subject: =?UTF-8?Q?Re:_[devstack][infra]_POST=5FFAILURE_on_export-devstack-journa?= =?UTF-8?Q?l_:_Export_journal?= In-Reply-To: References: <20210406160247.gevud2hlvodg7jzt@yuggoth.org> Message-ID: <7626869f-dab3-41df-a40b-dafa20dcfaf4@www.fastmail.com> On Tue, Apr 6, 2021, at 9:11 AM, Radosław Piliszek wrote: > On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley wrote: > > Looking at the error, I strongly suspect memory exhaustion. We could > > try tuning xz to use less memory when compressing. Worth noting that we continue to suspect memory pressure, and in particular diving into swap, for random failures that appear timing or performance related. I still think it would be a helpful exercise for OpenStack to look at its memory consumption (remember end users will experience this too) and see if there are any unexpected areas of memory use. I think the last time i skimmed logs the privsep daemon was a large consumer because we separate instance is run for each service and they all add up. > > That was my hunch as well, hence why I test using gzip. > > On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan wrote: > > > > On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote: > > > I am testing whether replacing xz with gzip would solve the problem [1] [2]. > > > > The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix. > > Let's see how bad the file sizes are. > If they are acceptable, we can keep gzip and be happy. > Otherwise we try to tune the params to make xz a better citizen as > fungi suggested. > > -yoctozepto > > From jim at acmegating.com Wed Apr 7 01:55:27 2021 From: jim at acmegating.com (James E. Blair) Date: Tue, 06 Apr 2021 18:55:27 -0700 Subject: Recent nodepool label changes Message-ID: <87blaqn9io.fsf@fuligin> Hi, I recently spent some time trying to figure out why a job worked as expected during one run and then failed due to limited memory on the following run. It turns out that back in February this change was merged on an emergency basis, which caused us to start occasionally providing nodes with 32G of ram instead of the typical 8G: https://review.opendev.org/773710 Nodepool labels are designed to represent the combination of an image and set of resources. To the best of our ability, the images and resources they provide should be consistent across different cloud providers. That's why we use DIB to create consistent images and that's why we use "-expanded" labels to request nodes with additional memory. It's also the case that when we add new clouds, we generally try to benchmark performance and adjust flavors as needed. Unfortunately, providing such disparate resources under the same Nodepool labels makes it impossible for job authors to reliably design jobs. To be clear, it's fine to provide resources of varying size, we just need to use different Nodepool labels for them so that job authors get what they're asking for. The last time we were in this position, we updated our Nodepool images to add the mem= Linux kernel command line parameter in order to limit the total available RAM. I suspect that is still possible, but due to the explosion of images and flavors, doing so will be considerably more difficult this time. We now also have the ability to reboot nodes in jobs after they come online, but doing that would add additional run time for every job. I believe we need to address this. Despite the additional work, it seems like the "mem=" approach is our best bet; unless anyone has other ideas? -Jim From cboylan at sapwetik.org Wed Apr 7 16:20:55 2021 From: cboylan at sapwetik.org (Clark Boylan) Date: Wed, 07 Apr 2021 09:20:55 -0700 Subject: Recent nodepool label changes In-Reply-To: <87blaqn9io.fsf@fuligin> References: <87blaqn9io.fsf@fuligin> Message-ID: On Tue, Apr 6, 2021, at 6:55 PM, James E. Blair wrote: > Hi, > > I recently spent some time trying to figure out why a job worked as > expected during one run and then failed due to limited memory on the > following run. It turns out that back in February this change was > merged on an emergency basis, which caused us to start occasionally > providing nodes with 32G of ram instead of the typical 8G: > > https://review.opendev.org/773710 > > Nodepool labels are designed to represent the combination of an image > and set of resources. To the best of our ability, the images and > resources they provide should be consistent across different cloud > providers. That's why we use DIB to create consistent images and that's > why we use "-expanded" labels to request nodes with additional memory. > It's also the case that when we add new clouds, we generally try to > benchmark performance and adjust flavors as needed. > > Unfortunately, providing such disparate resources under the same > Nodepool labels makes it impossible for job authors to reliably design > jobs. > > To be clear, it's fine to provide resources of varying size, we just > need to use different Nodepool labels for them so that job authors get > what they're asking for. > > The last time we were in this position, we updated our Nodepool images > to add the mem= Linux kernel command line parameter in order to limit > the total available RAM. I suspect that is still possible, but due to > the explosion of images and flavors, doing so will be considerably more > difficult this time. > > We now also have the ability to reboot nodes in jobs after they come > online, but doing that would add additional run time for every job. > > I believe we need to address this. Despite the additional work, it > seems like the "mem=" approach is our best bet; unless anyone has other > ideas? This change was made at the request of mnaser to better support resource allocation in vexxhost (the flavors we use now use their standard ratio for memory:cpu). One (likely bad) option would be to select a flavor based on memory rather than cpu count. In this case I think we would go from 8vcpu + 32GB memory to 2vcpu + 8GB of memory. At the time I was surprised the change merged so quickly and asked if anyone was starting work on setting the kernel boot parameters again: http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-02-02.log.html#t2021-02-02T18:04:23 I suspect that the kernel limit is our best option. We can set this via DIB_BOOTLOADER_DEFAULT_CMDLINE [0] which i expect will work in many cases across the various distros. The problem with this approach is that we would need different images for the places we want to boot with more memory (the -expanded labels for example). For completeness other possibilities are: * Convince the clouds that the nova flavor is the best place to control this and set them appropriately * Don't use clouds that can't set appropriate flavors * Accept Fungi's argument in the IRC log above and accept that memory as with other resources like disk iops and network will be variable * Kernel module that inspects some attribute at boot time and sets mem appropriately [0] https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/README.rst > > -Jim From smooney at redhat.com Wed Apr 7 16:30:28 2021 From: smooney at redhat.com (Sean Mooney) Date: Wed, 7 Apr 2021 17:30:28 +0100 Subject: Recent nodepool label changes In-Reply-To: References: <87blaqn9io.fsf@fuligin> Message-ID: <41ed1949-f638-eee1-2421-9840750a5c01@redhat.com> On 07/04/2021 17:20, Clark Boylan wrote: > On Tue, Apr 6, 2021, at 6:55 PM, James E. Blair wrote: >> Hi, >> >> I recently spent some time trying to figure out why a job worked as >> expected during one run and then failed due to limited memory on the >> following run. It turns out that back in February this change was >> merged on an emergency basis, which caused us to start occasionally >> providing nodes with 32G of ram instead of the typical 8G: >> >> https://review.opendev.org/773710 >> >> Nodepool labels are designed to represent the combination of an image >> and set of resources. To the best of our ability, the images and >> resources they provide should be consistent across different cloud >> providers. That's why we use DIB to create consistent images and that's >> why we use "-expanded" labels to request nodes with additional memory. >> It's also the case that when we add new clouds, we generally try to >> benchmark performance and adjust flavors as needed. >> >> Unfortunately, providing such disparate resources under the same >> Nodepool labels makes it impossible for job authors to reliably design >> jobs. >> >> To be clear, it's fine to provide resources of varying size, we just >> need to use different Nodepool labels for them so that job authors get >> what they're asking for. >> >> The last time we were in this position, we updated our Nodepool images >> to add the mem= Linux kernel command line parameter in order to limit >> the total available RAM. I suspect that is still possible, but due to >> the explosion of images and flavors, doing so will be considerably more >> difficult this time. >> >> We now also have the ability to reboot nodes in jobs after they come >> online, but doing that would add additional run time for every job. >> >> I believe we need to address this. Despite the additional work, it >> seems like the "mem=" approach is our best bet; unless anyone has other >> ideas? > This change was made at the request of mnaser to better support resource allocation in vexxhost (the flavors we use now use their standard ratio for memory:cpu). One (likely bad) option would be to select a flavor based on memory rather than cpu count. In this case I think we would go from 8vcpu + 32GB memory to 2vcpu + 8GB of memory. > > At the time I was surprised the change merged so quickly and asked if anyone was starting work on setting the kernel boot parameters again: > > http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-02-02.log.html#t2021-02-02T18:04:23 > > I suspect that the kernel limit is our best option. We can set this via DIB_BOOTLOADER_DEFAULT_CMDLINE [0] which i expect will work in many cases across the various distros. The problem with this approach is that we would need different images for the places we want to boot with more memory (the -expanded labels for example). > > For completeness other possibilities are: > * Convince the clouds that the nova flavor is the best place to control this and set them appropriately > * Don't use clouds that can't set appropriate flavors > * Accept Fungi's argument in the IRC log above and accept that memory as with other resources like disk iops and network will be variable > * Kernel module that inspects some attribute at boot time and sets mem appropriately im not sure why the issue is with allowing vms to have 32GB of ram. as job authors we should basically talor our jobs to fit the minium avaiable and if we get more ram then that a bonus. we should not be writing tempest jobs in particarl in such a way that more ram would break things out side of very speciric jobs. for example the whitebox tempest plug that litally ssh into the host vms to validate thing in the libvirt xml makes some assumiton about the env but i would consider it a bug in our plugin if it could not work with more ram. less ram we may have issue but more should not break any of our test or we should fix them. i think we shoudl be able to just have the vexhost flavor labled twice. once with the normal lables and once with the -expand one i would hope that we do not go down the path of hardcodign a kernel mem limit to 8G for all lables it seam very wasteful to me to boot a 32G vm and only use 8G of it. > > [0] > https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/README.rst > >> -Jim From fungi at yuggoth.org Wed Apr 7 16:39:46 2021 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 7 Apr 2021 16:39:46 +0000 Subject: Recent nodepool label changes In-Reply-To: References: <87blaqn9io.fsf@fuligin> Message-ID: <20210407163945.mjcz7l75kimktxed@yuggoth.org> On 2021-04-07 09:20:55 -0700 (-0700), Clark Boylan wrote: [...] > This change was made at the request of mnaser to better support > resource allocation in vexxhost (the flavors we use now use their > standard ratio for memory:cpu). One (likely bad) option would be > to select a flavor based on memory rather than cpu count. In this > case I think we would go from 8vcpu + 32GB memory to 2vcpu + 8GB > of memory. > > At the time I was surprised the change merged so quickly [...] Based on the commit message and the fact that we were pinged in IRC to review, I got the impression it was relatively urgent. > I suspect that the kernel limit is our best option. We can set > this via DIB_BOOTLOADER_DEFAULT_CMDLINE [0] which i expect will > work in many cases across the various distros. The problem with > this approach is that we would need different images for the > places we want to boot with more memory (the -expanded labels for > example). > > For completeness other possibilities are: > * Convince the clouds that the nova flavor is the best place to > control this and set them appropriately > * Don't use clouds that can't set appropriate flavors > * Accept Fungi's argument in the IRC log above and accept that > memory as with other resources like disk iops and network will be > variable To be clear, this was mostly a "devil's advocate" argument, and not really my opinion. We saw first hand that disparate memory sizing in HPCloud was allowing massive memory usage jumps to merge in OpenStack, and took action back then to artificially limit the available memory at boot. We now have fresh evidence from the Zuul community that this hasn't ceased to be a problem. On the other hand, we also see projects merge changes which significantly increase disk utilization and then can't run on some environments where we get smaller disks (or depend on having multiple network interfaces, or specific addressing schemes, or certain CPU flags, or...), so heterogeneity the problem isn't limited exclusively to memory. > * Kernel module that inspects some attribute at boot time and > sets mem appropriately [...] Not to downplay the value of the donated resources, because they really are very much appreciated, but these currently account for less than 5% of our aggregate node count so having to maintain multiple nearly identical images or doing a lot of additional engineering work seems like it may outweigh any immediate benefits. With the increasing use of special node labels like expanded, nested-virt and NUMA, it might make more sense to just limit this region to not supplying standard nodes, which sidesteps the problem for now. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From jim at acmegating.com Wed Apr 7 17:33:22 2021 From: jim at acmegating.com (James E. Blair) Date: Wed, 07 Apr 2021 10:33:22 -0700 Subject: Recent nodepool label changes In-Reply-To: <41ed1949-f638-eee1-2421-9840750a5c01@redhat.com> (Sean Mooney's message of "Wed, 7 Apr 2021 17:30:28 +0100") References: <87blaqn9io.fsf@fuligin> <41ed1949-f638-eee1-2421-9840750a5c01@redhat.com> Message-ID: <877dlem23h.fsf@fuligin> Sean Mooney writes: > im not sure why the issue is with allowing vms to have 32GB of ram. > as job authors we should basically talor our jobs to fit the minium > avaiable and if we get more ram then that a bonus. > we should not be writing tempest jobs in particarl in such a way that > more ram would break things out side of very speciric jobs. > for example the whitebox tempest plug that litally ssh into the host > vms to validate thing in the libvirt xml makes some assumiton about > the env but i would consider it a bug in our plugin if it could not > work with more ram. I tried really hard to make it clear I have no problem with the idea that we could have flavors with more ram. I absolutely don't object to that. What I am saying is that there is definitely a problem with using a label that has different amounts of ram in different providers. It causes jobs to behave differently. Jobs that pass in one provider will fail in another because of the ram difference. I agree with you that as job authors we should tailor our jobs to fit the minimum available ram. The problem is that is nearly impossible if Nodepool randomly gives us nodes with more ram. We won't realize we have exceeded the minimum ram until we hit a job on a provider with less ram after having exceeded it on a provider with more ram. This is not a theoretical issue -- you are reading this message because I hit this problem after two test runs on a recently started project. > less ram we may have issue but more should not break any of our test > or we should fix them. There is an inherent contradiction in saying that more ram is okay but less ram is not. They are two sides of the same coin. A job will not break because it had more ram the first time, it will break because it had less ram the second time. The fundamental issue is that a Nodepool label describes an image plus a flavor. That flavor must be as consistent as possible across providers if we expect job authors to be able to write predictable jobs. > it seam very wasteful to me to boot a 32G vm and only use 8G of it. It may seem that way, but the infrastructure provider has told us that they have tuned their hardware purchases to that ratio of CPU/RAM, and so we're helping out by doing this. The more wasteful thing is people issuing rechecks because their jobs pass in some providers and not others. -Jim From iwienand at redhat.com Thu Apr 8 05:43:33 2021 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 8 Apr 2021 15:43:33 +1000 Subject: Next steps with new review server In-Reply-To: References: Message-ID: On Thu, Apr 01, 2021 at 02:35:32PM -0700, Clark Boylan wrote: > I ended up double checking the mirror node and in > mirror.ca-ymq-1.vexxhost.opendev.org:/etc/netplan/50-cloud-init.yaml > you can see what we did there. Essentially we set dhcpv6 and > accept-ra to false then set an address and routes. We should be able > to do the same thing with the new review host if we can't figure > anything else out. > [3] https://launchpad.net/bugs/1844712 So we have a work around in production but also [3] being marked as an open security bug. Are we happy enough ignoring RA's is sufficient to overcome the issues discussed in [3] for this service? The concern mostly seemed to be a targeted MITM attack; something which ssh host keys and SSL certificates should cover? -i From fungi at yuggoth.org Thu Apr 8 19:48:35 2021 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 8 Apr 2021 19:48:35 +0000 Subject: Next steps with new review server In-Reply-To: References: Message-ID: <20210408194835.ma5xr6cm5enegnab@yuggoth.org> On 2021-04-08 15:43:33 +1000 (+1000), Ian Wienand wrote: > On Thu, Apr 01, 2021 at 02:35:32PM -0700, Clark Boylan wrote: > > I ended up double checking the mirror node and in > > mirror.ca-ymq-1.vexxhost.opendev.org:/etc/netplan/50-cloud-init.yaml > > you can see what we did there. Essentially we set dhcpv6 and > > accept-ra to false then set an address and routes. We should be able > > to do the same thing with the new review host if we can't figure > > anything else out. > > > [3] https://launchpad.net/bugs/1844712 > > So we have a work around in production but also [3] being marked as an > open security bug. > > Are we happy enough ignoring RA's is sufficient to overcome the issues > discussed in [3] for this service? The concern mostly seemed to be a > targeted MITM attack; something which ssh host keys and SSL > certificates should cover? Yes, I think ignoring RAs is probably sufficient. Nobody seems to have yet figured out how the leak happens or what else could be leaked, but as you note the fact that a MitM couldn't usefully spoof a viable HTTPS or SSH connection endpoint is sufficient insurance against anything worse, so we can just focus on mitigating the stability problem arising from stray leaks for now. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: