From iwienand at redhat.com Fri Jul 1 01:35:11 2022 From: iwienand at redhat.com (Ian Wienand) Date: Fri, 1 Jul 2022 11:35:11 +1000 Subject: New approaches for grafana.opendev.org Message-ID: Hello, 65;6800;1c Currently all graphs pushed to grafana.opendev.org are defined in project-config/graphana as YAML files consumed by grafyaml [1] The fundamental tension for grafyaml is that the upstream Grafana project do not document and publish a defined schema for dashboards and their components. grafyaml has a subset of the upstream data model which is incomplete, in some cases buggy -- but also, perhaps most importantly, undocumented [1]. We have over a million data points of interesting information in graphite but I feel there are significant barriers to new and interesting visualisations. With no clear documentation, either upstream or in grafyaml, where is somebody supposed to start? A series of changes have been reviewed and landed through grafyaml that allow it to upload dashboards exported directly from the Grafana UI in its native .json format. I would like to achieve some consensus that we use this feature in the OpenDev environment. I will leave aside the issues with the schema encoded in grafyaml; it's possible this might be fixed. AIUI the main reason for duplicating the schema in grafyaml was that it presented more reviewable YAML files. To this I would say: 1) Layout of the page; i.e. the rows, panels, nesting, etc. My argument here is that reviewers having to build a mental model of what a dashboard will look like -- from either YAML or json -- does not make for thorough reviews, especially if you're not already intimately familiar with the desired output. To this end, I have added a new job "project-config-grafana" which produces an artifact "Screenshots" that loads changed graphs into a Grafana instance and stores actual screenshots loaded from a headless browser. I believe this is a much more effective way to review proposed layout changes for both formats of input. 2) The data graphed. This comes down to the metric selected, and any functions applied. My argument here is that firstly the screenshot is a good way to evaluate this; for example you will see if you've accidentally treated milliseconds as seconds, etc. when the graph axis is wrong. As to the actual data -- ensuring we have the right metric, etc. -- I would say that the "raw" output of the exported graphs just isn't that hard to parse. It is unobfuscated and reads logically. I have proposed some examples: https://review.opendev.org/c/openstack/project-config/+/833213/6/grafana/infra-prod-deployment.json https://review.opendev.org/c/openstack/project-config/+/848212/3/grafana/nodepool-dib-status.json I think you can clearly see the metrics chosen and the functions applied. 3) Generally more confusing. This is true, as the .json file is meant for Grafana to read. However, for better or worse, this is the actual data model of your graph page. To this end, I have proposed documentation and a helper-script to start a Grafana instance in a local container, and load it with the defined dashboards: https://review.opendev.org/c/openstack/project-config/+/833214/ This is useful for interactive editing sessions to develop new dashboards, and if a reviewer wishes to examine a change more closely than the screenshots provided by CI, they can simply pull the change from gerrit and load it into a live instance using this simple method. I think this is a significantly lower barrier to get people developing new and interesting things against the data provided. I'm not proposing any existing graph need change [2] and grafyaml's features to setup datasources and load the graphs are still used. I'm not proposing we remove or even stop any development of the YAML schema if people want to work on that and prefer to keep their graphs that way. I think that there is a great resource here that is underutilised, and my hope is we have a path to greatly reduce the barriers to new contributions. Sorry for the long mail, -i [1] Grafana does a good job of backwards compatibility, so "old" dashboards work in new releases. Hence our extant graphs, though producing output that looks very different from what the UI produces now, generally work. Modulo some bugs where the "update" process doesn't work (thresholds was one I found), deprecations of features that will disappear (c.f. time-series graphs) and just the many panel types that are completely unsupported. [2] Though most extant graphs use deprecated panel types that will have to be updated one day; but that's an issue for another time. From cboylan at sapwetik.org Tue Jul 5 17:16:00 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 05 Jul 2022 10:16:00 -0700 Subject: Team Meeting Agenda for July 5, 2022 Message-ID: <6819655f-a2d6-4fb0-a211-af2258a879d2@www.fastmail.com> Hello, We will meet on July 5, 2022 at 19:00 UTC in #opendev-meeting with this agenda: == Agenda for next meeting == * Announcements ** clarkb missing July 12 meeting * Actions from last meeting * Specs Review * Topics ** Improving OpenDev's CD throughput (clarkb 20220705) *** Bootstrapping bridge via Zuul is now a complicated subject. Can use zuul secrets to make it happen. Are we comfortable with this? *** https://review.opendev.org/c/opendev/infra-specs/+/821645 -- spec outlining some of the issues with secrets *** https://review.opendev.org/c/opendev/system-config/+/821155 -- sample of secret writing; more info in changelog *** Auto upgrades of Zuul are in place now. ** Improving Grafana management tooling (clarkb 20220705) *** Grafyaml doesn't properly support setting the color thresholds on graphs anymore (this makes failed states show red and happy states show green, we always seen green now) *** https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html *** https://review.opendev.org/q/topic:grafana-json ** Run a custom URL shortener service (frickler 20220705) *** Many people use bit.ly or similar in IRC channel topics and elsewhere *** https://opensource.com/article/18/7/apache-url-shortener shows an easy solution that could be git-based *** Should be easy to with some new DNS record on static.o.o *** Data could be managed in a single file (maybe in project-config) or one file per URL ** Zuul job POST_FAILURES (clarkb 20220705) *** TripleO and OSA are both seeing a higher than usual number of POST_FAILURES *** https://review.opendev.org/c/opendev/base-jobs/+/848027 Add remote log store location debug info to base jobs. ** Bastion host (ianw 20220628) *** worth moving ansible/openstacksdk to a venv? system-config jobs first then production *** c.f. https://review.opendev.org/c/opendev/system-config/+/847700 *** bastion host OS upgrade. prioin-place? new host? wait until have time to return to some of the bootstrapping/parallel job work? * Open discussion Apologies for getting this agenda sent out late. Yesterday was a holiday, and I had less time indoors than anticipated. From iwienand at redhat.com Mon Jul 11 02:43:36 2022 From: iwienand at redhat.com (Ian Wienand) Date: Mon, 11 Jul 2022 12:43:36 +1000 Subject: [service-announce] Updating Zuul's Default Ansible Version to Ansible v5 In-Reply-To: <8f869fba-10b8-488c-8f58-065115822555@www.fastmail.com> References: <8f869fba-10b8-488c-8f58-065115822555@www.fastmail.com> Message-ID: On Wed, Jun 15, 2022 at 12:11:00PM -0700, Clark Boylan wrote: > The OpenDev team will be updating the default Ansible version in our > Zuul tenants from Ansible 2.9 to Ansible 5 on June 30, 2022. Zuul > itself will eventually update its default, but making the change in > our tenant configs allows us to control exactly when this happens. Note this has been merged with https://review.opendev.org/c/openstack/project-config/+/849120 Just for visibility I've cc'd this to openstack-discuss; but please subscribe to service-announce [1] if you're interested in such OpenDev infra updates. Thanks, -i [1] https://lists.opendev.org/cgi-bin/mailman/listinfo/service-announce From cboylan at sapwetik.org Mon Jul 18 21:47:41 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 18 Jul 2022 14:47:41 -0700 Subject: Team Meeting Agenda for July 19, 2022 Message-ID: Hello, We will meet on July 19, 2022 at 19:00UTC in #opendev-meeting with this agenda: == Agenda for next meeting == * Announcements * Actions from last meeting * Specs Review * Topics ** Improving OpenDev's CD throughput (clarkb 20220719) *** Bootstrapping bridge via Zuul is now a complicated subject. Can use zuul secrets to make it happen. Are we comfortable with this? *** https://review.opendev.org/c/opendev/infra-specs/+/821645 -- spec outlining some of the issues with secrets *** https://review.opendev.org/c/opendev/system-config/+/821155 -- sample of secret writing; more info in changelog ** Improving Grafana management tooling (clarkb 20220719) *** Grafyaml doesn't properly support setting the color thresholds on graphs anymore (this makes failed states show red and happy states show green, we always seen green now) *** https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html *** https://review.opendev.org/q/topic:grafana-json ** Bastion host (ianw 20220719) *** worth moving ansible/openstacksdk to a venv? system-config jobs first then production *** c.f. https://review.opendev.org/c/opendev/system-config/+/847700 *** bastion host OS upgrade. prioin-place? new host? wait until have time to return to some of the bootstrapping/parallel job work? ** Upgrading Bionic servers to Focal/Jammy (clarkb 20220719) *** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades ** Zuul + Ansible v5 + Glibc deadlock (clarkb 20220719) *** Using backported Debian testing glibc on Zuul images now. ** Zuul job POST_FAILURES (clarkb 20220719) *** TripleO and OSA are both seeing a higher than usual number of POST_FAILURES *** https://review.opendev.org/c/opendev/base-jobs/+/848881 Add remote log store location debug info to prod base job. ** New Gerrit 3.5 caches (clarkb 20220719) *** Gerrit 3.5 added some new caches that consume a fair bit more disk space. *** We've grown the disk for Gerrit to accommodate this change with newer Gerrit. *** https://review.opendev.org/c/opendev/system-config/+/849886 is our fallback which will disable the caches should their growth continue to be unsustainable. * Open discussion From cboylan at sapwetik.org Mon Jul 25 22:30:24 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 25 Jul 2022 15:30:24 -0700 Subject: Team Meeting Agenda for July 26, 2022 Message-ID: We will meet on July 26, 2022 at 19:00 UTC in #opendev-meeting with this agenda: == Agenda for next meeting == * Announcements * Actions from last meeting * Specs Review * Topics ** Improving OpenDev's CD throughput (clarkb 20220726) *** Bootstrapping bridge via Zuul is now a complicated subject. Can use zuul secrets to make it happen. Are we comfortable with this? *** https://review.opendev.org/c/opendev/infra-specs/+/821645 -- spec outlining some of the issues with secrets *** https://review.opendev.org/c/opendev/system-config/+/821155 -- sample of secret writing; more info in changelog ** Improving Grafana management tooling (clarkb 20220726) *** https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html *** https://review.opendev.org/q/topic:grafana-json ** Bastion host (ianw 20220726) *** worth moving ansible/openstacksdk to a venv? system-config jobs first then production *** c.f. https://review.opendev.org/c/opendev/system-config/+/847700 *** bastion host OS upgrade. prioin-place? new host? wait until have time to return to some of the bootstrapping/parallel job work? ** Upgrading Bionic servers to Focal/Jammy (clarkb 20220726) *** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades ** Zuul job POST_FAILURES (clarkb 20220726) *** TripleO and OSA are both seeing a higher than usual number of POST_FAILURES *** We now log the target swift region before uploading logs. Next step is collecting info on where this occurs to determine if it is consistent. ** OpenDev Service coordinator elections beginning soon (clarkb 20220726) *** In February I stated August 2 - 16, 2022 would work as a nomination period. *** Will send email about the process after our meeting. * Open discussion From cboylan at sapwetik.org Tue Jul 26 23:26:34 2022 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 26 Jul 2022 16:26:34 -0700 Subject: Service Coordinator Election August 2022 Edition Message-ID: It is almost that time again. Back in February I said that we'd have a Service Coordinator Election nomination period that runs from August 2, 2022 to August 16, 2022 [0]. If you'd like to run (I'm more than happy for someone else to do it) now is the time to start thinking about that. I'm giving everyone a head up a week in advance today, and will send a notice on August 2nd that the nomination period is beginning. If you'd like to take this on you can send your nomination as a reply to this email thread or in a new email thread to the service-discuss list. [0] https://lists.opendev.org/pipermail/service-discuss/2022-February/000318.html Clark