On Fri, Jan 15, 2021 at 12:42:05PM -0800, Clark Boylan wrote:
Given the new information we have about the bug ceasing to be a problem in a couple of weeks, and also the ability to switch freely between and mix 1.6 and 1.8 servers, it sounds like a rollback won't necessarily be intractable if we decide it's warranted.
Yes, rather than doing it all in one go with a downtime maybe we should instead start with afs01.ord or afs02.dfw (the more secondary servers), ensure that server is happy then roll through things in a rolling fashion? We have documented that general process here: https://docs.opendev.org/opendev/system-config/latest/afs.html#no-outage-ser.... It does appear that the existing releases may complicate this process though as we typically want to have the RW volume on active servers?
I agree with this; we can start with one server manually to validate that 1.6 and 1.8 do actually co-exist as happily as advertised. As mentioned in the etherpad I can work on things like new key distribution first as well, which is a good first thing to start running ansible with. I think we need to keep the manual upgrade plan in our pocket in case of a unexpected restart before the end of January. -i