OpenAFS timestamp rollover issues, discussion and plan

Jeremy Stanley fungi at yuggoth.org
Fri Jan 15 19:24:22 UTC 2021


On 2021-01-15 16:56:39 +1100 (+1100), Ian Wienand wrote:
[...]
> We have been told there are no fixes for the 1.6 servers planned.
> We do not have a wonderful answer here, unfortunately.
[...]

Per subsequent discussion in IRC today, the problem (at least for
1.6) stems from the number of zero bits in the timestamp leading to
weak/repeating ID generation, and will cease to present a problem
around the end of the month.

> It seems our best option here is to take some down-time of the AFS
> services and perform a manual, in-place upgrade of the existing
> servers to 1.8 code.  We will freeze these from ongoing automated
> config managment (puppet).  This will allow us to deploy and evaluate
> 1.8 servers and keep with the principle of "change one thing at a
> time".  Once we are in a steady state, we should have enough
> redundancy that we can replace the servers one-at-a-time with no, or
> very little, downtime.  This gives us time to write, test and review
> Ansible based deployment as usual.  When we do switch in new Focal
> based servers, we have the advantage we are not also trying to change
> the AFS version too.  In some ways, it works out well (something about
> lemons and lemonade).

It turns out 1.6 and 1.8 share the same protocols and on-disk volume
format, and the old and new keystores for them can coexist
side-by-side as well, so we very well may be able to just do this
in-place or with a rolling/piecemeal upgrade if we want.

> We usually try to have a rollback plan with any upgrades.  Since 1.6
> is not getting updates, and as far as I know we still do not have the
> technology to move reality backwards in the time continum before the
> problem rollover time (if I do figure out time travel, I will be sure
> to pre-respond to this message that it can be ignored) our revert
> plans seem limited.
[...]

Given the new information we have about the bug ceasing to be a
problem in a couple of weeks, and also the ability to switch freely
between and mix 1.6 and 1.8 servers, it sounds like a rollback won't
necessarily be intractable if we decide it's warranted.

> I'd suggest that I could start looking at this around 2021-01-17
> 21:00UTC, following the plan layed out in [2], which interested
> parties should definitely review.  This should hopefully be a quiet
> time, and give us a decent runway if we do hit issues.
[...]

This sounds reasonable time-wise, but also it seems like we might be
able to reduce the impact/outage and can take it slower if we need.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.opendev.org/pipermail/service-discuss/attachments/20210115/d18c83ca/attachment.sig>


More information about the service-discuss mailing list