Next steps with new review server

Thu Apr 1 15:20:31 UTC 2021

On Wed, Mar 31, 2021, at 7:27 PM, Ian Wienand wrote:
> Hi,
> 
> We have a large server provided by Vexxhost up and running in a
> staging capacity to replace the current server at
> review02.openstack.org.
> 
> I have started to track some things at [1]
> 
> There's a couple of things:
> 
> 1) Production database
> 
> Currently, we use a hosted db.  Since NoteDB this only stores review
> seen flags.  We've been told that other sites treat this data as
> ephemeral; they use a H2 db on disk and don't worry about backing up
> or restoring across upgrades.
> 
> I have proposed storing this in a mariadb sibling container with [2].
> We know how to admin, backup and restore that.  That would be my
> preference, but I'm not terribly fussed.  If I could request some
> reviews on that; I'll take +2's as a sign we should use a container,
> otherwise we can leave it with H2 it has now.

Agreed, sticking with known DB tooling seems like a good idea for ease of operator interaction. I'll try to review this change today.

> 
> 2) IPv6 issues
> 
> We've seen a couple of cases that are looking increasingly like stray
> RA's are some how assigning extra addresses, similar to [1].  Our
> mirror in the same region has managed to acquire 50+ default routes
> somehow.
> 
> It seems like inbound traffic keeps working (why we haven't seen
> issues with other production servers?).  But I feel like it's a little
> bit troubling to have undiagnosed before we switch our major service
> to it.  I'm running some tracing, trying to at least catch a stray RA
> while the server is quite, in the etherpad.  But suggestions here are
> welcome.

Agreed, ideally we would sort this out before any migration completes. I want to say we saw similar with the mirror in vexxhost and the "solution" there was to disable RAs and create a static yaml config for ubuntu using its new network management config file? That seems less than ideal from a cloud perspective as we can't be the only ones noticing this (in fact some of our CI jobs may indicate they suffer from similar causing some jobs to run long when reaching network resources). I know when we brought this up with the mirror mnaser suggested static config was fine, but maybe we need to reinforce that this is problematic as a cloud user and see if we can help debug (network traces seem like a good start there).

> 
> -i
> 
> 
> [1] https://etherpad.opendev.org/p/gerrit-upgrade-2021
> [2] https://review.opendev.org/c/opendev/system-config/+/775961
> [3] https://launchpad.net/bugs/1844712