Parallel production jobs changes

Ian Wienand iwienand at redhat.com
Tue Dec 7 05:58:44 UTC 2021


Thank you for prior reviews and sticking with this rather complicated
fiddling; we ended up with some failures and reverts that I hope have
been addressed:

Firstly, we have a typo in matching install playbooks for our current
roles that has propogated as we copy-paste; a perfunctory fix

 https://review.opendev.org/c/opendev/system-config/+/820281

I am then proposing we rename the job we currently call
infra-prod-install-ansible to infra-prod-bootstrap-bridge.  This is a
back-to-the-future situation; as it was originally called something
like "install-bridge".  I think this better reflects where it will end
up, as hopefully revealed in the following steps

 https://review.opendev.org/c/opendev/system-config/+/820282

After this, we need to run infra-prod-bootstrap-bridge as the base job
before all other production jobs.  To recap -- this will be the
synchronisation point that puts Zuul's checkout of system-config on
the bastion host (bridge) that is then used to deploy all production
systems (eventually, in as parallel way as possible).

This revealed what I think is a problem with the original job -- it
runs the install-ansible role via
playbooks/zuul/run-production-playbook.yaml.  This is a
chicken-and-egg problem -- run-production-playbook.yaml uses the
Ansible installed on bridge to run playbooks in system-config to
... install Ansible on bridge.  Addressed with:

 https://review.opendev.org/c/opendev/system-config/+/820320/

this makes infra-prod-bootstrap bridge a stand-alone job that should

 * install the required version of Ansible on bridge
 * setup system-config to Zuul's checkout for the buildset

For sanity, we keep the current parent of the infra-prod-* jobs the
same -- this means each infra-prod-* job will still be re-checking-out
system-config as it runs.  We should validate

 * the install of ansible/openstacksdk/etc. is actually idempotent;
   (i.e. very run of infra-prod-bootstrap-bridge isn't reinstalling
   everything).

 * infra-prod-bootstrap-bridge always runs first in a deployment
    buildset (i.e. dependencies are correct)

 * infra-prod-bootstrap-bridge correctly puts the right checkout of
   system-config on bridge (as mentioned, for now the other infra-prod
   jobs will continue to overwrite it)

Once we have validated infra-prod-bootstrap-bridge is running as we
like, we can drop the other jobs checking-out code with:

  https://review.opendev.org/c/opendev/system-config/+/820651

and also cleanup base-jobs

 https://review.opendev.org/c/opendev/base-jobs/+/820652

At this point, we should be ready to run in parallel (touch wood...)

-i


On Wed, Nov 17, 2021 at 04:04:33PM +1100, Ian Wienand wrote:
> Hi,
> 
> To recap: currently production deployment jobs run sequentially.  Zuul
> starts the job on an executor, which is setup to log into the bastion
> host.  The job sets up the system-config playbooks on the bastion host
> and Ansible is run from there against the production server.
> 
> To run in parallel, each job needs to not assume it owns the
> system-config playbooks on the bastion host.
> 
> Each Zuul *buildset* can use the same system-config playbook checkout
> though.  To achieve this we need to rework the dependencies; each
> production job needs to depend on a common source-setup job.  Once the
> source is setup on the bastion host, the actual production jobs can
> run in parallel.
> 
> To the changes...
> 
> Firstly, I believe we're doing the setup steps for the executor to log
> into bridge twice:
> 
>  https://review.opendev.org/c/opendev/system-config/+/818190
> 
> removes this duplication, and should be safe to merge.
> 
> As pointed out in prior reviews when running in the periodic or hourly
> pipelines each job overrides that bastion host checkout to master.
> 
>  https://review.opendev.org/c/opendev/base-jobs/+/818189
> 
> moves this step into base-jobs, in preparation for only being done
> once by the separate source-setup job.  I believe this will be safe to
> merge; system-config will just do it again in an idempotent way,
> until:
> 
>  https://review.opendev.org/c/opendev/system-config/+/818191
> 
> merges, which drops this step from system-config.
> 
> We can then merge the system-config job dependency updates in
> 
>  https://review.opendev.org/c/opendev/system-config/+/807672
> 
> This should mean that all jobs not only rely on the correct base jobs,
> but jobs that need certificates, etc. will be relying on the
> letsencrypt job, etc.  This should be safe to merge as nothing should
> actually change, we just have stricter dependencies.
> 
> After this, I think we are ready to refactor the base jobs into the
> two separate steps -- firstly setup the keys on the executor to log
> into the bastion host, then setup the source to use on the bastion
> host:
> 
>   https://review.opendev.org/c/opendev/base-jobs/+/807807
> 
> This initial refactor should be safe to merge as it creates two new
> jobs, but the existing base job keeps running both steps as-is.
> 
> Then we are ready for the penultimate change:
> 
>   https://review.opendev.org/c/opendev/system-config/+/807808
> 
> This updates the system-config jobs to all depend on
> "infra-prod-setup-src" which will be the canonical job that sets up
> the source repository on bridge.o.o.  All other jobs in the buildset
> will depend on this job, ensuring consistency for a run.
> 
> This should also be safe, as it again doesn't actually change
> ordering.
> 
> Once all this is in, we need the final change to enable parallel
> running (and think about correct semaphores between periodic/hourly
> and regular runs).  That is yet to be written, but we have enough to
> get to that point!
> 
> -i




More information about the service-discuss mailing list