Re: Wheel cache builds

3 Apr 2023

      On Mon, Apr 3, 2023, at 3:16 AM, Ian Wienand wrote:
...
Hi there,
The wheel cache builds are becoming harder and harder to maintain, so
I think we need to re-evaluate what we're doing.
To summarise; currently for every platform, every day
* job starts with zuul clone of requirements
* run bindep (for master only?  ... probably wrong) and do some more
   (now-looking-a-bit-dubious) setup [1]
* we iterate over master + stable/* and "pip wheel" build each item in
  requirements, putting it into a local wheel cache [2]
* except for arm64, where we take the latest two branches (the
    choosing of which was recently broken by the change in sort
    ordering from the "YYYY.X" release format)
* then we grep the build logs to find out which .whl files were
  downloaded from PyPI, and delete them from the local cache
* then we move to the publish step, where we copy the wheels to AFS.
  This never removes, but it does overwrite (so the .whl is very
  likely to change every day, as timestamps, etc. mean .whl builds are
  not reproducable). [3]
Rebuilding on some platforms is important as .so files can move due to library updates and incompatibilities. I have this problem with libre2 on tumbleweed. That said for the CI system I don't think any of the platforms have this problem except maybe centos stream (theoretical and I suspect they won't do this to stream either)?
...
* then we make a pypi index from the files in afs [4].
* We wait for all the publishing jobs to complete successfully, then
  we release the AFS volumes.  If any fail, we don't publish that day [5]
This started a long time ago, when we had a few platforms and a few
branches.  We now have newton->2023.1 branches in requirements, and we
currently do this for 15 different platforms.  When you multiply that
out, it's not sustainable.  Daily build jobs are timing out now, which
holds up all publishing (I think the latest release pushed us over the
edge).
For some years, we were not pruning wheels we downloaded from PyPI
[6].  If a .whl is built and on PyPI we should get it directly from
upstream -- we have a caching proxy setup for CI jobs.  I have written
a small tool to help us clean up our caches [7].  It would be good if
we could audit this tool, and when we're happy with it's output we can
look at clearing out our caches.
But that still leaves what is going into them every day.  Iterating
every branch is fairly useless.  Ideally, we'd have a matrix of
platforms v branches that gave us an exact mapping of what platforms
run jobs on what branches.  This does not generally exist; we all have
some vague ideas and the extremes are obvious (we are not running Zed
jobs on centos-8, and we are not running newton jobs on Ubuntu Jammy)
but the middle is fuzzy.
I'd like to solicit opinions on what we want this cache to do?
Thinking out loud here: what if we switched to maintaining an explicit list of packages we care about having wheels: libvirt-python, cryptography, cffi, lxml, etc. Some of these will already have wheels on pypi, some will only have wheels for x86 and not arm, and some won't have wheels at all. We could build those that are not available for the current platform and publish those.

I suggest this because I suspect that the vast majority of wheels we are building are for sdist only pure python projects that don't really need wheels published to speed up installation. If this assumption is correct we might end up with a very small list to maintain that builds quickly and doesn't consume much mirror space. Another upside to this approach is that we could decouple the wheel mirrors from openstack and allow other projects to request wheels. The downside is that this will probably require occasional maintenance on the OpenDev side to approve new packages to the list.

We will also need to maintain a bindep file to build that subset of packages. In some ways we would be duplicating work openstack is already doing. This is probably the biggest downside and worth considering.
...
One compelling option is to just build master requirements into the
cache.  The theory being that as branches are made, the requirement
must have passed through master; ergo as we have an additive cache we
will have wheels built.
This seems OK, but it also seems that we need a cut-off point.  It
doesn't seem useful to build building "master" on centos-8/xenial as
the requirements are all pinning things for Python's way in advance of
what's there.  If we do this, how do we maintain where a platform
stops building master?  stable/* requirements shouldn't change much;
but if they do, we should push new .whls into the cache -- how do we
do that in this model?  This also makes our cache "precious" in that
we are never building old branches -- if we lose AFS for some reason,
we have a job ahead of us to restore all the old wheels.
I think a perfect solution here might involve making the entire
publishing pipeline driven by changes to openstack/requirements.
Firstly, we have a non-trivial amount of work to figure out modifying
the release process from "everything builds and releases or nothing
does" to individual builds.  I think we can do this with Zuul
semaphores, and there's a decent chance it was written like this
because mutexes/sempahores weren't available.  This would be a
non-trivial amount of work, and also be handing off a significant
amount of this from what has traditionally been an infra job to the
requirements project.  Is anyone interested on working on this?
My only concern with having openstack/requirements drive this is that we have talked about decoupling openstack from these builds in the past so that other projects can more easily take advantage of them. Driving this from requirements would probably kill those dreams.

I'm not sure I understand how Zuul semaphores help with the handling of build failures? Instead maybe we should just always publish since any wheel we do build should be valid. If we don't build a wheel for X and Y depends on X the pypi indexes upstream of us will already cause X to be used. We aren't really gaining much by not having the wheels we do build in the downstream published wheel cache index.
...
I welcome any and all suggestions on what we want out of the wheel
cache and how we can achieve it :)
-i
[1] 
https://opendev.org/openstack/openstack-zuul-jobs/src/commit/699e811cb8fd3f0...
[2] 
https://opendev.org/openstack/openstack-zuul-jobs/src/commit/699e811cb8fd3f0...
[3] 
https://opendev.org/openstack/project-config/src/branch/master/roles/copy-wh...
[4] 
https://opendev.org/openstack/project-config/src/commit/6e4748ca35008a4c25e5...
[5] 
https://opendev.org/openstack/project-config/src/branch/master/playbooks/whe...
[6] https://review.opendev.org/c/openstack/project-config/+/703487
[7] https://review.opendev.org/c/opendev/system-config/+/879239

Re: Wheel cache builds

Clark Boylan