Hi there, The wheel cache builds are becoming harder and harder to maintain, so I think we need to re-evaluate what we're doing. To summarise; currently for every platform, every day * job starts with zuul clone of requirements * run bindep (for master only? ... probably wrong) and do some more (now-looking-a-bit-dubious) setup [1] * we iterate over master + stable/* and "pip wheel" build each item in requirements, putting it into a local wheel cache [2] * except for arm64, where we take the latest two branches (the choosing of which was recently broken by the change in sort ordering from the "YYYY.X" release format) * then we grep the build logs to find out which .whl files were downloaded from PyPI, and delete them from the local cache * then we move to the publish step, where we copy the wheels to AFS. This never removes, but it does overwrite (so the .whl is very likely to change every day, as timestamps, etc. mean .whl builds are not reproducable). [3] * then we make a pypi index from the files in afs [4]. * We wait for all the publishing jobs to complete successfully, then we release the AFS volumes. If any fail, we don't publish that day [5] This started a long time ago, when we had a few platforms and a few branches. We now have newton->2023.1 branches in requirements, and we currently do this for 15 different platforms. When you multiply that out, it's not sustainable. Daily build jobs are timing out now, which holds up all publishing (I think the latest release pushed us over the edge). For some years, we were not pruning wheels we downloaded from PyPI [6]. If a .whl is built and on PyPI we should get it directly from upstream -- we have a caching proxy setup for CI jobs. I have written a small tool to help us clean up our caches [7]. It would be good if we could audit this tool, and when we're happy with it's output we can look at clearing out our caches. But that still leaves what is going into them every day. Iterating every branch is fairly useless. Ideally, we'd have a matrix of platforms v branches that gave us an exact mapping of what platforms run jobs on what branches. This does not generally exist; we all have some vague ideas and the extremes are obvious (we are not running Zed jobs on centos-8, and we are not running newton jobs on Ubuntu Jammy) but the middle is fuzzy. I'd like to solicit opinions on what we want this cache to do? One compelling option is to just build master requirements into the cache. The theory being that as branches are made, the requirement must have passed through master; ergo as we have an additive cache we will have wheels built. This seems OK, but it also seems that we need a cut-off point. It doesn't seem useful to build building "master" on centos-8/xenial as the requirements are all pinning things for Python's way in advance of what's there. If we do this, how do we maintain where a platform stops building master? stable/* requirements shouldn't change much; but if they do, we should push new .whls into the cache -- how do we do that in this model? This also makes our cache "precious" in that we are never building old branches -- if we lose AFS for some reason, we have a job ahead of us to restore all the old wheels. I think a perfect solution here might involve making the entire publishing pipeline driven by changes to openstack/requirements. Firstly, we have a non-trivial amount of work to figure out modifying the release process from "everything builds and releases or nothing does" to individual builds. I think we can do this with Zuul semaphores, and there's a decent chance it was written like this because mutexes/sempahores weren't available. This would be a non-trivial amount of work, and also be handing off a significant amount of this from what has traditionally been an infra job to the requirements project. Is anyone interested on working on this? I welcome any and all suggestions on what we want out of the wheel cache and how we can achieve it :) -i [1] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/699e811cb8fd3f0... [2] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/699e811cb8fd3f0... [3] https://opendev.org/openstack/project-config/src/branch/master/roles/copy-wh... [4] https://opendev.org/openstack/project-config/src/commit/6e4748ca35008a4c25e5... [5] https://opendev.org/openstack/project-config/src/branch/master/playbooks/whe... [6] https://review.opendev.org/c/openstack/project-config/+/703487 [7] https://review.opendev.org/c/opendev/system-config/+/879239