I allow myself to contact you following the email below sent to me by Clark.
Indeed, I want to add a new volume driver in Cinder  and for that I
must have a working CI which validates the tempest tests with the Lustre
I tried for 2 weeks to set up this CI with Software Factory but I didn't
manage to make it work. However, the Lustre FS being 100% open-source, I
was advised to use the upstream CI directly.
In gerrit I have already created a service user with username lustreci.
Could you tell me how to set up and run the tests directly in the
upstream CI ?
For information, I already have ansible playbooks allowing to set up a
Lustre FS on CentOS 8.
Thanks in advance,
Le 19/04/2023 à 18:01, Clark Boylan a écrit :
> I wanted to followup on your questions in
> https://review.opendev.org/c/openstack/cinder/+/853785 as well as your
> query submitted to https://openinfra.dev/projects/contact/.
> As mentioned by Sean on the Gerrit change the Cinder team requires
> working CI for all in tree Cinder drivers. Many of the systems that
> Cinder integrates with are proprietary storage systems which
> necessitates the use of external (to OpenDev) third party CI as
> specialized hardware and licensing requirements don't allow us to run
> these upstream.
> To make this happen you need a CI system that is capable of listening
> to Gerrit events, triggering builds, and reporting the results back to
> Gerrit (for example Zuul/Jenkins/etc). Both OpenDev  and the Cinder
> team  attempt to provide documentation, but this will always be
> incomplete as we won't be aware of your local network policies,
> hardware peculiarities and so on. The OpenDev team can help with
> connection and account details for Gerrit, and the Cinder team should
> be able to help with test specific needs (like appropriate logging,
> service behavior, etc). I cannot speak to Software Factory as I have
> never personally used it.
> The good news is that Lustre is open source software which can be
> deployed without proprietary licensing restrictions, and there don't
> appear to be specialized hardware needs either. In this case I would
> test Lustre + Cinder in the upstream CI system. It looks like Cinder
> is already doing this with Ceph . My recommendation would be that
> you shift focus from attempting to running a third party CI system to
> adding a new job that runs against Cinder changes to test Cinder +
> Lustre integration.
> This is something that the Cinder team should be able to help with as
> they have a number of Zuul jobs already including the one that tests
> against Ceph. The OpenDev team can help with higher level concerns
> like Gerrit accounts, general Zuul behaviors/syntax, and CI system
> limitations. You can reach out to the OpenDev team either via
> service-discuss(a)lists.opendev.org or in #opendev on the OFTC IRC
> network (all of this info can be found in the footer of
>  https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers
Responsable technique OSSA
Tél : 07 85 55 35 11
The wheel cache builds are becoming harder and harder to maintain, so
I think we need to re-evaluate what we're doing.
To summarise; currently for every platform, every day
* job starts with zuul clone of requirements
* run bindep (for master only? ... probably wrong) and do some more
(now-looking-a-bit-dubious) setup 
* we iterate over master + stable/* and "pip wheel" build each item in
requirements, putting it into a local wheel cache 
* except for arm64, where we take the latest two branches (the
choosing of which was recently broken by the change in sort
ordering from the "YYYY.X" release format)
* then we grep the build logs to find out which .whl files were
downloaded from PyPI, and delete them from the local cache
* then we move to the publish step, where we copy the wheels to AFS.
This never removes, but it does overwrite (so the .whl is very
likely to change every day, as timestamps, etc. mean .whl builds are
not reproducable). 
* then we make a pypi index from the files in afs .
* We wait for all the publishing jobs to complete successfully, then
we release the AFS volumes. If any fail, we don't publish that day 
This started a long time ago, when we had a few platforms and a few
branches. We now have newton->2023.1 branches in requirements, and we
currently do this for 15 different platforms. When you multiply that
out, it's not sustainable. Daily build jobs are timing out now, which
holds up all publishing (I think the latest release pushed us over the
For some years, we were not pruning wheels we downloaded from PyPI
. If a .whl is built and on PyPI we should get it directly from
upstream -- we have a caching proxy setup for CI jobs. I have written
a small tool to help us clean up our caches . It would be good if
we could audit this tool, and when we're happy with it's output we can
look at clearing out our caches.
But that still leaves what is going into them every day. Iterating
every branch is fairly useless. Ideally, we'd have a matrix of
platforms v branches that gave us an exact mapping of what platforms
run jobs on what branches. This does not generally exist; we all have
some vague ideas and the extremes are obvious (we are not running Zed
jobs on centos-8, and we are not running newton jobs on Ubuntu Jammy)
but the middle is fuzzy.
I'd like to solicit opinions on what we want this cache to do?
One compelling option is to just build master requirements into the
cache. The theory being that as branches are made, the requirement
must have passed through master; ergo as we have an additive cache we
will have wheels built.
This seems OK, but it also seems that we need a cut-off point. It
doesn't seem useful to build building "master" on centos-8/xenial as
the requirements are all pinning things for Python's way in advance of
what's there. If we do this, how do we maintain where a platform
stops building master? stable/* requirements shouldn't change much;
but if they do, we should push new .whls into the cache -- how do we
do that in this model? This also makes our cache "precious" in that
we are never building old branches -- if we lose AFS for some reason,
we have a job ahead of us to restore all the old wheels.
I think a perfect solution here might involve making the entire
publishing pipeline driven by changes to openstack/requirements.
Firstly, we have a non-trivial amount of work to figure out modifying
the release process from "everything builds and releases or nothing
does" to individual builds. I think we can do this with Zuul
semaphores, and there's a decent chance it was written like this
because mutexes/sempahores weren't available. This would be a
non-trivial amount of work, and also be handing off a significant
amount of this from what has traditionally been an infra job to the
requirements project. Is anyone interested on working on this?
I welcome any and all suggestions on what we want out of the wheel
cache and how we can achieve it :)