elastic-recheck maintenance takeover

Tue Aug 11 23:18:42 UTC 2020

On Tue, Aug 11, 2020, at 2:42 PM, Sean Mooney wrote:
> does the tool actully need maitance or do we just need to write more 
> queries and submit them to the tool when we hit
> a bug. we used to have a strong recommentation that you never recheck a 
> patch without first inspecting why it
> failed and when you leave a recheck add a reference to a bug. there was 
> then a futuer recommentation that if we
> see the same bug being reject we add an elastic recheck query 
> refernecing the bug(which you were ment to fined if there
> was not alredy one) so that if other hit the same error elastic recheck 
> comments letting them know about the know issue.

Its a bit of both. The deployment of the tool could be modernized to python3 as well as related code updates. Ideally we'd also adjust it to be less openstack specific. A big part of that would be loading its queries from configs somewhere and not in the same repo. This way we can host queries for airship and starlingx and zuul too if we want. On the query side of things OpenStack's gate has a >50% unclassified status according to e-r right now.

I'm happy if those that find it useful continue to keep it running. Whether that is openstack specific or something different.

> 
> for example if you look a the nueton docs 
> https://docs.openstack.org/neutron/pike/contributor/policies/gerrit-recheck.html
> 
> "Please, do not recheck without providing the bug number for the failed 
> job. For example, do not just put an empty
> “recheck” comment but find the related bug number and put a “recheck 
> bug ######” comment instead. If a bug does not
> exist yet, create one so other team members can have a look. It helps 
> us maintain better visibility of gate failures"

We stopped this convention because e-r was working better. Humans were constantly identifying the wrong bugs or not attempting to be accurate and that data ended up being incredibly noisy. What we found instead was that the elastic-recheck data was far more accurate and it was better to invest in that. This worked really well as long as people were investing in it. I would not recommend we go back to human recheck tracking as it will just be noisy and lead to bad assumptions.

> 
> nova used to have a similar doc but i cant find it currently. i think 
> we might have removed it and driected people to
> the centralised contibutors guide at some point. the main contibutor 
> guide docs descibe how to add a new recheck query
> https://docs.openstack.org/contributors/code-and-documentation/elastic-recheck.html
> 
> but i dont think it common knoladge that we should add new queries, 
> file bugs or at a minium state the reason for
> the recheck before rechecking as at least in nove we have not pushed 
> contibutors to do this consitently for 2-3 year.
> when i first started working on openstack it was common knoladge and 
> the core teams and other explained that you should
> avoid doing an empty recheck but the last time i brought this up 
> downstream in ternally only about have of my team knew
> that that convention used to exist.
> 
> 
> On Tue, 2020-08-11 at 20:16 +0100, Sorin Sbarnea wrote:
> >  To be clear, the goal is drive maintenance of the tool, not to "control" it. Working in sync with opende- infra is
> > something I assumed.
> > 
> > As you well said there is a signifiant deployment / operational part which is currently done with puppet, which is for
> > good reason managed by opendev team. One of my goals was to simplify that part, migrate to ansible, container
> > conversion being part of it. 
> > 
> > This is highly dependent on opendev and I am confident that I will get the needed help on achieving that goal.
> > 
> > Sorin
> > 
> > > On 11 Aug 2020, at 19:53, Jeremy Stanley <fungi at yuggoth.org> wrote:
> > > 
> > > service relies represent a substantial chunk of our overall "control
> > 
> > 
> 
> 
>