elastic-recheck maintenance takeover

Wed Aug 12 00:05:33 UTC 2020

On Tue, 2020-08-11 at 16:18 -0700, Clark Boylan wrote:
> On Tue, Aug 11, 2020, at 2:42 PM, Sean Mooney wrote:
> > does the tool actully need maitance or do we just need to write more 
> > queries and submit them to the tool when we hit
> > a bug. we used to have a strong recommentation that you never recheck a 
> > patch without first inspecting why it
> > failed and when you leave a recheck add a reference to a bug. there was 
> > then a futuer recommentation that if we
> > see the same bug being reject we add an elastic recheck query 
> > refernecing the bug(which you were ment to fined if there
> > was not alredy one) so that if other hit the same error elastic recheck 
> > comments letting them know about the know issue.
> 
> Its a bit of both. The deployment of the tool could be modernized to python3 as well as related code updates. Ideally
> we'd also adjust it to be less openstack specific. A big part of that would be loading its queries from configs
> somewhere and not in the same repo. This way we can host queries for airship and starlingx and zuul too if we want. On
> the query side of things OpenStack's gate has a >50% unclassified status according to e-r right now.
> 
> I'm happy if those that find it useful continue to keep it running. Whether that is openstack specific or something
> different.
> 
> > 
> > for example if you look a the nueton docs 
> > https://docs.openstack.org/neutron/pike/contributor/policies/gerrit-recheck.html
> > 
> > "Please, do not recheck without providing the bug number for the failed 
> > job. For example, do not just put an empty
> > “recheck” comment but find the related bug number and put a “recheck 
> > bug ######” comment instead. If a bug does not
> > exist yet, create one so other team members can have a look. It helps 
> > us maintain better visibility of gate failures"
> 
> We stopped this convention because e-r was working better. Humans were constantly identifying the wrong bugs or not
> attempting to be accurate and that data ended up being incredibly noisy. What we found instead was that the elastic-
> recheck data was far more accurate and it was better to invest in that. This worked really well as long as people were
> investing in it. I would not recommend we go back to human recheck tracking as it will just be noisy and lead to bad
> assumptions.

well the down side that i have found is that now often people dont look at why it failded an just recheck

i try to at least do 
rechcek <job name> failed because <whatever failed>
 
i dont nessisary reference a bug or file one always
but try at least flag why it failed

> 
> > 
> > nova used to have a similar doc but i cant find it currently. i think 
> > we might have removed it and driected people to
> > the centralised contibutors guide at some point. the main contibutor 
> > guide docs descibe how to add a new recheck query
> > https://docs.openstack.org/contributors/code-and-documentation/elastic-recheck.html
> > 
> > but i dont think it common knoladge that we should add new queries, 
> > file bugs or at a minium state the reason for
> > the recheck before rechecking as at least in nove we have not pushed 
> > contibutors to do this consitently for 2-3 year.
> > when i first started working on openstack it was common knoladge and 
> > the core teams and other explained that you should
> > avoid doing an empty recheck but the last time i brought this up 
> > downstream in ternally only about have of my team knew
> > that that convention used to exist.
> > 
> > 
> > On Tue, 2020-08-11 at 20:16 +0100, Sorin Sbarnea wrote:
> > >  To be clear, the goal is drive maintenance of the tool, not to "control" it. Working in sync with opende- infra
> > > is
> > > something I assumed.
> > > 
> > > As you well said there is a signifiant deployment / operational part which is currently done with puppet, which is
> > > for
> > > good reason managed by opendev team. One of my goals was to simplify that part, migrate to ansible, container
> > > conversion being part of it. 
> > > 
> > > This is highly dependent on opendev and I am confident that I will get the needed help on achieving that goal.
> > > 
> > > Sorin
> > > 
> > > > On 11 Aug 2020, at 19:53, Jeremy Stanley <fungi at yuggoth.org> wrote:
> > > > 
> > > > service relies represent a substantial chunk of our overall "control
> > > 
> > > 
> > 
> > 
> > 
> 
>