Opendev review crawler

Ian Wienand iwienand at redhat.com
Tue Nov 3 23:45:16 UTC 2020


On Tue, Nov 03, 2020 at 06:42:15PM +0000, Jeremy Stanley wrote:
> If you still have a lot left to query, you might try to introduce a
> bit of a pause between each batch of paginated results to give the
> server some time to catch up freeing memory for caching other
> requests.

I'd also suggest updating your script to send a User-Agent header with
with your contact details, similar to other robots.

> Also if obtaining the data in a different form would be
> useful to you, we might be able to look into performing a dump of
> specific database tables so you don't have to slowly trickle it out
> of the API.

I'd agree with this.  Note that we are still using Gerrit 2.16 which
stores all this information in SQL, but soon we will soon (~20th Nov)
upgrade to the 3.0 branch which uses notedb to store the change
information in the git trees themselves.  This might be easier for you
to work with when complete; you can find the project list at [1].

Probably it would be easiest for us if you gave a SQL query you'ld
like us to run; we can double check and provide the results.  I'm not
really sure where to point you for the schema, I'm not sure Gerrit has
explicit documentation.  You could probably use our docker image at
[1] to stand it up and then fiddle with SQL queries directly?

-i

[1] https://github.com/openstack/project-config/blob/master/gerrit/projects.yaml
[2] https://registry.hub.docker.com/r/opendevorg/gerrit




More information about the service-discuss mailing list