Dear Jeremy, Ian, Thank you for your kind support and for removing firewall rules that block my access. I will add more delay and use a User-Agent header with my contact details in my future requests as you recommended. I also agree with you that a dump of specific database tables would be great and can encourage other researchers to conduct more studies on code review and benefit from the available rich review data on Opendev projects. Kind regards, Moataz ________________________________ From: Jeremy Stanley Sent: Wednesday, November 4, 2020 12:12 PM To: service-discuss@lists.opendev.org Cc: Chouchen, Moataz; Ouni, Ali; Laurin, François Subject: Re: Opendev review crawler On 2020-11-04 00:02:33 +0000 (+0000), Chouchen, Moataz wrote: [...]
For this purpose of study, we need to collect the whole Opendev data. Right now, we collected data from 2020 to 2015 and still need the other years. As per requested we will add more waiting time to my process to let the server catch up. If you also have any other suggestions that will help, we will be happy to try it. [...]
Just adding that delay/throttle will probably suffice, but also Ian's suggestion to set a custom user agent string with contact info on your API requests can help avoid confusion and make it easier for site administrators to reach you if there's a problem with your queries. I've gone ahead and removed the firewall rules we had temporarily blocking your IP addresses as well. If you have any questions, please don't hesitate to get in touch through our service-discuss mailing list. Good luck on the research too, I can't wait to read your findings! -- Jeremy Stanley
Thanks again for implementing our earlier recommendations for setting a recognizable user agent string on your queries and spacing out each request by adding a delay. Unfortunately we've needed to temporarily block connections from your IP address again. It appears that, following our upgrade from Gerrit 2.13 to 3.2 we're seeing an overwhelming increase in memory pressure which seems to coincide with the times at which your queries started again on Monday. We're going to be working on trying to improve some aspects of memory management for the service, so we'd appreciate if you could put those Gerrit queries on hold while we attempt to do so. In the meantime, we should discuss what alternative methods we have for providing the data you're seeking. Thanks for understanding, and apologies for the inconvenience. -- Jeremy Stanley
participants (2)
-
Chouchen, Moataz
-
Jeremy Stanley