Use timestamp in aggregate table#2
Conversation
aaronharsh
left a comment
There was a problem hiding this comment.
@zach - Sorry to retract my approval, but I just realized we're still using max_https_crawl_id in the internal repo. Is there a ddg.git PR to go along with this one?
Also, CONTRIBUTING.md in this repo still references max_https_crawl_id.
|
We will alter table and restart the service(s). Shouldn't need an additional PR. Updated CONTRIBUTING. |
|
My point was that some internal code still works with the removed field. If you drop https://dub.duckduckgo.com/duckduckgo/ddg/blob/bttf/components/https/create-https-queue.pl#L75 |
|
That specific script is on deck for an update beyond the ID. We can run it prior to merging the internal PR to give us a little leeway but, if necessary, the queue can be populated manually. |
|
It sounds like it's expected that this change makes the schema incompatible with the internal scripts like I get the motivation. It would be nice if the code was in a working state, at the very least because it makes it tougher for me to review. Would you like help cleaning it up? In particular, I'm thinking things like:
|
|
@aaronharsh Let me see if I can clear up the confusion.
|
No longer need the crawl ID since we are aggregating directly. We usually need to decode it into a timestamp anyhow.