Brainstorm time. Express server that has 2 endpoints:
-
/register
- Registers a callback, and you receive an id later. it
evals the payload and keep in memory. this is only for not sending the whole function every time on /request. Using the id received here calls the same evaled function using /request
-
/request
- FIFO Queue the request (the same RequestOptions format) in memory.
- would be on a 32GB instance with worker_threads (pool of 7-8 threads) in a work stealing manner.
- use a SharedArrayBuffer with the JSON in string format, JSON.parse inside the worker?
- Bilateral communication can work with MessageChannel. Maybe call Apify.pushData from the received message? or inside the worker? (might overload the dataset endpoint)
- Persists remaining requests to RL on migration, using the same id from register, so we know which one belongs to where and can be loaded from the KV.
- how retries are done? they come back to the queue or stay retrying until giving up?
- Registering the code from the "main" process (ie. another scraper/task) can serialize the function as in:
await fetch(/* container url register */, {
body: myWorkerCode.toString()
})
- since it's a fire-and-forget mechanism (and the data will be pushed to the designated dataset), how could de-duping occur?
Brainstorm time. Express server that has 2 endpoints:
/register
evals the payload and keep in memory. this is only for not sending the whole function every time on /request. Using the id received here calls the same evaled function using /request/request