Linkhut.Archiving.Pipeline
(linkhut v0.1.4)
View Source
Orchestrates the archiving pipeline from URL validation through crawler dispatch. Called by the Archiver worker.
Summary
Functions
Runs the full archiving pipeline for the given crawl run.
Functions
@spec run( Linkhut.Archiving.CrawlRun.t(), keyword() ) :: {:ok, map()} | {:error, term()}
Runs the full archiving pipeline for the given crawl run.
- Validates URL (SSRF check)
- Preflight request to get content_type, final_url, status
- SSRF check on final_url
- Selects eligible crawlers via can_handle?/2
- Atomically dispatches crawler jobs + creates pending snapshots
Always-dispatch crawlers (third-party) are selected before preflight
and dispatched alongside target crawlers. Not-archivable outcomes
(invalid URL, unsupported scheme, no eligible crawlers, file too large)
are finalized as :not_archivable — no retries.
Options:
:recrawl- boolean, whether this is a re-crawl attempt