Linkhut.Archiving
(linkhut v0.1.4)
View Source
Manages link archiving — creating snapshots of bookmarked pages, storing them, and generating time-limited tokens to view them.
Crawling is handled by Linkhut.Archiving.Workers.Archiver and
Linkhut.Archiving.Workers.Crawler, which call back into this context
to persist results.
Summary
Functions
Returns the list of content types accepted for snapshot uploads.
Returns comprehensive archive statistics for the admin dashboard.
Returns archive statistics for a user.
Returns the number of available slots in the archiver queue. Counts jobs in active states and subtracts from the queue limit.
Returns true if the user can create new archives.
Returns true if the user can view/download existing archives. Any active user can view when archiving isn't disabled.
Cleans up older snapshots of the same (link_id, type) that are superseded
by a newly-terminal snapshot.
Creates a new crawl run record.
Creates a new snapshot for a link.
Deletes a single snapshot's storage and database record.
Returns :ok on success, {:error, reason} on failure.
Returns a MapSet of domain strings that have had a crawl run created
within the cooldown window. Used by the scheduler to skip domains that
were recently crawled.
Returns the list of users eligible for archiving based on the current mode.
Enqueues a SnapshotDeleter job for each snapshot in pending_deletion state.
Also deletes orphaned crawl runs (terminal state with no remaining snapshots).
Generates a short-lived token for serving a snapshot.
Returns all snapshots for a link (any state), newest first.
Returns a complete snapshot by ID, or {:error, :not_found}.
Returns all complete snapshots for a link, newest first, with crawl_run preloaded.
Returns all crawl runs for a link (excluding pending_deletion), with preloaded snapshots (also excluding pending_deletion), newest first.
Returns all non-deleted snapshots for a link, ordered by format locality then recency.
Returns the latest complete snapshot of a given format for a link.
Returns the latest complete snapshot of a given format and source for a link.
Returns a snapshot by link_id and job_id, or nil.
Gets a snapshot by its ID.
Returns links that have configured sources not covered by a current snapshot with matching version, excluding links with in-flight crawl runs.
Lists unarchived links for a user (links without completed snapshots and without an existing archive).
Marks all snapshots and crawl runs for a link as pending deletion.
Transitions a :processing crawl run to :complete when all its snapshots
have reached a terminal state (:complete, :not_available, :failed,
or :pending_deletion).
Returns the archiving mode.
Atomically recomputes the total_size_bytes for a single crawl run
from its complete snapshots.
Atomically recomputes the total_size_bytes for a crawl run by ID.
Uses a single UPDATE ... SET ... = (SELECT ...) statement — no locks needed.
Marks a single snapshot as pending deletion.
Schedules a re-crawl for a link by enqueueing a new Archiver job with the recrawl flag.
Transitions a :pending crawl run to :processing.
Idempotent for already-processing crawl runs (safe for Oban retries).
Returns {:error, :not_found} if the crawl run doesn't exist or is in
an unexpected state.
Returns steps relevant to a single snapshot: orchestration steps (no snapshot_id) plus steps matching the given snapshot_id. Sorted by timestamp.
Returns total storage bytes used across all users (complete snapshots only).
Returns total storage bytes used by a specific user (complete snapshots only).
Updates a crawl run's attributes.
Updates a snapshot's attributes.
Uploads a user-provided snapshot for a link.
Verifies a snapshot serving token, returning the snapshot_id or an error.
Functions
Returns the list of content types accepted for snapshot uploads.
Returns comprehensive archive statistics for the admin dashboard.
Returns archive statistics for a user.
Returns the number of available slots in the archiver queue. Counts jobs in active states and subtracts from the queue limit.
@spec can_create_archives?(Linkhut.Accounts.User.t()) :: boolean()
Returns true if the user can create new archives.
@spec can_view_archives?(Linkhut.Accounts.User.t()) :: boolean()
Returns true if the user can view/download existing archives. Any active user can view when archiving isn't disabled.
Cleans up older snapshots of the same (link_id, type) that are superseded
by a newly-terminal snapshot.
Quality ordering — a new state supersedes older snapshots in these states:
:complete→:complete,:not_available,:failed:not_available→:not_available,:failed:failed→:failed
Also marks crawl runs that end up with zero remaining non-deleted snapshots
as :pending_deletion.
Creates a new crawl run record.
Creates a new snapshot for a link.
Deletes a single snapshot's storage and database record.
Returns :ok on success, {:error, reason} on failure.
Returns a MapSet of domain strings that have had a crawl run created
within the cooldown window. Used by the scheduler to skip domains that
were recently crawled.
Returns the list of users eligible for archiving based on the current mode.
:disabled→ empty list:limited→ users with an active supporter subscription:enabled→ all active users
Enqueues a SnapshotDeleter job for each snapshot in pending_deletion state.
Also deletes orphaned crawl runs (terminal state with no remaining snapshots).
Generates a short-lived token for serving a snapshot.
Returns all snapshots for a link (any state), newest first.
Returns a complete snapshot by ID, or {:error, :not_found}.
Returns all complete snapshots for a link, newest first, with crawl_run preloaded.
Returns all crawl runs for a link (excluding pending_deletion), with preloaded snapshots (also excluding pending_deletion), newest first.
Returns all non-deleted snapshots for a link, ordered by format locality then recency.
Returns the latest complete snapshot of a given format for a link.
Returns the latest complete snapshot of a given format and source for a link.
Returns a snapshot by link_id and job_id, or nil.
Gets a snapshot by its ID.
Returns links that have configured sources not covered by a current snapshot with matching version, excluding links with in-flight crawl runs.
Returns a list of {link, remaining_sources} tuples where remaining_sources
is a MapSet of crawler source type strings not yet covered by any snapshot.
Lists unarchived links for a user (links without completed snapshots and without an existing archive).
Marks all snapshots and crawl runs for a link as pending deletion.
Transitions a :processing crawl run to :complete when all its snapshots
have reached a terminal state (:complete, :not_available, :failed,
or :pending_deletion).
Uses atomic UPDATE ... WHERE state = :processing to prevent race conditions
when concurrent crawlers finish simultaneously.
Returns the archiving mode.
:disabled— no archiving features:enabled— archiving for all active users:limited— archiving only for active paying users
Atomically recomputes the total_size_bytes for a single crawl run
from its complete snapshots.
Atomically recomputes the total_size_bytes for a crawl run by ID.
Uses a single UPDATE ... SET ... = (SELECT ...) statement — no locks needed.
Marks a single snapshot as pending deletion.
Returns {:ok, snapshot} on success, {:error, :active} if the snapshot
is in an active state (pending/crawling/retryable), {:error, :not_found}
if the snapshot doesn't exist or doesn't belong to the user.
Schedules a re-crawl for a link by enqueueing a new Archiver job with the recrawl flag.
Transitions a :pending crawl run to :processing.
Idempotent for already-processing crawl runs (safe for Oban retries).
Returns {:error, :not_found} if the crawl run doesn't exist or is in
an unexpected state.
Returns steps relevant to a single snapshot: orchestration steps (no snapshot_id) plus steps matching the given snapshot_id. Sorted by timestamp.
Returns total storage bytes used across all users (complete snapshots only).
Returns total storage bytes used by a specific user (complete snapshots only).
Updates a crawl run's attributes.
Updates a snapshot's attributes.
Uploads a user-provided snapshot for a link.
Validates file size, detects format from content type, stores the file, creates a snapshot record, and runs supersession cleanup.
Returns {:ok, snapshot} on success, {:error, reason} on failure.
Verifies a snapshot serving token, returning the snapshot_id or an error.