Archiving

View Source

Linkhut can capture point-in-time snapshots of bookmarked pages using SingleFile. Snapshots can be stored on the local filesystem or in S3-compatible object storage, and can be viewed, downloaded, or listed per bookmark.

Configuration

Archiving is configured under config :linkhut, Linkhut.Archiving, [...].

KeyTypeDefaultDescription
modeatom:disabledControls who can use archiving (see below).
data_dirstring""Directory where snapshot files are stored.
serve_hoststringnilHostname used when serving snapshots. Falls back to the request host.
storagemoduleStorage.LocalStorage backend module.
legacy_data_dirslist[]Additional directories to accept when resolving or deleting existing snapshots (useful during data directory migrations).

Local storage

Configuration under config :linkhut, Linkhut.Archiving.Storage.Local, [...]:

KeyTypeDefaultDescription
compressionatom:gzipCompression algorithm for new snapshots (:none or :gzip).

S3 storage

Configuration under config :linkhut, Linkhut.Archiving.Storage.S3, [...]:

KeyTypeDefaultDescription
bucketstring(required)S3 bucket name.
regionstring"eu-central-1"AWS region.
endpointstring(required)S3 endpoint hostname (e.g. s3.eu-central-1.amazonaws.com or a MinIO host).
access_key_idstring(required)AWS access key ID.
secret_access_keystring(required)AWS secret access key.
schemestring"https://"URL scheme for the endpoint.
portinteger443Port for the endpoint.
presign_ttlinteger900Presigned URL expiry in seconds.
compressionatom:gzipCompression algorithm for new snapshots (:none or :gzip).

To use S3 storage, set storage: Linkhut.Archiving.Storage.S3 in the Linkhut.Archiving config. Both backends can coexist — the dispatch layer routes resolve and delete operations based on the storage key prefix regardless of the active backend.

Modes

  • :disabled — Archiving is completely off. No snapshots are created or served. This is the default.
  • :limited — Archiving is available only to paying users.
  • :enabled — Archiving is available to all active users.

Environment variables

In runtime.exs, the following environment variables are read:

General:

VariableConfig keyTypeDefaultDescription
ARCHIVING_MODE:mode"enabled", "limited", or "disabled""disabled"Controls who can use archiving.
ARCHIVING_STORAGE:storage"local" or "s3""local"Storage backend for new snapshots.
ARCHIVING_DATA_DIR:data_dirpath string(none)Directory where local snapshot files are stored.
ARCHIVING_STAGING_DIR:staging_dirpath string(none)Temporary directory for crawler output before storage. Defaults to data_dir if unset.
ARCHIVING_SERVE_HOST:serve_hosthostname string(none)Dedicated hostname for serving archived HTML.
ARCHIVING_MAX_FILE_SIZE:max_file_sizeinteger (bytes)70000000Maximum size of archived files.
ARCHIVING_USER_AGENT_SUFFIX:user_agent_suffixstring(none)Appended to crawler User-Agent.
ARCHIVING_CRAWLER_CONCURRENCYOban :crawler queue limitinteger5Max concurrent crawler jobs. Lower this on resource-constrained hosts.

Local storage:

VariableConfig keyTypeDefaultDescription
ARCHIVING_LOCAL_COMPRESSION:compression"none" or "gzip""gzip"Compression for new local snapshots.

S3 storage:

VariableConfig keyTypeDefaultDescription
ARCHIVING_S3_BUCKET:bucketstring(none)S3 bucket name. Enables S3 config when set.
ARCHIVING_S3_ENDPOINT:endpointhostname string(required)S3 endpoint hostname.
ARCHIVING_S3_REGION:regionstring"eu-central-1"AWS region.
ARCHIVING_S3_ACCESS_KEY_ID:access_key_idstring(none)AWS access key ID.
ARCHIVING_S3_SECRET_ACCESS_KEY:secret_access_keystring(none)AWS secret access key.
ARCHIVING_S3_SCHEME:scheme"https://" or "http://""https://"URL scheme for the endpoint.
ARCHIVING_S3_PORT:portinteger443Port for the endpoint.
ARCHIVING_S3_PRESIGN_TTL:presign_ttlinteger (seconds)900Presigned URL expiry.
ARCHIVING_S3_COMPRESSION:compression"none" or "gzip""gzip"Compression for new S3 snapshots.

Compression

When compression is set to :gzip, new snapshots with compressible content types are gzip-compressed at rest.

  • Compressed files are served with Content-Encoding: gzip, so browsers decompress them transparently.
  • For local storage, downloads are decompressed before sending. For S3 storage, downloads are served via presigned URL.
  • Existing snapshots are not affected by the setting change. Use mix linkhut.storage local.compress to compress local snapshots retroactively.

Security considerations

SSRF protection

Before crawling a URL, Linkhut resolves its hostname and checks the resulting IP against a comprehensive list of reserved/non-routable address ranges (RFC 1918, link-local, loopback, CGN, cloud metadata, etc.). This prevents bookmarks from being used to probe internal services.

However, SingleFile runs as a separate process and performs its own DNS resolution. This leaves a small window for DNS rebinding attacks, where a hostname resolves to a public IP during validation but is changed to an internal IP before SingleFile fetches it.

Recommendation: Run crawler workers in a network-isolated environment (e.g., a container or network namespace with no access to internal services or cloud metadata endpoints like 169.254.169.254). This is the primary defense against DNS rebinding and other SSRF bypass techniques.

Snapshot serving

Snapshots are served in a sandboxed iframe with a restrictive Content Security Policy. Access to snapshot content requires a short-lived token (15 minutes) generated per view. The token is verified independently of the user session, allowing the snapshot to be served from a separate host if configured via serve_host.