Archiving

View Source

Linkhut can capture point-in-time snapshots of bookmarked pages using SingleFile. Snapshots are stored locally and can be viewed, downloaded, or listed per bookmark.

Configuration

Archiving is configured under config :linkhut, Linkhut.Archiving, [...].

KeyTypeDefaultDescription
modeatom:disabledControls who can use archiving (see below).
data_dirstring""Directory where snapshot files are stored.
serve_hoststringnilHostname used when serving snapshots. Falls back to the request host.
storagemoduleStorage.LocalStorage backend module.
legacy_data_dirslist[]Additional directories to accept when resolving or deleting existing snapshots (useful during data directory migrations).

Additional configuration under config :linkhut, Linkhut.Archiving.Storage.Local, [...]:

KeyTypeDefaultDescription
compressionatom:noneCompression algorithm for new snapshots (:none or :gzip).

Modes

  • :disabled — Archiving is completely off. No snapshots are created or served. This is the default.
  • :limited — Archiving is available only to paying users.
  • :enabled — Archiving is available to all active users.

Environment variables

In runtime.exs, the following environment variables are read:

VariableConfig keyTypeDefaultDescription
ARCHIVING_MODE:mode"enabled", "limited", or "disabled""disabled"Controls who can use archiving.
ARCHIVING_DATA_DIR:data_dirpath string(none)Directory where snapshot files are stored.
ARCHIVING_SERVE_HOST:serve_hosthostname string(none)Dedicated hostname for serving archived HTML.
ARCHIVING_MAX_FILE_SIZE:max_file_sizeinteger (bytes)70000000Maximum size of archived files.
ARCHIVING_USER_AGENT_SUFFIX:user_agent_suffixstring(none)Appended to crawler User-Agent.
ARCHIVING_STORAGE_COMPRESSIONcompression (under Storage.Local)"none" or "gzip""none"Compression for new local snapshots.

Compression

When compression is set to :gzip, new snapshots with compressible content types are gzip-compressed at rest.

  • Compressed files are served with Content-Encoding: gzip, so browsers decompress them transparently.
  • Downloads are decompressed before sending.
  • Existing snapshots are not affected by the setting change. Use mix linkhut.storage local.compress to compress them retroactively.

Security considerations

SSRF protection

Before crawling a URL, Linkhut resolves its hostname and checks the resulting IP against a comprehensive list of reserved/non-routable address ranges (RFC 1918, link-local, loopback, CGN, cloud metadata, etc.). This prevents bookmarks from being used to probe internal services.

However, SingleFile runs as a separate process and performs its own DNS resolution. This leaves a small window for DNS rebinding attacks, where a hostname resolves to a public IP during validation but is changed to an internal IP before SingleFile fetches it.

Recommendation: Run crawler workers in a network-isolated environment (e.g., a container or network namespace with no access to internal services or cloud metadata endpoints like 169.254.169.254). This is the primary defense against DNS rebinding and other SSRF bypass techniques.

Snapshot serving

Snapshots are served in a sandboxed iframe with a restrictive Content Security Policy. Access to snapshot content requires a short-lived token (15 minutes) generated per view. The token is verified independently of the user session, allowing the snapshot to be served from a separate host if configured via serve_host.