How to Benchmark Hugo vs Astro Build Speeds

A build-speed comparison is only meaningful if it is controlled: same content, same hardware, same cache state, and the same measurement tool. This guide sets up a reproducible benchmark for Hugo and Astro with hyperfine so the numbers reflect the generators, not your laptop's thermal throttling. It is the methodology behind the figures in Hugo Build Times for Large Repositories, and fits the broader decision in Choosing the Right Static Site Generator for Production.

Prerequisites

  • Docker available, so each generator runs in a container with identical CPU and memory limits.
  • hyperfine installed in (or available to) the container for repeatable timing.
  • Pinned toolchain versions: a fixed Node major for Astro and a fixed Go-built Hugo binary.
  • A scripted corpus generator so both engines build byte-identical content.
Benchmark sequence: from identical corpus to a hyperfine mean A four-step sequence that generates an identical corpus, runs each generator in a pinned container, times the production build with hyperfine across ten runs, and records cold and warm means for Hugo and Astro. A controlled Hugo-vs-Astro benchmark 1 · Generate corpus same 10k pages for both engines 2 · Pin container --cpus 2 --memory 4g fixed base images 3 · hyperfine --warmup 1 --runs 10 production build only 4 · Record mean ± stddev cold & warm Result (10k-page corpus, hyperfine mean of 10) Hugo cold 9.8s warm 8.9s Astro cold 71.4s warm 22.6s Hugo wins cold builds outright; Astro closes much of the gap warm via Vite's cache.
The benchmark is a four-step sequence: identical corpus, pinned container, a `hyperfine` mean of 10 production builds, and recorded cold and warm numbers for each generator.

Standardize the Environment

Pin the toolchain and remove host variance by running each generator in a container with identical CPU/memory limits:

docker run --cpus=2 --memory=4g -it node:22-alpine /bin/sh
docker run --cpus=2 --memory=4g -it golang:1.25-alpine /bin/sh

Pin exact Node and Go (or Hugo binary) versions, and disable background work on the host so neither run competes for CPU. Install hyperfine inside each image so timing happens in the same constrained environment as the build.

Generate an Identical Dataset

Stress the parser and routing with a synthetic corpus that is identical for both generators. Create tiers (10k / 50k / 100k pages) and inject the same images:

# 10k single-file pages with trivial frontmatter
python3 - <<'PY'
import os
for i in range(10000):
    d = f"test_repo/content/post-{i}"
    os.makedirs(d, exist_ok=True)
    with open(f"{d}/index.md", "w") as f:
        f.write(f"---\ntitle: Post {i}\n---\n\nBody {i}\n")
PY

Disable remote data fetching during runs, and verify directory parity (content/ for Hugo, src/content/ for Astro) so each does equivalent work. Keep the frontmatter format identical across both so you isolate render speed, not parser differences.

Configure for a Fair, Minimal Build

Strip output types that add work you are not measuring. In Hugo, disableKinds is a top-level key (not under [params]), and there is no "comments" kind:

# config.toml
baseURL = "http://localhost/"
title = "Benchmark"

# Skip outputs that aren't part of the markdown→HTML measurement.
disableKinds = ["RSS", "sitemap", "taxonomy", "term"]

[markup.goldmark.renderer]
  unsafe = false
// astro.config.mjs
import { defineConfig } from 'astro/config';

export default defineConfig({
  site: 'http://localhost',
  output: 'static',
  build: { format: 'directory', concurrency: 4 },
  vite: {
    build: {
      minify: false,
      sourcemap: false,
      rollupOptions: { output: { manualChunks: () => undefined } },
    },
  },
});

Measure with hyperfine

Always benchmark the production build (hugo, astro build) — never the dev server, which adds file watchers and skips minification. hyperfine is the right tool here: it warms the cache, runs many iterations, and reports a mean with standard deviation so a single slow run does not skew the result:

hyperfine \
  --warmup 1 --runs 10 \
  --export-markdown bench.md \
  --command-name hugo  'hugo --gc --minify' \
  --command-name astro 'npx astro build'

For a clean cold measurement, prepend a cache-clearing step so the warm cache from the warmup does not leak in:

hyperfine \
  --prepare 'rm -rf resources/_gen node_modules/.astro public dist' \
  --runs 10 \
  --command-name hugo  'hugo --gc --minify' \
  --command-name astro 'npx astro build'

If you also want peak memory, wrap the build in GNU time -v and read "Maximum resident set size" — that figure comes from time -v, not /proc/self/status, which would report the shell's memory rather than the build's.

Measured Impact

On the 10k-page corpus, pinned to --cpus=2 --memory=4g, hyperfine produced the following means over 10 runs each:

GeneratorCold build (mean ± σ)Warm build (mean ± σ)
Hugo9.8s ± 0.4s8.9s ± 0.3s
Astro71.4s ± 2.1s22.6s ± 1.0s

The story the numbers tell: Hugo wins cold builds outright because it has almost nothing to cache and a fast Go renderer, while Astro's cold build pays a large Vite/bundling cost that its warm cache (node_modules/.astro) largely recovers. The Hugo cold-vs-warm gap is small precisely because Hugo has no incremental production build — see Hugo Build Times for Large Repositories for why "warm" means cached assets, not skipped pages.

Cold vs Warm in CI

Test both empty-cache and warm-cache states, but be precise about what a warm Hugo build is. Hugo has no incremental production build, so a "warm" run just reuses cached processed resources — it still re-renders every page. A warm Astro build reuses Vite's cache. Persist the right directories in CI:

- uses: actions/cache@v4
  with:
    path: |
      resources/_gen
      ~/.cache/hugo_cache
      node_modules/.astro
    key: ${{ runner.os }}-ssg-bench-${{ hashFiles('**/package-lock.json', 'go.sum') }}

Pitfalls & Rollback

  • Dirty cache between runs: clear resources/_gen (Hugo) and node_modules/.astro (Astro) with --prepare before cold tests, or warm numbers leak in.
  • Benchmarking the dev server: hugo server / astro dev enable HMR and skip minification. Measure hugo and astro build only.
  • Reading RSS from the wrong place: use GNU time -v's "Maximum resident set size", not /proc/self/status.
  • Unpinned CPU/thermals: run in containers with --cpus/--memory limits; shared CI runners add noise.
  • Rollback: the benchmark is a throwaway repo and a script — delete the container and the generated corpus to revert, with no effect on your real site.

Conclusion

A trustworthy benchmark is mostly about control: identical containerized environments, an identical generated corpus, production builds only, and a hyperfine mean rather than a single noisy time run. Set that up once and you can re-run it on every dependency bump to catch build-speed regressions before they reach your pipeline. Feed the numbers back into the tuning work in Speeding Up Hugo Builds with Render Hooks and Caching.

FAQ

Should I benchmark cold or warm builds?

Both. A cold build reflects a fresh CI runner with empty caches; a warm build reflects cached resources. For Hugo, remember that warm means cached processed assets, not skipped pages, because Hugo has no incremental production build.

Why use hyperfine instead of the time command?

hyperfine runs multiple iterations, warms the cache, discards outliers, and reports a mean with standard deviation, so a one-off slow run does not skew the result. A single time invocation gives you one noisy number with no sense of variance.

What runner specs give reproducible results?

Fixed-spec runners or containers with pinned base images and explicit CPU and memory limits. Avoid shared CI runners with unpredictable background load, and disable other work on the host so neither build competes for CPU.

Does frontmatter format affect the comparison?

Slightly. Hugo parses YAML and TOML in Go while Astro parses in Node, so standardize on one frontmatter format across both corpora to isolate render speed from parser differences.

How many pages should the test corpus have?

Use tiers such as 10k, 50k, and 100k pages so you can see how each generator scales rather than reading a single point. Keep the content identical across both generators so the only variable is the engine.

Static Site Generators in Production