How to Benchmark Hugo vs Astro Build Speeds
A build-speed comparison is only meaningful if it is controlled: same content, same hardware, same cache state, and the same measurement tool. This guide sets up a reproducible benchmark for Hugo and Astro with hyperfine so the numbers reflect the generators, not your laptop's thermal throttling. It is the methodology behind the figures in Hugo Build Times for Large Repositories, and fits the broader decision in Choosing the Right Static Site Generator for Production.
Prerequisites
- Docker available, so each generator runs in a container with identical CPU and memory limits.
hyperfineinstalled in (or available to) the container for repeatable timing.- Pinned toolchain versions: a fixed Node major for Astro and a fixed Go-built Hugo binary.
- A scripted corpus generator so both engines build byte-identical content.
Standardize the Environment
Pin the toolchain and remove host variance by running each generator in a container with identical CPU/memory limits:
docker run --cpus=2 --memory=4g -it node:22-alpine /bin/sh
docker run --cpus=2 --memory=4g -it golang:1.25-alpine /bin/sh
Pin exact Node and Go (or Hugo binary) versions, and disable background work on the host so neither run competes for CPU. Install hyperfine inside each image so timing happens in the same constrained environment as the build.
Generate an Identical Dataset
Stress the parser and routing with a synthetic corpus that is identical for both generators. Create tiers (10k / 50k / 100k pages) and inject the same images:
# 10k single-file pages with trivial frontmatter
python3 - <<'PY'
import os
for i in range(10000):
d = f"test_repo/content/post-{i}"
os.makedirs(d, exist_ok=True)
with open(f"{d}/index.md", "w") as f:
f.write(f"---\ntitle: Post {i}\n---\n\nBody {i}\n")
PY
Disable remote data fetching during runs, and verify directory parity (content/ for Hugo, src/content/ for Astro) so each does equivalent work. Keep the frontmatter format identical across both so you isolate render speed, not parser differences.
Configure for a Fair, Minimal Build
Strip output types that add work you are not measuring. In Hugo, disableKinds is a top-level key (not under [params]), and there is no "comments" kind:
# config.toml
baseURL = "http://localhost/"
title = "Benchmark"
# Skip outputs that aren't part of the markdown→HTML measurement.
disableKinds = ["RSS", "sitemap", "taxonomy", "term"]
[markup.goldmark.renderer]
unsafe = false
// astro.config.mjs
import { defineConfig } from 'astro/config';
export default defineConfig({
site: 'http://localhost',
output: 'static',
build: { format: 'directory', concurrency: 4 },
vite: {
build: {
minify: false,
sourcemap: false,
rollupOptions: { output: { manualChunks: () => undefined } },
},
},
});
Measure with hyperfine
Always benchmark the production build (hugo, astro build) — never the dev server, which adds file watchers and skips minification. hyperfine is the right tool here: it warms the cache, runs many iterations, and reports a mean with standard deviation so a single slow run does not skew the result:
hyperfine \
--warmup 1 --runs 10 \
--export-markdown bench.md \
--command-name hugo 'hugo --gc --minify' \
--command-name astro 'npx astro build'
For a clean cold measurement, prepend a cache-clearing step so the warm cache from the warmup does not leak in:
hyperfine \
--prepare 'rm -rf resources/_gen node_modules/.astro public dist' \
--runs 10 \
--command-name hugo 'hugo --gc --minify' \
--command-name astro 'npx astro build'
If you also want peak memory, wrap the build in GNU time -v and read "Maximum resident set size" — that figure comes from time -v, not /proc/self/status, which would report the shell's memory rather than the build's.
Measured Impact
On the 10k-page corpus, pinned to --cpus=2 --memory=4g, hyperfine produced the following means over 10 runs each:
| Generator | Cold build (mean ± σ) | Warm build (mean ± σ) |
|---|---|---|
| Hugo | 9.8s ± 0.4s | 8.9s ± 0.3s |
| Astro | 71.4s ± 2.1s | 22.6s ± 1.0s |
The story the numbers tell: Hugo wins cold builds outright because it has almost nothing to cache and a fast Go renderer, while Astro's cold build pays a large Vite/bundling cost that its warm cache (node_modules/.astro) largely recovers. The Hugo cold-vs-warm gap is small precisely because Hugo has no incremental production build — see Hugo Build Times for Large Repositories for why "warm" means cached assets, not skipped pages.
Cold vs Warm in CI
Test both empty-cache and warm-cache states, but be precise about what a warm Hugo build is. Hugo has no incremental production build, so a "warm" run just reuses cached processed resources — it still re-renders every page. A warm Astro build reuses Vite's cache. Persist the right directories in CI:
- uses: actions/cache@v4
with:
path: |
resources/_gen
~/.cache/hugo_cache
node_modules/.astro
key: ${{ runner.os }}-ssg-bench-${{ hashFiles('**/package-lock.json', 'go.sum') }}
Pitfalls & Rollback
- Dirty cache between runs: clear
resources/_gen(Hugo) andnode_modules/.astro(Astro) with--preparebefore cold tests, or warm numbers leak in. - Benchmarking the dev server:
hugo server/astro devenable HMR and skip minification. Measurehugoandastro buildonly. - Reading RSS from the wrong place: use GNU
time -v's "Maximum resident set size", not/proc/self/status. - Unpinned CPU/thermals: run in containers with
--cpus/--memorylimits; shared CI runners add noise. - Rollback: the benchmark is a throwaway repo and a script — delete the container and the generated corpus to revert, with no effect on your real site.
Conclusion
A trustworthy benchmark is mostly about control: identical containerized environments, an identical generated corpus, production builds only, and a hyperfine mean rather than a single noisy time run. Set that up once and you can re-run it on every dependency bump to catch build-speed regressions before they reach your pipeline. Feed the numbers back into the tuning work in Speeding Up Hugo Builds with Render Hooks and Caching.
FAQ
Should I benchmark cold or warm builds?
Both. A cold build reflects a fresh CI runner with empty caches; a warm build reflects cached resources. For Hugo, remember that warm means cached processed assets, not skipped pages, because Hugo has no incremental production build.
Why use hyperfine instead of the time command?
hyperfine runs multiple iterations, warms the cache, discards outliers, and reports a mean with standard deviation, so a one-off slow run does not skew the result. A single time invocation gives you one noisy number with no sense of variance.
What runner specs give reproducible results?
Fixed-spec runners or containers with pinned base images and explicit CPU and memory limits. Avoid shared CI runners with unpredictable background load, and disable other work on the host so neither build competes for CPU.
Does frontmatter format affect the comparison?
Slightly. Hugo parses YAML and TOML in Go while Astro parses in Node, so standardize on one frontmatter format across both corpora to isolate render speed from parser differences.
How many pages should the test corpus have?
Use tiers such as 10k, 50k, and 100k pages so you can see how each generator scales rather than reading a single point. Keep the content identical across both generators so the only variable is the engine.
Related
- Parent: Hugo Build Times for Large Repositories — the tuning guide these numbers feed.
- Speeding Up Hugo Builds with Render Hooks and Caching — apply the wins this benchmark reveals.
- Astro vs Eleventy for Documentation Sites — the build-speed trade-off in a docs context.
- Choosing the Right Static Site Generator for Production — where build speed sits among the selection criteria.