SSG Framework Selection Matrix

A selection matrix turns "which SSG feels nicer" into a scored decision you can defend in a review. Weight the criteria that matter for your project, run the same realistic build on each candidate, and let the numbers narrow the field. The output isn't false precision — it's an explicit, comparable judgment that survives a year of hindsight. For the underlying trade-offs behind each criterion, see Choosing the Right Static Site Generator for Production.

A weighted scoring matrix for static site generators A table-style diagram with weighted criteria down the left — build speed, authoring, ecosystem, routing, integrations — and four candidate generators scored one to five per criterion, with weighted totals shown at the bottom. Weight & score, then sum Criterion (weight) Astro Eleventy Hugo Jekyll Build speed (30%) 3 4 5 2 Authoring (25%) 5 3 3 4 Ecosystem (20%) 4 3 4 5 Routing (15%) 5 4 4 3 Integrations (10%) 4 3 3 4 Weighted total 4.10 3.45 4.05 3.40 Change the weights and the winner changes — that is the matrix doing its job.
The same four candidates rank differently depending on the weights: tilt build speed up and Hugo leads; tilt authoring up and Astro leads. The matrix makes the trade-off explicit instead of aesthetic.

Define Criteria & Weighting

Pick the categories that actually drive your project and assign weights that sum to 100% — typically build velocity, developer/authoring experience, routing flexibility, ecosystem health, and headless-CMS/integration support. Drop any criterion that every candidate passes equally; it only dilutes the others. Account for the language each framework lives in: Hugo (Go) wins raw speed, Astro and Eleventy (Node) offer flexible component and hydration models, and Jekyll (Ruby) trades some speed for a long-stable ecosystem.

Score each candidate 1–5 per category, multiply by the weight, and rank. As the diagram shows, the ranking is sensitive to weights by design — a docs team that weights build speed at 30% will surface Hugo, while a marketing team that weights authoring at 30% surfaces Astro. Map required integrations (search, analytics, i18n) to native capabilities, since anything you have to bolt on with an unmaintained plugin is a future liability, not a checkmark.

Write a one-line definition of what a 1 and a 5 mean for each criterion before you score, or the numbers drift between candidates and reviewers. "Build speed: 5 = cold full build under 10s on our content set; 1 = over two minutes" is defensible; an unanchored 1–5 is just a feeling with a number on it. Where two candidates score within a fraction of a point, treat it as a tie and break it on something the matrix can't capture — who you can hire for, what your platform team already runs, which community you'd rather ask for help. The matrix exists to eliminate the clearly-wrong options and surface the close ones, not to manufacture a winner out of rounding noise.

Run an Identical Evaluation Build

Scaffold your top candidates with the same content and measure a production-grade build on each. Note that Eleventy is installed into a project rather than scaffolded with a create command:

# Astro: interactive scaffold
npm create astro@latest astro-eval

# Hugo: new site skeleton
hugo new site hugo-eval

# Eleventy: add to a fresh project (no `create-eleventy` package exists)
mkdir eleventy-eval && cd eleventy-eval && npm init -y && npm install @11ty/eleventy

Load each with the same realistic dataset and confirm routing, template inheritance, and asset handling behave before you compare build numbers. Time the cold build with hyperfine so the numbers are repeatable rather than eyeballed:

hyperfine --warmup 1 'npm run build'

On an identical 2,000-page Markdown corpus, a representative run looks like this — capture your own, but the shape is predictable:

CandidateCold build (2,000 md pages)Default JS shipped
Hugo6s0 KB
Eleventy38s0 KB
Astro74s0 KB (until islands)
Jekyll2m 40s0 KB

For the component and hydration dimension specifically, Astro vs Eleventy for Documentation Sites goes deeper on what those build seconds buy you.

Capture two numbers per candidate, not one: the cold build and the cached incremental rebuild, because they predict different costs. The cold build is what CI pays on a clean runner; the incremental rebuild is what an author pays on every save. A framework can win the cold build and lose the dev loop, or vice versa. While you're scaffolding, also note the output size and the default JavaScript payload — a generator that builds fast but ships a heavy runtime has moved the cost from your CI bill to your readers' devices, which is the worse place for it to live.

Run Candidates in a CI Matrix

Cache dependencies and run candidates in a matrix so the comparison is apples-to-apples — same runner, same hardware, same job:

# .github/workflows/ssg-matrix-build.yml
name: SSG Matrix Build
on: [push, pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        candidate: [astro-eval, eleventy-eval]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
        working-directory: ${{ matrix.candidate }}
      - run: npm run build
        working-directory: ${{ matrix.candidate }}

Dependency caching (cache: npm) is the reliable, framework-agnostic speedup. Incremental flags vary by framework, so add them per-candidate rather than assuming a shared --incremental. Isolating each candidate in its own working-directory keeps caches from colliding so the build times stay comparable. Run the comparison on the same runner size, too — a candidate that looks faster only because it happened to land on a warmer cache or a larger machine has told you nothing. Pin the runner, pin the Node version, and run each candidate two or three times so a single noisy build doesn't decide the score; report the median, not the best or worst run.

Validate the Content Model Before You Commit

The most expensive mistake a matrix can hide is a content model that breaks at scale. Before scoring is final, build a few genuinely nested collections — versioned docs, multi-author posts, tagged guides — on each finalist and confirm routing and template inheritance hold. A framework that scores well on a flat blog can still force a painful refactor once you add a second content type, a relationship between content types, or a cross-reference that has to stay valid as pages move.

Push on the awkward cases deliberately, because they're the ones that surface a framework's real limits: a page that belongs to two collections, a taxonomy with thousands of terms, a redirect that must survive a slug change. If a candidate makes any of those genuinely hard now, it will make them harder at ten times the size. If your content spans languages, the i18n routing model deserves its own scoring row — translation fallbacks, per-locale URLs, and language switchers behave very differently across generators; that's covered in Picking an SSG for a Multi-Language Documentation Site.

Production Hardening

Once you've chosen, enforce the standard you scored for. Gate deploys on a Lighthouse budget — LCP < 2.5s, CLS < 0.1, TBT < 200ms:

# lighthouserc.yml
ci:
  collect:
    numberOfRuns: 3
    settings:
      preset: desktop
  assert:
    assertions:
      "categories:performance": ["error", { "minScore": 0.9 }]
      "categories:accessibility": ["error", { "minScore": 0.95 }]

Turn the same rigor on the human process: a written rubric keeps the team from re-litigating the choice every quarter. The SSG Selection Checklist for Engineering Teams packages the criteria into a reusable list. For teams whose authors aren't developers, weight onboarding heavily and start from Best SSG for Technical Writers Without Coding Experience.

Score Total Cost of Ownership, Not Just Day One

A matrix that only measures launch-day fit will reward the framework that demos best and punish you eighteen months later. The criteria that predict long-term cost are dull on day one and decisive over the project's life: how often the framework ships breaking changes, how large the upgrade is each time, how active the maintenance community is, and how many people you can realistically hire who already know it.

Add an explicit row for upgrade burden and score it from the framework's own history. A generator that ships a major version every year with a migration guide is a known, plannable cost; one that breaks templates on minor releases is a recurring tax you can't schedule. Check the changelog and the issue tracker for the candidates, not the marketing site — the gap between "we value stability" and the actual cadence of breaking changes is where the real cost hides.

Hiring and knowledge transfer belong in the score too. The language the framework lives in is a proxy for both: a Node-based generator draws from the largest frontend talent pool, Go and Ruby from smaller ones. If the site will outlive its original author — and most production sites do — the relevant question isn't "can our best engineer use this," it's "can the engineer who inherits this in two years figure it out from the docs without us." A framework with thorough first-party documentation and a large, answer-rich community scores high on a criterion that never shows up in a build benchmark but dominates the maintenance years. Capture the durable version of these criteria in a written rubric — the SSG Selection Checklist for Engineering Teams is built for exactly this — so the same standard applies the next time the question comes up.

Common Pitfalls

  • Ignoring incremental/build-speed needs: a framework that only does full rebuilds becomes a CI bottleneck at scale. Weight build velocity for large repos.
  • Counting plugins as a positive: a long plugin list often means reliance on unmaintained community code. Prefer native capabilities and official integrations.
  • Skipping content-modeling validation: not testing nested collections and custom routing early forces a painful mid-project refactor.
  • Over-precise weights: scoring to two decimal places implies certainty you don't have. Round to whole-number weights and let clear gaps, not 0.05 differences, decide.

Conclusion

A selection matrix is only as good as its inputs: weight what your project actually needs, anchor each score to a written definition, run the same realistic build on each candidate, and prove the routing and content model before you commit. Score the durable costs — upgrade burden, hiring, documentation — alongside the day-one fit, because those are what you pay for over the life of the site. Decide deliberately and you avoid the far more expensive decision, the one no matrix can soften: migrating frameworks after launch, with real content and real readers already depending on the URLs.

Key Takeaways

  • Weight what your project actually needs; the ranking is meant to move when the weights move.
  • Run the same realistic build on each candidate and time it with hyperfine so the numbers are repeatable.
  • Use a CI matrix with isolated working directories so build times are produced on identical hardware.
  • Validate nested collections and routing before committing — a flat-blog winner can fail at scale.
  • A written rubric keeps the decision from being re-argued every quarter.

FAQ

How do I weight criteria for documentation versus marketing sites?

Documentation favors build speed, native Markdown and MDX support, and search indexing, so weight those highest. Marketing favors component flexibility, headless-CMS integration, and Core Web Vitals. The same matrix works for both — only the weights change, which is exactly the point of scoring instead of guessing.

What build-time threshold should I target for 10,000 or more pages?

Aim for full builds in the tens of seconds. If you are into minutes, look at resource caching, scoped templates, or a faster engine such as Hugo before accepting it. A multi-minute cold build will eventually collide with CI timeouts and slow every release as the repository grows.

Can I evaluate multiple SSGs in one CI pipeline?

Yes. Use a build matrix and isolate each candidate in its own working directory or container so caches do not collide. Run the identical content set through each, capture the build time and output size, and compare them in the same job so the numbers are produced on identical hardware.

How many criteria should a selection matrix have?

Usually five to seven. Fewer than four hides important trade-offs; more than eight makes the weights so small that the ranking gets noisy. Pick the criteria that genuinely drive your project, drop the ones every candidate passes, and make the weights sum to one hundred percent.

Should plugin count be a scoring criterion?

Not as a positive on its own. A long plugin list often signals reliance on unmaintained community code rather than capability. Score native, first-party support for the integrations you actually need, and treat heavy plugin dependence as a maintenance risk rather than a feature.

Static Site Generators in Production