Case Study · SEO & AI Discovery

Bonihua

A dataset-driven SEO/GEO build and how we made it crawlable for AI search.

Last updated: 2026-02-25

TL;DR

Bonihua is a bilingual-in-spirit project running on two production domains (.ru and .by) from one codebase.

Don’t “write SEO pages”. Build datasets first, then render indexable pages from them.

It’s a static export deployment (Netlify-friendly) while staying entirely domain-aware to avoid mixing canonicals.

What we were building

  • A data-driven learning resource about Chinese tutoring.
  • Pages useful to humans and highly legible to AI crawlers.
  • A scalable system: add JSONL records → pages + links appear.

The Stack

  • Next.js App Router with static export.
  • React 19, TypeScript, Tailwind v4.
  • JSONL + Zod schemas for dataset validation.
  • Netlify build pipelines.

The “don’t hardcode domains” rule

Two domains, one build. To prevent SEO chaos, every page derives its base URL dynamically.

Centralized in lib/host.ts:

  • resolveBaseFromEnvOrHost() acts dynamically during runtime (reading headers).
  • → Falls back to env vars during static builds.
  • → Normalizes output to the active domain (e.g., bonihua.ru).
const { CANONICAL_BASE } = await resolveBaseFromEnvOrHost();
return {
  metadataBase: new URL(CANONICAL_BASE),
  alternates: await buildAlternatesAuto("/datasets"),
  openGraph: { url: "/datasets" },
};

Data Layer First

We treat datasets as the product core:

  • Source of truth: .jsonl files.
  • Metadata & grouping via registry.
  • Loaders manage tags and cross-linking dynamically.

Key Note: Filters (e.g., query params) explicitly set robots: noindex to prevent duplicate content bloat.

Schema.org Hygiene

We emit schemas that match reality, not generic blocks:

  • Hubs: DataCatalog, Dataset, ItemList.
  • Items: Article (isPartOf Dataset), DefinedTerm, or Course.

"That’s the difference between a page with text and a page machines can classify."

GEO & AI-Search Discoverability

Beyond standard SEO, we built specific crawler endpoints:

GET /llms.txt
Domain-aware prompt feeding.
GET /llms-datasets.txt
Prioritized dataset list.
GET /ai
DataCatalog landing page.
public/ai/catalog.json
Static payload dumps.

Get in touch

Interested in learning more about how we build crawlable data systems, or want to discuss a similar project? We're always happy to chat.

Contact the Team