Bonihua
A dataset-driven SEO/GEO build and how we made it crawlable for AI search.
Last updated: 2026-02-25
TL;DR
Bonihua is a bilingual-in-spirit project running on two production domains (.ru and .by) from one codebase.
Don’t “write SEO pages”. Build datasets first, then render indexable pages from them.
It’s a static export deployment (Netlify-friendly) while staying entirely domain-aware to avoid mixing canonicals.
What we were building
- • A data-driven learning resource about Chinese tutoring.
- • Pages useful to humans and highly legible to AI crawlers.
- • A scalable system: add JSONL records → pages + links appear.
The Stack
- • Next.js App Router with static export.
- • React 19, TypeScript, Tailwind v4.
- • JSONL + Zod schemas for dataset validation.
- • Netlify build pipelines.
The “don’t hardcode domains” rule
Two domains, one build. To prevent SEO chaos, every page derives its base URL dynamically.
Centralized in lib/host.ts:
- →
resolveBaseFromEnvOrHost()acts dynamically during runtime (reading headers). - → Falls back to env vars during static builds.
- → Normalizes output to the active domain (e.g., bonihua.ru).
const { CANONICAL_BASE } = await resolveBaseFromEnvOrHost();
return {
metadataBase: new URL(CANONICAL_BASE),
alternates: await buildAlternatesAuto("/datasets"),
openGraph: { url: "/datasets" },
};Data Layer First
We treat datasets as the product core:
- • Source of truth:
.jsonlfiles. - • Metadata & grouping via registry.
- • Loaders manage tags and cross-linking dynamically.
Key Note: Filters (e.g., query params) explicitly set robots: noindex to prevent duplicate content bloat.
Schema.org Hygiene
We emit schemas that match reality, not generic blocks:
- • Hubs:
DataCatalog,Dataset,ItemList. - • Items:
Article(isPartOf Dataset),DefinedTerm, orCourse.
"That’s the difference between a page with text and a page machines can classify."
GEO & AI-Search Discoverability
Beyond standard SEO, we built specific crawler endpoints:
GET /llms.txtDomain-aware prompt feeding.
GET /llms-datasets.txtPrioritized dataset list.
GET /aiDataCatalog landing page.
public/ai/catalog.jsonStatic payload dumps.
Get in touch
Interested in learning more about how we build crawlable data systems, or want to discuss a similar project? We're always happy to chat.
Contact the Team