Case Study · SEO & AI Discovery

Bonihua

A dataset-driven SEO/GEO build and how we made it crawlable for AI search.

Last updated: 2026-02-25

TL;DR

Bonihua is a bilingual-in-spirit project running on two production domains (.ru and .by) from one codebase.

Don’t “write SEO pages”. Build datasets first, then render indexable pages from them.

It’s a static export deployment (Netlify-friendly) while staying entirely domain-aware to avoid mixing canonicals.

What we were building

• A data-driven learning resource about Chinese tutoring.
• Pages useful to humans and highly legible to AI crawlers.
• A scalable system: add JSONL records → pages + links appear.

The Stack

• Next.js App Router with static export.
• React 19, TypeScript, Tailwind v4.
• JSONL + Zod schemas for dataset validation.
• Netlify build pipelines.

The “don’t hardcode domains” rule

Two domains, one build. To prevent SEO chaos, every page derives its base URL dynamically.

Centralized in lib/host.ts:

→ resolveBaseFromEnvOrHost() acts dynamically during runtime (reading headers).
→ Falls back to env vars during static builds.
→ Normalizes output to the active domain (e.g., bonihua.ru).

const { CANONICAL_BASE } = await resolveBaseFromEnvOrHost();
return {
  metadataBase: new URL(CANONICAL_BASE),
  alternates: await buildAlternatesAuto("/datasets"),
  openGraph: { url: "/datasets" },
};

Data Layer First

We treat datasets as the product core:

• Source of truth: .jsonl files.
• Metadata & grouping via registry.
• Loaders manage tags and cross-linking dynamically.

Key Note: Filters (e.g., query params) explicitly set robots: noindex to prevent duplicate content bloat.

Schema.org Hygiene

We emit schemas that match reality, not generic blocks:

• Hubs: DataCatalog, Dataset, ItemList.
• Items: Article (isPartOf Dataset), DefinedTerm, or Course.

"That’s the difference between a page with text and a page machines can classify."

GEO & AI-Search Discoverability

Beyond standard SEO, we built specific crawler endpoints:

GET /llms.txt
Domain-aware prompt feeding.

GET /llms-datasets.txt
Prioritized dataset list.

GET /ai
DataCatalog landing page.

public/ai/catalog.json
Static payload dumps.

Get in touch

Interested in learning more about how we build crawlable data systems, or want to discuss a similar project? We're always happy to chat.

Contact the Team