Japanese Full-Text Search on D1 (SQLite): FTS5 Trigram with Layered Fallbacks
# How do you search Japanese text without a morphological analyzer? Using FTS5's trigram tokenizer on Cloudflare D1, with LIKE for short queries and prefix search for English — a three-stage fallback design.
Turnstile site key が未設定のため、このフォームは送信できません。管理者は NEXT_PUBLIC_TURNSTILE_SITE_KEY を設定してください。
[ ]まだコメントはありません。
Why Japanese full-text search is hard
English words are separated by spaces; Japanese has no separators. The usual answer is a morphological analyzer like MeCab, but you cannot ship one to Cloudflare Workers + D1 — the dictionaries are far too large.
So this blog uses FTS5's trigram tokenizer: every 3-character window becomes a token, which makes Japanese searchable with no dictionary at all.
Schema
sql
CREATE VIRTUAL TABLE posts_fts_ja USING fts5( title, excerpt, content, tags, slug, tokenize = 'trigram');-- keep it in sync with the posts table via triggersCREATE TRIGGER posts_fts_ja_insert AFTER INSERT ON posts BEGIN INSERT INTO posts_fts_ja(rowid, title, excerpt, content, tags, slug) VALUES (NEW.rowid, NEW.title, NEW.excerpt, NEW.content, NEW.tags, NEW.slug);END;CREATE TRIGGER posts_fts_ja_update AFTER UPDATE ON posts BEGIN DELETE FROM posts_fts_ja WHERE slug = OLD.slug; INSERT INTO posts_fts_ja(rowid, title, excerpt, content, tags, slug) VALUES (NEW.rowid, NEW.title, NEW.excerpt, NEW.content, NEW.tags, NEW.slug);END;
The important decision: sync with database triggers, not application code. If the app is responsible for updating the index, a forgotten write will eventually rot it. Let the database guarantee it.
Query escaping
FTS5's MATCH syntax chokes on terms with symbols, like Next.js. Always wrap the query in double quotes:
ts
function toFtsQuery(query: string) { const clean = query.replace(/["']/g, " ").replace(/\s+/g, " ").trim(); if (!clean) return ""; return `"${clean.replace(/"/g, '""')}"`;}
The search itself is rank-ordered:
sql
SELECT posts.* FROM posts_fts_jaJOIN posts ON posts.rowid = posts_fts_ja.rowidWHERE posts.status = 'published' AND posts_fts_ja MATCH ?ORDER BY rank LIMIT 30;
The trap: trigram cannot match queries under 3 characters
Because trigram tokenizes 3-character windows, one- or two-character queries — extremely common in Japanese ("AI", 検索) — return nothing. You only discover this in production.
The fix is to branch on query length:
ts
const useShortQueryLike = Array.from(clean).length < 3; // Array.from for surrogate pairsconst sql = useShortQueryLike ? `SELECT * FROM posts WHERE status = 'published' AND (title LIKE ? OR excerpt LIKE ? OR content LIKE ? OR tags LIKE ?) ORDER BY datetime(created_at) DESC LIMIT 30` : `SELECT posts.* FROM posts_fts_ja ... MATCH ?`;
Short queries do fine with LIKE '%xx%'. A full scan over a few hundred posts takes single-digit milliseconds. Don't assume everything must go through FTS.
The full three-stage fallback
The final search flow:
1. **Query under 3 chars** → LIKE scan
2. **3+ chars** → trigram FTS (the main Japanese path)
3. **Zero hits** → prefix search for English words (`token*` against a separate FTS table)
4. **Still zero** → LIKE scan as the last resort
Each stage is wrapped in try/catch so an FTS syntax error can never take down search entirely. For search, *degrading gracefully* matters more than precision.
Migration note
When adding an FTS table to a database that already has data, backfill inside the migration — idempotently:
sql
INSERT INTO posts_fts_ja(rowid, title, excerpt, content, tags, slug)SELECT rowid, title, excerpt, content, tags, slug FROM postsWHERE NOT EXISTS (SELECT 1 FROM posts_fts_ja WHERE posts_fts_ja.slug = posts.slug);
Takeaways
• Trigram FTS5 is the practical answer for Japanese search without an analyzer
• Sync the FTS table with DB triggers, never application code
• Trigram misses sub-3-character queries — branch to LIKE
• Design search as layered fallbacks that never die
Turnstile site key が未設定のため、このフォームは送信できません。管理者は NEXT_PUBLIC_TURNSTILE_SITE_KEY を設定してください。