Context & History
Netflixs engineering culture has long emphasized scalability, reliability, and developer autonomy. The Graph Search platform emerged to allow internal teams to query federated data sets across content, business, and analytics domains using a proprietary Filter Domain Specific Language (DSL). Early releases prioritized high throughput and configurability, but required users to master a structured query syntax that mirrored GraphQL field definitions. Over several years, the platform expanded to support dozens of micro‑services and thousands of searchable fields, yet the interaction model remained rooted in UI widgets and manual filter construction. As large language models (LLMs) became readily available, Netflix identified an opportunity to lower the barrier for non‑technical stakeholders by translating everyday language into the existing DSL, preserving the platforms performance characteristics while simplifying the user experience.
Implementation & Best Practices
Embedding LLM‑driven translation involves three disciplined stages: context engineering, prompt design, and post‑generation validation. First, the system extracts the GraphQL schema for each index, converting field names, types, and controlled vocabularies into a concise metadata payload. This payload is injected into the LLM prompt so the model can generate syntactically correct expressions (e.g., year >= 1990 AND year < 2000 AND genre IN ["Sci‑Fi"]). Second, prompts are crafted to emphasize intent preservation, requesting the model to output only the filter clause without explanatory text. Including examples of natural language paired with DSL snippets improves consistency across diverse query styles. Third, after generation the output passes through a parser that checks grammar, validates field names against the schema, and ensures enumerated values belong to the defined controlled vocabularies. If mismatches are detected, the system falls back to a clarification loop, asking the user to refine ambiguous terms. This loop reduces hallucination risk and builds trust. For production readiness, Netflix adopts the following best practices:
- Version the schema snapshot used for prompt generation and update it on a weekly cadence.
- Cache LLM responses for identical queries to minimize latency and cost.
- Instrument end‑to‑end latency and success metrics, alerting on regression beyond a 200 ms threshold.
- Document the prompt template and validation rules alongside the service code to facilitate hand‑offs.