Netflixs engineering team has transformed its internal search capabilities by moving from a rigid Graph Search Filter DSL to a natural‑language interface powered by Large Language Models. This shift reduces development effort, improves user experience across product suites, and creates a reusable platform that can be managed by internal teams through automation.
Background of Graph Search at Netflix
Netflix built its original Graph Search system to allow engineers and analysts to query federated data sources spread across many services. The platform relied on a custom filter language that described graph traversals, property filters, and aggregation steps. While the system could scale to billions of nodes, the learning curve for new users was steep, and many teams built adapters to translate UI actions into DSL statements.
Challenges with Structured Query Interfaces
Using a structured language required developers to understand schema details, syntax rules, and execution semantics. Mistakes in query formation often resulted in empty results or performance bottlenecks. Additionally, the need to maintain separate parsers for each service increased operational overhead and slowed feature rollout across the organization.
Adoption of Large Language Models for Search
The rise of Large Language Models (LLMs) offered a way to interpret free‑form text and generate syntactically correct DSL statements. Netflix experimented with several pre‑trained models, fine‑tuning them on internal query logs and schema documentation. The models learned to map natural language intents to graph traversal patterns, dramatically simplifying the user interaction model.
Text‑to‑Query Translation Architecture
The production pipeline consists of three layers: a front‑end service that captures user input, an inference engine running the fine‑tuned LLM, and a validation module that checks generated DSL against schema constraints. If the validation fails, the system returns a clarification request, ensuring that only safe queries reach the execution engine.
Performance Evaluation and Metrics
Netflix measured latency, accuracy, and error rates across a set of benchmark queries. Average response time dropped from 350 ms to 180 ms after introducing the LLM layer, while the percentage of successful translations rose to 92 %. The team also tracked developer time saved, reporting a reduction of roughly 30 % in effort required to implement new search features.
Self‑Managed Platform Deployment
To avoid reliance on external services, Netflix packaged the inference engine into a containerized microservice that runs on its internal Kubernetes cluster. This approach gives the engineering organization full control over model versions, scaling policies, and security settings, allowing rapid iteration without external dependencies.
Impact on Engineering Culture and Product Development
The new search experience has encouraged cross‑team collaboration, as product owners can now specify requirements in plain language without deep technical knowledge of the DSL. Engineering squads report faster prototyping cycles, and the platforms open API has become a standard component in many upcoming Netflix products.