AgentKit provides an integrated suite that lets developers create, manage, and improve AI agents without juggling disconnected tools.
Agent Builder – visual workflow designer
Agent Builder offers a drag‑and‑drop canvas where teams can map out multi‑agent logic, add guardrails, and version each iteration. The interface promotes rapid prototyping and clear communication across product, legal, and engineering groups.
- Canvas supports nodes for tool calls, conditionals, and loop constructs.
- Built‑in preview runs let you test a workflow instantly.
- Full version history enables safe rollback and A/B testing.
- Template library accelerates common patterns such as support bots and research assistants.
- Integrates with large language models via the Responses API.
Connector Registry – centralized data and tool management
The Connector Registry aggregates external services into a single admin panel, allowing administrators to control access and configure connections for all agents in an organization.
- Pre‑built connectors for Dropbox, Google Drive, SharePoint, Microsoft Teams, and more.
- Custom connector framework supports third‑party MCPs.
- Global Admin Console governs domains, SSO, and multi‑org API keys.
- Guardrails can be attached at the connector level to mask PII or block jailbreak attempts.
- Works within Choosing the right AI model guidelines.
ChatKit – embedable chat‑based agent UI
ChatKit reduces the effort required to integrate conversational agents into web or mobile products, handling streaming responses, thread management, and UI theming.
- SDKs for JavaScript and React simplify embedding.
- Customizable themes align the chat appearance with brand guidelines.
- Supports real‑time “thinking” indicators to improve user trust.
- Built‑in analytics capture interaction metrics.
- Compatible with cloud computing architecture for scalable deployments.
Evals Enhancements – measuring and improving agent performance
New eval capabilities let developers create datasets, run trace grading, automate prompt refinement, and benchmark third‑party models, all from a single interface.
- Dataset creator with automatic grader attachment.
- Trace grading visualizes end‑to‑end workflow execution.
- Prompt optimizer generates higher‑quality prompts based on human feedback.
- Third‑party model support expands evaluation beyond OpenAI offerings.
- Integrated with secure development environments best practices.
Reinforcement Fine‑Tuning – custom reasoning for specific tasks
Reinforcement fine‑tuning (RFT) allows teams to adapt reasoning models to call the right tools at the right moment and to enforce custom success criteria.
- Custom tool‑call training improves workflow efficiency.
- Custom graders let you define task‑specific evaluation metrics.
- Available on o4‑mini and in private beta for GPT‑5.
- Beta feedback loop informs future model releases.
- Works in conjunction with Agent Builder guardrails for safe operation.