Just‑in‑Time Tests (JiTTests) for Agentic Software Development

21 February 2026 by

Suraj Barman

Context & History of Just‑in‑Time Tests

The software industry has shifted toward AI‑augmented coding, where large language models assist in writing, reviewing, and merging code. This acceleration exposed the limits of traditional static test suites, which require continual updates and often generate noise. In response, developers introduced Just‑in‑Time Tests (JiTTests), a method that creates a custom test at the moment a pull request appears. By leveraging the same language model that suggested the change, JiTTests can anticipate the intended behavior and generate a focused test that runs immediately, catching regressions before they reach production.

Implementation & Best Practices

Implementing JiTTests involves three phases: capture, generate, and validate. First, the system intercepts a code change event, extracts the diff, and supplies it to an LLM. Second, the model crafts a test that targets the modified functionality, using prompt engineering techniques to steer the output toward high‑signal checks. Third, the generated test is executed in a sandbox, and the results are reported back to the developer. Following this roadmap ensures that JiTTests remain lightweight, relevant, and safe for continuous integration pipelines.

Capturing Code Changes

Integrate a webhook or Git hook that triggers on pull‑request creation. The hook should collect the full diff, relevant metadata (author, branch, issue ID), and any existing documentation. Providing this context helps the LLM understand the purpose of the change.

Prompt Engineering for Test Generation

Design prompts that ask the model to produce a single, deterministic test that verifies the new behavior and explicitly checks for regressions. Refer to best practices in prompt engineering for small language models to keep prompts concise and reproducible.

Sandbox Execution and Reporting

Run the generated test in an isolated environment matching the project's runtime. Capture pass/fail outcomes, execution time, and any exception details. Feed the results back into the pull‑request as a comment, highlighting only genuine failures to avoid distracting developers with false alarms.

Best‑Practice Checklist

Limit test scope to the modified files to keep execution fast.
Validate LLM output against a safety policy to block malicious code.
Cache successful test templates for similar changes to improve consistency.
Monitor false‑positive rates and adjust prompts regularly.

Choosing the Right LLM for JiTTests

Selecting a model balances cost, latency, and capability. For most codebases, a 7‑b parameter model offers sufficient reasoning while staying affordable. Larger models may provide deeper semantic understanding but increase response time. See the discussion on agentic AI for guidance on aligning model choice with development workflow.

Deployment Strategies

Deploy the model as a hosted API within your CI/CD platform, or run it on‑premises for tighter security. Ensure rate limits align with the volume of pull requests to prevent bottlenecks.

Continuous Improvement

Collect telemetry on test generation success, execution duration, and developer feedback. Use this data to refine prompts, adjust model parameters, and expand the test‑generation library.