Back to TDD in ages of Agentic AI development
The risky thing about tool usage for the agents is that they are very smart at handling tool failure. If one tool fails they can call another tool to make up for the failure. But in actual solution...
Test-Driven Development (TDD) is a software development approach where tests are written before the actual code. Although TDD is strongly associated with agile methodologies and Extreme Programming (XP), it also gained importance in more traditional (plan-driven, waterfall-like) software development environments.
Why TDD?
By writing tests first, developers clarify the behavior of the code and identify potential edge cases early on.
In traditional development, late-stage changes or refactoring could be risky and expensive, as defects discovered during integration or testing phases could derail schedules. With TDD, a robust suite of tests provides a safety net, making it easier to modify and improve the code reducing fear of introducing breaking changes.
Writing tests first forces developers, business analysts, and stakeholders to articulate requirements more precisely.
TDD Promotes Iterative Mindset in a Non-Iterative Process. Even within a traditional, phase-gated process, TDD promotes small, incremental steps. Each test/code cycle helps validate that new functionality works as intended before proceeding.
TDD helps maintainability and Long-Term Benefits. A well-maintained test suite fosters maintainable and extensible code. When requirements change or features are added, existing tests help ensure that legacy functionality remains correct. This is crucial in long-running traditional projects where the software is expected to operate for years (or even decades).
TDD for Agentic systems
TDD was gaining popularity even back in 2007 and onwards. I can remember from that time when we were adopting TDD, some of our team were very annoyed by the fact that we have to write the test code even before the actual code. In my observation this simple contract is the most valuable aspect in agentic AI development. LLM calls particularly for a unit task/job are fundamentally nondeterministic.
If you put your LLM calls in the unit-test your unit tests will not always pass. It will be a non-deterministic test, but the level of certainty you increase for passing a unit test directly translates into confidence that your application will function as intended.
Tools are another important piece of code that are deterministic in nature and very crucial to the agent to function properly. The risky thing about tool usage for the agents is that they are very smart at handling tool failure. If one tool fails they can call another tool to compensate for the failure. But in actual solution it may not be adequate to swap tools, tools swapping may degrade actual performance without even revealing that the wrong tool is being used.
Unit tests can be used very efficiently to safeguard the tool functionality. As long as the tools function properly the LLM can call the right tools successfully and the agentic system is less prone to hallucinate.
Here in this article (How Small Tools Like Pytest and Poetry Can Be Valuable to a Data Scientist) I have described how you can use unit tests in python projects. I would also appreciate your thoughts, likes and dislikes about the topic.