Testing Strategies That Prevent Production Failures

March 25, 2026 · 9 min read

Production failures are expensive. A bug that costs R100 to fix during development costs R1,500 to fix during testing and R15,000 to fix in production -- once you account for incident response, root cause analysis, hotfix development, emergency deployment, customer communication, and reputation damage. A comprehensive testing strategy is not overhead. It is insurance. At Pepla, we have built testing practices across our custom software development projects that give our teams and clients confidence that what we deploy works correctly under real-world conditions.

The Testing Pyramid

The testing pyramid, popularised by Mike Cohn, is the foundational model for structuring a test suite. It prescribes a large number of fast, focused unit tests at the base; a moderate number of integration tests in the middle; and a small number of slow, comprehensive end-to-end tests at the top. The shape is intentional: each level provides a different kind of confidence, and the ratio between levels optimises for both coverage and speed.

Unit Tests: The Foundation

Unit tests verify individual functions, methods, or classes in isolation. They are the fastest tests to write and run -- a suite of 1,000 unit tests typically completes in seconds. They test business logic, data transformations, validation rules, and algorithmic correctness without involving databases, file systems, APIs, or user interfaces.

Effective unit tests follow several principles. They test one thing -- a single behaviour or scenario. They are independent -- each test can run in any order without affecting others. They are fast -- if a unit test takes more than a few milliseconds, it is probably not a unit test. They are deterministic -- they produce the same result every time.

The mocking and stubbing patterns used in unit testing isolate the code under test from its dependencies. If a service method calls a database repository, the unit test replaces that repository with a mock that returns predetermined data. This isolation means unit test failures point directly to the code that is broken, without ambiguity about whether the failure is in the code, the database, or the network.

At Pepla, we target 80% or higher code coverage for business logic. Coverage alone is not a quality metric -- you can have 100% coverage with meaningless tests -- but it serves as a safety net that catches regressions when code is modified. The most valuable unit tests are the ones that test edge cases: null inputs, empty collections, boundary values, error conditions, and concurrent access.

The testing pyramid is intentional -- cheap fast unit tests at the base, expensive slow E2E tests at the top. Invert it and your suite becomes brittle.

Integration Tests: The Glue

Integration tests verify that components work together correctly. Where unit tests isolate components from their dependencies, integration tests intentionally include those dependencies to test the connections between them.

Common integration test scenarios include testing that a service layer correctly reads from and writes to a real database (or an in-memory equivalent), testing that an API controller correctly serialises responses and handles HTTP errors, testing that message producers and consumers communicate correctly through a queue, and testing that authentication and authorisation work end-to-end through the middleware stack.

Integration tests are slower than unit tests -- they may involve starting databases, HTTP servers, or message brokers. A typical integration test suite runs in minutes rather than seconds. But they catch a category of bugs that unit tests cannot: serialisation errors, configuration mismatches, database query errors, and network protocol issues.

The key discipline with integration tests is scope management. An integration test that exercises the entire application from API to database is really an end-to-end test wearing integration test clothing. True integration tests focus on specific integration points -- the boundary between two components -- and verify that the contract between them is honoured.

End-to-End Tests: The User's Perspective

End-to-end (E2E) tests simulate real user workflows through the complete application. They launch a browser (or API client), navigate through screens, fill in forms, click buttons, and verify that the application produces the correct outcomes. They test the system as a whole, including all integrations, configurations, and infrastructure.

E2E tests provide the highest confidence but at the highest cost. They are slow (minutes per test), brittle (sensitive to UI changes, timing issues, and test data), and expensive to maintain. A flaky E2E test that fails intermittently erodes team trust in the test suite and trains developers to ignore failures.

Because of these costs, E2E tests should be limited to critical user journeys -- the workflows where failure would have the most significant business impact. For an e-commerce application, critical journeys might include: user registration, product search, adding items to cart, checkout and payment, and order status tracking. These five scenarios cover the core business value. Secondary features can be covered by cheaper tests lower in the pyramid.

TDD vs BDD

Test-Driven Development (TDD) is a development practice where you write the test before writing the code. The cycle is: write a failing test that describes the desired behaviour, write the minimum code to make it pass, then refactor the code while keeping the test green. This "red-green-refactor" cycle drives design toward modular, testable code and ensures that every piece of functionality has a corresponding test.

TDD works best for well-understood business logic where the inputs, outputs, and rules are clear. It is less effective for exploratory work where the design is still emerging, or for integration-heavy code where setting up test infrastructure dominates the effort.

Behaviour-Driven Development (BDD) extends TDD by expressing tests in natural language using the Given-When-Then format. "Given a customer has items in their cart, when they proceed to checkout, then the order summary displays the correct total including VAT." BDD scenarios serve as both tests and documentation, making them readable by non-technical stakeholders.

BDD frameworks like Cucumber, SpecFlow, and behave parse these natural-language scenarios and map them to test code. This enables collaboration between BAs, testers, and developers on the same artefact -- the BA writes the scenario, the developer implements it, and the automated test verifies it. At Pepla, we use BDD for acceptance criteria on complex business rules where stakeholder validation is critical.

A bug costs R100 to fix in development, R1,500 in testing, and R15,000 in production. The testing pyramid is not overhead -- it is insurance.

Test Automation Frameworks

Choosing the right test automation framework depends on your technology stack and the type of testing you need.

For unit and integration testing: Jest and Vitest dominate the JavaScript ecosystem. xUnit and NUnit are standard for .NET. JUnit and TestNG serve Java. pytest is the go-to for Python. Each provides test runners, assertion libraries, and mocking capabilities.

For API testing: REST Assured (Java), Supertest (Node.js), and Postman/Newman provide tools for testing HTTP APIs. They support request construction, response validation, and chaining requests to test multi-step workflows.

For E2E testing: Playwright has emerged as the leading browser automation framework, supporting Chromium, Firefox, and WebKit with a single API. Cypress remains popular for its developer experience and debugging capabilities. Selenium, while older, has the broadest browser and language support.

At Pepla, we select frameworks based on the project's technology stack and team familiarity. The framework itself matters less than the discipline of writing and maintaining tests consistently. A team that writes comprehensive tests in any framework delivers better software than a team with a cutting-edge framework and no tests.

Performance Testing

Functional tests verify that the software works correctly. Performance tests verify that it works correctly under load, stress, and sustained use.

Load testing simulates the expected number of concurrent users to verify that the system meets its performance requirements under normal conditions. If the requirement specifies sub-second response times for 1,000 concurrent users, a load test generates 1,000 concurrent requests and measures response times, throughput, and error rates.

Stress testing pushes beyond expected load to find the breaking point. How does the system behave at 2x, 5x, 10x expected load? Does it degrade gracefully (slower response times but no errors) or catastrophically (crashes, data loss, cascading failures)? Stress testing reveals bottlenecks -- the database connection pool, the memory allocation, the network bandwidth -- that limit scalability.

Soak testing (also called endurance testing) runs the system at expected load for an extended period -- hours or days. It reveals issues that only emerge over time: memory leaks, connection pool exhaustion, log file growth, cache bloat, and gradual performance degradation. A system that performs well in a 10-minute load test might fail after 8 hours if it leaks memory on every request.

Tools like k6, Gatling, JMeter, and Locust enable teams to write performance test scenarios and execute them at scale. At Pepla, we integrate performance testing into the CI/CD pipeline for projects with significant performance requirements, running abbreviated load tests on every deployment to staging and full suite tests on a scheduled basis.

Security Testing

Security testing verifies that the application is resistant to common attack vectors. It includes static analysis (scanning source code for security anti-patterns like SQL injection vulnerabilities and hardcoded credentials), dependency scanning (checking third-party libraries for known vulnerabilities), dynamic analysis (probing the running application for vulnerabilities like XSS, CSRF, and authentication bypass), and penetration testing (simulating real attacks by skilled testers).

Automated security scanning should run in the CI/CD pipeline. Tools like Snyk, SonarQube, and OWASP ZAP can be integrated into the build process, failing the build when critical vulnerabilities are detected. Manual penetration testing supplements automation for critical applications, particularly before initial launch and after significant changes.

Limit end-to-end tests to critical user journeys. The pyramid shape is intentional -- cheap fast tests at the base, expensive slow tests at the top.

Test Data Management

Tests are only as good as the data they run against. Test data management ensures that tests have access to realistic, consistent, and independent data.

Test data isolation prevents tests from interfering with each other. Each test creates the data it needs, operates on it, and cleans up afterward. Tests that share data are fragile -- they break when run in different orders or in parallel.

Factories and builders programmatically generate test data with sensible defaults. Instead of maintaining static test data files that become stale, factories create fresh data on demand. The developer specifies only the attributes relevant to the test, and the factory fills in the rest with realistic defaults.

Data anonymisation is essential when using production-like data for testing. Personal information, financial data, and other sensitive content must be anonymised or synthesised to comply with privacy regulations like POPIA and GDPR. At Pepla, we maintain data anonymisation pipelines that produce realistic test datasets from production data with all sensitive fields replaced.

A test suite you do not trust is worse than no test suite at all. It creates a false sense of security while training the team to ignore test results. Invest in test reliability as seriously as you invest in test coverage.

When to Test Manually

Automation is powerful, but it has limits. Some testing is best done by humans.

Exploratory testing involves a skilled tester using the application without a script, following their intuition about where problems might hide. Experienced testers develop a sense for fragile areas -- complex state transitions, unusual input combinations, race conditions -- that scripted tests do not cover. Exploratory testing finds categories of bugs that automation misses.

Usability testing evaluates whether the application is intuitive and efficient for real users. Automated tests can verify that a button exists and is clickable. They cannot assess whether users understand what the button does, whether it is where they expect it, or whether the workflow it initiates makes sense.

Visual testing catches rendering issues -- broken layouts, overlapping elements, incorrect colours, truncated text -- that functional tests ignore. While tools like Percy and Chromatic automate visual regression testing, initial visual QA requires human judgement about what "looks right."

A test suite you do not trust is worse than none at all. Invest in test reliability as seriously as you invest in test coverage.

The practical approach at Pepla combines automated tests for regression protection (ensuring that existing functionality continues working) with manual testing for new features, exploratory investigation, and user experience validation. Automation handles the repetitive verification. Humans handle the judgement calls.

Need help with this?

Pepla can help you implement these practices in your organisation.

Get in Touch