Performance is not a feature you bolt on at the end of a project. It is an architectural concern that spans every layer of your application, from the queries hitting your database to the pixels rendering on a user's screen. In over a decade of building production software at Pepla, we have learned that the difference between a system that feels fast and one that feels sluggish usually comes down to a handful of decisions made early in the development cycle — and a handful of oversights discovered far too late.
This guide walks through the full stack, layer by layer, with practical techniques you can apply to your own projects. No silver bullets. Just methodical engineering.
Start Where the Data Lives: Database Query Analysis
Most performance problems start at the database. It is the one component in your architecture that cannot be trivially scaled horizontally, and it is the one component where a single poorly written query can bring an entire system to its knees during peak load.
A single missing index can mean the difference between a 12-second query and a 40-millisecond one.
The first tool you should reach for is EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL). This command shows you the query execution plan — how the database engine intends to retrieve your data. You are looking for sequential scans on large tables, nested loops where hash joins would be more efficient, and estimated row counts that diverge wildly from actual row counts.
Here is a practical example. Suppose you have a query joining orders to customers with a filter on date range:
EXPLAIN ANALYZE SELECT c.name, o.total FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.created_at BETWEEN '2026-01-01' AND '2026-03-01';
If the output shows a sequential scan on the orders table, you are missing an index on created_at. If the join produces a nested loop with an estimated 1 row but an actual 50,000 rows, your table statistics are stale and you need to run ANALYZE. These are not theoretical concerns. On a recent Pepla project, a single missing composite index on a three-column filter was responsible for a query that took 12 seconds dropping to 40 milliseconds.
Beyond individual queries, consider your indexing strategy holistically. Every index speeds up reads but slows down writes. Covering indexes — where the index includes all columns needed by a query — can eliminate table lookups entirely. Partial indexes on frequently filtered subsets can keep index size manageable. And always monitor your slow query log. The queries that cause problems in production are rarely the ones you expected.
The N+1 Query Problem
If there is one anti-pattern that appears in nearly every ORM-based application we audit, it is the N+1 query. The pattern is deceptively simple: you load a list of parent records (1 query), then for each parent you lazily load a child record (N queries). With 10 records, nobody notices. With 10,000 records, your page takes thirty seconds to load and your database connection pool is exhausted.
The fix depends on your ORM and context. In Entity Framework, use .Include() for eager loading. In Django, use select_related() for foreign keys and prefetch_related() for many-to-many relationships. In raw SQL, a single JOIN or a subquery with IN replaces hundreds of round trips with one.
Detection is straightforward. Enable query logging in your development environment and look at any page that generates more than a dozen queries. Tools like MiniProfiler (for .NET), Django Debug Toolbar, or the built-in query counter in Rails make this trivially visible. At Pepla, we treat any endpoint generating more than 10 queries as a code review flag.
Start where the data lives -- most performance problems originate in unoptimised database queries.
API Response Compression and Payload Optimisation
Once your data leaves the database, it hits your API layer. Two common mistakes here: returning more data than the client needs, and failing to compress what you do return.
Start with payload size. If your endpoint returns a user object with 40 fields but the client only uses 5, you are wasting bandwidth and serialisation time. GraphQL solves this architecturally, but you do not need GraphQL to apply the principle. Sparse fieldsets (e.g., ?fields=name,email,avatar) or dedicated response DTOs tailored to specific views accomplish the same goal with REST.
Compression is non-negotiable for any API serving responses over a few kilobytes. Enable gzip or Brotli compression at your web server or reverse proxy level. Brotli typically achieves 15-25% better compression ratios than gzip on text-based payloads. A JSON response that weighs 200KB uncompressed drops to roughly 20KB with Brotli. Over mobile networks in South Africa, where latency and bandwidth are real constraints, that difference is the gap between a responsive app and one that feels broken.
Also consider HTTP caching headers. Responses that do not change between requests — reference data, configuration, feature flags — should carry appropriate Cache-Control and ETag headers. This eliminates requests entirely, which is always faster than making them faster.
Content Delivery Networks
A CDN is not just for static files. Modern CDNs like Cloudflare, Azure Front Door, and AWS CloudFront can cache API responses at edge locations, serve stale content while revalidating, and terminate TLS closer to the user — all of which reduce perceived latency significantly.
For South African businesses, CDN configuration deserves particular attention. If your application serves users primarily in South Africa and your origin server is in Azure South Africa North (Johannesburg), a CDN still helps by caching at multiple edge points within the country and handling DDoS protection. If you serve international users, the latency reduction from edge caching is dramatic — the round trip from Cape Town to a European origin can add 150-200ms per request.
Static assets — images, CSS, JavaScript bundles, fonts — should always be served through a CDN with long cache lifetimes and content-hashed filenames. This is a solved problem. If you are still serving static files from your application server, you are leaving the easiest performance win on the table.
Frontend Rendering Performance
The frontend is where performance becomes perception. A page can load all its data in 200ms, but if it takes another 800ms to parse, compile, and execute JavaScript before anything is visible, the user perceives a full second of delay.
Lazy loading is the most impactful quick win. Images below the fold, third-party widgets, and non-critical JavaScript modules should all load on demand rather than upfront. The native loading="lazy" attribute handles images. For JavaScript, dynamic import() expressions let you split your bundle and load modules only when the user navigates to the relevant feature.
Code splitting takes this further at the build level. If you are shipping a single JavaScript bundle that includes the code for every route in your application, most of that code is wasted on any given page load. Modern bundlers — Vite, Webpack, esbuild — support route-based splitting out of the box. A typical single-page application can reduce its initial bundle by 60-80% with proper splitting, with the remaining code loaded asynchronously as the user navigates.
Beyond splitting, scrutinise your dependency tree. It is common to find applications importing an entire utility library for a single function, or bundling multiple icon libraries when a handful of SVGs would suffice. Tools like webpack-bundle-analyzer or source-map-explorer visualise your bundle composition and make these waste patterns obvious.
Core Web Vitals measure what users actually feel. Optimise for perception, not just throughput.
Core Web Vitals: Measuring What Users Feel
Google's Core Web Vitals provide a standardised framework for measuring user-perceived performance. The three metrics that matter most are:
- Largest Contentful Paint (LCP) — how long until the largest visible element renders. Target: under 2.5 seconds. Improve it by optimising your critical rendering path, preloading key resources, and ensuring your server responds quickly.
- Interaction to Next Paint (INP) — how long the browser takes to respond to user interactions. Target: under 200 milliseconds. Improve it by breaking up long tasks, debouncing event handlers, and minimising main thread work.
- Cumulative Layout Shift (CLS) — how much the page layout shifts unexpectedly during loading. Target: under 0.1. Fix it by setting explicit dimensions on images and embeds, avoiding dynamically injected content above the fold, and using CSS
containto isolate layout changes.
These are not vanity metrics. They directly affect search rankings, and more importantly, they correlate strongly with user engagement and conversion rates. At Pepla, we include Core Web Vitals targets in our project requirements and monitor them continuously in production using tools like Google Lighthouse CI and web-vitals.js.
Profiling: Finding the Real Bottleneck
The most common performance mistake is optimising the wrong thing. You can spend a week shaving 50ms off a database query that runs once per page load while ignoring a JavaScript function that blocks the main thread for 300ms on every scroll event. Profiling prevents this.
On the backend, use application performance monitoring (APM) tools — Azure Application Insights, Datadog, New Relic, or the open-source alternative OpenTelemetry. These tools trace requests through your entire stack, showing you exactly where time is spent. A flame graph that shows 80% of request time in a single database call tells a different story than one that shows time distributed across dozens of microservice calls.
On the frontend, Chrome DevTools remains the gold standard. The Performance panel records a timeline of everything the browser does during a page load or interaction: script evaluation, style recalculation, layout, paint, and compositing. The key insight is usually in the long tasks — any JavaScript execution block longer than 50ms is a candidate for optimisation.
For network analysis, the Network panel sorted by waterfall reveals sequential request chains that could be parallelised, resources loaded without compression, and third-party scripts that block rendering. Lighthouse audits combine all of these perspectives into a single actionable report.
Start with measurement, identify the bottleneck, fix it, measure again -- that loop is the entire discipline.
Putting It All Together
Performance optimisation is a discipline, not a task. It requires measurement before intervention, hypothesis before action, and verification after change. The stack we have walked through — database queries, ORM patterns, API payloads, CDN configuration, frontend rendering, and Core Web Vitals — forms a complete picture of where time goes in a modern web application.
The order matters too. Start at the database, because a slow query affects every user on every request. Move to the API layer, because payload size and compression affect every response. Configure your CDN, because caching eliminates work entirely. Then address frontend rendering, because this is where marginal gains compound into perceptible differences.
At Pepla, we build performance budgets into our project planning from day one. We define targets for response time, bundle size, and Core Web Vitals before writing the first line of code. When performance degrades — and it always does, incrementally, as features accumulate — we have a baseline to measure against and a methodology to diagnose the cause.
Performance is not about making things fast. It is about not making things slow. Every architectural decision, every dependency, every query is an opportunity to either preserve the speed you have or give it away.
Start with measurement. Identify the bottleneck. Fix it. Measure again. That loop, applied consistently across every layer of the stack, is the entire discipline of performance engineering.




