Estimating Software Projects: Beyond Story Points

March 16, 2026 | 9 min read

Software estimation is one of the most contentious topics in the industry. On one end, stakeholders and executives demand precise forecasts: "When will it be done? How much will it cost?" On the other end, developers know from painful experience that software is inherently uncertain and that precise estimates are, at best, educated guesses and, at worst, fiction presented as fact.

The tension is real but not irresolvable. At Pepla, our consulting team helps clients navigate this challenge daily. The answer is not to estimate better — it is to estimate differently, using techniques that acknowledge uncertainty rather than hiding it, and to communicate forecasts in ways that give stakeholders useful information without false precision.

Story Points Explained

Story points are a unit of relative effort, not absolute time. A story estimated at 5 points is roughly 2.5 times the effort of a 2-point story — but neither maps to a specific number of hours or days. This abstraction was intentional. Story points were designed to separate estimation (how much effort?) from scheduling (how long will it take?), allowing teams to estimate without the anchoring bias that comes from thinking in hours.

Story points measure relative effort, not absolute time. A 5-point story is not "2.5 days" -- it is 2.5 times the effort of a 2-point story.

The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most common scale. The increasing gaps between numbers reflect a fundamental truth about estimation: the larger the work, the less precise the estimate. The difference between a 1-point and a 2-point story is meaningful. The difference between a 13-point and a 21-point story is fuzzy at best. The scale's structure forces estimators to acknowledge this increasing uncertainty rather than pretending they can distinguish between a 14-point and a 16-point story.

Story points work best when the team has a shared set of reference stories — previously completed stories of known sizes that serve as calibration points. "This new story feels similar in complexity to the user authentication story we did last sprint, which was a 5. But it also requires a new API integration, so maybe it's an 8." Reference stories anchor the conversation in shared experience rather than abstract numbers.

Planning Poker

Planning poker is the most widely used estimation technique in agile teams. The process is straightforward: the product owner describes a story, the team discusses it briefly, and then each member simultaneously reveals their estimate (typically using cards or a digital tool). If estimates converge, the team accepts the consensus. If they diverge, the highest and lowest estimators explain their reasoning, the team discusses, and they re-estimate.

The value of planning poker is not in the numbers it produces. It is in the conversation. When one developer estimates a story at 3 points and another estimates it at 13, the ensuing discussion almost always reveals that they have different assumptions about scope, technical approach, or acceptance criteria. That conversation is where alignment happens. The estimate is a side effect of shared understanding.

Common pitfalls to avoid:

Anchoring. If estimates are revealed sequentially rather than simultaneously, the first number anchors everyone else. Always reveal simultaneously.
Estimation by seniority. If the team consistently defers to the most senior developer's estimate, the process provides no value. Every team member should estimate independently.
Endless debate. If the team cannot converge after two rounds of discussion, take the higher estimate and move on. The precision of any individual estimate matters less than the overall accuracy across the sprint.
Estimating in hours. Teams that translate story points into hours defeat the purpose. If a 1-point story is "half a day" and a 2-point story is "one day," you are estimating in hours with extra steps.

The value of planning poker is in the conversation, not the cards. Divergent estimates reveal hidden assumptions about scope and approach.

T-Shirt Sizing

T-shirt sizing (XS, S, M, L, XL) is a coarser estimation technique used for high-level planning and roadmapping. Where story points are used for sprint-level planning, T-shirt sizes are appropriate when you need a rough sense of effort across a large backlog without the precision of individual story estimates.

The typical mapping is:

XS: Trivial change, less than a day of work.
S: Small, well-understood work. One to two days.
M: Moderate complexity. Three to five days.
L: Significant effort with some uncertainty. One to two weeks.
XL: Large and complex. Needs to be broken down before it can be meaningfully estimated.

T-shirt sizing is fast — a team can size 30-40 items in an hour — and its coarseness is a feature, not a bug. At the roadmap level, knowing that a feature is "Large" versus "Small" is sufficient for prioritisation and capacity planning. Spending an hour debating whether it is a 13 or a 21 is not a productive use of time at that stage.

The #NoEstimates Movement

The #NoEstimates movement, associated with Woody Zuill and Vasco Duarte, challenges the fundamental premise that estimation is necessary. The argument is not that estimation is inaccurate (though it often is), but that the time spent estimating could be better spent delivering, and that there are alternative approaches that provide forecasting without estimation.

The core idea: if you break work into small, similarly-sized items and track how many items the team completes per week (throughput), you can forecast delivery dates without estimating individual items. If the team completes an average of 6 items per week and the backlog contains 30 items, the work will take approximately 5 weeks — no story points required.

This approach works well under specific conditions: the team has a stable throughput, work items are consistently small (ideally completable within 1-2 days), and the backlog is well-defined. When these conditions hold, throughput-based forecasting is often more accurate than story-point-based forecasting because it eliminates estimation error entirely.

The approach works less well when work items vary dramatically in size, when the team's composition or context changes frequently, or when stakeholders require estimates for individual features rather than the entire backlog. In these cases, some form of estimation remains useful.

Always provide a range, never a single date. Communicate confidence levels so stakeholders can manage their own risk.

Reference Stories

Whatever estimation technique a team uses, reference stories are the single most effective calibration tool. A reference story set is a collection of 5-8 previously completed stories, spread across the estimation scale, that the team uses as benchmarks.

For example:

1 point: "Add a new field to the customer form with validation" — well-understood, minimal complexity.
3 points: "Implement password reset flow with email verification" — straightforward but involves multiple components.
5 points: "Build the invoice generation module with PDF export" — moderate complexity with some design decisions.
8 points: "Integrate with the third-party payment gateway" — significant complexity with external dependencies.
13 points: "Implement the multi-tenant data isolation layer" — high complexity with architectural implications.

When estimating a new story, the team compares it to the reference set: "This feels bigger than the payment gateway integration but smaller than the data isolation layer — so it's probably a 13." This comparative approach is more reliable than abstract estimation because human beings are better at relative comparison than absolute sizing.

Update your reference stories periodically. As the team's capabilities evolve and the codebase changes, a story that was an 8 last year might be a 3 today because the team has built better tooling or deeper domain knowledge.

Velocity-Based Forecasting

Velocity — the average number of story points completed per sprint — is the traditional tool for converting a sized backlog into a delivery forecast. If the team averages 35 points per sprint and the remaining backlog is 140 points, the forecast is 4 sprints.

The problem with single-point velocity forecasts is that they imply a precision that does not exist. Velocity varies from sprint to sprint due to holidays, sick leave, production incidents, underestimated stories, and countless other factors. A team that averages 35 points might have a range of 25 to 50 across recent sprints.

A better approach is to use a velocity range. Instead of "the team will complete this in 4 sprints," say "based on the team's velocity range of 25-50 points per sprint, this work will take between 3 and 6 sprints, with a most likely completion at sprint 4." This communicates the same information while honestly representing the uncertainty.

Monte Carlo Simulations

Monte Carlo simulation is the most sophisticated and most accurate forecasting technique available. It uses historical data — either throughput (items per sprint) or velocity (points per sprint) — to run thousands of simulated futures and produce a probability distribution of completion dates.

The process works like this: take the team's throughput data from the last 10-20 sprints. For each simulation run, randomly sample a throughput value from that historical data for each future sprint. Run 10,000 simulations. The result is a distribution: "There is a 50% chance the work will be complete by sprint 4, an 85% chance by sprint 5, and a 95% chance by sprint 6."

Monte Carlo has several advantages over deterministic forecasting:

It naturally accounts for variability in team performance without requiring the team to explain why some sprints are faster than others.
It produces probability-based forecasts that communicate uncertainty explicitly.
It can account for scope changes by adjusting the remaining backlog size and re-running the simulation.
It does not require story-point estimates — it can work with item counts alone, making it compatible with #NoEstimates approaches.

Several tools now make Monte Carlo accessible without requiring statistical expertise. Jira plugins, ActionableAgile, and even spreadsheet templates can run these simulations from a team's existing data.

At Pepla, we use Monte Carlo forecasting for client delivery commitments because it allows us to provide honest, data-backed timelines rather than guesses dressed up as precision.

Communicating Uncertainty to Stakeholders

The best estimation technique in the world is useless if the forecast is communicated poorly. Stakeholders who hear "it will be done in 6 sprints" hear a commitment. When sprint 6 arrives and the work is not done, trust is damaged — even if the original estimate was clearly stated as an approximation.

Effective communication of estimates follows these principles:

Always provide a range, never a single date. "We expect completion between mid-May and early June" is more honest and more useful than "June 1st."
Use confidence levels. "We are 85% confident we will deliver by June 15th" gives stakeholders the information they need to manage their own risk.
Explain what drives the uncertainty. "The main source of uncertainty is the third-party integration, which we have not yet prototyped" gives stakeholders context and allows them to help reduce the uncertainty (for example, by accelerating access to the third-party environment).
Update regularly. Estimates should narrow over time as uncertainty is resolved. A forecast that was a 4-week range at the start of the project should be a 1-week range when 80% of the work is complete. Regular updates demonstrate that the team is tracking progress and managing risk.
Separate estimation from commitment. An estimate is the team's best prediction. A commitment is a promise. These are different things with different accountability levels. Make sure stakeholders understand which one they are receiving.

Monte Carlo simulation turns historical data into probability forecasts -- "85% chance by sprint 5" is more honest than a single-point date.

The goal of estimation is not to predict the future with certainty — that is impossible for any non-trivial software project. The goal is to provide stakeholders with enough information to make good decisions under uncertainty. The techniques in this article, applied thoughtfully, achieve that goal far better than gut-feel estimates or manufactured precision.

Need Help Planning Your Software Project?

Pepla uses data-driven estimation techniques to provide honest, reliable delivery forecasts. Let's discuss your project.

Get in Touch