Why Accuracy Is Not Enough — Gökhan Çetinkaya

In many applied ML projects, the story goes something like this.

A model is trained. Offline metrics look good. Accuracy improves, RMSE drops, AUC climbs. There is a quiet sense of relief: the hard part is done.

And yet, once the model is deployed, the business impact is often disappointing. Sometimes it is even negative.

This is not because the model is “wrong” in a statistical sense. It is because a good prediction is not the same thing as a good decision.

Prediction and decision are different problems

A prediction answers a narrow question:

What is likely to happen?

A decision answers a very different one:

What should we do, given that several things might happen?

Machine learning models are usually trained to solve the first problem. Businesses, however, live entirely in the second.

Decisions depend on things that rarely appear in a loss function:

asymmetric costs
operational constraints
service-level targets
risk tolerance
downstream consequences

None of these are automatically encoded in “accuracy”.

Why accuracy is a weak business objective

Accuracy treats all errors as equal. Business never does.

Predicting demand slightly too low may lead to a temporary stockout. Predicting it slightly too high may lock capital in slow-moving inventory for months.

From a pure accuracy perspective, these errors might look symmetric. From a business perspective, they are not.

This is why teams often end up with models that are technically strong but operationally misaligned.

Accuracy optimizes the model’s comfort. Business metrics optimize outcomes.

Metrics and KPIs are related — but not interchangeable

This brings up a recurring confusion in ML projects: the assumption that improving model metrics will automatically improve business KPIs.

In practice, the relationship is indirect and domain-specific.

If the goal is to reduce warehouse CAPEX, the relevant modeling questions are not just “How accurate is the forecast?” But also:

How uncertain is it?
Which SKUs are more expensive to be wrong about?
Are errors symmetric across cost, volume, or criticality?

Model metrics matter — but only insofar as they map meaningfully to business objectives. That mapping is a design choice, not an emergent property.

A simple inventory thought experiment

Consider a single SKU.

Your model predicts an average monthly demand of 100 units. The forecast is unbiased and historically accurate.

Case 1: low variance

Demand typically fluctuates between 90 and 110 units.

Using this forecast, you can hold modest safety stock, achieve a high service level, and keep capital tied up in inventory under control.

Business KPIs look healthy.

Case 2: same mean, high variance

Now consider another SKU — similar price, similar volume — but demand fluctuates between 30 and 170 units.

The average prediction is still 100. Point accuracy is unchanged.

But the business reality is completely different:

safety stock requirements explode
stockouts become frequent, or
capital usage increases dramatically

From the model’s perspective, nothing changed. From the business perspective, everything did.

The difference is not the mean. It is the uncertainty around it.

Thresholds and policies hide in plain sight

Inventory decisions eventually come down to thresholds: reorder points, safety stock levels, service-level targets.

These are often treated as technical details, tuned after the model is “done”.

In reality, they encode policy decisions:

how much risk the business is willing to accept
how costly stockouts are relative to excess inventory
how constrained capital really is

When teams argue about thresholds, they are usually arguing about business trade-offs — without explicitly framing them as such.

A decision-first way of thinking

A more robust approach reverses the usual workflow:

Start from the decision and its KPIs
Make trade-offs explicit
Design model objectives that support those trade-offs

Sometimes that leads to point forecasts. Often, it requires uncertainty estimates, asymmetric losses, or stress scenarios.

The model is not the product. The decision system is.

Closing thoughts

Many ML projects struggle not because the models are weak, but because the connection between predictions and decisions is left implicit.

Accuracy is useful — but only in context. Uncertainty is inconvenient — but ignoring it is expensive.

This is not a purely technical problem. It is a coordination problem across data science, operations, and business teams.

When everyone involved understands that prediction is only one input into a larger decision process, ML systems become not just accurate — but genuinely useful.