What I write about

Showing posts with label tool orchestration. Show all posts
Showing posts with label tool orchestration. Show all posts

Monday, 6 April 2026

Design Failures Caused by a Fundamental Misunderstanding of MCP

Design Failures Caused by a Fundamental Misunderstanding of MCP

Design Failures Caused by a Fundamental Misunderstanding of What MCP Actually Is (And Why It Breaks in Production)

Most MCP implementation failures are design failures. And they come from a simple but critical misunderstanding of what MCP actually is. I see this repeatedly across teams—developers, architects, even experienced engineering orgs.

MCP gets treated like:

  • a better API layer
  • a structured version of function calling
  • or just another integration pattern

On the surface, this seems to work.

  • Demos look clean
  • Test cases pass
  • Early results feel promising

But none of that reflects real usage.

Then the system hits production.

And the cracks start showing:

  • the wrong tools get selected
  • behavior becomes inconsistent across similar queries
  • latency increases without clear reason
  • debugging becomes guesswork

At this point, the blame usually shifts to the model:

“The LLM is unreliable.”

It isn’t.

What’s actually happening is more fundamental.

MCP is being introduced into systems that are still designed as if execution is deterministic.

But MCP changes that completely.

It introduces a decision-making layer into your architecture.

Which means:

  • execution is no longer fully controlled by code
  • flows are not fixed—they are interpreted at runtime
  • correctness depends on how well you define tools, schemas, and boundaries

This is the shift most teams underestimate.

Because of that, they carry forward design habits from traditional systems:

  • overlapping responsibilities
  • loosely defined interfaces
  • implicit assumptions about flow

In a deterministic system, this might still hold.

In an MCP system, it creates ambiguity.

And ambiguity at design time becomes:

  • incorrect tool selection
  • inconsistent execution paths
  • silent failures that look “correct”

These issues rarely show up in controlled testing.

They surface under:

  • real user variability
  • scale
  • and unpredictable inputs

Which is why MCP systems often appear to “break” only in production.

Not because MCP is flawed—

but because the system around it was never designed for how MCP actually works.

That’s also why the same questions keep coming up across teams.

Not as isolated problems, but as symptoms of the same root cause.

So instead of addressing them one by one in conversations, I’ve put together the most common misconceptions, questions, and failure patterns I see—and how to approach them correctly in real systems.


MCP Deep-Dive FAQ

1) When would I need MCP instead of just using APIs directly?

APIs are designed to execute predefined operations. When you use APIs directly, you are responsible for deciding when to call them, how to sequence them, and how to combine their results.

MCP introduces a decision-making layer on top of this. Instead of hardcoding the flow, the LLM interprets the user’s intent and decides which tool (and therefore which capability) should be used.

If your workflow is fixed, predictable, and does not depend on interpretation, APIs are sufficient. However, if your workflow needs to adapt dynamically to user input, MCP provides flexibility and reduces hardcoded logic.

2) Is MCP just function calling, or does it solve a broader problem?

No. Function calling is only a mechanism that allows a model to invoke a tool.

MCP is a broader architectural pattern. It defines how tools are:

  • described (using schemas)
  • exposed (via MCP servers)
  • selected and orchestrated (via MCP clients and the LLM)

In other words, function calling is one implementation detail, while MCP is about structuring how reasoning connects to execution across a system.

3) How should I think about the number of MCP clients and servers in a real system?

In most real-world systems, you should begin with a single MCP client and a single MCP server.

The MCP client is responsible for interacting with the LLM and orchestrating tool usage. The MCP server exposes a set of tools that the LLM can access.

You should only introduce multiple MCP servers when there is a clear separation of concerns, such as:

  • different business domains (e.g., pricing vs analytics)
  • separate teams owning different toolsets
  • security or scaling requirements

Multiple MCP clients are typically only needed in advanced setups such as multi-agent systems or separate applications with independent reasoning flows.

4) How do APIs, microservices, and MCP fit together in a real architecture?

These components operate at different layers and should not be confused as alternatives.

MCP is responsible for deciding what action should be taken next based on user intent.

Tools act as the interface layer. They receive structured input from the LLM and translate it into executable operations.

APIs are used for simple, well-defined operations such as fetching data or triggering a specific action.

Microservices handle more complex responsibilities, including business logic, data processing, and scalable backend operations.

In , MCP determines the action, tools translate that decision, and APIs or microservices perform the actual work.

5) How does the system decide which tool or capability to use?

The LLM does not have true understanding of tools; it performs pattern matching using the signals you provide.

It relies on three primary inputs:

  • the tool description (what the tool claims to do and when to use it)
  • the input schema (how to call it and what arguments look like)
  • the user’s query (intent expressed in natural language)

At runtime, the model compares the user’s query with available tool descriptions and selects the tool whose description best matches the intent.

Common failure modes:

  • vague or generic descriptions → wrong tool selected
  • overlapping descriptions → inconsistent selection
  • missing usage cues ("when to use") → tool is ignored

Best practices:

  • write descriptions as decision rules (e.g., “Use this when you need X with Y constraints”)
  • include examples of when to use vs not use
  • keep names and descriptions unambiguous and domain-specific

6) Why do we need schemas for tools, and what problem do they solve?

Schemas define a strict contract between the LLM and tools. They specify exactly what inputs are valid and what outputs will be returned.

Without schemas:

  • the LLM may generate malformed or incomplete inputs
  • tools may return free-form outputs that cannot be consumed by subsequent steps
  • multi-step workflows break silently due to shape mismatches

With schemas:

  • inputs are validated before execution (type, required fields)
  • outputs are predictable and machine-readable
  • tools can be chained reliably across steps

Practical guidance:

  • define explicit required fields and types
  • keep outputs minimal and structured (avoid prose)
  • maintain consistency across tools to enable composition

7) What happens if multiple tools can solve the same problem?

When two tools can solve similar problems, the LLM faces ambiguity during selection.

Symptoms:

  • different tools chosen for the same query across runs
  • unstable behavior when prompts or context change slightly

Root cause:

  • overlapping responsibility or similar descriptions

Fixes:

  • enforce single-responsibility per tool
  • differentiate descriptions with clear boundaries (inputs, constraints, outcomes)
  • remove or merge redundant tools

8) Should the system be allowed to use tools freely without restrictions?

No. Fully autonomous tool access often leads to inefficient or incorrect behavior.

Risks:

  • unnecessary tool calls (cost and latency)
  • wrong tool selection due to ambiguity

Control strategies:

  • limit the set of tools exposed for a given context or task
  • add routing hints (e.g., “for billing queries, only these tools are allowed”)
  • enforce max tool calls per request

Goal: Provide guided autonomy so the model can choose, but within well-defined boundaries.

9) Why does the system sometimes avoid using tools even when they exist?

By default, the LLM prefers to answer directly if it believes it can.

Common reasons for not calling tools:

  • the tool does not clearly outperform direct reasoning
  • the description does not signal when it should be used
  • the query appears solvable without external data

How to fix:

  • ensure the tool provides capabilities the LLM cannot reliably do (real-time data, deterministic checks, actions)
  • make the description explicit about triggers ("Use this when…")
  • optionally bias toward tool use via instructions or constraints

10) How do I know if a tool is actually needed or just adding complexity?

A tool is unnecessary if it does not add unique capability beyond the LLM.

If the LLM alone can produce equivalent results, the tool only adds overhead (latency, cost, complexity).

Valid tool use cases:

  • accessing external or real-time data
  • enforcing deterministic logic (validation, scoring)
  • performing actions (sending emails, writing to DB)

Heuristic: If LLM(prompt) ≈ Tool + LLM, the tool is redundant.

11) Should tools be allowed to call other tools internally?

Technically possible, but discouraged.

Problems introduced:

  • hidden execution paths (hard to trace end-to-end)
  • compounded failures across nested calls
  • difficult observability and debugging

Recommended pattern:

  • keep tools atomic (single responsibility)
  • let the MCP client orchestrate sequences explicitly

This keeps control centralized and behavior observable.

12) Is it a good idea for tools to use LLMs internally?

Yes, but only when the problem is inherently non-deterministic.

Good uses:

  • parsing or summarizing unstructured text
  • extracting signals from logs or documents

Avoid using LLM for:

  • calculations, validation, rule checks
  • simple data retrieval or formatting

Guideline: Use LLM inside tools only when rules cannot reliably solve the task.

13) Where should system state be managed in an MCP architecture?

State should live outside tools, in dedicated systems such as databases or memory layers.

Tools should be stateless:

  • accept input
  • read/write external state as needed
  • return output

Why this matters:

  • stateless tools are easier to scale and test
  • avoids hidden coupling and side effects
  • enables reuse outside MCP contexts

14) Why do MCP-based systems sometimes feel slower?

MCP introduces multi-step execution for a single user request.

A request may involve:

  • an LLM decision step
  • one or more tool calls
  • additional reasoning steps between calls

Each step adds network and processing overhead, and chaining multiplies the delay.

Mitigations:

  • reduce the number of tool calls per request
  • avoid deep chains unless necessary
  • cache frequent results
  • design tools to return complete, minimal outputs

15) What is the most challenging part of designing an MCP system?

Tool design is the most challenging and most impactful aspect.

You must get right:

  • clear, narrow responsibility (no overlap)
  • precise input schema (what’s required, types)
  • consistent, minimal output schema (machine-first)

Consequences of poor design:

  • incorrect tool selection
  • brittle multi-step flows
  • hard-to-debug behavior

Reality: Most MCP issues stem from tool design, not the LLM itself.

16) Why do MCP systems often become more complex than expected?

Complexity in MCP systems usually comes from over-engineering too early rather than real necessity.

Common causes include:

  • introducing too many tools before understanding real usage patterns
  • splitting MCP servers or domains prematurely
  • trying to model every possible capability upfront

This leads to:

  • higher cognitive load for the LLM when selecting tools
  • more ambiguity and overlap between tools
  • harder debugging and slower iteration

A better approach is incremental:

  • start with a minimal set of high-quality tools
  • observe how they are used in real scenarios
  • split or expand only when clear bottlenecks or boundaries emerge

The goal is not architectural purity, but operational clarity.

17) What should we track or monitor to understand how the system is behaving?

Effective debugging in MCP requires visibility into both decisions and execution.

At a minimum, you should log:

  • which tool was selected
  • the exact input passed to the tool
  • the output returned by the tool

For production systems, also include:

  • number of tool calls per request
  • failures and retries
  • timestamps for each step

This allows you to answer critical questions such as:

  • Why was this tool chosen?
  • Where did the system deviate from expectation?
  • Which part of the flow is slow or failing?

Without proper logging, MCP systems behave like black boxes, making debugging guesswork.

18) What is the most common hidden failure in MCP systems that is hard to detect?

The most dangerous failure mode is silent misrouting.

In this case, the system appears to work correctly, but it:

  • selects the wrong tool
  • produces suboptimal results
  • still returns a plausible answer

Because there is no obvious error, these issues often go unnoticed and accumulate over time.

This can degrade system quality significantly without triggering alerts.

To detect this, you need:

  • evaluation metrics focused on tool selection
  • monitoring of tool usage patterns
  • periodic audits of decision quality

19) How should we evaluate whether an MCP system is actually working well?

Evaluation must go beyond checking whether the final answer is correct.

You should assess:

  • whether the correct tool was selected
  • how many tools were used to reach the answer
  • latency per request
  • failure and retry rates
  • cost per query

For example, two systems may produce the same answer, but one may require multiple unnecessary tool calls, making it inefficient and expensive.

A strong evaluation framework focuses on both correctness and efficiency.

20) Does MCP replace microservices or work alongside them?

No. MCP and microservices serve fundamentally different roles.

Microservices are responsible for executing business logic, handling data, and scaling backend operations.

MCP sits above this layer and focuses on deciding:

  • which capability to use
  • when to use it
  • how to sequence multiple capabilities

In practice, MCP orchestrates calls to tools, which in turn interact with APIs or microservices.

This separation ensures that business logic remains reusable and independent of the AI layer.

21) Why won’t most SaaS platforms directly expose themselves as MCP servers?

SaaS providers have strong incentives to maintain control over how their systems are accessed and used.

Key reasons include:

  • security and data protection
  • pricing and rate limiting control
  • ownership of user experience

Exposing full MCP interfaces would reduce this control.

As a result, most SaaS platforms will continue to provide APIs, and MCP systems will act as a layer that integrates and orchestrates those APIs.

22) What is the most common misunderstanding about what MCP actually does?

A common misconception is that MCP makes AI systems more intelligent.

In reality, MCP does not improve the reasoning capability of the LLM.

Instead, it improves how that reasoning is applied by:

  • structuring interactions with external systems
  • enforcing consistency through schemas
  • enabling controlled execution of actions

This leads to systems that are more reliable and predictable, even if the underlying intelligence remains the same.

23) What characteristics define a well-designed MCP system?

A strong MCP system is defined by clarity, not complexity.

Key characteristics include:

  • clearly defined tools with a single responsibility
  • strict and consistent schemas for all tools
  • minimal overlap between tool capabilities
  • predictable and structured outputs

Additionally, strong systems exhibit:

  • controlled tool usage (not excessive or random)
  • good observability for debugging
  • efficient execution with minimal unnecessary steps

The focus should always be on making the system easy for both the LLM and engineers to understand and reason about.

24) What does deploying an MCP system look like in a real-world setup?

Deploying an MCP system is less about introducing new infrastructure and more about placing each component correctly in your existing stack.

In a typical deployment:

  • the MCP client lives inside your application backend (or agent layer) and interacts with the LLM
  • the MCP server runs as a service that exposes tools over a standard interface
  • tools internally call APIs or microservices that contain the actual business logic

From an infrastructure perspective:

  • MCP servers can be deployed like any backend service (container, serverless, etc.)
  • tools should be stateless so they scale easily
  • backend services remain unchanged and reusable

Key deployment considerations:

  • keep MCP server lightweight (no heavy logic inside)
  • secure tool access (authentication, rate limiting)
  • monitor tool usage and latency
  • ensure network reliability between client, server, and services

Common mistake: Trying to redesign your entire backend for MCP. In reality, MCP should sit on top of your existing APIs and microservices, not replace them.

Production-Level Realities

25) Why do MCP systems behave differently in production compared to testing?

In testing environments, inputs are usually clean, limited, and predictable. You are effectively validating the system against a narrow slice of reality.

In production, the system encounters:

  • a wide variety of user phrasing
  • incomplete or ambiguous inputs
  • edge cases you did not anticipate

Because tool selection is probabilistic, small differences in wording or context can lead the LLM to choose a different tool or skip tools entirely. This creates behavior that appears inconsistent even though the system is “working as designed.”

To mitigate this, you should:

  • test with real-world queries, not curated examples
  • tighten tool descriptions and schemas
  • add constraints or routing hints where necessary

26) Why does the system sometimes overuse a particular tool?

A tool becomes overused when it is either too broadly defined or consistently produces acceptable outputs. The LLM learns that this tool is a “safe default” and begins to prefer it over more specific tools.

This typically happens when:

  • the tool description is generic
  • multiple tools overlap in responsibility
  • one tool rarely fails compared to others

Over time, this leads to degraded system quality because the LLM stops exploring better alternatives.

To fix this, you should:

  • narrow the scope of the overused tool
  • make competing tools more clearly differentiated
  • explicitly describe when each tool should be used

27) Why is it difficult to predict the cost of running an MCP system?

In MCP systems, cost is not tied to a single operation. Instead, it depends on how many steps are taken to resolve a query.

A single user request may involve:

  • one or more LLM calls
  • multiple tool invocations
  • retries if something fails or is unclear

Because the number of steps varies per query, cost becomes difficult to predict.

To control this, you should:

  • limit the maximum number of tool calls per request
  • monitor cost at a per-query level
  • reduce unnecessary chaining of tools

28) Why do tools work individually but fail when used together?

Tools are often designed and tested in isolation, where they behave correctly. Problems arise when they are composed into multi-step workflows.

Failures occur because:

  • the output format of one tool does not match the expected input of another
  • assumptions made by one tool are not valid for the next
  • minor inconsistencies compound across steps

This is not a failure of individual tools, but of integration.

To prevent this:

  • enforce consistent schemas across all tools
  • validate outputs before passing them to the next step
  • design tools with composability in mind

29) Why do small changes in prompts affect tool usage so much?

Prompts directly influence how the LLM interprets user intent and decides which tool to call.

Even small changes in wording can:

  • shift the perceived meaning of a query
  • alter which tool is selected
  • change how arguments are constructed

Because of this, prompts should be treated as part of your system logic, not as informal text.

Best practices include:

  • versioning prompts
  • testing prompt changes before deployment
  • monitoring their impact on tool usage

30) Why can the same query produce different results each time?

LLM-based systems are inherently non-deterministic. The same input can produce slightly different internal reasoning paths.

In MCP systems, this variability is amplified because:

  • different tool selection paths may be taken
  • intermediate outputs may differ
  • timing and context may influence decisions

To reduce variability:

  • lower randomness in model settings where possible
  • constrain tool selection
  • ensure tools produce consistent outputs

31) Why do some tools rarely or never get used?

A tool may be ignored if:

  • its purpose is unclear from its description
  • it overlaps with a more general or dominant tool
  • it does not appear necessary to solve common queries

The LLM will naturally prefer tools that are easier to match or more frequently successful.

To address this:

  • simplify and clarify tool descriptions
  • remove redundant tools
  • ensure each tool has a clear and distinct role

32) Why does adding more tools sometimes reduce system performance?

Adding more tools increases the decision complexity for the LLM.

With more options available:

  • ambiguity increases
  • overlap becomes more likely
  • selection accuracy decreases

Instead of improving capability, excessive tools often degrade performance.

The better approach is to:

  • keep the tool set minimal
  • ensure each tool has a well-defined purpose
  • expand only when there is a clear gap

33) Why does system performance degrade over time?

MCP systems are sensitive to change. Over time, multiple factors can introduce drift, including:

  • updates to prompts
  • modifications to tools
  • changes in input patterns

These changes accumulate and can gradually reduce system performance without obvious failure.

To manage this, you should:

  • implement continuous evaluation
  • track key metrics over time
  • run regression tests after changes

34) Why can retrying a request sometimes make results worse?

Retries do not guarantee the same execution path.

When a request is retried:

  • the LLM may choose a different tool
  • intermediate steps may change
  • outputs may become inconsistent

This can lead to worse results instead of improvements.

To handle retries effectively:

  • control retry behavior explicitly
  • limit the number of retries
  • use fallback strategies instead of blind retries

35) Why is observability essential in MCP systems?

MCP systems involve multiple layers of decision-making and execution, many of which are not visible by default.

Without observability, you cannot:

  • understand why a tool was selected
  • trace where a failure occurred
  • identify inefficiencies

A production system should log:

  • tool selection decisions
  • inputs and outputs
  • execution paths

This visibility is essential for debugging and continuous improvement.

36) Why do MCP systems need guardrails?

LLMs can behave unpredictably when given full freedom.

Without guardrails, the system may:

  • call tools unnecessarily
  • misuse tools
  • generate invalid inputs

Guardrails help enforce boundaries by:

  • restricting which tools can be used
  • validating inputs before execution
  • limiting usage patterns

They ensure the system remains safe, efficient, and aligned with intended behavior.

37) Why are continuous evaluation pipelines important?

In MCP systems, correctness is not binary. A response may be technically correct but achieved inefficiently or through the wrong path.

Evaluation pipelines allow you to:

  • track tool selection accuracy
  • measure consistency across queries
  • detect degradation over time

Without continuous evaluation, issues accumulate silently and become harder to fix.

38) Why is caching especially important in MCP systems?

Many queries in real systems repeat similar patterns.

Without caching, the system repeatedly:

  • calls the LLM
  • invokes tools
  • recomputes results

This increases both latency and cost.

Caching allows you to reuse previous results, improving performance and reducing resource usage.

39) Why do we need stopping conditions in MCP workflows?

Because MCP systems can involve iterative decision-making, there is a risk of excessive or even infinite tool usage.

Without stopping conditions, the system may:

  • continue calling tools unnecessarily
  • enter inefficient loops

Stopping conditions enforce limits such as:

  • maximum number of tool calls
  • termination rules based on outcomes

40) Why is evolving schemas over time challenging?

Schemas define the contract between the LLM and tools. Changing them affects both how tools are called and how outputs are interpreted.

If schemas are modified without care:

  • existing integrations may break
  • the LLM may generate incorrect inputs

To manage schema evolution:

  • version schemas
  • maintain backward compatibility
  • test changes thoroughly

41) Why is it difficult to decide the right level of tool granularity?

Choosing the right level of granularity for tools is challenging.

If tools are too broad:

  • their purpose becomes unclear
  • selection accuracy decreases

If tools are too narrow:

  • the number of tools increases
  • orchestration becomes complex

The goal is to design tools with a single clear responsibility that is neither too vague nor too fragmented.

42) Why are fallback strategies critical in production systems?

In production systems, failures are inevitable. Tools may fail, APIs may be unavailable, and routing may be incorrect.

Fallback strategies ensure that the system can still provide a response when something goes wrong.

Examples include:

  • using an alternative tool
  • returning partial results
  • defaulting to LLM-only responses

43) Why is testing MCP systems more complex than traditional systems?

Testing MCP systems is more complex than traditional systems because you are not only testing code, but also behavior.

You must evaluate:

  • how the LLM makes decisions
  • how tools are orchestrated
  • how components interact under different conditions

This requires system-level testing rather than isolated unit tests.

Final Insight

MCP is not about increasing intelligence. It is about structuring how intelligence interacts with systems in a reliable, observable, and controllable way.

Design Failures Caused by a Fundamental Misunderstanding of MCP

Design Failures Caused by a Fundamental Misunderstanding of MCP Design Failures Caused by a Fundamental Misunderstanding o...