What I write about

Showing posts with label AI Development. Show all posts
Showing posts with label AI Development. Show all posts

Thursday, 12 March 2026

The Reality of Building AI Systems Today

In today’s AI ecosystem, many capabilities that once required deep machine learning expertise have become widely accessible. Powerful APIs, pre-trained models, and developer platforms allow engineers to build sophisticated prototypes in very little time. As a result, it has become increasingly important to distinguish between what looks impressive and what actually requires substantial engineering skill.

Understanding this distinction is essential for anyone working with modern AI systems.

What Looks Impressive but Is Now Relatively Easy

Many demonstrations that appear technically advanced are largely integrations of existing tools rather than deeply engineered systems.

Calling AI APIs

Today, developers can access powerful language, vision, and multimodal models with only a few lines of code. A typical workflow might involve:

  • Capturing input (text, image, audio, or video)
  • Sending it to an AI API
  • Receiving a structured or descriptive response
  • Displaying or processing the result

The heavy lifting—perception, reasoning, and pattern recognition—is handled by the model provider. The surrounding application often acts primarily as a thin integration layer around these services.

Prompt Engineering

Crafting prompts that produce structured, detailed, or highly contextual outputs can appear sophisticated. In practice, prompt design is often an iterative process of experimentation and refinement.

For example, instructing a model to:

  • Describe actions occurring in a scene
  • Extract entities from a document
  • Summarize key ideas from a conversation

can produce highly convincing results. However, most of the intelligence resides within the model itself rather than in the surrounding system logic.

Rapid Prototyping

Combining multiple capabilities—such as multimodal input, model reasoning, and conversational interfaces—can quickly produce demonstrations that appear complex.

A prototype might integrate:

  • Live data input
  • A large AI model
  • A conversational interface
  • A simple decision rule

Such systems can look remarkably advanced, but the complexity often lies within the underlying models rather than in the application architecture.

The Real Challenges in Modern AI Systems

The most difficult problems today are usually not about building models. Instead, they involve designing reliable systems around those models.

These challenges remain difficult even for experienced engineering teams.

Architecture Design

Modern AI systems typically involve multiple interconnected components. A robust architecture often includes layers such as:

  • Data ingestion
  • Event processing
  • State management
  • Reasoning or decision logic
  • Storage systems
  • Monitoring and alerting
  • Analytics and reporting

Designing how these components interact reliably under real-world conditions is one of the most important skills in AI engineering.

Poor architectural choices can lead to systems that are fragile, expensive to operate, or difficult to scale.

Event Pipelines

Real-world environments produce continuous streams of data. Transforming these streams into meaningful signals is a core engineering challenge.

A common pipeline might involve:

data stream
→ filtering or sampling
→ signal detection
→ classification or analysis
→ event generation

Designing event pipelines requires careful consideration of latency, accuracy, noise, and system stability.

Small design errors can lead to systems that generate excessive false signals or miss critical information entirely.

Data Flow and System Efficiency

AI systems often process large volumes of data. Efficient system design requires deciding what data should be processed, when, and where.

Instead of processing everything, systems typically include filtering stages such as:

detect relevant activity
↓
capture relevant data
↓
analyze only selected inputs

Optimizing these flows is essential for controlling cost, latency, and system performance.

Temporal Reasoning

Most models analyze individual inputs in isolation. Real-world understanding, however, often requires reasoning across sequences of events.

For example, a system might need to interpret a sequence such as:

event A occurs
event B follows
event C occurs later

The meaning of the sequence may depend on the relationship between these events over time.

Designing systems that can maintain context and interpret temporal patterns is significantly more challenging than analyzing isolated inputs.

Reliability and Error Handling

AI models are probabilistic systems and can produce incorrect outputs. Production systems must therefore account for uncertainty.

Robust systems often include mechanisms such as:

  • Confidence thresholds
  • Multiple observations before triggering actions
  • Validation layers
  • Fallback logic

Balancing sensitivity with reliability is a non-trivial engineering problem.

Tracking State Over Time

Many systems must track entities, objects, or conditions across time. This requires maintaining consistent state information even when data is incomplete or noisy.

Challenges include:

  • Maintaining identity across observations
  • Handling temporary loss of signal
  • Managing partial information
  • Reconciling conflicting signals

Reliable state tracking is critical for many real-world applications.

Pattern and Behavior Analysis

More advanced systems move beyond detecting individual events to identifying patterns across time.

This may involve:

  • Identifying recurring sequences
  • Detecting unusual activity
  • Analyzing long-term trends
  • Generating insights from historical data

This level of reasoning requires both data infrastructure and analytical logic beyond simple model inference.

Why Hardware Is Often Not the Limiting Factor

Many assume that powerful GPUs are the primary requirement for building advanced AI systems. In practice, the hardest problems often occur outside the model itself.

Architecture design, data pipelines, and system orchestration are primarily software engineering challenges.

Because large models are typically available through cloud services, the limiting factor is rarely raw compute power. The real challenge lies in designing systems that use those capabilities effectively.

Where Experience Really Shows

Experienced engineers tend to focus on aspects that are rarely visible in demonstrations:

  • System reliability
  • Fault tolerance
  • Data consistency
  • Cost efficiency
  • Monitoring and observability
  • Operational maintenance

Prototypes often perform well under controlled conditions, but production systems must handle unpredictable inputs, failures, and edge cases.

Building systems that remain stable under real-world conditions is what distinguishes experimentation from professional engineering.

The Key Insight

Modern AI development has shifted significantly.

The difficult part is no longer building powerful models.

Instead, the challenge lies in designing systems that can interpret, coordinate, and act on the outputs those models produce.

In other words: the intelligence of modern AI systems increasingly comes from the architecture surrounding the model, not the model itself.

Tuesday, 31 December 2024

Optimizing Azure Document Intelligence for Performance and Cost Savings: A Case Study

    As a developer working with Azure Document Intelligence, optimizing document processing is crucial to reduce processing time without compromising the quality of output. In this post, I will share how I managed to improve the performance of my text analytics code, significantly reducing the processing time from 10 seconds to just 3 seconds, with no impact on the output quality.

Original Code vs Optimized Code

Initially, the document processing took around 10 seconds, which was decent but could be improved for better scalability and faster execution. After optimization, the processing time was reduced to just 3 seconds by applying several techniques, all without affecting the quality of the results.

Original Processing Time

  • Time taken to process: 10 seconds

Optimized Processing Time

  • Time taken to process: 3 seconds

Steps Taken to Optimize the Code

Here are the key changes I made to optimize the document processing workflow:

1. Preprocessing the Text

Preprocessing the text before passing it to Azure's API is essential for cleaning and normalizing the input data. This helps remove unnecessary characters, stop words, and any noise that could slow down processing. A simple preprocessing function was added to clean the text before calling the Azure API. Additionally, preprocessing reduces the number of tokens sent to Azure's API, directly lowering the associated costs since Azure charges based on token usage.

def preprocess_text(text):
    # Implement text cleaning: remove unnecessary characters, normalize text, etc.
    cleaned_text = text.lower()  # Example: convert to lowercase
    cleaned_text = re.sub(r'[^\w\s]', '', cleaned_text)  # Remove punctuation
    return cleaned_text

2. Specifying the Language Parameter

Azure Text Analytics API automatically detects the language of the document, but specifying the language parameter in API calls can skip this detection step, thereby saving time.

For example, by specifying language="en" when calling the API for recognizing PII entities, extracting key phrases, or recognizing named entities, we can directly process the text and skip language detection.

# Recognize PII entities pii_responses = text_analytics_client.recognize_pii_entities(documents, language="en") # Extract key phrases key_phrases_responses = text_analytics_client.extract_key_phrases(documents, language="en") # Recognize named entities entities_responses = text_analytics_client.recognize_entities(documents, language="en")

This reduces unnecessary overhead and speeds up processing, especially when dealing with a large number of documents in a specific language.

3. Batch Processing

Another performance optimization technique is to batch multiple documents together and process them in parallel. This reduces the overhead of making multiple individual API calls. By sending a batch of documents, Azure can process them in parallel, which leads to faster overall processing time.

# Example of sending multiple documents in one batch 
documents = ["Document 1 text", "Document 2 text", "Document 3 text"
batch_response = text_analytics_client.analyze_batch(documents)

4. Parallel API Calls

If you’re working with a large dataset, consider using parallel API calls for independent tasks. For example, you could recognize PII entities in one set of documents while extracting key phrases from another set. This parallel processing can be achieved using multi-threading or asynchronous calls.

Performance Gains

After applying these optimizations, the processing time dropped from 10 seconds to just 3 seconds per execution, which represents a 70% reduction in processing time. This performance boost is particularly valuable when dealing with large-scale document processing, where speed is critical.

Conclusion

Optimizing document processing with Azure Document Intelligence not only improves performance but also reduces operational costs. By incorporating preprocessing steps, specifying the language parameter, and utilizing batch and parallel processing, you can achieve significant performance improvements while maintaining output quality and minimizing costs by reducing token usage.

If you’re facing similar challenges, try out these optimizations and see how they work for your use case. I’d love to hear about any additional techniques you’ve used to speed up your document processing workflows while saving costs.

The Reality of Building AI Systems Today

In today’s AI ecosystem, many capabilities that once required deep machine learning expertise have become widely accessible. Powerful API...