Tuesday, 31 December 2024

Optimizing Azure Document Intelligence for Performance and Cost Savings: A Case Study

    As a developer working with Azure Document Intelligence, optimizing document processing is crucial to reduce processing time without compromising the quality of output. In this post, I will share how I managed to improve the performance of my text analytics code, significantly reducing the processing time from 10 seconds to just 3 seconds, with no impact on the output quality.

Original Code vs Optimized Code

Initially, the document processing took around 10 seconds, which was decent but could be improved for better scalability and faster execution. After optimization, the processing time was reduced to just 3 seconds by applying several techniques, all without affecting the quality of the results.

Original Processing Time

  • Time taken to process: 10 seconds

Optimized Processing Time

  • Time taken to process: 3 seconds

Steps Taken to Optimize the Code

Here are the key changes I made to optimize the document processing workflow:

1. Preprocessing the Text

Preprocessing the text before passing it to Azure's API is essential for cleaning and normalizing the input data. This helps remove unnecessary characters, stop words, and any noise that could slow down processing. A simple preprocessing function was added to clean the text before calling the Azure API. Additionally, preprocessing reduces the number of tokens sent to Azure's API, directly lowering the associated costs since Azure charges based on token usage.

def preprocess_text(text):
    # Implement text cleaning: remove unnecessary characters, normalize text, etc.
    cleaned_text = text.lower()  # Example: convert to lowercase
    cleaned_text = re.sub(r'[^\w\s]', '', cleaned_text)  # Remove punctuation
    return cleaned_text

2. Specifying the Language Parameter

Azure Text Analytics API automatically detects the language of the document, but specifying the language parameter in API calls can skip this detection step, thereby saving time.

For example, by specifying language="en" when calling the API for recognizing PII entities, extracting key phrases, or recognizing named entities, we can directly process the text and skip language detection.

# Recognize PII entities pii_responses = text_analytics_client.recognize_pii_entities(documents, language="en") # Extract key phrases key_phrases_responses = text_analytics_client.extract_key_phrases(documents, language="en") # Recognize named entities entities_responses = text_analytics_client.recognize_entities(documents, language="en")

This reduces unnecessary overhead and speeds up processing, especially when dealing with a large number of documents in a specific language.

3. Batch Processing

Another performance optimization technique is to batch multiple documents together and process them in parallel. This reduces the overhead of making multiple individual API calls. By sending a batch of documents, Azure can process them in parallel, which leads to faster overall processing time.

# Example of sending multiple documents in one batch 
documents = ["Document 1 text", "Document 2 text", "Document 3 text"
batch_response = text_analytics_client.analyze_batch(documents)

4. Parallel API Calls

If you’re working with a large dataset, consider using parallel API calls for independent tasks. For example, you could recognize PII entities in one set of documents while extracting key phrases from another set. This parallel processing can be achieved using multi-threading or asynchronous calls.

Performance Gains

After applying these optimizations, the processing time dropped from 10 seconds to just 3 seconds per execution, which represents a 70% reduction in processing time. This performance boost is particularly valuable when dealing with large-scale document processing, where speed is critical.

Conclusion

Optimizing document processing with Azure Document Intelligence not only improves performance but also reduces operational costs. By incorporating preprocessing steps, specifying the language parameter, and utilizing batch and parallel processing, you can achieve significant performance improvements while maintaining output quality and minimizing costs by reducing token usage.

If you’re facing similar challenges, try out these optimizations and see how they work for your use case. I’d love to hear about any additional techniques you’ve used to speed up your document processing workflows while saving costs.

Wednesday, 20 November 2024

Building BloomBot: A Comprehensive Guide to Creating an AI-Powered Pregnancy Companion Using Gemini API

Solution approach for BloomBot

1. Problem Definition and Goals

Objective:

  • Develop BloomBot, an AI-powered chatbot tailored for expecting mothers to provide:
    • Pregnancy tips
    • Nutrition advice by week
    • Emotional support resources
    • A conversational interface for queries

Key Requirements:

  • AI-Powered Chat: Leverage Gemini for generative responses.
  • User Interface: Interactive and user-friendly chatbot interface.
  • Customization: Adapt responses based on pregnancy stages.
  • Scalability: Handle concurrent user interactions efficiently.

2. Architecture Overview

Key Components:

  1. Frontend:

    • Tool: Tkinter for desktop GUI.
    • Features: Buttons, dropdowns, text areas for interaction.
  2. Backend:

    • Role: Acts as a bridge between the frontend and Gemini API.
    • Tech Stack: Python with google.generativeai for Gemini API integration.
  3. Gemini API:

    • Purpose: Generate responses for user inputs.
    • Capabilities Used: Content generation, chat handling.
  4. Environment Configuration:

    • Secure API key storage using .env file and dotenv.

3. Solution Workflow

Frontend Interaction:

  • Users interact with BloomBot via a Tkinter-based GUI:
    • Buttons for specific tasks (e.g., pregnancy tips, nutrition advice).
    • A dropdown for selecting pregnancy weeks.
    • A text area for displaying bot responses.

Backend Processing:

  1. Task-Specific Prompts:
    • Predefined prompts for tasks like fetching pregnancy tips or emotional support.
    • Dynamic prompts (e.g., week-specific nutrition advice).
  2. Free-Form Queries:
    • Use the chat feature of Gemini to handle user inputs dynamically.
  3. Response Handling:
    • Parse and return Gemini's response to the frontend.

Gemini API Integration:

  • Models Used: gemini-1.5-flash.
  • API methods like generate_content for static prompts and start_chat for conversational queries.

4. Implementation Details

Backend Implementation

Key Features:

  1. Pregnancy Tip Generator:
    • Prompt: "Give me a helpful tip for expecting mothers."
    • Method: generate_content.
  2. Week-Specific Nutrition Advice:
    • Dynamic prompt: "Provide nutrition advice for week {week} of pregnancy."
    • Method: generate_content.
  3. Emotional Support Resources:
    • Prompt: "What resources are available for emotional support for expecting mothers?"
    • Method: generate_content.
  4. Chat Handler:
    • Start a conversation: start_chat.
    • Handle free-form queries.

Code Snippet:


class ExpectingMotherBotBackend: def __init__(self, api_key): self.api_key = api_key genai.configure(api_key=self.api_key) self.model = genai.GenerativeModel("models/gemini-1.5-flash") def get_pregnancy_tip(self): prompt = "Give me a helpful tip for expecting mothers." result = self.model.generate_content(prompt) return result.text if result.text else "Sorry, I couldn't fetch a tip right now." def get_nutrition_advice(self, week): prompt = f"Provide nutrition advice for week {week} of pregnancy." result = self.model.generate_content(prompt) return result.text if result.text else "I couldn't fetch nutrition advice at the moment." def get_emotional_support(self): prompt = "What resources are available for emotional support for expecting mothers?" result = self.model.generate_content(prompt) return result.text if result.text else "I'm having trouble fetching emotional support resources." def chat_with_bot(self, user_input): chat = self.model.start_chat() response = chat.send_message(user_input) return response.text if response.text else "I'm here to help, but I didn't understand your query."

Frontend Implementation

Key Features:

  1. Buttons and Inputs:
    • Fetch pregnancy tips, nutrition advice, or emotional support.
  2. Text Area:
    • Display bot responses with a scrollable interface.
  3. Dropdown:
    • Select pregnancy week for tailored nutrition advice.

Code Snippet:


class ExpectingMotherBotFrontend: def __init__(self, backend): self.backend = backend self.window = tk.Tk() self.window.title("BloomBot: Pregnancy Companion") self.window.geometry("500x650") self.create_widgets() def create_widgets(self): title_label = tk.Label(self.window, text="BloomBot: Your Pregnancy Companion") title_label.pack() # Buttons for functionalities tip_button = tk.Button(self.window, text="Get Daily Pregnancy Tip", command=self.show_pregnancy_tip) tip_button.pack() self.week_dropdown = ttk.Combobox(self.window, values=[str(i) for i in range(1, 51)], state="readonly") self.week_dropdown.pack() nutrition_button = tk.Button(self.window, text="Get Nutrition Advice", command=self.show_nutrition_advice) nutrition_button.pack() support_button = tk.Button(self.window, text="Emotional Support", command=self.show_emotional_support) support_button.pack() self.response_text = tk.Text(self.window) self.response_text.pack() def show_pregnancy_tip(self): tip = self.backend.get_pregnancy_tip() self.display_response(tip) def show_nutrition_advice(self): week = self.week_dropdown.get() advice = self.backend.get_nutrition_advice(int(week)) self.display_response(advice) def show_emotional_support(self): support = self.backend.get_emotional_support() self.display_response(support) def display_response(self, response): self.response_text.delete(1.0, tk.END) self.response_text.insert(tk.END, response)

5. Deployment

Steps:

  1. Environment Setup:
    • Install required packages: pip install tkinter requests google-generativeai python-dotenv.
    • Set up .env with the Gemini API key.
  2. Testing:
    • Ensure prompt-response functionality works as expected.
    • Test UI interactions and Gemini API responses.

6. Monitoring and Maintenance

  • Usage Analytics: Track interactions for feature improvements.
  • Error Handling: Implement better fallback mechanisms for API failures.
  • Feedback Loop: Regularly update prompts based on user feedback.



Monday, 20 May 2024

Amazon's Digital Transformation: A Journey of Innovation and Expansion

 

Amazon.com is a well-known online ecommerce trading platform that has transformed the way we shop. Its Marketplace platform, with superior data analytics and logistics efficiency, has enabled it to create a pseudo perfectly competitive market, optimize and reduce the cost of products, and offer personalized recommendations to customers.

 

Amazon's Core Competency: The Marketplace Platform

Amazon's business model is driven by data and technology, allowing it to innovate and transform existing business models in multiple industries and domains. Its core competency lies in finding service improvement opportunities in business models and driving innovation using technology and data.

 

Amazon's Next Growth Engine: AWS

In the short term, Amazon's next growth engine is its cloud computing services, AWS. AWS is a critical part of Amazon's core business, as it can process and churn massive amounts of data and drive innovation using technology and data. AWS also generates a large portion of Amazon's revenue and profit.

 

Amazon's Expansion to Growing Economies

As Amazon expands to growing economies, it faces competition from regional competitors who have a better understanding of the local market and may receive better support and protection from local governments. To succeed, Amazon needs to learn the local markets better, possibly by partnering with or acquiring local businesses, and adapting to the government and regulatory policies in different regions.

 

In conclusion, Amazon's digital transformation journey shows a company that is constantly innovating and expanding, using data and technology to drive growth and success.

 

Saturday, 13 April 2024

Copilot & GPT for product managers & owners

 

GPT, or Generative Pretrained Transformer, is a type of AI language model that can generate human-like text. This technology has the potential to revolutionize the way we create content, making the process faster, easier, and more efficient.

I am a Product owner & manager, and in my role -

·         I need to create a lot of contents fast,

·         Analyze a lot of data.

·         Need to find documents fast in my system.

·         And need to discover documents to refer across the organization.

·         Churn out POCs very quickly.

And for all these, GPT has been God send for me. And thanks to my organization they have made MS Copilot available to me. Microsoft 365 Copilot internally utilizes GPT-4.

Do not consider this as the ultimate reference, but some of the ways I have used GPT & Copilot are.

·         Using Copilot to create meeting summaries, sharing it with the team, and keeping for personal reference.

·         Creating reference documents on a particular subject – by downloading transcriptions of all previous meetings recorded and combining them with all notes and creating a summarized document from them and then finally manually structuring them.

·         I rarely have recently created a document on MS doc from scratch without getting a draft from MS doc Copilot.

·         Created notes by summarizing multiple documents to create information from large data and have made them consumable for wider audience.

·         Created presentations, slides & organized slides from unstructured documents and have made them consumable for wider audience.

·         You can create your own chat bot by putting your tabular data on an excel table and then use it for question answering using Copilot over excel.

·         I have discovered documents across the organization for reference using Microsoft 365 Copilot.

·         I have used Microsoft 365 Copilot to ask questions & find documents while searching for particular contents within my personal Organization One drive.

·         Being an AI product owner, I have particularly used Copilot to demonstrate the concept of Sentiment analysis over MS words & excels which I never thought could have been possible.


I think Copilot on an average has saved 30-45 minutes ~ improved 5-7% of my productivity, on a daily basis.