Tuesday, 31 December 2024

Optimizing Azure Document Intelligence for Performance and Cost Savings: A Case Study

    As a developer working with Azure Document Intelligence, optimizing document processing is crucial to reduce processing time without compromising the quality of output. In this post, I will share how I managed to improve the performance of my text analytics code, significantly reducing the processing time from 10 seconds to just 3 seconds, with no impact on the output quality.

Original Code vs Optimized Code

Initially, the document processing took around 10 seconds, which was decent but could be improved for better scalability and faster execution. After optimization, the processing time was reduced to just 3 seconds by applying several techniques, all without affecting the quality of the results.

Original Processing Time

  • Time taken to process: 10 seconds

Optimized Processing Time

  • Time taken to process: 3 seconds

Steps Taken to Optimize the Code

Here are the key changes I made to optimize the document processing workflow:

1. Preprocessing the Text

Preprocessing the text before passing it to Azure's API is essential for cleaning and normalizing the input data. This helps remove unnecessary characters, stop words, and any noise that could slow down processing. A simple preprocessing function was added to clean the text before calling the Azure API. Additionally, preprocessing reduces the number of tokens sent to Azure's API, directly lowering the associated costs since Azure charges based on token usage.

def preprocess_text(text):
    # Implement text cleaning: remove unnecessary characters, normalize text, etc.
    cleaned_text = text.lower()  # Example: convert to lowercase
    cleaned_text = re.sub(r'[^\w\s]', '', cleaned_text)  # Remove punctuation
    return cleaned_text

2. Specifying the Language Parameter

Azure Text Analytics API automatically detects the language of the document, but specifying the language parameter in API calls can skip this detection step, thereby saving time.

For example, by specifying language="en" when calling the API for recognizing PII entities, extracting key phrases, or recognizing named entities, we can directly process the text and skip language detection.

# Recognize PII entities pii_responses = text_analytics_client.recognize_pii_entities(documents, language="en") # Extract key phrases key_phrases_responses = text_analytics_client.extract_key_phrases(documents, language="en") # Recognize named entities entities_responses = text_analytics_client.recognize_entities(documents, language="en")

This reduces unnecessary overhead and speeds up processing, especially when dealing with a large number of documents in a specific language.

3. Batch Processing

Another performance optimization technique is to batch multiple documents together and process them in parallel. This reduces the overhead of making multiple individual API calls. By sending a batch of documents, Azure can process them in parallel, which leads to faster overall processing time.

# Example of sending multiple documents in one batch 
documents = ["Document 1 text", "Document 2 text", "Document 3 text"
batch_response = text_analytics_client.analyze_batch(documents)

4. Parallel API Calls

If you’re working with a large dataset, consider using parallel API calls for independent tasks. For example, you could recognize PII entities in one set of documents while extracting key phrases from another set. This parallel processing can be achieved using multi-threading or asynchronous calls.

Performance Gains

After applying these optimizations, the processing time dropped from 10 seconds to just 3 seconds per execution, which represents a 70% reduction in processing time. This performance boost is particularly valuable when dealing with large-scale document processing, where speed is critical.

Conclusion

Optimizing document processing with Azure Document Intelligence not only improves performance but also reduces operational costs. By incorporating preprocessing steps, specifying the language parameter, and utilizing batch and parallel processing, you can achieve significant performance improvements while maintaining output quality and minimizing costs by reducing token usage.

If you’re facing similar challenges, try out these optimizations and see how they work for your use case. I’d love to hear about any additional techniques you’ve used to speed up your document processing workflows while saving costs.

Wednesday, 20 November 2024

Building BloomBot: A Comprehensive Guide to Creating an AI-Powered Pregnancy Companion Using Gemini API

Solution approach for BloomBot

1. Problem Definition and Goals

Objective:

  • Develop BloomBot, an AI-powered chatbot tailored for expecting mothers to provide:
    • Pregnancy tips
    • Nutrition advice by week
    • Emotional support resources
    • A conversational interface for queries

Key Requirements:

  • AI-Powered Chat: Leverage Gemini for generative responses.
  • User Interface: Interactive and user-friendly chatbot interface.
  • Customization: Adapt responses based on pregnancy stages.
  • Scalability: Handle concurrent user interactions efficiently.

2. Architecture Overview

Key Components:

  1. Frontend:

    • Tool: Tkinter for desktop GUI.
    • Features: Buttons, dropdowns, text areas for interaction.
  2. Backend:

    • Role: Acts as a bridge between the frontend and Gemini API.
    • Tech Stack: Python with google.generativeai for Gemini API integration.
  3. Gemini API:

    • Purpose: Generate responses for user inputs.
    • Capabilities Used: Content generation, chat handling.
  4. Environment Configuration:

    • Secure API key storage using .env file and dotenv.

3. Solution Workflow

Frontend Interaction:

  • Users interact with BloomBot via a Tkinter-based GUI:
    • Buttons for specific tasks (e.g., pregnancy tips, nutrition advice).
    • A dropdown for selecting pregnancy weeks.
    • A text area for displaying bot responses.

Backend Processing:

  1. Task-Specific Prompts:
    • Predefined prompts for tasks like fetching pregnancy tips or emotional support.
    • Dynamic prompts (e.g., week-specific nutrition advice).
  2. Free-Form Queries:
    • Use the chat feature of Gemini to handle user inputs dynamically.
  3. Response Handling:
    • Parse and return Gemini's response to the frontend.

Gemini API Integration:

  • Models Used: gemini-1.5-flash.
  • API methods like generate_content for static prompts and start_chat for conversational queries.

4. Implementation Details

Backend Implementation

Key Features:

  1. Pregnancy Tip Generator:
    • Prompt: "Give me a helpful tip for expecting mothers."
    • Method: generate_content.
  2. Week-Specific Nutrition Advice:
    • Dynamic prompt: "Provide nutrition advice for week {week} of pregnancy."
    • Method: generate_content.
  3. Emotional Support Resources:
    • Prompt: "What resources are available for emotional support for expecting mothers?"
    • Method: generate_content.
  4. Chat Handler:
    • Start a conversation: start_chat.
    • Handle free-form queries.

Code Snippet:


class ExpectingMotherBotBackend: def __init__(self, api_key): self.api_key = api_key genai.configure(api_key=self.api_key) self.model = genai.GenerativeModel("models/gemini-1.5-flash") def get_pregnancy_tip(self): prompt = "Give me a helpful tip for expecting mothers." result = self.model.generate_content(prompt) return result.text if result.text else "Sorry, I couldn't fetch a tip right now." def get_nutrition_advice(self, week): prompt = f"Provide nutrition advice for week {week} of pregnancy." result = self.model.generate_content(prompt) return result.text if result.text else "I couldn't fetch nutrition advice at the moment." def get_emotional_support(self): prompt = "What resources are available for emotional support for expecting mothers?" result = self.model.generate_content(prompt) return result.text if result.text else "I'm having trouble fetching emotional support resources." def chat_with_bot(self, user_input): chat = self.model.start_chat() response = chat.send_message(user_input) return response.text if response.text else "I'm here to help, but I didn't understand your query."

Frontend Implementation

Key Features:

  1. Buttons and Inputs:
    • Fetch pregnancy tips, nutrition advice, or emotional support.
  2. Text Area:
    • Display bot responses with a scrollable interface.
  3. Dropdown:
    • Select pregnancy week for tailored nutrition advice.

Code Snippet:


class ExpectingMotherBotFrontend: def __init__(self, backend): self.backend = backend self.window = tk.Tk() self.window.title("BloomBot: Pregnancy Companion") self.window.geometry("500x650") self.create_widgets() def create_widgets(self): title_label = tk.Label(self.window, text="BloomBot: Your Pregnancy Companion") title_label.pack() # Buttons for functionalities tip_button = tk.Button(self.window, text="Get Daily Pregnancy Tip", command=self.show_pregnancy_tip) tip_button.pack() self.week_dropdown = ttk.Combobox(self.window, values=[str(i) for i in range(1, 51)], state="readonly") self.week_dropdown.pack() nutrition_button = tk.Button(self.window, text="Get Nutrition Advice", command=self.show_nutrition_advice) nutrition_button.pack() support_button = tk.Button(self.window, text="Emotional Support", command=self.show_emotional_support) support_button.pack() self.response_text = tk.Text(self.window) self.response_text.pack() def show_pregnancy_tip(self): tip = self.backend.get_pregnancy_tip() self.display_response(tip) def show_nutrition_advice(self): week = self.week_dropdown.get() advice = self.backend.get_nutrition_advice(int(week)) self.display_response(advice) def show_emotional_support(self): support = self.backend.get_emotional_support() self.display_response(support) def display_response(self, response): self.response_text.delete(1.0, tk.END) self.response_text.insert(tk.END, response)

5. Deployment

Steps:

  1. Environment Setup:
    • Install required packages: pip install tkinter requests google-generativeai python-dotenv.
    • Set up .env with the Gemini API key.
  2. Testing:
    • Ensure prompt-response functionality works as expected.
    • Test UI interactions and Gemini API responses.

6. Monitoring and Maintenance

  • Usage Analytics: Track interactions for feature improvements.
  • Error Handling: Implement better fallback mechanisms for API failures.
  • Feedback Loop: Regularly update prompts based on user feedback.



Monday, 20 May 2024

Amazon's Digital Transformation: A Journey of Innovation and Expansion

 

Amazon.com is a well-known online ecommerce trading platform that has transformed the way we shop. Its Marketplace platform, with superior data analytics and logistics efficiency, has enabled it to create a pseudo perfectly competitive market, optimize and reduce the cost of products, and offer personalized recommendations to customers.

 

Amazon's Core Competency: The Marketplace Platform

Amazon's business model is driven by data and technology, allowing it to innovate and transform existing business models in multiple industries and domains. Its core competency lies in finding service improvement opportunities in business models and driving innovation using technology and data.

 

Amazon's Next Growth Engine: AWS

In the short term, Amazon's next growth engine is its cloud computing services, AWS. AWS is a critical part of Amazon's core business, as it can process and churn massive amounts of data and drive innovation using technology and data. AWS also generates a large portion of Amazon's revenue and profit.

 

Amazon's Expansion to Growing Economies

As Amazon expands to growing economies, it faces competition from regional competitors who have a better understanding of the local market and may receive better support and protection from local governments. To succeed, Amazon needs to learn the local markets better, possibly by partnering with or acquiring local businesses, and adapting to the government and regulatory policies in different regions.

 

In conclusion, Amazon's digital transformation journey shows a company that is constantly innovating and expanding, using data and technology to drive growth and success.

 

Saturday, 13 April 2024

Copilot & GPT for product managers & owners

 

GPT, or Generative Pretrained Transformer, is a type of AI language model that can generate human-like text. This technology has the potential to revolutionize the way we create content, making the process faster, easier, and more efficient.

I am a Product owner & manager, and in my role -

·         I need to create a lot of contents fast,

·         Analyze a lot of data.

·         Need to find documents fast in my system.

·         And need to discover documents to refer across the organization.

·         Churn out POCs very quickly.

And for all these, GPT has been God send for me. And thanks to my organization they have made MS Copilot available to me. Microsoft 365 Copilot internally utilizes GPT-4.

Do not consider this as the ultimate reference, but some of the ways I have used GPT & Copilot are.

·         Using Copilot to create meeting summaries, sharing it with the team, and keeping for personal reference.

·         Creating reference documents on a particular subject – by downloading transcriptions of all previous meetings recorded and combining them with all notes and creating a summarized document from them and then finally manually structuring them.

·         I rarely have recently created a document on MS doc from scratch without getting a draft from MS doc Copilot.

·         Created notes by summarizing multiple documents to create information from large data and have made them consumable for wider audience.

·         Created presentations, slides & organized slides from unstructured documents and have made them consumable for wider audience.

·         You can create your own chat bot by putting your tabular data on an excel table and then use it for question answering using Copilot over excel.

·         I have discovered documents across the organization for reference using Microsoft 365 Copilot.

·         I have used Microsoft 365 Copilot to ask questions & find documents while searching for particular contents within my personal Organization One drive.

·         Being an AI product owner, I have particularly used Copilot to demonstrate the concept of Sentiment analysis over MS words & excels which I never thought could have been possible.


I think Copilot on an average has saved 30-45 minutes ~ improved 5-7% of my productivity, on a daily basis.


Monday, 20 November 2023

Evaluating the success of an AI&ML use case

              Data science team has finished development of the current version of the ML model & has reported an accuracy or error metric. But you are not sure how to put that number in context. Whether that number is good or not good enough.

                In one of my previous blogs, I have addressed the issue of AI investment and how long before the business can know if the engagement has some potential or it is not going anywhere. This blog can be considered an extension of the above-mentioned blog. If you haven’t checked it out already, please visit : https://anirbandutta-ideasforgood.blogspot.com/2023/07/investment-on-developing-ai-models.html

                In my previous blog, I spoke about the KPIs like Accuracy and Error as thumb-rules to quickly assess on the potential success of the use case. In this blog, I will try to add more specificity or relativeness to it.

                Fundamentally, to evaluate the latest performance KPI of your AI&ML model, there are 3 ways you can go about it, in independence or in combination.

Consider human level performance metric.

For AI use cases which has the primary objective of replacing human effort, this can be considered the primary success metric. For example, if for a particular process the current human error stands at 5%, and the AI can have less or equal to 5% error rate, it can be determined a valuable model. Because AI with the same error rate, bring along with it - smart automation, speeding up the process, negligible down-time etc.

Example: Tasks which needs data entry can easily be replicated by AI. But the success criteria for AI does not need to be 100% accuracy for adoption, but just must match the accuracy which the human counterpart was delivering, to be adopted for real word deployment.

Base Model metric

In use cases for which problem areas getting addressed are more theoretical in nature, or the discovery of the business problem that can get addressed is in progress, its best to create a quick simple base model and then try to improve the model with each iteration.

For example: Currently I am working on a system to determine if a content is created by AI or not. For the lack of any past reference based to which the accuracy can be compared, I have taken this approach to determine the progress.

Satisfying & optimizing metric

We outline both, a metric that we want the model to do as good as possible (we call this optimizing metric) while also meeting some minimum standard which makes it functional and valuable in real life scenarios (we call this satisfying metric)

Example: For Home Voice Assistant, the optimizing metric would be the accuracy of a model hearing exactly what someone said. The satisfying metric would be that the model does not take more than 100 ms to process what was said.

Wednesday, 18 October 2023

AI TRUST & ADOPTION – THE METRICS TO MONITOR

 

Trust is critical to AI adoption. With more deployment of next generation of AI models, building trust on these systems becomes even more vital and difficult. For example, although with the amazing capabilities Generative AI, LLMs are delivering, it brings along with it the trouble of it being larger, complex, and opaque than ever. This makes identification of the right metrics and continuously monitoring and reporting them imperative.

Below are some of the most critical metrics that every organization & business should be continuously monitoring and have the capability to report them as and when necessary.

DATA

       Date of instances

       Date processed.

       Owner & steward

       Who created it?

       Who funded it?

       Who’s the intended user?

       Who’s accountable?

       What do instances (i.e., rows) represent?

       How many instances are there?

       Is it all of them or was it sampled?

       How was it sampled?

       How was it collected?

       Are there any internal or external keys?

       Are there target variables?

       Descriptive statistics and distributions of important and sensitive variables

       How often is it updated?

       How long are old instances retained?

       Applicable regulations (e.g., HIPAA)

 

MODELS

       Date trained.

       Owner & steward

       Who created it?

       Who funded it?

       Who’s the intended user?

       Who’s accountable?

       What do instances (i.e., rows) represent?

       What does it predict?

       Features

       Description of its training & validation data sets

       Performance metrics

       When was it trained?

       How often is it retrained?

       How long are old versions retained?

       Ethical and regulatory considerations

 

BIAS remains one of the most difficult KPI to define & measure. Hence, I am excited to find some measures which can contribute measuring presence of BIAS in some format.

  •          Demographic representation: Does a dataset have the same distribution of sensitive subgroups as the target population?
  •          Equality of opportunity: Like equalized odds, but only checks the true positive rate.
  •          Average odds difference: The difference between the false positive and true positive
  •          Demographic parity: Are model prediction averages about the same overall and for sensitive subgroups? For example, if we’re predicting the likelihood to pay a phone bill on time, does it predict about the same pay rate for men and women? A t-test, Wilcoxon test, or bootstrap test could be used.
  •          Equalized odds: For Boolean classifiers that predict true or false, are the true positive and false positive rates about the same for sensitive subgroups? For example, is it more accurate for young adults than for the elderly?
  •          Average odds difference: The difference between the false positive and true positive
  •          Odds ratio: Positive outcome rate divided by the negative outcome rate. For example, (likelihood that men pay their bill on time) / (likelihood that men don’t pay their bill on time) compared to that for women.
  •          Disparate impact: Ratio of the favorable prediction rate for a sensitive subgroup to that of the overall population.
  •          Predictive rate parity: Is model accuracy about the same for different sensitive subgroups? Accuracy can be measured by things such as precision, F-score, AUC, mean squared error, etc.

But considering all the above, we must be very sensitive and be cognizant of the business & social context while identifying our above mentioned “sensitive group.”

By no means, it is a exhaustive list, but only a start towards a safer & fairer digital ecosystem. I will try my best to consolidate new information.

 

Thanking dataiku, some information collected from dataiku report: How to build trustworthy AI systems.

Tuesday, 4 July 2023

Investment on developing AI&ML models – timelines & diminishing return

 

One of the most popular questions that I often get asked by the stakeholders is about the timelines required for a ML model to finish development. I will try to address the subtlety of this topic in this writeup.

AI development is a unique scenario where you are expected to deliver an innovation. It is a special case where the resource required is uncertain. And hence it is sometimes very difficult to understand when & where to “STOP”.

When I talk to businesses one of the questions, what I stress about the most, is for them to define what an “MVP” solution is to them. That is with what minimum accuracy or maximum error rate the AI solution would still be useful for their business.

If you are investing on AI use cases one of the concepts, I would recommend you understand is – AI resourcing & diminishing return. Please look at the graph below –

 



            So, what I suggest to the AI investors are if you haven’t reached an MVP by the point of maximum return, “STOP”. For example – By the end of PoMR if the model is still with an error rate of 30%, and that is something that does not work for your business, may be AI cannot solve this for you. Maybe it needs a completely different approach to solve this. Whatever is the case, deploying more resource is not the solution.

            Driving from my experience with all the AI&ML use cases I have worked for almost a decade now, a general thumb rule which I recommend is – The accuracy or error rate, that you get at the end of 3 months is your Point of maximum return. You should reach an MVP by then. Beyond that it should be fine tuning or customizing to specific business needs. By then if it it’s still miles apart from your business objective, may be its time to pull the plug.

            This is again an opinion piece, and these has been my experience. Will be glad to hear how the journey has been for you.

Sunday, 25 June 2023

The concept of game theory

 



Do you remember this legendary sentence from Bruce Lee – ‘Boards don’t hit back !’.

Well, he himself may have meant it otherwise, this sentence had a profound importance in the field of Mathematics and Deep Data Science.

Most decisions from our everyday life to business and War strategies are majorly dependent how other parties are behaving.

And thus most practical interactions are not with boards but with other parties whom we cannot predict how they will react to different scenarios.

And thus comes the concept of game theory.

The official definition is - the branch of mathematics concerned with the analysis of strategies for dealing with competitive situations where the outcome of a participant's choice of action depends critically on the actions of other participants. Game theory has been applied to contexts in war, business, and biology.

And something that we are currently lacking as a data science community is we predict stuffs considering people around us will behave rationally or predictively, which is not always the right assumption.

That is why it is so difficult or impossible to predict an election or stock market.
Few major contributors towards this field are - John von Neumann and John Nash.(The one from beautiful mind).

So the key takeaway for the team - is with time you will get more exposure to predictions, functions and recommendations from machines. But understand it is an indicator not an exact science. Because it’s very difficult to predict how the other parties are going to behave in both competitive or co-operative platform.

Below is a very good write up on afghan conflict in the context of game theory.

Monday, 19 June 2023

AI, ML, Data Science - frequent consultation advises to leadership

 In this current post I have tried to compile most questions, discussions and queries I come across while consulting Data Science road maps with leaders and managers. I am hoping this compilation will add value to the other leaders and managers too who may have at some point wondered about them but didn’t get the opportunity to have those discussions. Many of you may have a better response or more exposure to some of the questions, I have tried to compile it based on the best knowledge I have and how I go about explaining them.

Please reach out, if you think you have a question or point that appears a lot during consultation and is worth discussed upon.

1. Should we build this product or capability in-house or get it from a vendor?
A vendor product will always be a generalized one to cut across as many businesses as possible as they thrive on repeat-ability. While when you build in-house you can make it more customized to a smaller set of use cases or scenarios and may be create a better differentiation.
So please ask yourself –
· When partnering with a vendor what role do I play? What stops the vendor from partnering with the business directly in future? What is my value addition, is there any risk I may become insignificant.
· What kind of team I have? If you have a great engineering team may be you want to do more stuffs in-house and keep a bigger piece of pie for yourself.
· What is my core capability? Is the capability needed in line with our core skill or is it something we want to learn, then maybe we should do it in house, or it is something we just want to get done, then may be the best way is to get a vendor involved.

2. We have created certain Analytics use cases. But we see several other teams also creating similar use cases.
Differentiation of analytics product or use cases are driven by each and combination of below –
a) Deep domain knowledge
b) Combination of data from different systems brought together on a dynamic big data system
c) Deep or mature algorithm applied
If your use cases are easy to replicate it’s most probably on a shallow data, with very general domain knowledge applied with basic Data Science techniques.

3. Are we using the kind of AI that is used for technologies like Self Driving car?
Yes and No. Internally all these technologies uses combinations of neural net and reinforcement learning. We also for different use cases have used variation of same and similar technologies. But technologies like self-driving car works on image or vision data, which we generally don’t do. Our use cases are mostly based on numerical, text and language processing data.

4. Vendor says their product is being used by Amazon. So should we go ahead and buy it?
May be it is being used by Amazon or any similarly big companies, but ask the vendor if their product is being used for a mission critical process or for some PoC or to store or process data like click-stream data which is not business critical. This makes all the difference, if the logos vendors show you are using the vendors technology for business critical projects or some non-critical processes.

5. We are showing the use case to the business but it’s not making much of an impact.
Story telling needs to be improved. Every analytics use case must be associated with a story that should end with the business making or saving money. If the story cannot relate how the engineering will improve the customer’s financial bottom-line, the customer business does not care about it, irrespective of how good the engineering is.

6. Now we have a data scientist in our team. Can we now expect more insights from our data?
Data Scientists alone cannot ensure project success. Data Engineers, Big Data and Cloud Infra Engineers are equally important part of the technical team. Without the infrastructure in place and data being stored in the infra in proper format, Data Scientists cannot do his or her magic.

7. We are finding very difficult to hire data scientists and big data developers.
Though there is no dearth of CVs, finding genuinely talented people with actual knowledge and production implementation knowledge is difficult. And among the few, most are already paid well and on good projects. So whenever a decision is taken to hire senior data science talents, a 6 month time frame should be kept in hand.

8. What is the difference between ML and AI?
Though you will find several answers to this in the internet one good way I have found to explain it to a business person, without the jargons is as below. By definition ML comes within the broader scope of AI. But to understand better and remember, Ai is something that is built to replicate human behavior. A program is called a successful AI when it can pass a Turing Test. A system is said to pass a Turing test when we cannot differentiate the intelligence coming from a machine and a human. ML is a system which you create to find pattern in a big data set that is too big for a human brain to comprehend. On a lighter note – if you ask a machine 1231*1156 and it answers it in a fraction of a second it is ML and if it pauses, makes some comment and answers after 5 mins, like a human, it is AI.

9. Why aren’t we using a big data Hadoop architecture but using RDBMS like MSSQL. Oracle.
RDBMS products like MSSQL, Oracle are still viable analytics products and are not replaceable by Big Data tools for many scenarios. Deciding on a data store or a data processing engine involves a lot of factors like ACID-BASE properties, type and size of data, current implementation, skill set of the technical team etc. So doing an analytics project does not make Hadoop or NoSQL product default.

10. Here is some data, give me some insight.
This is the first line of any failed initiative. A project which is not clear about the business problem it wants to solve is sure to fail. Starting on an analytics project without a clear goal in mind and for the sake of just adding a data science project to the portfolio and no road-map how this will eventually contribute to company goal, is a waste of resource and will only end in failure.