Saturday 22 April 2023

Demonstrating Flask to create a simple Web-API OCR program

 Most of the time the real use of the Machine Learning code lies at the heart of an application which integrates the ML code to becomes a smart application.

“Consider the following situation:
You have built a super cool machine learning model that can predict if a particular transaction is fraudulent or not. Now, a friend of yours is developing an android application for general banking activities and wants to integrate your machine learning model in their application for its super objective.
But your friend found out that, you have coded your model in Python while your friend is building his application in Java. So? Won't it be possible to integrate your machine learning model into your friend's application?
Fortunately enough, you have the power of APIs. And the above situation is one of the many where the need of turning your machine learning models into APIs is extremely important.”- - https://www.datacamp.com/community/tutorials/machine-learning-models-api-python

The below code demonstrates how you can use Flask - A web services' framework in Python, to wrap a machine learning Python code into an API.

Few things to note:
· Here it is assumed the image is already in the path of the python code. That is we are not inputting the image from API. Since that was not the primary object of the exercise.
· You will find 2 OCR functions –

image = Image.open("screenshot.jpg")
image.filter(ImageFilter.SHARPEN)
new_size = tuple(2*x for x in image.size)
image = image.resize(new_size, Image.ANTIALIAS)
txt1 = pytesseract.image_to_string(image)
#return jsonify({'Converted OCR A': str(txt1)})

and
image = cv2.imread("screenshot.jpg")
image = cv2.resize(image, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
kernel = np.ones((1, 1), np.uint8)
image = cv2.dilate(image, kernel, iterations=1)
image = cv2.erode(image, kernel, iterations=1)
cv2.threshold(cv2.bilateralFilter(image, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt2 = pytesseract.image_to_string(image)
return jsonify({'Converted OCR ': str(txt1)})

Use any of the one. Add return at the end of the one you prefer to use. I will suggest you should try the accuracy of both.

· I used postman to check the result. You can do the same.

· After running flask if you need to check if it has been installed properly again refer to the article I have mentioned above from datacamp to check the screen shots. And from the same link you can refer how to run and check your code in postman.

· Please note here since the objective was to demonstrate the flask I haven’t chosen a model that needed to be trained. IF you have a model which needs to be trained write and train the model in a separate py file and call that file from the flask py file.


Steps to run:
Run the flask python file -
Place the image in the python file location
For me the text image was -
Open Postman and run the API



Resource that was referred:

Real time Natural Language Audio Analytics

 The code I have shared in this blog enables us to capture real time audio conversation and do analysis of the conversation on the fly.


The list of analytics that it does are:
· Sentiment Score or Polarity Score
· Polarizing Words – The words that are contributing towards the dialogue’s sentiment score. We have divided the sentiments in 3 types – positive, negative and neutral.
· Keywords – The particular brand, company, person etc. who is being talked about here. The Proper Nouns after POS (Parts of Speech) analysis were marked as keywords.
· Context – The context of the conversation or the general subject area. The nouns (other than Proper Nouns) after POS analysis are marked as Context.
· Action Words – The action words associated with the conversation, i.e. the verbs.

For example : Let’s consider the following customer feedback –
"Disappointed that the Dell Outlook ticket was closed reopen the ticket."


Things worth noting the code analyzed are -
Here the polarity or sentiment score is -0.48.
Negative word identified: disappointed.
Keywords identified: Dell, Outlook
Context identified: Ticket
Action words identified: Reopen.

The script can be used at real time to connect to or alert appropriate teams who might be aware of the scenario and better suited to handle the issue. And if you are looking for text analytics it creates an immediate analytics without having to run a separate script later.
 

Saturday 18 February 2023

Demystifying BERT - Things you should know before trying out BERT

 Introducing BERT


What is BERT?
BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.

What makes BERT different?
BERT builds upon recent work in pre-training contextual representations — including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike these previous models, BERT is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus (in this case, Wikipedia).

Why does this matter? Pre-trained representations can either be context-free or contextual, and contextual representations can further be unidirectional or bidirectional. Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary. For example, the word “bank” would have the same context-free representation in “bank account” and “bank of the river.” Contextual models instead generate a representation of each word that is based on the other words in the sentence. For example, in the sentence “I accessed the bank account,” a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account.” However, BERT represents “bank” using both its previous and next context — “I accessed the ... account” — starting from the very bottom of a deep neural network, making it deeply bidirectional.

The Strength of Bidirectionality If bidirectionality is so powerful, why hasn’t it been done before? To understand why, consider that unidirectional models are efficiently trained by predicting each word conditioned on the previous words in the sentence. However, it is not possible to train bidirectional models by simply conditioning each word on its previous and next words, since this would allow the word that’s being predicted to indirectly “see itself” in a multi-layer model.

To solve this problem, we use the straightforward technique of masking out some of the words in the input and then condition each word bidirectionally to predict the masked words. For example:
While this idea has been around for a very long time, BERT is the first time it was successfully used to pre-train a deep neural network.

BERT also learns to model relationships between sentences by pre-training on a very simple task that can be generated from any text corpus: Given two sentences A and B, is B the actual next sentence that comes after A in the corpus, or just a random sentence? For example:
How i extended BERT for chat bot

Already pretrained BERT was fine tuned on SQUAD database.
The model is pre-trained on 40 epochs over a 3.3 billion word corpus, including BooksCorpus (800 million words) and English Wikipedia (2.5 billion words).

The model is fine tuned in Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles


Deployment

I used Flask - A web services' framework in Python, to wrap a machine learning Python code into an API.
Training and Maintenance

  • Google BERT is a pre-trained model and there is no training involved.
  • You can fine tune it though like i did on SQUAD data set.
  • If you can spend some time on understanding the underlying code you can customize it to better suit your domain and requirement, like we did.
  • Once the code is deployed it needs to be constantly monitored and evaluated to understand improvement scope.
  • No day-to-day training is required.
Infra spec

Though the BERT pre-trained model should be able to run on any infra spec that is generally advised for any analytics use case, the infra that Google has advised for fine tuning is on the higher end by non-Google standard.
(Though BERT without fine tuning is also efficient, fine tuning result in substantial accuracy improvements.)

As per Google –

  • Fine-tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre-trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state-of-the-art.
  • All results on the paper were fine-tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re-produce most of the BERT-Large results on the paper using a GPU with 12GB - 16GB of RAM, because the maximum batch size that can fit in memory is too small.
  • The fine-tuning examples which use BERT-Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given.
  • Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080.

References I used for my learning and some content

Flavors of Text Analytics, NLP & Cognitive Analytics that every business should try