Saturday, 29 April 2023

Guru Mantras – What works for a successful AI, ML & Data Science implementation

  • Value of the problem - Before you start solving your analytics use case, ask yourself – how significant the change will be for the business if you get the perfect answer to your question. If the change is not significant enough don’t even bother to start solving it. With enough data most questions can be solved but is it worth the effort ?
  • You get paid for the business solution not the technical engineering - The objective of your project should be a business problem or a strategic solution. If you see yourself solving a tactical or IT problem, remember you are impacting the means to an end but not the end.
  • We ourselves are the most advanced intelligence - Whenever you are thinking of a problem try to think how our brain would have solved it. Although our solutions are not as sophisticated as the brain they were all inspired by them Like when you basket a ball think how our brain considers different things like our height, distance from the basket, strength of the wind, our angle from the basket etc. and then determines the strength of our throw and how it gets better with time. So when you build the machine how it should process the same things and get better with time.
  • What is success ? Understand from your customer what success means to them. Define success as part of the project scope. Try not to promise any particular number that you will achieve as part of your algorithm outcome like 95% accuracy as part of the scope. Explain the algorithm outcomes. Try to explain what the algorithm does in a simplistic manner. The business will be much more open to include algorithmic outcomes as part of their decision making processes if they have some intuition of what the algorithm does.
  • Some MLs are black boxes. Understand that some ML models given enough data to train works. We actually tell it how to understand and process the data. We actually don’t know how internally it is differentiating. For example we are currently running a project to differentiate between a clean and messy room. And we have tremendous success but we really don’t know how it internally differentiates between the two.

  • Remember AI winter. AI winter is a period of reduced funding and interest in artificial intelligence research. The AI winter was a result of hype, due to over-inflated promises by developers, unnaturally high expectations from end-users, and extensive promotion in the media The term was coined by analogy to the idea of a nuclear winter. It has happened in 70 and 80s. So do not be pressurized to say yes to something which the business says they have read about or seen somewhere else. Understand there is a lot of false hype around and you should be sure what you are promising can be build in a feasible manner.
  • Cannot build castle on air Algorithms are almost as good as the data – in terms of quality and size, and the infra on which it runs. So before you engage that data scientist ask your self do I have the required data in terms of both size and format and in consistent quality available on my repository to run the algorithm on. And also do I have the processing power and big data infra set up to handle that processing. If you don’t have these, building the algo should not be your first priority. Do remember, many of the concepts we are using now were always there it’s our processing power which have brought them back to spotlight.

    • Don’t fight the big techs Use domain knowledge that you have gathered over years of being in business i.e. tricks of trade or domain business knowledge. It can be years of building cars, manufacturing things or making software products.
    1. · Avoid direct competition with the tech giants. If the product is too generic in nature they will build it faster and better with their deep resources.
    2. · Integrate data science closely with business products. Analytics should be out of the box and intuitive. If required collaborate with the tech giants’ offerings but never give away domain expertise.
    3. · Enable data science teams with domain knowledge. Celebrate people who are domain experts and make them part of the data science team.
    And we know big tech 4 – (Facebook, Amazon, Microsoft, Google) strengths.
    1. Deep funds available to do experimental innovation. Significantly less pressure to go profitable.
    2. Army of engineers and scientists.
    3. Vast computer infra. So vast that a company can rent their unused infra and it can become one of world’s biggest business (read AWS)
    • Systems are difficult to bring together I often see people talking about planning to bring systems together like taking walk on a park. It is rather like swimming across Arctic ocean. Think about bringing Twitter and Facebook together and identifying both are same person.In most cases the PII data cannot be used due to data security.Data level is a great challenge.
    • Few things are difficult to predict and action Stock market is one such example.
    1. It highly depends on the principle of game theory. That is you are very much dependent what others are doing. In a way you are trying to predict other’s behavior rather than the market.
    2. Lot of fake and misleading news in the net
    3. Influencing factors keep changing. We never knew Trump tweets can change the course of the market. But you can appreciate the trend in the long run.


    Application of YOLO - and things i figured on the way

     1. YOLO is a pre-trained model you can reuse for most of classes already added. if you want to get a list of the object please check the file coco.names which will have the list of 80 classes. Initially i tried recreating the yolo.h5 model by running yad2k.py over yolo.cfg and yolo.weights which i later figured was not required and reused pre-trained weights.


    2. Its important to understand the concept of -
    Encode Bounding boxes, Intersection over unoin and non-max suppression, anchor boxes.
    Good place to refresh one's understanding -

    3.As said earlier, though the YOLOv3 model should let you detect the object from the 80 classes mentioned, to detect any additional class you will need to custom train the YOLO model.

    4.Finally though it may not be required to run the model, but to apply it to actual use cases one should be clear of the concept of CNNs. Few blogs i will suggest are -
    https://victorzhou.com/blog/keras-cnn-tutorial/
    https://victorzhou.com/blog/intro-to-cnns-part-1/

    I tried the YOLO pre-trained weights and below are the results -



    Saturday, 22 April 2023

    Demonstrating Flask to create a simple Web-API OCR program

     Most of the time the real use of the Machine Learning code lies at the heart of an application which integrates the ML code to becomes a smart application.

    “Consider the following situation:
    You have built a super cool machine learning model that can predict if a particular transaction is fraudulent or not. Now, a friend of yours is developing an android application for general banking activities and wants to integrate your machine learning model in their application for its super objective.
    But your friend found out that, you have coded your model in Python while your friend is building his application in Java. So? Won't it be possible to integrate your machine learning model into your friend's application?
    Fortunately enough, you have the power of APIs. And the above situation is one of the many where the need of turning your machine learning models into APIs is extremely important.”- - https://www.datacamp.com/community/tutorials/machine-learning-models-api-python

    The below code demonstrates how you can use Flask - A web services' framework in Python, to wrap a machine learning Python code into an API.

    Few things to note:
    · Here it is assumed the image is already in the path of the python code. That is we are not inputting the image from API. Since that was not the primary object of the exercise.
    · You will find 2 OCR functions –

    image = Image.open("screenshot.jpg")
    image.filter(ImageFilter.SHARPEN)
    new_size = tuple(2*x for x in image.size)
    image = image.resize(new_size, Image.ANTIALIAS)
    txt1 = pytesseract.image_to_string(image)
    #return jsonify({'Converted OCR A': str(txt1)})

    and
    image = cv2.imread("screenshot.jpg")
    image = cv2.resize(image, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
    image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    kernel = np.ones((1, 1), np.uint8)
    image = cv2.dilate(image, kernel, iterations=1)
    image = cv2.erode(image, kernel, iterations=1)
    cv2.threshold(cv2.bilateralFilter(image, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    txt2 = pytesseract.image_to_string(image)
    return jsonify({'Converted OCR ': str(txt1)})

    Use any of the one. Add return at the end of the one you prefer to use. I will suggest you should try the accuracy of both.

    · I used postman to check the result. You can do the same.

    · After running flask if you need to check if it has been installed properly again refer to the article I have mentioned above from datacamp to check the screen shots. And from the same link you can refer how to run and check your code in postman.

    · Please note here since the objective was to demonstrate the flask I haven’t chosen a model that needed to be trained. IF you have a model which needs to be trained write and train the model in a separate py file and call that file from the flask py file.


    Steps to run:
    Run the flask python file -
    Place the image in the python file location
    For me the text image was -
    Open Postman and run the API



    Resource that was referred:

    Real time Natural Language Audio Analytics

     The code I have shared in this blog enables us to capture real time audio conversation and do analysis of the conversation on the fly.


    The list of analytics that it does are:
    · Sentiment Score or Polarity Score
    · Polarizing Words – The words that are contributing towards the dialogue’s sentiment score. We have divided the sentiments in 3 types – positive, negative and neutral.
    · Keywords – The particular brand, company, person etc. who is being talked about here. The Proper Nouns after POS (Parts of Speech) analysis were marked as keywords.
    · Context – The context of the conversation or the general subject area. The nouns (other than Proper Nouns) after POS analysis are marked as Context.
    · Action Words – The action words associated with the conversation, i.e. the verbs.

    For example : Let’s consider the following customer feedback –
    "Disappointed that the Dell Outlook ticket was closed reopen the ticket."


    Things worth noting the code analyzed are -
    Here the polarity or sentiment score is -0.48.
    Negative word identified: disappointed.
    Keywords identified: Dell, Outlook
    Context identified: Ticket
    Action words identified: Reopen.

    The script can be used at real time to connect to or alert appropriate teams who might be aware of the scenario and better suited to handle the issue. And if you are looking for text analytics it creates an immediate analytics without having to run a separate script later.