Sunday 4 June 2023

Building the next Netflix

Recently I helped out a friend who is starting a video streaming gig to layout the roadmap on how he should go about developing the video content recommendation engine. Since I got good feedback, I thought of sharing the same with you. And you can apply the same principles to any recommendation, irrespective of the product.

So let’s assume you are also trying to build a video streaming company which will compete with the likes of Netflixes of the world. We have to come up with a complete road map of how the recommendation engine will be built from day 1 till you are a billion dollar company.

Here are our competitors –

Since we are a new business and we are in the business of recommending movies, we want to start off quickly.

The problems are
o All our users are new. That is, we don’t have any history of individual users.
o And since our company is also new, we haven’t also still learned what kind of movies people of different age, gender etc. prefer to watch, so that we can recommend.
So as we collect more data let’s start with level 1 recommendation strategy.

· Level 1: Popularity. Recommending products is just based off of the popularity of each product. The first most simple and basic method is to recommend the most popular movie irrespective of user characteristic. This can be deployed when you don’t have any history to work on, you don’t know anything about your user, or you are not much concerned about personalization, or you want to quickly start off with something while you develop the other approaches.
Pros:
1. Simple and easy to implement.
2. Can work on shallow data
Cons:
1. Lacks personalization
So what we will do for our Startup is we will collect the best rated movies from imdb’s public data and recommend the best rated movies to everyone.


We keep doing this until we collect enough data on what kind of movies different type of people are watching. Then we want to handle some level of personalization, so let’s move to level 2.

Now few days has passed for our Startup and due to good work done by our Marketing team, we have had good number of people who have signed up and watched some movies. With these data in place we get trends like –
Female, Age – 30-40 -> Likes -> {Romantic Comedies, Julia Roberts}
Male, Age – 30-40 -> Likes -> {Action, Arnold Schwarzenegger}
Male, Age – 35-45 -> Likes -> {Animated from Pixar Production}
And not just that. We can also find trends like
On a weekend, afternoon, comedies or romance genre are watched more, while thrillers and horrors are watched more during night times.
So next time when a user logs in we may recommend him movie based on his demography matching trends like above. But we are not removing Level 1 strategy completely. We now will implement both Level 1 and Level 2 strategy with some weightage to each.

· Level 2: Classification model. Where we're going to use features of both the products and the users to make our recommendations. So this classification model is going to take in features about the user, features about the past purchases of that user, features about the product that we're thinking about possibly recommending, as well as potentially lots of other features that we can talk about.

Pros:
1. It can be very personalized
2. Can capture context
3. Works well in very limited historical data
Cons:
1. Works poorly when features are insufficient or poor in quality.

Now at our startup we have reached a pretty advanced level of personalization and everyone is happy. But let’s look at certain scenario of –
Male, Age – 30-40 -> Likes -> Action, Arnold Schwarzenegger
Male, Age – 35-45 -> Likes -> Animated from Pixar Production
We can assume the Pixar movies are mostly for children watching from their parents account. So these kind of problems can be either solved by asking more questions from Users during sign up. But there are only a certain number of questions you can ask and rest you have to understand from individual patterns. In cases like these, we can implement Level 3. Here we analyze trends like people who watch movies from Pixar also tend to like movies from Disney.

So we may find trends like below which are very tightly coupled, which we can use for our recommendation.
People who like {Pixar}-> also likes ->{Disney, Animation, Ice Age Franchise}
People who like {Avengers} ->also like -> {Marvel Studio, Batman Franchise}
Again we will implement Level 1, Level 2 and Level 3 together with combined weightages to get the best possible solution.

· Level 3: Collaborative filtering. This brings us to the idea of co-occurrence of purchases. Product recommendation is built on information like if a person bought this item, then they're probably also interested in some other item because we've seen lots and lots of examples in the past of people buying those pairs of items together. Maybe not simultaneously, at the same time, but in the course of their purchase histories.

Pros:
1. Works well even when past data is unavailable for the individual user.
2. Handles lack of feature or poor quality feature well.
Cons:
1. Complex to build

Congratulations, you are now successfully running a unicorn video streaming company with a great recommendation engine.
Please reach out to me if you need any technical clarification.

Until next time.

Saturday 3 June 2023

AI/ML and human brain - Similarities and their implication in Corporate and Political campaigns

 The premise of this article is how our brain is similar to a machine, and the biases and errors the machine experiences are experienced by the brain too, and in some cases this may be leveraged to drive success at a marketing or electoral campaign.


1. The good orator - No election has ever been won without a good orator at the helm. However good your policies are at the end of the day it needs to be sold. A good orator is like quality data to a machine. To drive learning in a machine, if we even want to propagate biases in some way, it needs to be clearly and strategically delivered. This delivery of data, to steer the mind and machine in a certain way, have also been called Propaganda in warfare.

2. The 360 degree delivery – To drive a target candidate to take a certain action, stimulus must be provided from every direction. It is widely accepted the same objective can be achieved with much less effort with stimulus from multiple dimension, rather than larger one-dimensional effort. It is the same reason we come across our favorite brand across multiple platforms like TV, radio, newspaper, social media, online ads etc. According to a successful marketing platform ‘personalized messaging across email, SMS, direct mail, and more, alongside personalized online response’ leads to a much successful marketing campaign. AI systems inspired by this characteristic of the brain is always advised to be built around with data from as many diverse source systems as possible.

3. What is in it for me - Human beings by the very nature of their existence and survival instinct mostly react to news and events that directly affect their well-being. The same idea is implemented in reinforcement learning where the agent takes measure to fulfill objective which in our case is survival. So to make your target audience to take notice of any policy or idea, it should be narrated as tightly coupled with the audiences’ well-being. It should answer their basic question how it will affect me.

4. Relative rather than absolute – Human brains intuitively understand something relative much better than anything absolute. If you ask most people if a deal is good, they will generally say it is good or bad based on how other people are getting deals. In the same way you can manipulate a machine to label something on a particular high or low range, by strategically infusing data on the other end of the scale. In the same way during a political or marketing campaign it is not enough to advertise your positives but also important to emphasize your opponents weaknesses.

5. Confirmation bias – Definition - “Confirmation bias occurs from the direct influence of desire on beliefs. When people would like a certain idea or concept to be true, they end up believing it to be true. They are motivated by wishful thinking. This error leads the individual to stop gathering information when the evidence gathered so far confirms the views or prejudices one would like to be true.” This is a psychological error that can always be used to one’s advantage during a political or marketing campaign. Political and Corporate organizations have at various times taken advantages of this by implementing a biased belief system at an early age of life or consumption cycle.

6. ML bias – Machine Learning applications develop inherent biases when fed data which may be tilted towards a certain stereotypical trend due to the flawed nature in which society develops. Like real-world example of a machine learning model that’s designed to differentiate between men and women in pictures. When the training data contains more pictures of women in kitchens than men in kitchens, or more pictures of men writing computer code than women writing computer code, then algorithm is trained to make incorrect inferences about the gender of people engaged in those activities. Human brain can be manipulated in the same way too. If you give enough example to a brain, associating people of certain characteristic to certain nature of actions – either good or bad, human brains inherently start associating these people to those activities without extensive thought.

But however amazing our brain is, it still have certain flaws which have been inherited by ML and AI processes as these are inspired by the brain itself. But I guess all these imperfections are what keeps us human.

Tuesday 9 May 2023

AI framework for self learning Q&A agent

 Pantomath AI bot framework

Definition: A pantomath is a person who wants to know and knows everything.

What is Pantomath

Pantomath is an AI framework inspired by human learning pattern developed with ZERO cost using ZERO propriety software or framework using open source R that can learn any subject and respond to a query when asked about it. It can learn any domain, topic and subject and keeps getting better and more knowledgeable with time and experience.

Why Pantomath?

· Pantomath has been designed on the idea of general AI which has the capability of learning different domains.
· While different enterprise solutions may be present they concentrate towards a particular domain
· It has been developed from open source framework hence there is no attached proprietary price.
· Business can easily enable Pantomath to automate FAQs, knowledge management, menu handling, computer trouble shooting etc.? If anything that has the pattern of resolving a query and does not need a detailed conversation or diagnosis, Pantomath can scale extremely well and can save significant cost while improving Customer Satisfaction.
· It has one of the best research oriented, scalable, technical back-end developed.

Pantomath: How does it work?

Steps

1. Enter few sample Q&A on different topics for it to start the learning & conversation.
2. On given the sample it tries to learn how it can answer questions on same topics asked differently or similar questions on the same topic.
3. And with each conversation it reinforces and reconfirms its knowledge.
4. If it does not know a topic it confirms it does not know about it and asks for more knowledge materials or hints to be fed into it.
5. With more conversations it learns more about language subtlety and gathers knowledge about different topics. (Just like us).

Pantomath: How does it constantly learn?

Pantomath’s learning model has been inspired from David Kolb’s learning model and human learning pattern from birth to adulthood.

David Kolb’s learning model

1. Concrete Experience- (a new experience of situation is encountered, or a reinterpretation of existing experience).
2. Reflective Observation (of the new experience. Of particular importance are any inconsistencies between experience and understanding).
3. Abstract Conceptualization (Reflection gives rise to a new idea, or a modification of an existing abstract concept).
4. Active Experimentation (the learner applies them to the world around them to see what results). Reference: https://medium.com/@johnharrydsouza/david-kolb-s-cycle-of-learning-2777d150d09e#.xitj0ph53

Human learning development


1. After Birth – A baby is born with basic human instincts while it gradually learns initial movements.
2. Toddler – It starts interacting with environment, still learning basic movements with the guidance by parents at this stage being very critical.
3. Childhood – It has almost completed learning its basic movements, while most of its learning coming from interacting with environment while asking for guidance much less.
4. Adulthood – It has learned most of its survival skills from learning independently from environment while rarely needing guidance now.

Learning trajectory for the algorithm with experience
Similarly the bot with more experience and maturity needs less guidance and is more self-sufficient.

Pantomath: Stages of development

Text similarity – It is implemented using text similarity pattern matching and recommending responses that might be best suitable for the current question.
A string metric is a metric that measures similarity or dissimilarity (distance) between two text strings for approximate string matching or comparison. Corpus-Based similarity is a semantic similarity measure that determines the similarity between words according to information gained from large corpora.

Neural Net – After that we would like to implement an ANN (Artificial Neural Net) to understand the weightage of different words used in the conversation and recommend the best response.
Reinforcement learning is an area of Machine Learning. Reinforcement. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation.

Reinforcement Learning – Next stage would be assigning an agent to the bot which will interact with its environment and would be rewarded for the right response and penalized for the wrong response. Thus the agent with time will learn and adjust itself to better responses.
Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. It is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes.

Working code example

The below sample use case is to feed Pantomath with sample computer trouble shooting scenarios.
Some records of the initial set being.
Internally it creates an auto mapping rule in its brain. Let’s see how the mapping table looks internally at each stage of conversation.
Initial stage, it creates auto tags for all the sample issues provided which would help it to recommend a solution using text similarity pattern via string metric. –
*Trouble shooting steps are as above sample and I haven’t rewritten them to save space.
Scenario 1 - Show a result based on probability match
First issue we asked the bot –
mouse not moving correctly
Bot suggests –
"Check if the mouse is securely plugged into the computer. If not, plug it in completely.\r\n· Check to see if the cord has been damaged. If so, the mouse may need replacing.\r\n· If you are using a cordless mouse, try pushing the connection button on the underside of the\r\nmouse to reestablish a connection.\r\n· Clean the mouse, especially on the bottom.
Which if we move up on the sample provided we see is a solution for the issue –
Then the bot reconfirms if I am happy with the solution it provided. If I say YES, it responds.
"Glad to hear i could help and it has made me wiser"

Scenario 2 - Store queries which it could not resolve
Next scenario let’s try to ask a question which it may not know and needs to learn externally.
Let’s ask-
Not able to use the mouse
Bot says
Sorry i can’t help you regarding this. I will pass you to the next level engineer
And we see in its internal memory map it has made another entry with the new query and auto tags and there is no TroubleShootingSteps corresponding to it as it does not know how to solve it.

So the bot goes back to its human handler and tells this is a topic it does not know and haven’t been able to learn from the conversations and asks it to provide some knowledge.

Scenario 3 - show multiple options ordered based on similarity - learn based on the option chosen - Next time show the better option
Next scenario let’s try to ask a question for which it may have multiple recommendation
Let’s ask-
"keyboard problem"
Bot gives two recommendation
[1] "Make sure the keyboard is connected to the computer. If not, connect it to the computer.\r\nIf you are using a wireless keyboard, try changing the batteries.\r\nIf one of the keys on your keyboard gets stuck, turn the computer off and clean with a damp\r\ncloth.\r\nUse the mouse to restart the computer."
and
[2] "Clean the keys thoroughly"
BOT then asks me to confirm which actually solved the ticket so that it can refine its learning. I said 2 as the second recommendation solved the ticket for me. Bot responds -
"Glad to hear i could help and it has made me wiser"
And we see in its internal memory map it has made another entry with the new query and auto tags the number 2 solution for this question, so next time on being asked the same thing it can respond better,
So when again asked the same question
"keyboard problem"
It responds
[1] "Clean the key throughly"                                                                                                                                                          
and
[2] "Make sure the keyboard is connected to the computer. If not, connect it to the computer.\r\nIf you are using a wireless keyboard, try changing the batteries.\r\nIf one of the keys on your keyboard gets stuck, turn the computer off and clean with a damp\r\ncloth.\r\nUse the mouse to restart the computer."                                 
Interestingly on learning from its last interaction it now suggests Clean the keys thoroughly as the first option and the other one as the next option.  

Scenario 4 - The user does not like any option chosen, store queries which it could not resolve
Next scenario let’s again try to ask a question but we don’t choose its recommendation
Let’s ask-
“The mouse is slow"     
Bot gives me a recommendation  
[1] "Restart your computer.\r\n· Verify that there is at least 200-500 MB of free hard drive space. To do so, select Start and\r\nclick on My Computer or Computer. Then highlight the local C drive by clicking on it once.\r\nSelect the Properties button at the top left-hand corner of the window; this will display a\r\nwindow showing …                                             
BOT then asks me to confirm if I found the recommendation usable to which I said NO.
BOT responds –
"Sorry i could not help you. We will add content to fulfill your request in future."
And we see in its internal memory map it has made another entry since it realized it needs to learn more on this issue. So the bot goes back to its human handler and tells this is a topic it does not know enough of and asks to give more hint so that it can give a better hint next time.
What Pantomath is not
· Pantomath is not a conversational agent. It is a Q&A agent. Though it learns from each conversation and remembers how the user responded to its previous answers it does not remember personal conversational context or non-business critical facts.
· Pantomath is not a diagnosis tool. Though it may with time, learn to suggest recommendation for general questions, it is not build to find the root-cause using series of questions.
· Pantomath cannot go and open tickets for you in another environment. It provides information to the user, but it cannot take action for them.

Conclusion

Business on a daily basis employs enormous human resource to respond to user questions on various topics. While some of them need complex diagnosis, most of them are rudimentary and repetitive in nature. Pantomath can be easily deployed and scaled to automate a major proportion of this task. It’s an AI platform developed based on human learning pattern. It learns from conversations and asks for help wherever it needs, and gets more mature with time. It can adjust to any domain and learn any topic.
Business can easily enable Pantomath to automate
· FAQs
· knowledge management
· menu handling
· Computer trouble shooting etc.
If anything that has the pattern of resolving a query and does not need a detailed conversation or diagnosis, Pantomath can scale extremely well. It is extremely cost effective as it is completely build without using any third party enterprise component.