Data science team has finished development of the current version of the ML model & has reported an accuracy or error metric. But you are not sure how to put that number in context. Whether that number is good or not good enough.
In
one of my previous blogs, I have addressed the issue of AI investment and how
long before the business can know if the engagement has some potential or it is
not going anywhere. This blog can be considered an extension of the above-mentioned
blog. If you haven’t checked it out already, please visit : https://anirbandutta-ideasforgood.blogspot.com/2023/07/investment-on-developing-ai-models.html
In
my previous blog, I spoke about the KPIs like Accuracy and Error as thumb-rules
to quickly assess on the potential success of the use case. In this blog, I
will try to add more specificity or relativeness to it.
Fundamentally,
to evaluate the latest performance KPI of your AI&ML model, there are 3
ways you can go about it, in independence or in combination.
Consider human level performance metric.
For AI use cases which has the primary
objective of replacing human effort, this can be considered the primary success
metric. For example, if for a particular process the current human error stands
at 5%, and the AI can have less or equal to 5% error rate, it can be determined
a valuable model. Because AI with the same error rate, bring along with it - smart
automation, speeding up the process, negligible down-time etc.
Example: Tasks which needs data entry can
easily be replicated by AI. But the success criteria for AI does not need to be
100% accuracy for adoption, but just must match the accuracy which the human counterpart
was delivering, to be adopted for real word deployment.
Base Model metric
In use cases for which problem areas
getting addressed are more theoretical in nature, or the discovery of the business
problem that can get addressed is in progress, its best to create a quick
simple base model and then try to improve the model with each iteration.
For example: Currently I am working on a
system to determine if a content is created by AI or not. For the lack of any
past reference based to which the accuracy can be compared, I have taken this
approach to determine the progress.
Satisfying & optimizing metric
We outline both, a metric that we want the model to do as good as possible (we call this optimizing metric) while also meeting some minimum standard which makes it functional and valuable in real life scenarios (we call this satisfying metric)
Example: For Home
Voice Assistant, the optimizing metric would be the accuracy of a model
hearing exactly what someone said. The satisfying metric would be that
the model does not take more than 100 ms to process what was said.