Friday, 10 February 2023

Transformer and its impact on Language Understanding in Machine Learning

‘Transformer’ a deep learning model was proposed in the paper ‘Attention is all you need’ which relied on a mechanism called attention, and ignored recurrence (which lot of its predecessors depended on) , to reach a new state of the art in translation quality, with significant more parallelization and much less training.


Due to its non-reliance on recurrence, the transformer does not require sequence data be processed in order, that allows transformers to allow parallelization and thus being able to train on much larger dataset on much lesser time. Transformer has enabled development of pre-trained models like BERT which has enabled much needed transfer learning. Transfer learning is a machine learning method where a model trained for a task is reused as starting point for another task. And as with BERT this approach delivers several advantages considering the vast compute and resource that is required to train the base model. BERT is a large model (12-layer to 24-24 layer transformer) that is trained on a large corpus like Wikipedia for a long time. It can then be fine-tuned to specific language tasks. Though Pre-training is expensive, fine-tuning is inexpensive. The other aspect of BERT is it can be adopted to many type pf NLP tasks. This enables the organizations to leverage large models build on transformers trained on massive datasets without spending significant resource on time, computation and effort and applying them on their own NLP tasks.
One of the challenge of using pre-trained transformer enabled models are they are trained and thus learns language representation of general English in case the language is English. Thus even strong base model might not always perform accurately when applied to domain specific context. Like – Credit could mean different thing when applied to general English or financial context. Significant fine tuning on specific tasks and customer use cases may go a long way on solving this.
Research is being done if similar transformer based architecture can be applied to video understanding. The thought process is if video can be interpreted as sequences of image patches extracted from individual frames, similar to how sentences are sequences of individual words in NLP.

Saturday, 28 January 2023

Azure AI portfolio and its offerings - cheat sheet

 Courtesy: Almost all the information in this blog has been compiled from these 2 YouTube videos. So thanks to the original creators.

https://www.youtube.com/watch?v=qJGRd34Hnl0

An introduction to Microsoft Azure AI | Azure AI Essentials

https://www.youtube.com/watch?v=8aMzR8iaB9s

AZ-900 Episode 16 | Azure Artificial Intelligence (AI) Services | Machine Learning Studio & Service


Azure AI portfolio has options for every developer may it be in the form of

Pre-built AI models
  • Advanced machine learning capability or

  • Low code/ no code development experience
























Azure cognitive services provide the most comprehensive portfolio of customizable AI models in the market. It includes

  • Vision,

  • Language,

  • Speech &

  • Decision.

It just needs an API call to integrate them to our applications.


Users can customize AI models using one’s own data without any machine learning expertise required. These models can also be deployed to containers so it can be run from anywhere.

For Business users Azure provides access to the same AI models through AI Builder which provide a no-code experience to train models and integrate them into apps within Microsoft Power Platform.


For common solution like chatbot and AI powered search, services are provided, which accelerate development for these solutions. These scenario specific services often bring together multiple cognitive services along with business logic and a user interface to solve for a common use case.


If we are looking to develop advanced machine learning models, Azure Machine Learning enables to quickly build, train and deploy machine learning models with experiences for all skill levels ranging from code first to a drag and drop no code experience.



It provides services that empowers all developers. It helps in the entire process by providing us with a set of tools. The processes include –

  • · Training the model

  • · Packaging and validating the model

  • · Deploy the model as web services

  • · Monitoring those web services

  • · Retraining the model to get even better results.

Set of tools mentioned above include –

  • Notebooks written in python/R

  • Visual designer which allows us to build machine learning models using a simple drag and drop experience directly in our browsers.

  • Machine learning model allows us to manage all the compute resources where train, package, validate and deploy those models so that we don’t have to worry about Azure infrastructure and underlying resources ourselves.

  • Additionally, Azure machine learning comes with something called automl. This automated process allows us to perform different algorithms with our data and see which one scores the best and deploy that as our designated web service.

  • Features of pipelines which allows us to build the entire process end-to-end.



Complete end to end solution for building machine learning models.