How to load any Huggingface [Transformer] model and use them?



Original Source Here

How to load any Huggingface [Transformer] model and use them?

Consider this article as part 2 of my previous post, “How to use [HuggingFace’s] Transformers Pre-Trained tokenizers?”, where I attempted to describe what a tokenizer is, why it is important, and how to use pre-trained ones from the Huggingface library. If you are not familiar with the concept, I highly encourage you to give that piece a read. Also, you should have a basic understanding of Artificial Neural Networks.

A words cloud made from the name of the 40+ available transformer-based models available in the Huggingface.

So, Huggingface 🤗

It is a library that focuses on the Transformer-based pre-trained models. The main breakthrough of this architecture was the Attention mechanism which gave the models the ability to pay attention (get it?) to specific parts of a sequence (or tokens). In the time of writing this piece, Transformer is dominating the Natural Language Processing (NLP) field, and there are recent papers that are trying to use it for vision. If you want to get into NLP, it definitely worth it to put in a time and truly understand each component of this architecture.

I will not go through the Transformer details as there are already numerous guides and tutorials available. (Want my recommendation? Ok! Read the Jay Alammar’s “The Illustrated Transformer”) Just know these pre-trained Transformer models (like BERT, GPT-2, BART, …) are very useful whether you want to do a university project and get something done fast or gets your hands dirty and start a new field. Huggingface provides a very flexible API for you to load the models and experiment with them.

Why is it exciting to use Pre-Trained models?

The whole idea came from the vision, Transfer Learning! As the NLP field progresses, the size of these models is getting larger and larger. The latest GPT-3 model has 175 billion trainable weights. Everyone can not train one of these models because it will cost you close to $5M. However, we can use these pre-trained models with minor fine-tuning on our custom datasets and get great results!

Fine-Tuning means training parts of a trained model with a small learning rate on our custom dataset to make the model adapt to our data instead of the data it is already seen and used to. It is cost-efficient and surprisingly handy tool to use.

Show me some code!!!

I Assume you already installed the Huggingface and PyTorch library. (Fairly easy, follow the links and instructions.)

As we already learned, to use any model, you first need to tokenize your input sequence so the model can understand what you are saying. The best way to load the tokenizers and models is to use Huggingface’s autoloader class. Meaning that we do not need to import different classes for each architecture (like we did in the previous post), we only need to pass the model’s name, and Huggingface takes care of everything for you.

Sample code on how to tokenize a sample text.
The above code’s output.

As you see in the code, instead of importing the BertTokenizer class, we use the AutoTokenizer. There is no need to search for different model’s class name in the documentation; instead, we can call the model’s name like bert-base-uncased, and the library import the right class for us. It will enable us to write truly modular codes and easily try different models in our experiments.

Each of these models is trained with different hyper-parameters. For example, the BERT base has 12-layers with 768 embedding size (110M parameters), while the large version has 24-layers with 1024 embedding size (336M parameters). You can choose the version suits you based on your hardware or the application.

You can find the list of all pre-trained models names in this link.

The next step is to load the model and guess what. There is an autoloader class for models as well. Let’s look at the code;

Sample code on how to load a model in Huggingface.
The above code’s output.

Deep neural network models work with tensors. You can think of them as multi-dimensional arrays containing numbers (usually with a float type, especially if the tensors hold model weights). The only reason that we imported PyTorch (for now) is to convert the list -which the tokenizer object generated- to a tensor so we could pass it to the model. The rest of the code should be straightforward. However, Let’s talk about the model’s output.

The output is different for each model. BERT has two different outputs, while other models like GPT-2 have only one output. (There are optional outputs though, you need to look at each model’s documentation to find out about different outputs options.) No matter which model you chose, you will surely get an output like [batch_size, sequence_number_of_tokens, embedding_size], which is the model’s last layer’s hidden state. (One of the optional outputs is a variable to hold all the intermediate layers’ hidden state as well) The batch_size is obviously 1, because we are encoding only one sequence which has 13 tokens, and the embedding size is predefined by the model. (if you choose BERT large, the output size will be [1, 13, 1024])

BERT’s second output (or “output[1]” in the code) is called “pooler_output.” In theory, it is the embedding result for the first token of the sequence, the [CLS] token. (As you should remember, we wrap sequences with <BOS>and <EOS> tokens that are pre-defined as [CLS]/[SEP] tokens for the BERT model.) You can look at it as a representation of the whole sequence, and it will be used for text classification. (More on that later, Hopefully!)

Keep in mind, even though I mentioned BERT too much, the concept is pretty much the same for all the available models in the Huggingface library. You need to look-up the model’s name from this list and call it. Like you can put the “gpt2-large” string in the “pretrained_model_name” variable to load the GPT-2’s large model, or “xlnet-base-cased” to load the XLNet model, … You got the point. And to finish up the post, I will put all the codes together for easy copy/pasting 🙂

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot



via WordPress https://ramseyelbasheer.io/2021/03/19/how-to-load-any-huggingface-transformer-model-and-use-them/

Popular posts from this blog

I’m Sorry! Evernote Has A New ‘Home’ Now

Jensen Huang: Racism is one flywheel we must stop

Fully Explained DBScan Clustering Algorithm with Python