RAM on a computer is inexpensive and plentiful in today’s society. An extremely sophisticated supervised machine learning task requires hundreds of GBs of RAM, which you can get for a little investment or rent. Access to GPUs, on the other hand, is not cheap. You’ll need access to hundreds of gigabytes of VRAM on GPUs, which won’t be easy and will cost a lot of money.
That might change in the future. But, for the time being, it implies that we need to be more strategic about how we utilize our resources to solve Deep Learning challenges. This is especially true when we are attempting to handle complicated real-world challenges in fields like image and speech recognition.
Thankfully, there is a technique known as “Transfer Learning” that allows us to employ pre-trained models from other individuals by making little adjustments.
Pre-trained model
Machine learning and deep learning algorithms have traditionally been intended to function in isolation. These algorithms have been honed to complete a certain goal. When the feature-space distribution changes, the models must be rebuilt from the ground up.
- Transfer learning is the concept of breaking out from the isolated learning paradigm and using what you’ve learned to tackle related problems.
So where do we use pre-trained models? Mostly in computer vision and natural language processing.
You work with image or video data in computer vision. When working with such data, Deep Learning models are frequently used. People usually don’t build networks from scratch for their use case; instead, they employ a deep learning model that has already been pre-trained for a huge and difficult picture classification problem, such as the ImageNet.
Text data is used as input or output in natural language processing. Embedding is employed in these sorts of challenges. Embedding is the mapping of words to a high-dimensional continuous vector space with similar vector representations for words with similar meanings.
The model is released under a permissive license by the researchers that developed it and trained it on a big corpus of literature.
Implementation of a pre-trained model
Here is a list of four general guidelines to follow while employing transfer learning methodologies. These are the following:
- The new dataset is vast and distinct from the old – We can go ahead and train the ConvNet from scratch because the dataset is vast, but training a sophisticated network takes time. As a result, a fine-tuned technique is usual in this situation. It’s possible that fine-tuning the ConvNet won’t yield positive results. This is owing to the fact that the new dataset differs significantly from the old one. In such a case, you may start over with the ConvNet.
- Because the new dataset is tiny yet extremely distinct from the old dataset, you may employ the above-mentioned technique. If the dataset is significantly different from the original, fine-tuning the ConvNet is recommended, but be careful not to delve too deep into the network and just alter the weights of a few layers.
- Fine-tuning may result in overfitting of the model since the new dataset is tiny and comparable to the previous dataset. As a result, it’s best to utilize ConvNet as a fixed feature detector, then layer on your own classifier (the dense layers) and train only that classifier.
- The new dataset is huge and similar to the previous dataset – The dataset is enormous again, and we can save time by using the fine-tune strategy instead of training the network from start. Because the new dataset is so similar to the old, the model would produce good results.