Introduction and terminology
Last updated
Last updated
You might've heard someone saying that deep learning requires huge amount of data in order to perform well. But in most of the cases we won't be having such huge data. OR this is not how we, humans learn. We learn from very little data.
There is a field in deep learning where the network been able to learn from very little data. like few examples for each class and expect to predict the right class. It is quite different from how we train normal Neural Network.
We do training in via Tasks. Each task is consists of two things Support set
and query set
. We represent to represent task, is Supportset and is the Queryset in thatTask. We will see what these tasks, support and query sets are. However, There is a subtle difference in how these tasks are formed during training and inference phase.
In the above figure, Support set in each task has 3 - classes and 2 - examples per class. So the learning process can be called as 3-way 2-shot learning. In general this can be extended to j-way k-shot learning. Few popular ones are one-shot learning, in which number of classes in the support set of each task can vary.
Few important things to note
Even though there are j classes in support set of each task, there can be n number of classes. In j-way k-shot learning, the tasks are constructed as follows
Support Set
First j - classes are chosen randomly from available n - classes, and select k - examples for each of the j classes.
Query Set
The number of samples in the query set can be arbitrary. But all these samples must be selected from the classes in Support set.
Both Support set and Query set has Ground Truth labels for each example. It is trained in such a way that given a small sample of support set with the labels, it has to predict the query set labels and loss calculation and back-propagation would be done accordingly.
Tasks in this phase has Support set with labels and Queryset without labels. Given the Support set that has ground truth labels, it has to predict the labels for query set which were not given.
For this kind of tasks, we generally prefer the datasets which have very few examples for each class. If we have few classes and very large number of samples for each class, then we would end up using same tasks again and again during training which can leads to overfitting. To put simply, we would expect the model to predict class label for samples in query set given a support set with class labels.
Note that, we are trying to teach the model to predict for the new data given very few samples with labels. It is not same as classification in typical network where we will try to make the model learn about the data that we have and generalize for the existing data( That's why we use transfer learning to tune our parameters to make it better for dataset on which the model wasn't trained on ). But in few shot learning, we are trying to teach the network to learn to predict the class labels, given few samples for each class. The interesting thing is the classes may be entirely new, seen not even during training.
Simply, the model should learn how to classify with few samples. If the tasks are repetitive ( will happen if the data has very few classes with so many examples per class), the model might overfit to the available classes in the training set and might not perform well for the future unseen classes during inference. I hope it is clear.
One such dataset is Omniglot. Before we understand about the dataset, it is good to know what alphabets are. You can read about it here.
This dataset is perfect for few-shot learning. You can download this from GitHub repository
More about this dataset in the next page.