Omniglot Dataset
Last updated
Last updated
This dataset is generally used for one-shot learning. It contains 1623 different handwritten characters from 50 different alphabets written by 20 different people. That means, it has 1623 classes with 20 examples each. Each image is of size 105x105.
. . .
As these characters are from different alphabets, each alphabet can have different number of characters. In the above image, Bengali has 46 characters where as Sanskrit has 42 characters.
You can download it from the repository. It's in python folder. images_background.zip and images_evaluation.zip, which can be considered as training and validation set.
The other source is Kaggle. It has two folders images_background and images_evaluation. If you wish, you can start writing programs by starting a new kernel under this dataset in Kaggle itself.
Each character has 20 examples each as it is written by 20 different people. The first character of Kannada ie., Kannada/character01/
contains these images.
If the data is not enough, we can convert each images to 4 by rotating and the original image. We can have total of 6492 classes now. As we have converted each character to 4 different characters.
Few other datasets are CUB dataset and mini Imagenet dataset.
CUB has 200 classes with 11, 788 images in total.
Mini Imagenet has 100 classes with 60,000 images in total.
The file structure would be as follows