Omniglot Dataset

GitHub - brendenlake/omniglot: Omniglot data set for one-shot learningGitHub

This dataset is generally used for one-shot learning. It contains 1623 different handwritten characters from 50 different alphabets written by 20 different people. That means, it has 1623 classes with 20 examples each. Each image is of size 105x105.

. . .

As these characters are from different alphabets, each alphabet can have different number of characters. In the above image, Bengali has 46 characters where as Sanskrit has 42 characters.

You can download it from the repository. It's in python folder. images_background.zip and images_evaluation.zip, which can be considered as training and validation set.

The other source is Kaggle. It has two folders images_background and images_evaluation. If you wish, you can start writing programs by starting a new kernel under this dataset in Kaggle itself.

Omniglot

The file structure would be as follows

Each character has 20 examples each as it is written by 20 different people. The first character of Kannada ie., Kannada/character01/ contains these images.

If the data is not enough, we can convert each images to 4 by rotating $90^0,180^0,270^0$ and the original image. We can have total of 6492 classes now. As we have converted each character to 4 different characters.

Any other Datasets?

Few other datasets are CUB dataset and mini Imagenet dataset.

Caltech-UCSD Birds 200

CUB has 200 classes with 11, 788 images in total.

mini-imagenet

Mini Imagenet has 100 classes with 60,000 images in total.

PreviousIntroduction and terminology NextHow to solve this ?

Last updated 5 years ago

Was this helpful?