# Omniglot Dataset

{% embed url="<https://github.com/brendenlake/omniglot>" %}

This dataset is generally used for one-shot learning. It contains **1623** different handwritten characters from **50** different alphabets written by 20 different people. That means, it has **1623 classes** with **20 examples each.** Each image is of size **105x105**.&#x20;

<img src="https://3125871907-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LskDNqFNx04llzI1sLA%2F-M2ObKS9pe1yIfFWVUB6%2F-M2P3KjP3APSStWmJf-W%2Fimage.png?alt=media&#x26;token=f4fdd94e-c8f4-48b7-abbe-f50892616ca7" alt="" data-size="original">  **.  .  .** <img src="https://3125871907-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LskDNqFNx04llzI1sLA%2F-M2ObKS9pe1yIfFWVUB6%2F-M2P3_drIFUSLah6-L7W%2Fimage.png?alt=media&#x26;token=9b5a3490-f3c4-4496-9fcd-63bfcc0fa35d" alt="" data-size="original">&#x20;

As these characters are from different alphabets, each alphabet can have different number of characters. In the above image, Bengali has 46 characters where as Sanskrit has 42 characters.

You can download it from the repository. It's in python folder. [images\_background.zip](https://github.com/brendenlake/omniglot/blob/master/python/images_background.zip) and  [images\_evaluation.zip](https://github.com/brendenlake/omniglot/blob/master/python/images_evaluation.zip), which can be considered as training and validation set.&#x20;

The other source is Kaggle. It has two folders **images\_background** and **images\_evaluation.** If you wish, you can start writing programs by starting a new kernel under this dataset in Kaggle itself.

{% embed url="<https://www.kaggle.com/watesoyan/omniglot>" %}

The file structure would be as follows\
&#x20;<img src="https://3125871907-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LskDNqFNx04llzI1sLA%2F-M2RZLdIGTSlG0sn6R6r%2F-M2Rl1ZWpzzHd6zIISOe%2Fimage.png?alt=media&#x26;token=ea034f0f-4b42-4254-be11-f981186eb9f4" alt="" data-size="original">                <img src="https://3125871907-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LskDNqFNx04llzI1sLA%2F-M2RZLdIGTSlG0sn6R6r%2F-M2RmzIfVWMI-_Cckip2%2Fimage.png?alt=media&#x26;token=cd5e6b99-7ab2-47b2-adb3-9cbaf9f51684" alt="" data-size="original">&#x20;

Each character has 20 examples each as it is written by 20 different people. The first character of Kannada ie., `Kannada/character01/` contains these images.

![First character of Kannada ( Kaggle )](https://3125871907-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LskDNqFNx04llzI1sLA%2F-M2RZLdIGTSlG0sn6R6r%2F-M2RoJ-RdCGA2aYiib8U%2Fimage.png?alt=media\&token=606ebee4-ecd8-4872-8c81-996cde2727cb)

If the data is not enough, we can convert each images to 4 by rotating $$90^0,180^0,270^0$$and the original image. We can have total of 6492 classes now. As we have converted each character to 4 different characters.

### Any other Datasets?

Few other datasets are CUB dataset and mini Imagenet dataset.

{% embed url="<http://www.vision.caltech.edu/visipedia/CUB-200.html>" %}

CUB has 200 classes with 11, 788 images in total.

{% embed url="<https://www.kaggle.com/whitemoon/miniimagenet>" %}

Mini Imagenet has  100 classes with 60,000 images in total.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ramsane.gitbook.io/deep-learning/few-shot-learning/omniglot-dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
