Image Caption Generator

Description

Recognized the context of an image and annotated it with relevant captions using deep learning and computer vision.
CNN was used to generate a vectorized representation of an image.
Then, LSTM used the information from CNN to help generate a caption of the image.
Implemented Greedy and Beam search strategies.
Evaluated our architecture using BLEU and METEOR metrics.

Architecture

Dataset Used

Flickr_8k dataset

Training images: 6000
Validation images: 1000
Testing images: 1000

Python Library Used

keras.applications
keras.utils
tqdm
keras.models
keras.layers
keras.preprocessing.text
numpy
matplotlib.pyplot
nltk.translate.bleu_score
nltk.translate.meteor_score
nltk.corpus
pickle

Scores

# Evaluation results of all test images with Greedy Search
BLEU-1 using greedy search: 42.2462
BLEU-2 using greedy search: 24.5350
Meteor score using greedy search: 23.2535

# Evaluation results of all test images with Beam Search (k=3)
BLEU-1 using beam search: 42.3408
BLEU-2 using beam search: 25.5814
Meteor score using beam search: 24.0130

Independent Test Results

Code

Github