Image Caption Generator

Description

  • Recognized the context of an image and annotated it with relevant captions using deep learning and computer vision.
  • CNN was used to generate a vectorized representation of an image.
  • Then, LSTM used the information from CNN to help generate a caption of the image.
  • Implemented Greedy and Beam search strategies.
  • Evaluated our architecture using BLEU and METEOR metrics.

Architecture




Dataset Used

Flickr_8k dataset

  • Training images: 6000
  • Validation images: 1000
  • Testing images: 1000

Python Library Used

  • keras.applications
  • keras.utils
  • tqdm
  • keras.models
  • keras.layers
  • keras.preprocessing.text
  • numpy
  • matplotlib.pyplot
  • nltk.translate.bleu_score
  • nltk.translate.meteor_score
  • nltk.corpus
  • pickle

Scores

# Evaluation results of all test images with Greedy Search
BLEU-1 using greedy search: 42.2462
BLEU-2 using greedy search: 24.5350
Meteor score using greedy search: 23.2535

# Evaluation results of all test images with Beam Search (k=3)
BLEU-1 using beam search: 42.3408
BLEU-2 using beam search: 25.5814
Meteor score using beam search: 24.0130




Independent Test Results


   

   

   


Code

Github