Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
-
Updated
Feb 3, 2023 - Jupyter Notebook
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
An easy implementation of Faster R-CNN (https://arxiv.org/pdf/1506.01497.pdf) in PyTorch.
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
An easy implementation of FPN (https://arxiv.org/pdf/1612.03144.pdf) in PyTorch.
Real-time semantic image segmentation on mobile devices
Using LSTM or Transformer to solve Image Captioning in Pytorch
A Clone version from Original SegCaps source code with enhancements on MS COCO dataset.
Pytorch implementation of image captioning using transformer-based model.
Adds SPICE metric to coco-caption evaluation server codes
Convert segmentation binary mask images to COCO JSON format.
PyTorch implementation of paper: "Self-critical Sequence Training for Image Captioning"
The pytorch implementation on “Fine-Grained Image Captioning with Global-Local Discriminative Objective”
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
A demo for mapping class labels from ImageNet to COCO.
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
MS COCO captions in Arabic
Image caption generation using GRU-based attention mechanism
Microsoft COCO: Common Objects in Context for huggingface datasets
Caption generation from images using topics as additional guiding inputs.
Add a description, image, and links to the mscoco-dataset topic page so that developers can more easily learn about it.
To associate your repository with the mscoco-dataset topic, visit your repo's landing page and select "manage topics."