This post will keep track the project progress for course IFT6266.
Project link: https://ift6266h17.wordpress.com/project-description/
Project analysis: “The project for this course is to generate the middle region of images conditioned on the outside border of the image and a caption describing the image. To be as successful as possible, the model needs to be able to understand the specific meaning of the caption in the context of a specific image.”
The model needs to finish two tasks: 1) with deep generative model to generate the middle region of images 2) to learn the caption of the image.
The objective for this project:1) train a model which can generate the middle regions and learn the caption 2) study the relationship between the caption and region generation 3) Try to learn how to quantitatively evaluate the performance of generative models in this project (As reported before, squared loss does not work well).
As described in the project descriptions, there are several potential models may work for this project. I am interested in the following models: variational autoencoder; Generative Adversarial Networks(DCGAN, Conditional GAN, LSGAN, Stacked GAN); and as we will also need to learn the caption, this may be viewed as multi-task learning problem, so a multi-task based model may also be helpful.
Right now, I am currently working on the VAE model.