Explainable Photorealistic Image Synthesis Using Diffusion Model With Multi-Scale Feature Fusion

Sonal Fatangare; Premanand Ghadekar

Authors

Sonal Fatangare Department of Computer Engineering, Vishwakarma Institute of Technology, (Savitribai Phule Pune University), Pune, Maharashtra, India.
Premanand Ghadekar Department of CSE-AIML, Vishwakarma Institute of Technology, (Savitribai Phule Pune University), Pune, Maharashtra, India.

Keywords:

Generative Models, Diffusion Models, Photorealistic Image Generation, Grad-CAM, U-Net Architecture

Abstract

Generative models, that generate images from scratch, have lately drawn a lot of attention. The diffusion models are especially popular for their training process and their excellent noise modelling capabilities. However, there is still challenging for conditional text image synthesis. In the proposed system the basic level architecture of U-Net is redesigned with attention and residual blocks for capturing complex features of images. Along with this, use of multi-scale feature fusion technique helps to handle images of different resolutions. Diffusion Transformer and cross-modal attention mechanisms enhance realism and coherence in image quality. The model generates the photorealistic image from the latent representation with VAE decoder. Furthermore, employment of Grad-CAM ensures the model is clearer and gives insights on what portions of the image the model highlighting to during its generation process. In comparison with both the basic diffusion model and GANs, the developed diffusion model improved significantly in photorealism, detail sharpness, and numbers like FID and PSNR to be much greater than those of both.

Explainable Photorealistic Image Synthesis Using Diffusion Model With Multi-Scale Feature Fusion

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

INDEXING

Information

Keywords