Prasun Roy

Prasun Roy

I received PhD from the School of Computer Science, University of Technology Sydney, Australia, advised by Prof Michael Blumenstein and Prof Umapada Pal. During my study, I was a member of the ISI-UTS Joint Research Cluster and the Australian Artificial Intelligence Institute (AAII).

Previously, I was a Research Associate at the Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata.

Email / CV / Bio / Google Scholar / GitHub / LinkedIn / Twitter

Research

I am interested in computer vision, deep learning, generative models, image processing, graphics, and applied machine learning. Most of my recent research focuses on text style manipulation, human pose transformation, and image generation in ambiguous context. Some publications are highlighted.

> Selected Publications _
	(NEW!) DRG-Font: Dynamic Reference-Guided Few-shot Font Generation via Contrastive Style-Content Disentanglement Rejoy Chakraborty, Prasun Roy, Saumik Bhattacharya, Umapada Pal arXiv, 2026 Project Page / Code / arXiv / BibTex We introduce a few-shot font generation strategy that learns complex glyph attributes by decomposing style and content embedding spaces.
	Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein IEEE Transactions on Artificial Intelligence, 2026 Project Page / Code / arXiv / BibTex By mutually cross-attending two different spatial feature spaces, we encode the global scene context for semantically meaningful affordance generation.
	FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya WACV, 2025 (Oral presentation) Project Page / Code / arXiv / BibTex By adopting a cascaded attention mechanism, we perform word-level style and content translation for realistic text manipulation in a scene.
	Semantically Consistent Person Image Generation Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein ICPR, 2024 Project Page / Code / arXiv / BibTex Using a parsing map-based representation, we propose a method for introducing a new person into a scene such that the inserted person is semantically consistent with the existing individuals.
	d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein ICPR, 2024 Project Page / Code / arXiv / BibTex A small trainable latent mapping network lets you perform photorealistic sketch-to-image translation using a pretrained text-to-image diffusion model without retraining.
	Multi-scale Attention Guided Pose Transfer Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal Pattern Recognition, 2023 Project Page / Code / arXiv / BibTex Cascaded attention at every feature resolution improves the generated image quality by retaining both low-frequency and high-frequency visual attributes in a structurally guided end-to-end human pose transformation.
	TIPS: Text-Induced Pose Synthesis Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein ECCV, 2022 Project Page / Code / arXiv / BibTex We address the structural bias in pose-guided person image generation techniques with a text-conditioned human pose transformation strategy.
	Scene Aware Person Image Generation through Global Contextual Conditioning Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein ICPR, 2022 Project Page / Code / arXiv / BibTex Using a keypoint-based representation, we propose a method for introducing a new person into a scene such that the inserted person is semantically consistent with the existing individuals.
	STEFANN: Scene Text Editor using Font Adaptive Neural Network Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal CVPR, 2020 Project Page / Code / arXiv / BibTex We introduce a technique for realistic text modification in a scene at the character-level by disentangling the task into dedicated shape and color transformation objectives.
	Effects of Degradations on Deep Neural Network Architectures Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal arXiv, 2018 Project Page / Code / arXiv / BibTex A study on how different image degradation models impact the performance decay of deep neural networks unveils fascinating insights for substantially improving noise tolerance at the expense of slight performance trade-offs.