Textual information in a captured scene plays an important role in scene interpretation and decision making. Though there exist methods that can successfully detect and interpret complex text regions present in a scene, to the best of our knowledge, there is no significant prior work that aims to modify the textual information in an image. The ability to edit text directly on images has several advantages including error correction, text restoration and image reusability. In this paper, we propose a method to modify text in an image at character-level. We approach the problem in two stages. At first, the unobserved character (target) is generated from an observed character (source) being modified. We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color. Next, we replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters. Our method works as a unified platform for modifying text in images. We present the effectiveness of our method on COCO-Text and ICDAR datasets both qualitatively and quantitatively.

Network Architecture

Click on the image for a detailed view of the network architecture.

Editing Results

Each image pair consists of the original image (Left) and the edited image (Right).


  title     = {STEFANN: Scene Text Editor using Font Adaptive Neural Network},
  author    = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2020}

Video Presentations

News and Updates

  May 20, 2020

CVPR 2020 main conference presentation schedule is released. We will be presenting our work at Session 3.3
on Thursday, June 18, 2020, 3:00-5:00 PM Pacific Daylight Time (Poster #105).

  Apr 21, 2020

Our work is featured in this week's edition of the Tracer Newsletter published by @Deeptracelabs.

  Apr 16, 2020

Our work is featured as the trending post of the day on Made With ML.
@GokuMohandas of @madewithml tweeted an insightful concern regarding potential misuse of generative models and the need of robust detection techniques to distinguish between real and fake images.

  Apr 15, 2020

We have released our paper, supplementary materials, code, datasets and pretrained models.
Star Fork

  Feb 24, 2020

Our paper is accepted in CVPR 2020.
More details about the code and datasets will be released soon.

  Sep 03, 2019

We have been granted a software copyright on STEFANN: Scene Text Editor using Font Adaptive Neural Network by the Copyright Office, Government of India with ROC No. SW-12778/2019 and Diary No. 9737/2019-CO/SW.

On Twitter

Copyright 2020 by the authors | Made with on Earth.