STEFANN: Scene Text Editor using Font Adaptive Neural Network

Prasun Roy ^1* Saumik Bhattacharya ^2* Subhankar Ghosh ^1* Umapada Pal ¹
¹ Indian Statistical Institute, Kolkata
² Indian Institute of Technology, Kharagpur

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Abstract

Textual information in a captured scene plays an important role in scene interpretation and decision making. Though there exist methods that can successfully detect and interpret complex text regions present in a scene, to the best of our knowledge, there is no significant prior work that aims to modify the textual information in an image. The ability to edit text directly on images has several advantages including error correction, text restoration and image reusability. In this paper, we propose a method to modify text in an image at character-level. We approach the problem in two stages. At first, the unobserved character (target) is generated from an observed character (source) being modified. We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color. Next, we replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters. Our method works as a unified platform for modifying text in images. We present the effectiveness of our method on COCO-Text and ICDAR datasets both qualitatively and quantitatively.

Network Architecture

Click on the image for a detailed view of the network architecture.

Editing Results

Each image pair consists of the original image (Left) and the edited image (Right).

Paper and Supplementary Materials

Download Paper ~8MB PDF

Download Supplementary Materials ~6MB PDF

Publication
@ CVF Open Access

Code
@ GitHub

Datasets + Models
@ Google Drive

Datasets + Kernels
@ Kaggle

Citation

@InProceedings{Roy_2020_CVPR,
  title     = {STEFANN: Scene Text Editor using Font Adaptive Neural Network},
  author    = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2020}
}

Video Presentations

News and Updates

May 20, 2020

CVPR 2020 main conference presentation schedule is released. We will be presenting our work at Session 3.3
on Thursday, June 18, 2020, 3:00-5:00 PM Pacific Daylight Time (Poster #105).

Apr 21, 2020

Our work is featured in this week's edition of the Tracer Newsletter published by @Deeptracelabs.

Apr 16, 2020

Our work is featured as the trending post of the day on Made With ML.
@GokuMohandas of @madewithml tweeted an insightful concern regarding potential misuse of generative models and the need of robust detection techniques to distinguish between real and fake images.

Apr 15, 2020

We have released our paper, supplementary materials, code, datasets and pretrained models.
Star Fork

Feb 24, 2020

Our paper is accepted in CVPR 2020.
More details about the code and datasets will be released soon.

Sep 03, 2019

We have been granted a software copyright on STEFANN: Scene Text Editor using Font Adaptive Neural Network by the Copyright Office, Government of India with ROC No. SW-12778/2019 and Diary No. 9737/2019-CO/SW.

On Twitter

STEFANN: Scene Text Editor using Font Adaptive Neural Network
pdf: https://t.co/WIkFfsL0i2
abs: https://t.co/sHIYHm8PMb
project page: https://t.co/cCCOKSKDrc
github: https://t.co/dkOYTNOjgE pic.twitter.com/A65TXpfC0C
— roadrunner01 (@ak92501) April 15, 2020

🏆 Trending post of the day on Made With ML: STEFANN - Scene Text Editor using Font Adaptive Neural Networkhttps://t.co/ki6DEBQE6n
— Made With ML (@madewithml) April 16, 2020

I think amazing work like should also come with methods of detection that the technique was used on a given image, etc. This year's CVPR results are of such high quality that they can easily be abused. Not saying the burden falls immediately on the group but a follow up, etc. https://t.co/QvaR7mTQlj
— Goku Mohandas (@GokuMohandas) April 16, 2020

Nice application to play with:
STEFANN: Scene Text Editor using Font Adaptive Neural Network @ The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020. #DeepLearning https://t.co/uzxUZTHjMZ
— Michał Chromiak (@drChromiak) April 17, 2020

STEFANN: Scene Text Editor using Font Adaptive Neural Network https://t.co/sMQMynJx1v #AI #Research via @Smerity
— Future of AI (@future_of_AI) April 17, 2020

The top feature from this week's Tracer Newsletter

Extinction Rebellion (XR) activists released a deepfake video of the Belgian Prime Minister Shophie Wilmès making a speech linking Covid-19 to the climate crisis.https://t.co/AFn18l3UmM
— Deeptrace (@Deeptracelabs) April 20, 2020