Scene Text Editing papers
SRNet
- style retention network (SRNet)
 - 
    
Editing Text in the wild. 이걸 기반해서 SwapText, STEFANN이 나옴
 - Three modules
    
- text conversion module : changes the text content of the source image into the target text while keeping the original text style
 - background inpainting module : erases the original text and fills the text region with appropriate texture
 - fusion module : combines the information from the two former modules, and generates the edited text images
 
 - paper : https://arxiv.org/pdf/1908.03047.pdf
 - code : https://github.com/endy-see/SRNet-1
 
STEFANN (CVPR 2020)
- character-level text editing in image
 - the unobserved character (target) is generated from an observed character (source) being modified
 - 
    
replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters
 - paper : STEFANN_CVPR_2020_paper.pdf
 - code: https://github.com/prasunroy/stefann
 
RewriteNet (CVPRW 2022)
- get content and style features from a text image by using scene text recognition
 - 
    
rewrite new text to the original image by using the features
 - paper : https://arxiv.org/pdf/2107.11041.pdf
 - code : https://github.com/GOGOOOMA/AIFFEL_Hackathon (unofficial code for RewriteNet) official github (currently not available)
 
STRIVE (ICCV 2021)
- Scene Text Replacement in Videos
 - the text in all frames is normalized to a frontal pose using a spatio-temporal transformer network
 - the text is replaced in a single reference frame using a state-of-art still-image text replacement method
 - 
    
the new text is transferred from the reference to remaining frames using a novel learned image transformation network
 - paper : G_STRIVE_Scene_Text_Replacement_in_Videos_ICCV_2021_paper.pdf
 - github : https://github.com/striveiccv2021/STRIVE-ICCV2021
 
MOSTEL (AAAI 2023)
- MOdifying Scene Text image at strokE Level (MOSTEL)
 - generate stroke guidance maps to explicitly indicate regions to be edited
 - 
    
propose a Semisupervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images
 - paper : https://arxiv.org/pdf/2212.01982.pdf
 - code : https://github.com/qqqyd/MOSTEL
 
SwapText (CVPR 2020)
- 하고 싶은 task 에 가장 가까운 연구임. 하지만 코드 공개가 안되어있음..
 - a three-stage framework to transfer texts across scene images
    
- first stage: a novel text swapping network to replace text labels only in the foreground image
 - second stage: a background completion network to reconstruct background images
 - third stage: the fusion network generate the word image by using the foreground and background images
 
 - paper : swaptext.pdf
 - code : not released
 
Font style transfer (cross-language)
FTransGAN (WACV 2021)
- transfer font styles between different languages by observing only a few samples
 - 
    
network into a multilevel attention form to capture both local and global features of the style images
 - paper : Li_Few-Shot_Font_Style_Transfer_Between_Different_Languages_WACV_2021_paper.pdf
 - code : https://github.com/ligoudaner377/font_translator_gan
 
OCR
- 
    
EasyOCR : how to use EasyOCR, official github, customize EasyOCR
 - 
    
Donut : https://github.com/clovaai/donut
 
API
- free ocr api : https://ocr.space/OCRAPI
 
Translation
- 
    
mT5 : https://huggingface.co/docs/transformers/model_doc/mt5
 - 
    
mBart: https://huggingface.co/transformers/v3.5.1/model_doc/mbart.html
 - 
    
googletrans : Free and Unlimited Google translate API for Python. Documentation
 - 
    
word2word : easy-to-use word translations by Kakao Brain. github link
 - 
    
seq2seq : A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation. github link
 
API
Papago : how to use papago api
Image Inpainting
RePaint (CVPR 2022)
- 
    
A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach
 - paper : https://arxiv.org/pdf/2201.09865.pdf
 - code : https://github.com/andreas128/RePaint
 
GARnet (ECCV 2022)
- Scene text removal paper using Gated Attention and RoI Generation method
 - Gated Attention : focus on the text stroke as well as the textures and colors of the surrounding regions to remove text from the input image much more precisely
 - 
    
RoI Generation : focus on only the region with text instead of the entire image to train the model more efficiently
 - paper : 4705_ECCV_2022_paper
 - code : https://github.com/naver/garnet