PubCard - From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹

¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能

arXiv, 2021-08-22

Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4

Ppt-level volatile organic compounds detection via microsecond-pulse-enhanced mid-infrared photoacoustic

Senyu Wang, Liang Zhao, Hongyu Luo, Xiangyu Zhao, Jianfeng Li, Wei Wang, Hao Lei, Mingrui Jiang, Jinlong Wan, Binxing Zhao, Bincheng Li, Yong Liu

Opto-Electronic Science

2026-04-23

Femtosecond laser maskless direct writing of dual-band crosstalk-free information for all-in-one high-security encryption metasurface

Hanmian Jiang, Zhuguo Li, Dongshi Zhang

Opto-Electronic Advances

2026-04-17

Polarization-guided diffusion prior for eyeglass reflection removal

Yating Chen, Liangcai Cao

Opto-Electronic Advances

2026-04-17

AI-assisted metaphotonics

Minsung Kang, Seokju Choi, Kaixi Fu, Xiaoyuan Liu, Zhun Wei, Lei Jin, Hao Wang, Olivier J. F. Martin, Joel K. W. Yang, Sunae So, Trevon Badloe

Opto-Electronic Advances

2026-04-17

Terahertz imaging technology: progress and applications

Yuyuan Tian, Xiaoyin Chen, Zhuocheng Zhang, Qianze Yan, Yiming Liu, Chengliang Deng, Min Wan, Jiang Li, Xiaoqiuyan Zhang, Lu Rong, Elizaveta Tsiplakova, Nikolay Petrov, Xinke Wang, Liguo Zhu, Min Hu, Yan Zhang

Opto-Electronic Technology

2026-03-30

Interpretable low-dose CT enhancement via multi-Gaussian cluster variance reduction

Xiaofeng Zhang, Yilan Zhu, Yongsheng Huang, Jielong Yang, Zhili Wang, Kai Zhang, Si Chen, Linbo Liu, Xin Ge

Opto-Electronic Science

2026-03-25

Polygonal generalized perfect spatiotemporal optical vortices

Shuoshuo Zhang, Zhangyu Zhou, Qianyi Wei, Zhongsheng Man, Changjun Min, Wending Zhang, Yuquan Zhang, Ting Mei, Xiaocong Yuan

Opto-Electronic Science

2026-03-25

Perovskite nanocrystals in glass for high efficiency and ultra-high resolution dynamic holographic multicolor display

Chao Ruan, Xinkuo Li, Ke Sun, Jianrong Qiu, Dezhi Tan

Opto-Electronic Advances

2026-03-25

Pixelated BIC metasurfaces for terahertz integrated sensing and imaging

Zhanqiang Xue, Guizhen Xu, Junliang Chen, Junxing Fan, Hongyang Xing, Ye Zhou, Longqing Cong

Opto-Electronic Advances

2026-03-25

Overcoming challenges in InP-based quantum dots: from nucleation mechanisms to high-performance quantum dot light-emitting diodes

Yangyang Bian, Qian Li, Fei Chen, Chunhe Yang, Huaibin Shen, Aiwei Tang

Opto-Electronic Advances

2026-03-25

Emerging landscape of photonic bound states in the continuum for next-generation metadevices

Thi Thu Ha Do, Ronghui Lin, Daniil A. Shilkin, Zhiyi Yuan, Cuong Dang, Arseniy I. Kuznetsov, Jinghua Teng, Son Tung Ha

Opto-Electronic Advances

2026-03-25

A 4096-element 3D-integrated Si-SiN optical phased array for high-power coherent LiDAR

Han Wang, Weimin Xie, Xin Yan, Jiaqi Li, Youxi Lu, Ping Jiang, Feng Li, Kai Jin, Xu Yang, Jiali Jiang, Keran Deng, Weishuai Chen, Jing Luo, Li Jin, Junbo Feng, Kai Wei

Opto-Electronic Technology

2026-03-20

NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks IGNNITION: fast prototyping of graph neural networks for communication networks