(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv, 2021-08-22
Abstract
In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).
As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
Review for wireless communication technology based on digital encoding metasurfaces
Haojie Zhan, Manna Gu, Ying Tian, Huizhen Feng, Mingmin Zhu, Haomiao Zhou, Yongxing Jin, Ying Tang, Chenxia Li, Bo Fang, Zhi Hong, Xufeng Jing, Le Wang
Opto-Electronic Advances
2025-07-17
Multiphoton intravital microscopy in small animals of long-term mitochondrial dynamics based on super‐resolution radial fluctuations
Saeed Bohlooli Darian, Jeongmin Oh, Bjorn Paulson, Minju Cho, Globinna Kim, Eunyoung Tak, Inki Kim, Chan-Gi Pack, Jung-Man Namgoong, In-Jeoung Baek, Jun Ki Kim
Opto-Electronic Advances
2025-07-17
Non-volatile tunable multispectral compatible infrared camouflage based on the infrared radiation characteristics of Rosaceae plants
Xin Li, Xinye Liao, Junxiang Zeng, Zao Yi, Xin He, Jiagui Wu, Huan Chen, Zhaojian Zhang, Yang Yu, Zhengfu Zhang, Sha Huang, Junbo Yang
Opto-Electronic Advances
2025-07-09
CW laser damage of ceramics induced by air filament
Chuan Guo, Kai Li, Zelin Liu, Yuyang Chen, Junyang Xu, Zhou Li, Wenda Cui, Changqing Song, Cong Wang, Xianshi Jia, Ji'an Duan, Kai Han
Opto-Electronic Advances
2025-06-27
Operando monitoring of state of health for lithium battery via fiber optic ultrasound imaging system
Chen Geng, Wang Anqi, Zhang Yi, Zhang Fujun, Xu Dongchen, Liu Yueqi, Zhang Zhi, Yan Zhijun, Li Zhen, Li Hao, Sun Qizhen
Opto-Electronic Science
2025-06-25
Observation of polaronic state assisted sub-bandgap saturable absorption
Li Zhou, Yiduo Wang, Jianlong Kang, Xin Li, Quan Long, Xianming Zhong, Zhihui Chen, Chuanjia Tong, Keqiang Chen, Zi-Lan Deng, Zhengwei Zhang, Chuan-Cun Shu, Yongbo Yuan, Xiang Ni, Si Xiao, Xiangping Li, Yingwei Wang, Jun He
Opto-Electronic Advances
2025-06-19