Year
Month
(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv, 2021-08-22
Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Filament based ionizing radiation sensing
  • Pengfei Qi, Haiyi Liu, Jiewei Guo, Nan Zhang, Lu Sun, Shishi Tao, Binpeng Shang, Lie Lin Weiwei Liu
  • Opto-Electronic Advances
  • 2025-12-25
  • Separation and identification of mixed signal for distributed acoustic sensor using deep learning
  • Huaxin Gu, Jingming Zhang, Xingwei Chen, Feihong Yu, Deyu Xu, Shuaiqi Liu, Weihao Lin, Xiaobing Shi, Zixing Huang, Xiongji Yang, Qingchang Hu, Liyang Shao
  • Opto-Electronic Advances
  • 2025-11-25
  • Scale-invariant 3D face recognition using computer-generated holograms and the Mellin transform
  • Yongwei Yao, Yaping Zhang, Huanrong He, Xianfeng David Gu, Daping Chu, Ting-Chung Poon
  • Opto-Electronic Advances
  • 2025-11-25
  • Partially coherent optical chip enables physical-layer public-key encryption
  • Bo Wu, Wenkai Zhang, Hailong Zhou, Jianji Dong, Yilun Wang, Xinliang Zhang
  • Opto-Electronic Advances
  • 2025-11-25
  • Advanced applications of pulsed laser deposition in electrocatalysts for hydrogen-electric conversion systems
  • Yuanyuan Zhou, Yong Wang, Ke Zhang, Huaqian Leng, Peter Müller-Buschbaum, Nian Li, Liang Qiao
  • Opto-Electronic Advances
  • 2025-11-25
  • A review on optical torques: from engineered light fields to objects
  • Tao He, Jingyao Zhang, Din Ping Tsai, Junxiao Zhou, Haiyang Huang, Weicheng Yi, Zeyong Wei Yan Zu, Qinghua Song, Zhanshan Wang, Cheng-Wei Qiu, Yuzhi Shi, Xinbin Cheng
  • Opto-Electronic Science
  • 2025-11-25
  • IncepHoloRGB: multi-wavelength network model for full-color 3D computer-generated holography
  • Xuan Yu, Zhilin Teng, Xuhao Fan, Tianchi Liu, Wenbin Chen, Xinger Wang, Zhe Zhao, Wei Xiong, Hui Gao
  • Opto-Electronic Advances
  • 2025-10-25
  • Dual-band-tunable all-inorganic Zn-based metal halides for optical anti-counterfeiting
  • Meng Wang, Dehai Liang1, Saif M. H. Qaid, Shuangyi Zhao, Yingjie Liu, Zhigang Zang
  • Opto-Electronic Advances
  • 2025-10-25
  • Superchirality induced ultrasensitive chiral detection in high-Q optical cavities
  • Tianxu Jia, Youngsun Jeon Lv Feng Hongyoon Kim, Bingjue Li, Guanghao Rui, Junsuk Rho
  • Opto-Electronic Advances
  • 2025-10-25
  • Unsupervised learning enabled label-free single-pixel imaging for resilient information transmission through unknown dynamic scattering media
  • Fujie Li, Haoyu Zhang, Zhilan Lu, Li Yao, Yuan Wei, Ziwei Li, Feng Bao, Junwen Zhang, Yingjun Zhou, Nan Chi
  • Opto-Electronic Advances
  • 2025-10-25
  • Simultaneous detection of inflammatory process indicators via operando dual lossy mode resonance-based biosensor
  • Desiree Santano, Abian B. Socorro, Ambra Giannetti, Ignacio Del Villar, Francesco Chiavaioli
  • Opto-Electronic Science
  • 2025-10-16
  • Noncommutative metasurfaces enabled diverse quantum path entanglement of structured photons
  • Yan Wang, Yichang Shou, Jiawei Liu, Qiang Yang, Shizhen Chen, Weixing Shu, Shuangchun Wen, Hailu Luo
  • Opto-Electronic Science
  • 2025-10-16



  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks                                IGNNITION: fast prototyping of graph neural networks for communication networks
    About
    |
    Contact
    |
    Copyright © PubCard