Year
Month

(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv , 2021-08-22
Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Fast-zoom and high-resolution sparse compound-eye camera based on dual-end collaborative optimization
  • Yi Zheng, Hao-Ran Zhang, Xiao-Wei Li, You-Ran Zhao, Zhao-Song Li, Ye-Hao Hou, Chao Liu, Qiong-Hua Wang
  • Opto-Electronic Advances
  • 2025-06-19
  • Cascaded metasurfaces for adaptive aberration correction
  • Lei Zhang, Tie Jun Cui
  • Opto-Electronic Advances
  • 2025-05-27
  • Embedded solar adaptive optics telescope: achieving compact integration for high-efficiency solar observations
  • Naiting Gu, Hao Chen, Ao Tang, Xinlong Fan, Carlos Quintero Noda, Yawei Xiao, Libo Zhong, Xiaosong Wu, Zhenyu Zhang, Yanrong Yang, Zao Yi, Xiaohu Wu, Linhai Huang, Changhui Rao
  • Opto-Electronic Advances
  • 2025-05-27
  • Spectrally extended line field optical coherence tomography angiography
  • Si Chen, Kan Lin, Xi Chen, Yukun Wang, Chen Hsin Sun, Jia Qu, Xin Ge, Xiaokun Wang, Linbo Liu
  • Opto-Electronic Advances
  • 2025-05-27
  • Wearable photonic smart wristband for cardiorespiratory function assessment and biometric identification
  • Wenbo Li, Yukun Long, Yingyin Yan, Kun Xiao, Zhuo Wang, Di Zheng, Arnaldo Leal-Junior, Santosh Kumar, Beatriz Ortega, Carlos Marques, Xiaoli Li, Rui Min
  • Opto-Electronic Advances
  • 2025-05-27
  • Integrated photonic polarizers with 2D reduced graphene oxide
  • Junkai Hu, Jiayang Wu, Di Jin, Wenbo Liu, Yuning Zhang, Yunyi Yang, Linnan Jia, Yijun Wang, Duan Huang, Baohua Jia, David J. Moss
  • Opto-Electronic Science
  • 2025-05-22
  • Tip-enhanced Raman scattering of glucose molecules
  • Zhonglin Xie, Chao Meng, Donghua Yue, Lei Xu, Ting Mei, Wending Zhang
  • Opto-Electronic Science
  • 2025-05-22
  • Structural color: an emerging nanophotonic strategy for multicolor and functionalized applications
  • Wenhao Wang, Long Wang, Qianqian Fu, Wang Zhang, Liuying Wang, Gu Liu, Youju Huang, Jie Huang, Haoyuan Zhang, Fuqiang Guo, Xiaohu Wu
  • Opto-Electronic Science
  • 2025-04-25
  • Reconfigurable origami chiral response for holographic imaging and information encryption
  • Zhibiao Zhu, Yongfeng Li, Jiafu Wang, Ze Qin, Lixin Jiang, Yang Chen, Shaobo Qu
  • Opto-Electronic Science
  • 2025-04-25
  • Single-layer, cascaded and broadband-heat-dissipation metasurface for multi-wavelength lasers and infrared camouflage
  • Xingdong Feng, Tianqi Zhang, Xuejun Liu, Fan Zhang, Jianjun Wang, Hong Bao, Shan Jiang, YongAn Huang
  • Opto-Electronic Advances
  • 2025-04-02
  • Phase reconstruction via metasurface-integrated quantum analog operation
  • Qiuying Li, Minggui Liang, Shuoqing Liu, Jiawei Liu, Shizhen Chen, Shuangchun Wen, Hailu Luo
  • Opto-Electronic Advances
  • 2025-04-02
  • Full-dimensional complex coherence properties tomography for multi-cipher information security
  • Yonglei Liu, Siting Dai, Yimeng Zhu, Yahong Chen, Peipei Peng, Yangjian Cai, Fei Wang
  • Opto-Electronic Advances
  • 2025-03-31



  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks        IGNNITION: fast prototyping of graph neural networks for communication networks
    About
    |
    Contact
    |
    Copyright © PubCard