Year
Month

(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv , 2021-08-22
Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Harmonic heterostructured pure Ti fabricated by laser powder bed fusion for excellent wear resistance via strength-plasticity synergy
  • Desheng Li, Huanrong Xie, Chengde Gao, Huan Jiang, Liyuan Wang, Cijun Shuai
  • Opto-Electronic Advances
  • 2025-09-25
  • Strong-confinement low-index-rib-loaded waveguide structure for etchless thin-film integrated photonics
  • Yifan Qi, Gongcheng Yue, Ting Hao, Yang Li
  • Opto-Electronic Advances
  • 2025-09-25
  • Flicker minimization in power-saving displays enabled by measurement of difference in flexoelectric coefficients and displacement-current in positive dielectric anisotropy liquid crystals
  • Junho Jung, HaYoung Jung, GyuRi Choi, HanByeol Park, Sun-Mi Park, Ki-Sun Kwon, Heui-Seok Jin, Dong-Jin Lee, Hoon Jeong, JeongKi Park, Byeong Koo Kim, Seung Hee Lee, MinSu Kim
  • Opto-Electronic Advances
  • 2025-09-25
  • Dual-frequency angular-multiplexed fringe projection profilometry with deep learning: breaking hardware limits for ultra-high-speed 3D imaging
  • Wenwu Chen, Yifan Liu, Shijie Feng, Wei Yin, Jiaming Qian, Yixuan Li, Hang Zhang, Maciej Trusiak, Malgorzata Kujawinska, Qian Chen, Chao Zuo
  • Opto-Electronic Advances
  • 2025-09-25
  • Parallel all-optical encoded CDMA-driven anti-interference LiDAR for 78 MHz point acquisition
  • Shujian Gong, Peng Tian, Yinghui Guo, Xiaoyin Li, Mingbo Pu, Qi Zhang, Yanqin Wang, Heping Liu, Xiangang Luo
  • Opto-Electronic Technology
  • 2025-09-22
  • Enrichment strategies in surface-enhanced Raman scattering: theoretical insights and optical design for enhanced light-matter interaction
  • Zhiyang Pei, Chang Ji, Mingrui Shao, Yang Wu, Xiaofei Zhao, Baoyuan Man, Zhen Li, Jing Yu, Chao Zhang
  • Opto-Electronic Science
  • 2025-09-18
  • Phase matching sampling algorithm for sampling rate reduction in time division multiplexing optical fiber sensor system
  • Junhui Wu, Zhilin Xu, Yi Shi, Yurong Liang, Qizhen Sun
  • Opto-Electronic Technology
  • 2025-09-18
  • Three-dimensional integrated optical fiber devices: emergence and applications
  • Tingting Yuan, Xiaotong Zhang, Shitai Yang, Donghui Wang, Libo Yuan
  • Opto-Electronic Technology
  • 2025-09-18
  • Femtosecond laser micro/nano-processing via multiple pulses incubation
  • Jingbo Yin, Zhenyuan Lin, Lingfei Ji, Minghui Hong
  • Opto-Electronic Technology
  • 2025-09-18
  • All-optical digital logic and neuromorphic computing based on multi-wavelength auxiliary and competition in a single microring resonator
  • Qiang Zhang, Yingjun Fang, Ning Jiang, Anran Li, Jiahao Qian, Yiqun Zhang, Gang Hu, Kun Qiu
  • Opto-Electronic Science
  • 2025-08-28
  • Fast step heterodyne light-induced thermoelastic spectroscopy gas sensing based on a quartz tuning fork with high-frequency of 100 kHz
  • Yuanzhi Wang Ying He, Shunda Qiao, Xiaonan Liu, Chu Zhan, Xiaoming Duan, Yufei Ma
  • Opto-Electronic Advances
  • 2025-08-28
  • Advances and new perspectives of optical systems and technologies for aerospace applications: a comprehensive review
  • Sandro Oliveira, Jan Nedoma, Radek Martinek, Carlos Marques
  • Opto-Electronic Advances
  • 2025-08-25



  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks        IGNNITION: fast prototyping of graph neural networks for communication networks
    About
    |
    Contact
    |
    Copyright © PubCard