(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
² Huawei Cloud & AI
arXiv, 2021-08-22

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Power grid fault diagnosis based on a deep pyramid convolutional neural network
  • Xu Zhang 张旭, Huiting Zhang, Dongying Zhang, Yixian Wang, Ruiting Ding, Yuchuan Zheng, Yongxu Zhang
  • CSEE Journal of Power and Energy Systems
  • 2022-05-06
  • China's factor reallocation effect considering energy
  • Guangqing Xu, Xiaoyu Chen
  • Chinese Journal of Population, Resources and Environment
  • 2022-05-02
  • Cannabidiol prevents depressive-like behaviors through the modulation of neural stem cell differentiation
  • Ming Hou, Suji Wang, Dandan Yu, Xinyi Lu, Xiansen Zhao, Zhangpeng Chen, Chao Yan
  • Frontiers of Medicine
  • 2022-04-26
  • Cultivation of gut microorganisms of the marine ascidian Halocynthia roretzi reveals their potential roles in the environmental adaptation of their host
  • Yang Yang, Yuting Zhu, Haiming Liu, Jiankai Wei, Haiyan Yu, Bo Dong
  • Marine Life Science & Technology
  • 2022-04-26
  • Data network traffic analysis and optimization strategy of real-time power grid dynamic monitoring system for wide-frequency measurements
  • Jinsong Li, Hao Liu, Wenzhuo Li, Tianshu Bi, Mingyang Zhao
  • Global Energy Interconnection
  • 2022-04-25
  • Field distribution of the Z₂ topological edge state revealed by cathodoluminescence nanoscopy
  • Xiao He, Donglin Liu, Hongfei Wang, Liheng Zheng, Bo Xu, Biye Xie, Meiling Jiang, Zhixin Liu, Jin Zhang, Minghui Lu, Zheyu Fang
  • Opto-Electronic Advances
  • 2022-04-25
  • Advances in femtosecond laser direct writing of fiber Bragg gratings in multicore fibers: technology, sensor and laser applications
  • Alexey Wolf, Alexander Dostovalov, Kirill Bronnikov, Mikhail Skvortsov, Stefan Wabnitz, Sergey Babin
  • Opto-Electronic Advances
  • 2022-04-25
  • Graphene-empowered dynamic metasurfaces and metadevices
  • Chao Zeng, Hua Lu, Dong Mao, Yueqing Du, He Hua, Wei Zhao, Jianlin Zhao
  • Opto-Electronic Advances
  • 2022-04-25
  • Charge carrier dynamics in different crystal phases of CH₃NH₃PbI₃ perovskite
  • Efthymis Serpetzoglou, Ioannis Konidakis, George Kourmoulakis, Ioanna Demeridou, Konstantinos Chatzimanolis, Christos Zervos, George Kioseoglou, Emmanuel Kymakis, Emmanuel Stratakis
  • Opto-Electronic Science
  • 2022-04-21
  • Applications of optically and electrically driven nanoscale bowtie antennas
  • Zhongjun Jiang, Yingjian Liu, Liang Wang
  • Opto-Electronic Science
  • 2022-04-20
  • Validation of the bodily expressive action stimulus test among Chinese adults and children
  • Yunmei Yang, Wenwen Hou, Jing Li
  • PsyCh Journal
  • 2022-04-17

  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks                                IGNNITION: fast prototyping of graph neural networks for communication networks
    Copyright © PubCard