(Preprint) CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model
Xin Wang ¹, Yasheng Wang ², Pingyi Zhou ², Meng Xiao ², Yadao Wang ², Li Li ³, Xiao Liu ⁴, Hao Wu 武浩 ⁵, Jin Liu 刘进 ¹, Xin Jiang ²
¹ School of Computer Science, Wuhan University 武汉大学 计算机学院
² Noah's Ark Lab, Huawei 华为 诺亚方舟实验室
³ Faculty of Information Technology, Monash University
⁴ School of Information Technology, Deakin University
⁵ School of Information Science and Engineering, Yunnan University 云南大学 信息学院
arXiv, 2021-08-10

Pre-trained models for programming languages have proven their significant values in various code-related tasks, such as code search, code clone detection, and code translation. Currently, most pre-trained models treat a code snippet as a sequence of tokens or only focus on the data flow between code identifiers.

However, rich code syntax and hierarchy are ignored which can provide important structure information and semantic rules of codes to help enhance code representations. In addition, although the BERT-based code pre-trained models achieve high performance on many downstream tasks, the native derived sequence representations of BERT are proven to be of low-quality, it performs poorly on code matching and similarity tasks.

To address these problems, we propose CLSEBERT, a Constrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model, to deal with various code intelligence tasks. In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST) and leverage the constrastive learning to learn noise-invariant code representations. Besides the masked language modeling (MLM), we also introduce two novel pre-training objectives. One is to predict the edges between nodes in the abstract syntax tree, and the other is to predict the types of code tokens. Through extensive experiments on four code intelligence tasks, we successfully show the effectiveness of our proposed model.
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_1
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_2
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_3
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_4
  • Power grid fault diagnosis based on a deep pyramid convolutional neural network
  • Xu Zhang 张旭, Huiting Zhang, Dongying Zhang, Yixian Wang, Ruiting Ding, Yuchuan Zheng, Yongxu Zhang
  • CSEE Journal of Power and Energy Systems
  • 2022-05-06
  • China's factor reallocation effect considering energy
  • Guangqing Xu, Xiaoyu Chen
  • Chinese Journal of Population, Resources and Environment
  • 2022-05-02
  • Cannabidiol prevents depressive-like behaviors through the modulation of neural stem cell differentiation
  • Ming Hou, Suji Wang, Dandan Yu, Xinyi Lu, Xiansen Zhao, Zhangpeng Chen, Chao Yan
  • Frontiers of Medicine
  • 2022-04-26
  • Cultivation of gut microorganisms of the marine ascidian Halocynthia roretzi reveals their potential roles in the environmental adaptation of their host
  • Yang Yang, Yuting Zhu, Haiming Liu, Jiankai Wei, Haiyan Yu, Bo Dong
  • Marine Life Science & Technology
  • 2022-04-26
  • Data network traffic analysis and optimization strategy of real-time power grid dynamic monitoring system for wide-frequency measurements
  • Jinsong Li, Hao Liu, Wenzhuo Li, Tianshu Bi, Mingyang Zhao
  • Global Energy Interconnection
  • 2022-04-25
  • Field distribution of the Z₂ topological edge state revealed by cathodoluminescence nanoscopy
  • Xiao He, Donglin Liu, Hongfei Wang, Liheng Zheng, Bo Xu, Biye Xie, Meiling Jiang, Zhixin Liu, Jin Zhang, Minghui Lu, Zheyu Fang
  • Opto-Electronic Advances
  • 2022-04-25
  • Advances in femtosecond laser direct writing of fiber Bragg gratings in multicore fibers: technology, sensor and laser applications
  • Alexey Wolf, Alexander Dostovalov, Kirill Bronnikov, Mikhail Skvortsov, Stefan Wabnitz, Sergey Babin
  • Opto-Electronic Advances
  • 2022-04-25
  • Graphene-empowered dynamic metasurfaces and metadevices
  • Chao Zeng, Hua Lu, Dong Mao, Yueqing Du, He Hua, Wei Zhao, Jianlin Zhao
  • Opto-Electronic Advances
  • 2022-04-25
  • Charge carrier dynamics in different crystal phases of CH₃NH₃PbI₃ perovskite
  • Efthymis Serpetzoglou, Ioannis Konidakis, George Kourmoulakis, Ioanna Demeridou, Konstantinos Chatzimanolis, Christos Zervos, George Kioseoglou, Emmanuel Kymakis, Emmanuel Stratakis
  • Opto-Electronic Science
  • 2022-04-21
  • Applications of optically and electrically driven nanoscale bowtie antennas
  • Zhongjun Jiang, Yingjian Liu, Liang Wang
  • Opto-Electronic Science
  • 2022-04-20
  • Validation of the bodily expressive action stimulus test among Chinese adults and children
  • Yunmei Yang, Wenwen Hou, Jing Li
  • PsyCh Journal
  • 2022-04-17

  • Grassland: A Rapid Algebraic Modeling System for Million-variable Optimization                                China's Technology Cooperation with Russia: Geopolitics, Economics, and Regime Security
    Copyright © PubCard