Abstract:For comments with multiple sections within sentences, some state-of-art models, such as Embedding from Language Models-Text Convolutional Neural Network and Generative Pre-trained Transformer model, cannot accurately extract the meaning and therefore result in unsatisfactory performance. To solve this problem, we utilize Bidirectional Encoder Representations from Transformers-Text Convolutional Neural Network and Generative Pre-trained Transformer model. Using the bidirectional code converter structure of BERT′s unique self-attention mechanism, we can obtain the word vector of the global feature of the sentence, then we input the word vectors into Text CNN, then using Text CNN to capture local features,finally we extract high-level features, such as semantics and contextual connection. This process solved the problem of inaccurate contextual connection of the text obtained by the model, allowing us to realize the fine-grained sentiment classification of Weibo comments with high accuracy. Meanwhile, to verify the advantages of the model, we compared it with existing models. The test results on the simplifyweibo_4_moods dataset show that the BERT-Text CNN model has improved accuracy, recall, and F1 indicators.
徐凯旋, 李宪, 潘亚磊. 基于双向编码转换器和文本卷积神经网络的微博评论情感分类[J]. 复杂系统与复杂性科学, 2021, 18(2): 89-94.
XU Kaixuan, LI Xian, PAN Yalei. Weibo Comments Sentiment Classification Based on BERT and Text CNN. Complex Systems and Complexity Science, 2021, 18(2): 89-94.
[1]黄萱菁, 张奇. 文本情感倾向分析[J]. 中文信息学报, 2011, 25(6): 118-127. Huang Xuanjing, Zhang Qi, Wu Yuanbin. A survey on sentiment analysis[J]. Journal of Chinese Information Processing, 2011, 25(6): 118-127. [2]Kotsiantis S B, Zaharakis I, Pintelas P, et al. Supervised machine learning: a review of classification techniques[J]. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160(1): 3-24. [3]Yao K, Zhang L, Du D, et al. Dual encoding for abstractive text summarization[J]. IEEE Transactions on Cybernetics, 2018, 50(3): 985-996. [4]Church K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1): 155-162. [5]Jaderberg M, Simonyan K, Vedald A, et al.Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20. [6]Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP). Corpora: EMNLP,2014: 1532-1543. [7]Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[DB/OL]. (2018-02-15)[2018-05-22]. https://arxiv.org/pdf/1802.05365.pdf. [8]Radford A, Salimans T, Narasimhan K, et al. Improving language understanding by generative pre-training[DB/OL]. [2018-06-21]. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf [9]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[DB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf. [10] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understading[DB/OL]. (2018-10-10)[2019-05-24]. https://arxiv.org/pdf/1810.04805.pdf [11] 胡益淮. 基于XLNET的抽取式多级语义融合模型[J]. 通信技术, 2020(7):1630-1635. Hu Yihuai. Extraction multi-level semantic fusion model based on XLNET[J]. Communications Technology, 2020(7):1630-1635. [12] 梁召. 基于PLSA的大数据文本情感分析及其应用[D]. 成都:电子科技大学, 2016. [13] Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification[C]//China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 194-206. [14] Shin H C, Roth H R, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning[J]. IEEE Transactions on Medical Imaging, 2016, 35(5): 1285-1298.