|
|
|
| Weibo Comments Sentiment Classification Based on BERT and Text CNN |
| XU Kaixuana, LI Xianb, PAN Yaleia
|
| a. Institute of Complexity Science; b. Institude For Future,Qingdao University, Qingdao 266071, China |
|
|
|
|
Abstract For comments with multiple sections within sentences, some state-of-art models, such as Embedding from Language Models-Text Convolutional Neural Network and Generative Pre-trained Transformer model, cannot accurately extract the meaning and therefore result in unsatisfactory performance. To solve this problem, we utilize Bidirectional Encoder Representations from Transformers-Text Convolutional Neural Network and Generative Pre-trained Transformer model. Using the bidirectional code converter structure of BERT′s unique self-attention mechanism, we can obtain the word vector of the global feature of the sentence, then we input the word vectors into Text CNN, then using Text CNN to capture local features,finally we extract high-level features, such as semantics and contextual connection. This process solved the problem of inaccurate contextual connection of the text obtained by the model, allowing us to realize the fine-grained sentiment classification of Weibo comments with high accuracy. Meanwhile, to verify the advantages of the model, we compared it with existing models. The test results on the simplifyweibo_4_moods dataset show that the BERT-Text CNN model has improved accuracy, recall, and F1 indicators.
|
|
Received: 02 November 2020
Published: 10 May 2021
|
|
|
|
|
|
[1]黄萱菁, 张奇. 文本情感倾向分析[J]. 中文信息学报, 2011, 25(6): 118-127. Huang Xuanjing, Zhang Qi, Wu Yuanbin. A survey on sentiment analysis[J]. Journal of Chinese Information Processing, 2011, 25(6): 118-127. [2]Kotsiantis S B, Zaharakis I, Pintelas P, et al. Supervised machine learning: a review of classification techniques[J]. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160(1): 3-24. [3]Yao K, Zhang L, Du D, et al. Dual encoding for abstractive text summarization[J]. IEEE Transactions on Cybernetics, 2018, 50(3): 985-996. [4]Church K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1): 155-162. [5]Jaderberg M, Simonyan K, Vedald A, et al.Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20. [6]Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP). Corpora: EMNLP,2014: 1532-1543. [7]Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[DB/OL]. (2018-02-15)[2018-05-22]. https://arxiv.org/pdf/1802.05365.pdf. [8]Radford A, Salimans T, Narasimhan K, et al. Improving language understanding by generative pre-training[DB/OL]. [2018-06-21]. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf [9]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[DB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf. [10] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understading[DB/OL]. (2018-10-10)[2019-05-24]. https://arxiv.org/pdf/1810.04805.pdf [11] 胡益淮. 基于XLNET的抽取式多级语义融合模型[J]. 通信技术, 2020(7):1630-1635. Hu Yihuai. Extraction multi-level semantic fusion model based on XLNET[J]. Communications Technology, 2020(7):1630-1635. [12] 梁召. 基于PLSA的大数据文本情感分析及其应用[D]. 成都:电子科技大学, 2016. [13] Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification[C]//China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 194-206. [14] Shin H C, Roth H R, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning[J]. IEEE Transactions on Medical Imaging, 2016, 35(5): 1285-1298. |
|
|
|