基于双向编码转换器和文本卷积神经网络的微博评论情感分类

doi:10.13306/j.1672-3813.2021.02.010

复杂系统与复杂性科学

2021, Vol. 18

Issue (2): 89-94 DOI: 10.13306/j.1672-3813.2021.02.010

本期目录 | 过刊浏览 | 高级检索

基于双向编码转换器和文本卷积神经网络的微博评论情感分类

徐凯旋^a, 李宪^b, 潘亚磊^a

青岛大学 a.复杂性科学研究所;b.未来研究院,山东青岛 266071

Weibo Comments Sentiment Classification Based on BERT and Text CNN

XU Kaixuan^a, LI Xian^b, PAN Yalei^a

a. Institute of Complexity Science; b. Institude For Future,Qingdao University, Qingdao 266071, China

摘要
参考文献
相关文章
Metrics

全文: PDF(1240 KB)
输出: BibTeX | EndNote (RIS)

摘要对微博多分句的评论,ELMo-Text CNN、GPT等模型不能准确提取文本上下文联系,导致分类效果不理想。为了解决此问题,采用BERT-Text CNN模型,利用BERT独特自注意力机制的双向编码转换器结构获得具有句子全局特征的字向量,将字向量输入到Text CNN中,利用Text CNN捕获局部特征的能力,最终提取语义、语序以及上下文联系等高阶特征,解决了模型不能准确获取文本上下文联系的问题,实现了高准确率的微博评论细粒度情感分类。同时为验证该模型的优势,与现有模型进行比较,在simplifyweibo_4_moods数据集上测试结果显示BERT-Text CNN模型在准确率、召回率以及F1指标方面均有提升。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	徐凯旋
	李宪
	潘亚磊

关键词 ：情感分类, 双向编码转换器, 文本卷积神经网络, 自注意力机制

Abstract：For comments with multiple sections within sentences, some state-of-art models, such as Embedding from Language Models-Text Convolutional Neural Network and Generative Pre-trained Transformer model, cannot accurately extract the meaning and therefore result in unsatisfactory performance. To solve this problem, we utilize Bidirectional Encoder Representations from Transformers-Text Convolutional Neural Network and Generative Pre-trained Transformer model. Using the bidirectional code converter structure of BERT′s unique self-attention mechanism, we can obtain the word vector of the global feature of the sentence, then we input the word vectors into Text CNN, then using Text CNN to capture local features,finally we extract high-level features, such as semantics and contextual connection. This process solved the problem of inaccurate contextual connection of the text obtained by the model, allowing us to realize the fine-grained sentiment classification of Weibo comments with high accuracy. Meanwhile, to verify the advantages of the model, we compared it with existing models. The test results on the simplifyweibo_4_moods dataset show that the BERT-Text CNN model has improved accuracy, recall, and F1 indicators.

Key words： sentiment classification bidirectional encode transformer Text CNN self-attention

收稿日期: 2020-11-02 出版日期: 2021-05-10

ZTFLH:

TP183

通讯作者: 李宪(1988-),男,山东济南人,助教,主要研究方向为复杂系统建模,优化控制,无人系统和数据挖掘。

作者简介: 徐凯旋(1995-) 男,山东临沂人,硕士研究生,主要研究方向为自然语言处理。

引用本文:

徐凯旋, 李宪, 潘亚磊. 基于双向编码转换器和文本卷积神经网络的微博评论情感分类[J]. 复杂系统与复杂性科学, 2021, 18(2): 89-94.
XU Kaixuan, LI Xian, PAN Yalei. Weibo Comments Sentiment Classification Based on BERT and Text CNN. Complex Systems and Complexity Science, 2021, 18(2): 89-94.

链接本文:

http://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2021.02.010 或 http://fzkx.qdu.edu.cn/CN/Y2021/V18/I2/89

[1]黄萱菁, 张奇. 文本情感倾向分析[J]. 中文信息学报, 2011, 25(6): 118-127.
Huang Xuanjing, Zhang Qi, Wu Yuanbin. A survey on sentiment analysis[J]. Journal of Chinese Information Processing, 2011, 25(6): 118-127.
[2]Kotsiantis S B, Zaharakis I, Pintelas P, et al. Supervised machine learning: a review of classification techniques[J]. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160(1): 3-24.
[3]Yao K, Zhang L, Du D, et al. Dual encoding for abstractive text summarization[J]. IEEE Transactions on Cybernetics, 2018, 50(3): 985-996.
[4]Church K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1): 155-162.
[5]Jaderberg M, Simonyan K, Vedald A, et al.Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20.
[6]Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP). Corpora: EMNLP,2014: 1532-1543.
[7]Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[DB/OL]. (2018-02-15)[2018-05-22]. https://arxiv.org/pdf/1802.05365.pdf.
[8]Radford A, Salimans T, Narasimhan K, et al. Improving language understanding by generative pre-training[DB/OL]. [2018-06-21]. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf
[9]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[DB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf.
[10] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understading[DB/OL]. (2018-10-10)[2019-05-24]. https://arxiv.org/pdf/1810.04805.pdf
[11] 胡益淮. 基于XLNET的抽取式多级语义融合模型[J]. 通信技术, 2020(7):1630-1635.
Hu Yihuai. Extraction multi-level semantic fusion model based on XLNET[J]. Communications Technology, 2020(7):1630-1635.
[12] 梁召. 基于PLSA的大数据文本情感分析及其应用[D]. 成都:电子科技大学, 2016.
[13] Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification[C]//China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 194-206.
[14] Shin H C, Roth H R, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning[J]. IEEE Transactions on Medical Imaging, 2016, 35(5): 1285-1298.

[1]	朱锋, 刘其朋. 基于生成对抗网络的半监督图像语义分割[J]. 复杂系统与复杂性科学, 2021, 18(1): 23-29.
[2]	郑振华, 刘其朋. 基于视觉特征提取的强化学习自动驾驶系统[J]. 复杂系统与复杂性科学, 2020, 17(4): 30-37.

Viewed

Full text

Abstract

Cited

Shared

Discussed