Please wait a minute...
文章检索
复杂系统与复杂性科学  2023, Vol. 20 Issue (1): 74-80    DOI: 10.13306/j.1672-3813.2023.01.010
  本期目录 | 过刊浏览 | 高级检索 |
基于关键词共现网络的主题词提取算法
张书谙1, 王曦2, 代继鹏1, 隋毅1, 孙仁诚1
1.青岛大学计算机科学技术学院,山东 青岛 266071;
2.青岛市急救中心通讯调度科,山东 青岛 266035
Subject Words Extraction Algorithm Based on Keyword Co-occurrence Network
ZHANG Shu’an1, WANG Xi2, DAI Jipeng1, SUI Yi1, SUN Rencheng1
1. School of Computer Science and Technology, QingDao University, Qingdao 266071, China;
2. Communication Dispatching Department, Qingdao Emergency Center,Qingdao 266035, China
全文: PDF(2001 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 针对主题词提取中关键词提取不准确以及仅考虑单一相关性的问题,提出一种将集成思想与复杂网络相结合的主题词提取算法。首先通过集成算法提取话题数据的关键词,以提高关键词提取的准确性,其次改进传统词共现公式计算关键词的共现度,并建立关键词共现网络,在网络的基础上得到最优连通子图,同时以节点度中心性为权重衡量关键词重要性并从中映射出主题词。最后,使用微博话题数据集进行实例验证,证明该算法是有效的,并优于传统的词共现算法,并在青岛社区话题数据集中进行应用。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张书谙
王曦
代继鹏
隋毅
孙仁诚
关键词 共现度共现网络主题词微博话题    
Abstract:Aiming at the problems of inaccurate keywords extraction and only considering single correlation in subject words extraction, a subject words extraction algorithm combining integration idea with complex network is proposed. Firstly, the keywords of topic data are extracted through the integration algorithm to improve the accuracy of keywords extraction. Secondly, the traditional word co-occurrence formula is improved to calculate the co-occurrence degree of keywords, and a keywords co-occurrence network is established. Based on the network, the optimal connected subgraph is obtained. At the same time, the importance of keywords is measured by taking the centrality of node degree as the weight, and the subject words are mapped. Finally, the micro-blog topic data set is used to verify the example, which proves that the algorithm is effective and better than the traditional word co-occurrence algorithm, and it is applied in the Qingdao community topic data set.
Key wordsco-occurrence degree    co-occurrence network    subject words    micro-blog topic
收稿日期: 2021-09-08      出版日期: 2023-04-19
ZTFLH:  TP391.1  
基金资助:国家自然科学基金青年科学基金(41706198)
通讯作者: 孙仁诚(1977),男,山东青岛人,博士,教授,主要研究方向为基于复杂网络的大数据分析。   
作者简介: 张书谙(1998),女,山东泰安人,硕士研究生,主要研究方向为自然语言处理,复杂网络大数据分析。
引用本文:   
张书谙, 王曦, 代继鹏, 隋毅, 孙仁诚. 基于关键词共现网络的主题词提取算法[J]. 复杂系统与复杂性科学, 2023, 20(1): 74-80.
ZHANG Shu’an, WANG Xi, DAI Jipeng, SUI Yi, SUN Rencheng. Subject Words Extraction Algorithm Based on Keyword Co-occurrence Network. Complex Systems and Complexity Science, 2023, 20(1): 74-80.
链接本文:  
https://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2023.01.010      或      https://fzkx.qdu.edu.cn/CN/Y2023/V20/I1/74
[1] 程肖. 网络舆情热点主题词提取研究[D]. 杭州:杭州电子科技大学,2010.
CHENG X. Research on extraction of hot topic words of network public opinion[D]. Hangzhou: Hangzhou Dianzi University: 2010.
[2] WITTEN I H, PAYNTER G W, FRANK E, et al. KEA: practical automatic keyphrase extraction[C]//Proceedings of the 4th ACM Conference on Digital Libraries. New York : ACM Press, 1999: 254255.
[3] 赵英环,郭贵锁. 基于主题词迭代提取的信息检索算法[J]. 华南理工大学学报(自然科学版), 2004, 32(S1): 7780.
ZHAO Y H, GUO G S. Information retrieval algorithm based on subject word iterative extraction[J]. Journal of South China University of Technology (Natural Science), 2004, 32(S1): 7780.
[4] 唐培丽,王树明,胡明. 基于语义的汉语文献主题词提取算法研究[J]. 吉林大学学报,2005, 23(5): 535540.
TANG P L, WANG S M, HU M. Research on semantic based Chinese literature subject word extraction algorithm[J]. Journal of Jilin University, 2005, 23(5): 535540.
[5] 程涛,施水才,王霞,等. 基于同义词词林的中文文本主题词提取[J]. 广西师范大学学报(自然科学版), 2007, 25(2): 145148.
CHENG T, SHI S C, WANG X, et al. Extraction of Chinese text subject words based on synonym forest[J]. Journal of Guangxi Normal University (Natural Science), 2007, 25(2): 145148.
[6] 李芳芳,葛斌,毛星亮,等. 基于语义关联的中文网页主题词提取方法研究[J]. 计算机应用研究, 2011, 28(1): 105107.
LI F F, GE B, MAO X L, et al. Research on extraction method of Chinese web page main inscription based on semantic Correlation[J]. Computer Application Research, 2011, 28(1): 105107.
[7] 王立霞. 基于语义的中文文本关键词提取算法[J]. 计算机工程, 2012, 38(1): 14.
WANG L X. Semantic based keyword extraction algorithm for Chinese text[J]. Computer Engineering, 2012, 38(1): 14.
[8] 赵鹏,蔡庆生,王清毅.一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能,2007, 20(6): 817831.
ZHAO P, CAI Q S, WANG Q Y. A Chinese document keyword extraction algorithm based on complex network features[J]. Pattern recognition and artificial intelligence, 2007, 20(6): 817831.
[9] 刘通. 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016, 33(2): 365369.
LIU T. Research on text keyword extraction algorithm based on complex network[J]. Computer Application Research, 2016, 33(2): 365369.
[10] 叶成绪,杨萍,刘少鹏. 基于主题词的微博热点话题发现[J]. 计算机应用与软件,2016, 33(2): 4650.
YE C X, YANG P, LIU S P. Micro-blog hot topic discovery based on subject words[J]. Computer Applications and Software, 2016, 36(2): 6771.
[11] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(4/5): 9931022.
[12] 张晨逸,孙建伶,丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展,2011, 48(10): 17951802.
ZHANG C Y, SUN J L, DING Y Q. Micro-blog topic mining based on MB-LDA model[J]. Computer Research And Development, 2011, 48(10): 17951802.
[13] 李继云,黄昀,陈捷. CGRMB_LDA: 面向隐式微博的主题挖掘[J]. 计算机应用,2016, 36(S1): 6771.
LI J Y, HUANG J, CHEN J. CGRMB_LDA: topic mining for implicit micro-blog[J]. Computer application, 2016, 36(S1): 6771.
[14] 冯勇,屈渤浩,徐红艳,等. 采用可变时间窗口的TIF-LDA微博主题模型[J].小型微型计算机系统,2018, 39(9): 20672071.
FENG Y, QU B H, XU H Y, et al. TIF-LDA micro-blog theme model with variable time window is adopted[J]. Small Microcomputer System, 2018, 39(9): 20672071.
[15] 张孝飞,陈航行. 基于语义概念和词共现的微博主题词提取研究[J]. 情报科学,2021, 39(1): 142147.
ZHANG X F, CHEN H X. Research on micro-blog subject word extraction based on semantic concept and word co-occurrence[J]. Information science, 2021, 39(1): 142147.
[16] MIHALCEA R, TARAU P. TextRank: bringing order into texts[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg : ACL, 2004: 404411.
[17] LI J Z, FAN Q N, ZHANG K. Keyword extraction based on tf/idf for Chinese news document[J]. Wuhan University Journal of Natural Sciences, 2007, 12(5): 917921.
[18] FAN H L, QIN Y B. Research on text classification based on improved TF-IDF algorithm[C]//2018 International Conferenceon Network, Communication, Computer Engineering(NCCE2018). Chongqing: Atlantis Press, 2018: 516521.
[19] 覃悦. 基于中心性的算法在复杂网络分析中的应用及对比研究[D]. 天津: 天津财经大学, 2020.
TAN Y. Application and comparative study of centrality based algorithms in complex network analysis[D]. Tianjin: Tianjin University of Finance and Economics, 2020.
[1] 于同洋, 肖人彬, 侯俊东. 网络舆情结构逆转建模与仿真:基于改进Deffuant模型[J]. 复杂系统与复杂性科学, 2019, 16(3): 30-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed