Please wait a minute...
文章检索
复杂系统与复杂性科学  2018, Vol. 15 Issue (3): 27-38    DOI: 10.13306/j.1672-3813.2018.03.004
  本期目录 | 过刊浏览 | 高级检索 |
基于文本挖掘的网络科学会议主题研究
李小珂1, 赵紫娟1, 郭强1, 刘建国2, 李仁德1
1.上海理工大学复杂系统研究中心,上海 200093;
2.上海财经大学金融科技研究院,上海 200433
On the Theme of Network Science Conference via Text Mining
LI Xiaoke1, ZHAO Zijuan1, GUO Qiang1, LIU Jianguo2, LI Rende1
1. Complex Systems Science Research Center, University of Shanghai for Science and Technology, Shanghai 200093, China;
2. Fintech Institute, Shanghai University of Finance and Economics, Shanghai 200433, China
全文: PDF(2222 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 网络科学研究发展迅猛,已经对物理,计算机和管理等学科产生了深远的影响。然而,中国国内目前最新的网络科学主题发展态势一直缺乏直观的分析。以2017年第十三届全国复杂网络大会的会议摘要为研究对象,从基于文本挖掘的主题提取与聚类的角度分析了网络科学最具代表性的复杂网络会议的研究趋势,该会议的研究趋势一定程度上可以反映出国内网络科学领域最新的研究态势。首先对会议摘要的文本信息进行预处理,通过自建词典和停用词库对文本进行jieba分词。然后使用LDA主题模型对摘要的主题分布进行识别,基于摘要间的JS距离进行凝聚层次聚类,得到10类会议主题。研究拓展了主题模型在学术会议研究态势与研究热点挖掘上的应用范围,丰富了学术会议主题挖掘与研究热点分析的思路,能为其他学术会议快速挖掘研究态势提供借鉴;同时提出一种基于主题模型和社交网络分析相结合来挖掘机构关联关系的研究方法,以机构研究主题相似度为参考指标,为机构寻找合适的科研合作单位提供参考建议。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李小珂
赵紫娟
郭强
刘建国
李仁德
关键词 主题建模文本挖掘凝聚层次聚类网络分析    
Abstract:The rapid development of network science has had a profound impact on the disciplines of physics, computers and management. However, the current development trend of the latest network science topics in China has been lack of intuitive analysis. This paper takes the summary of the 13th National Network Science Conference in 2017 as the research object, and analyzes the research trend of the most representative complex network conferences in network science from the perspective of topic extraction and clustering based on text mining. The research trend can reflect the latest research situation in the field of domestic network science to a certain extent. Firstly, the text information of the conference summary is preprocessed, and the text is jieba word segmentation through the self-built dictionary and the stop vocabulary. Then use the LDA topic model to identify the topic distribution of the abstract, and perform based on the JS distance between the abstracts to get 10 types of conference topics. This paper expands the application scope of the topic model in the research situation and research hotspots of academic conferences, enriches the ideas of academic conference topic mining and research hotspot analysis, and can provide reference for other academic conferences to quickly explore research situation. At the same time, it proposes a kind of combination of topic model and social network analysis, this paper explores the research methods of institutional associations, and uses the similarity of institutional research topics as reference indicators to provide reference for institutions to find suitable research cooperation units.
Key wordstheme modeling    text mining    condensed hierarchical clustering    network analysis
收稿日期: 2018-06-28      出版日期: 2019-01-31
ZTFLH:  TB3  
基金资助:国家自然科学基金(61773248,71771152)
通讯作者: 刘建国(1979-),男,山西临汾人,博士,教授,主要研究方向为网络科学、商务智能、科学知识图谱分析。   
作者简介: 李小珂(1995-),女,硕士研究生,主要研究方向为文本分析、网络科学。
引用本文:   
李小珂, 赵紫娟, 郭强, 刘建国, 李仁德. 基于文本挖掘的网络科学会议主题研究[J]. 复杂系统与复杂性科学, 2018, 15(3): 27-38.
LI Xiaoke, ZHAO Zijuan, GUO Qiang, LIU Jianguo, LI Rende. On the Theme of Network Science Conference via Text Mining. Complex Systems and Complexity Science, 2018, 15(3): 27-38.
链接本文:  
http://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2018.03.004      或      http://fzkx.qdu.edu.cn/CN/Y2018/V15/I3/27
[1]赵琦,张智雄,孙坦,等. 主题发现技术方法研究[J]. 情报理论与实践, 2009 (4): 104-108.
Zhao Qi, Zhang Zhixiong, Sun Tan, et al. Research on technology methods for topic discovery [J]. Information Studies: Theory & Application, 2009 (4): 104-108.
[2]Xu G, Qiu L, Liu H. Study on hot topic discovery from Chinese texts[J]. JDIM, 2014, 12(4): 267-273.
[3]程志,黄荣怀.文本挖掘及其教育应用[J]. 现代远距离教育, 2008 (2): 71-73.
Cheng Zhi,Huang Ronghuai. Text mining and its application in education[J]. Modern Distance Education, 2008 (2): 71-73.
[4]AlSumait L, Barbará D, Domeniconi C. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking[C]∥Data Mining, ICDM'08. 2008 Eighth IEEE International Conference on. IEEE, 2008: 3-12.
[5]刘洪涛,肖开洲,吴渝,等.带舆论评价的引文网络构建与主题发现[J]. 情报学报, 2011, 30(4): 441-448.
Liu Hongtao, Xiao Kaizhou, Wu Yu, et al. Construction of citation network with public opinion assessment and topic extraction of such network[J]. Journal of The China Society for Scientific and Technical Information, 2011, 30(4): 441-448.
[6]王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77.
Wang Ping. The discovery and evolution of scientific and technological literature subjects based on hierarchical probability topic model[J]. Library and Information Service, 2014, 58(22): 70-77.
[7]王连喜,曹树金. 学科交叉视角下的网络舆情研究主题比较分析——以国内图书情报学和新闻传播学为例[J]. 情报学报, 2017, 36(2): 159-169.
Wang Lianxi, Cao Shujin. Comparative analysis of the subjects of internet public opinion in the perspective of interdisciplinary perspective—taking domestic library and information science and news communication as examples[J]. Journal of The China Society ForScientific and Technical Information, 2017, 36(2): 159-169.
[8]王小华,徐宁,谌志群. 基于共词分析的文本主题词聚类与主题发现[J]. 情报科学, 2011, 29(11): 1621-1624.
Wang Xiaohua,Xu Ning, Zhai Zhiqun. Text topic clustering and topic discovery based on co-word analysis[J]. Information Science, 2011, 29(11): 1621-1624.
[9]Vaca CK, Mantrach A, Jaimes A, et al. A time-based collective factorization for topic discovery and monitoring in news[C]∥Proceedings of the 23rd international conference on World wide web. ACM, 2014: 527-538.
[10] Zhang Z, Li Q. QuestionHolic: Hot topic discovery and trend analysis in community question answering systems[J]. Expert Systems with Applications, 2011, 38(6): 6848-6855.
[11] Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast[J]. decision support systems, 2010, 48(2): 354-368.
[12] Chen Y,Amiri H,Li Z,et al. Emerging topic detection for organizations from microblogs[C]∥Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2013: 43-52.
[13] 路荣,项亮,刘明荣,等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能, 2012(3): 382-387.
Lu Rong, Xiang Liang, Liu Mingrong, et al. Discovery of news topics in microblog based on implicit topic analysis and text clustering[J]. Pattern Recognition and Artificial Intelligence, 2012(3): 382-387.
[14] 罗春海,刘红丽,胡海波.微博网络中用户主题兴趣相关性及主题信息扩散研究[J]. 电子科技大学学报, 2017, 46(2): 458-468.
Luo Chunhai, Liu Hongli, Hu Haibo. Research on user topic interest correlation and theme information diffusion in weibo network[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2): 458-468.
[15] Cataldi M,Di Caro L,Schifanella C. Emerging topic detection on twitter based on temporal and social terms evaluation[C]∥Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, 2010,4:1-10.
[16] 陈友,程学旗,杨森.面向网络论坛的高质量主题发现[J]. 软件学报, 2011, 22(8): 1785-1804.
Chen You, Cheng Xueqi, Yang Sen. High quality theme discovery for web forums[J]. Journal of Software, 2011, 22(8): 1785-1804.
[17] Belter C W. A bibliometric analysis of NOAA’s office of ocean exploration and research[J].Scientometrics,2013,95(2): 629-644.
[18] Hoz-Correa A D L, Muoz-LeivaF, BakuczM.Past themes and future trends in medical tourism research: A co-word analysis [J].Tourism Management,2018,65:200-211.
[19] 贾会玲,吴晟,李英娜,等. 基于PLSA模型的观点句聚类算法研究[J].价值工程, 2015, 34(31):167-169.
Jia Huiling, Wu Wei, Li Yingna, et al. Research on perspective sentence clustering algorithm based on PLSA Model[J]. Value Engineering, 2015, 34(31): 167-169.
[20] LandauerTK, DumaisST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge[J]. Psychological Review, 1997, 104(2):211-240.
[21] Hofmann T. Probabilistic latent semantic indexing[C]∥International ACM SIGIR conference on research and development in information retrieval. ACM, 1999:50-57.
[22] Hofmann T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine learning, 2001, 42(1): 177-196.
[23] Hofmann T. Latent semantic models for collaborative filtering[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1): 89-115.
[24] Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation[C]∥Proceedings of the third ACM conference on Recommender systems. ACM, 2009: 61-68.
[25] Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[J].Journal of Machine Learning Research,2003:993-1022.
[26] Agarwal D,Chen B C.FLDA: Matrix factorization through latent dirichlet allocation[C]∥International Conference on Web Search and Web Data Mining,WSDM. New York, USA, 2010.
[27] 张培晶,宋蕾. 基于LDA的微博文本主题建模方法研究述评[J].图书情报工作,2012,56( 24) : 2-5.
Zhang Peijing, Song Lei. A review of research on microblogging topic modeling methods based on LDA[J].Library and Information Service,2012,56(24): 2-5.
[28] Wang Y,Bai H,Stanton M,et al.PLDA: parallel latent dirichlet allocation for large-scale[C]∥Algorithmic Aspects in Information and Management: 5th International Conference, AAIM 2009.San Francisco.CA,USA,Proceedings.Springer Berlin Heidelberg,2009:301-314.
[29] Barbieri N,Manco G,Ritacco E,et al.Probabilistic topic models for sequence data[J].Machine Learning,2013,93(1): 5-29.
[30] 胡吉明,陈果.基于动态 LDA主题模型的内容主题挖掘与演化[J].图书情报工作2014(2): 138-142.
Hu Jiming, Chen Guo. Content Theme Mining and Evolution Based on Dynamic LDA Topic Model[J].Library and Information Service,2014(2): 138-142.
[31] Riahi F, Zolaktaf Z, Shafiei M, et al.Finding expert users in community question answering [C]∥Proceedings of the 21st International Conference on World Wide Web.2012:791-798.
[32] PavlinekM,Podgorelec V.Text classification method based on self-training and LDA topic models[J].Expert Systems with Applications, 2017, 80(1): 83-93.
[33] Griffiths T L, Steyvers M. Finding scientific topics[J]. Proceedings of the National academy of Sciences, 2004, 101: 5228-5235.
[34] 曹娟,张勇东,李锦涛,等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10):1780-1787.
Cao Juan, Zhang Yongdong, LI Jintao, et al. A density-based adaptive optimal LDA model selection method[J]. Chinese Journal of Computers,2008, 31(10): 1780-1787.
[35] Majtey A P, Lamberti P W, Prato D P. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states[J]. Physical Review A, 2005, 72(5):762-776.
[36] Johnson S C. Hierarchical clustering schemes[J]. Psychometrika, 1967, 32(3): 241-254.
[37] Dunn J C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3(3):32-57.
[38] Sun Junyi.Chinese words segementation utilities[DB/OL]. [2018-07-30]. https:∥pypi.python.org/pypi/jieba/.
[39] 徐戈,王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8):1423-1436.
Xu Ge, Wang Houfeng. Development of topic models in natural language processing[J]. Chinese Journal of Computers,2011, 34(8): 1423-1436.
[40] 王鹏,高铖,陈晓美. 基于LDA模型的文本聚类研究[J]. 情报科学, 2015(1):63-68.
Wang Peng, Gao Wei, Chen Xiaomei. Text clustering based on LDA model[J]. Information Science, 2015(1): 63-68.
[41] 汪小帆,李翔,陈关荣.网络科学导论[M].北京:高等教育出版社, 2012:44-46.
[42] 刘建国,任卓明,郭强,等. 网络科学中节点重要性排序的研究进展[J]. 物理学报, 2013,62(17):1-10.
Liu Jianguo, Ren Zhuoming,Guo Qiang, et al. Research progress on order importance of nodes in network science[J]. Acta Physica Sinica, 2013, 62(17):1-10.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed