On the Theme of Network Science Conference via Text Mining
LI Xiaoke1, ZHAO Zijuan1, GUO Qiang1, LIU Jianguo2, LI Rende1
1. Complex Systems Science Research Center, University of Shanghai for Science and Technology, Shanghai 200093, China; 2. Fintech Institute, Shanghai University of Finance and Economics, Shanghai 200433, China
Abstract:The rapid development of network science has had a profound impact on the disciplines of physics, computers and management. However, the current development trend of the latest network science topics in China has been lack of intuitive analysis. This paper takes the summary of the 13th National Network Science Conference in 2017 as the research object, and analyzes the research trend of the most representative complex network conferences in network science from the perspective of topic extraction and clustering based on text mining. The research trend can reflect the latest research situation in the field of domestic network science to a certain extent. Firstly, the text information of the conference summary is preprocessed, and the text is jieba word segmentation through the self-built dictionary and the stop vocabulary. Then use the LDA topic model to identify the topic distribution of the abstract, and perform based on the JS distance between the abstracts to get 10 types of conference topics. This paper expands the application scope of the topic model in the research situation and research hotspots of academic conferences, enriches the ideas of academic conference topic mining and research hotspot analysis, and can provide reference for other academic conferences to quickly explore research situation. At the same time, it proposes a kind of combination of topic model and social network analysis, this paper explores the research methods of institutional associations, and uses the similarity of institutional research topics as reference indicators to provide reference for institutions to find suitable research cooperation units.
李小珂, 赵紫娟, 郭强, 刘建国, 李仁德. 基于文本挖掘的网络科学会议主题研究[J]. 复杂系统与复杂性科学, 2018, 15(3): 27-38.
LI Xiaoke, ZHAO Zijuan, GUO Qiang, LIU Jianguo, LI Rende. On the Theme of Network Science Conference via Text Mining. Complex Systems and Complexity Science, 2018, 15(3): 27-38.
[1]赵琦,张智雄,孙坦,等. 主题发现技术方法研究[J]. 情报理论与实践, 2009 (4): 104-108. Zhao Qi, Zhang Zhixiong, Sun Tan, et al. Research on technology methods for topic discovery [J]. Information Studies: Theory & Application, 2009 (4): 104-108. [2]Xu G, Qiu L, Liu H. Study on hot topic discovery from Chinese texts[J]. JDIM, 2014, 12(4): 267-273. [3]程志,黄荣怀.文本挖掘及其教育应用[J]. 现代远距离教育, 2008 (2): 71-73. Cheng Zhi,Huang Ronghuai. Text mining and its application in education[J]. Modern Distance Education, 2008 (2): 71-73. [4]AlSumait L, Barbará D, Domeniconi C. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking[C]∥Data Mining, ICDM'08. 2008 Eighth IEEE International Conference on. IEEE, 2008: 3-12. [5]刘洪涛,肖开洲,吴渝,等.带舆论评价的引文网络构建与主题发现[J]. 情报学报, 2011, 30(4): 441-448. Liu Hongtao, Xiao Kaizhou, Wu Yu, et al. Construction of citation network with public opinion assessment and topic extraction of such network[J]. Journal of The China Society for Scientific and Technical Information, 2011, 30(4): 441-448. [6]王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77. Wang Ping. The discovery and evolution of scientific and technological literature subjects based on hierarchical probability topic model[J]. Library and Information Service, 2014, 58(22): 70-77. [7]王连喜,曹树金. 学科交叉视角下的网络舆情研究主题比较分析——以国内图书情报学和新闻传播学为例[J]. 情报学报, 2017, 36(2): 159-169. Wang Lianxi, Cao Shujin. Comparative analysis of the subjects of internet public opinion in the perspective of interdisciplinary perspective—taking domestic library and information science and news communication as examples[J]. Journal of The China Society ForScientific and Technical Information, 2017, 36(2): 159-169. [8]王小华,徐宁,谌志群. 基于共词分析的文本主题词聚类与主题发现[J]. 情报科学, 2011, 29(11): 1621-1624. Wang Xiaohua,Xu Ning, Zhai Zhiqun. Text topic clustering and topic discovery based on co-word analysis[J]. Information Science, 2011, 29(11): 1621-1624. [9]Vaca CK, Mantrach A, Jaimes A, et al. A time-based collective factorization for topic discovery and monitoring in news[C]∥Proceedings of the 23rd international conference on World wide web. ACM, 2014: 527-538. [10] Zhang Z, Li Q. QuestionHolic: Hot topic discovery and trend analysis in community question answering systems[J]. Expert Systems with Applications, 2011, 38(6): 6848-6855. [11] Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast[J]. decision support systems, 2010, 48(2): 354-368. [12] Chen Y,Amiri H,Li Z,et al. Emerging topic detection for organizations from microblogs[C]∥Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2013: 43-52. [13] 路荣,项亮,刘明荣,等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能, 2012(3): 382-387. Lu Rong, Xiang Liang, Liu Mingrong, et al. Discovery of news topics in microblog based on implicit topic analysis and text clustering[J]. Pattern Recognition and Artificial Intelligence, 2012(3): 382-387. [14] 罗春海,刘红丽,胡海波.微博网络中用户主题兴趣相关性及主题信息扩散研究[J]. 电子科技大学学报, 2017, 46(2): 458-468. Luo Chunhai, Liu Hongli, Hu Haibo. Research on user topic interest correlation and theme information diffusion in weibo network[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2): 458-468. [15] Cataldi M,Di Caro L,Schifanella C. Emerging topic detection on twitter based on temporal and social terms evaluation[C]∥Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, 2010,4:1-10. [16] 陈友,程学旗,杨森.面向网络论坛的高质量主题发现[J]. 软件学报, 2011, 22(8): 1785-1804. Chen You, Cheng Xueqi, Yang Sen. High quality theme discovery for web forums[J]. Journal of Software, 2011, 22(8): 1785-1804. [17] Belter C W. A bibliometric analysis of NOAA’s office of ocean exploration and research[J].Scientometrics,2013,95(2): 629-644. [18] Hoz-Correa A D L, Muoz-LeivaF, BakuczM.Past themes and future trends in medical tourism research: A co-word analysis [J].Tourism Management,2018,65:200-211. [19] 贾会玲,吴晟,李英娜,等. 基于PLSA模型的观点句聚类算法研究[J].价值工程, 2015, 34(31):167-169. Jia Huiling, Wu Wei, Li Yingna, et al. Research on perspective sentence clustering algorithm based on PLSA Model[J]. Value Engineering, 2015, 34(31): 167-169. [20] LandauerTK, DumaisST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge[J]. Psychological Review, 1997, 104(2):211-240. [21] Hofmann T. Probabilistic latent semantic indexing[C]∥International ACM SIGIR conference on research and development in information retrieval. ACM, 1999:50-57. [22] Hofmann T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine learning, 2001, 42(1): 177-196. [23] Hofmann T. Latent semantic models for collaborative filtering[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1): 89-115. [24] Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation[C]∥Proceedings of the third ACM conference on Recommender systems. ACM, 2009: 61-68. [25] Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[J].Journal of Machine Learning Research,2003:993-1022. [26] Agarwal D,Chen B C.FLDA: Matrix factorization through latent dirichlet allocation[C]∥International Conference on Web Search and Web Data Mining,WSDM. New York, USA, 2010. [27] 张培晶,宋蕾. 基于LDA的微博文本主题建模方法研究述评[J].图书情报工作,2012,56( 24) : 2-5. Zhang Peijing, Song Lei. A review of research on microblogging topic modeling methods based on LDA[J].Library and Information Service,2012,56(24): 2-5. [28] Wang Y,Bai H,Stanton M,et al.PLDA: parallel latent dirichlet allocation for large-scale[C]∥Algorithmic Aspects in Information and Management: 5th International Conference, AAIM 2009.San Francisco.CA,USA,Proceedings.Springer Berlin Heidelberg,2009:301-314. [29] Barbieri N,Manco G,Ritacco E,et al.Probabilistic topic models for sequence data[J].Machine Learning,2013,93(1): 5-29. [30] 胡吉明,陈果.基于动态 LDA主题模型的内容主题挖掘与演化[J].图书情报工作2014(2): 138-142. Hu Jiming, Chen Guo. Content Theme Mining and Evolution Based on Dynamic LDA Topic Model[J].Library and Information Service,2014(2): 138-142. [31] Riahi F, Zolaktaf Z, Shafiei M, et al.Finding expert users in community question answering [C]∥Proceedings of the 21st International Conference on World Wide Web.2012:791-798. [32] PavlinekM,Podgorelec V.Text classification method based on self-training and LDA topic models[J].Expert Systems with Applications, 2017, 80(1): 83-93. [33] Griffiths T L, Steyvers M. Finding scientific topics[J]. Proceedings of the National academy of Sciences, 2004, 101: 5228-5235. [34] 曹娟,张勇东,李锦涛,等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10):1780-1787. Cao Juan, Zhang Yongdong, LI Jintao, et al. A density-based adaptive optimal LDA model selection method[J]. Chinese Journal of Computers,2008, 31(10): 1780-1787. [35] Majtey A P, Lamberti P W, Prato D P. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states[J]. Physical Review A, 2005, 72(5):762-776. [36] Johnson S C. Hierarchical clustering schemes[J]. Psychometrika, 1967, 32(3): 241-254. [37] Dunn J C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3(3):32-57. [38] Sun Junyi.Chinese words segementation utilities[DB/OL]. [2018-07-30]. https:∥pypi.python.org/pypi/jieba/. [39] 徐戈,王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8):1423-1436. Xu Ge, Wang Houfeng. Development of topic models in natural language processing[J]. Chinese Journal of Computers,2011, 34(8): 1423-1436. [40] 王鹏,高铖,陈晓美. 基于LDA模型的文本聚类研究[J]. 情报科学, 2015(1):63-68. Wang Peng, Gao Wei, Chen Xiaomei. Text clustering based on LDA model[J]. Information Science, 2015(1): 63-68. [41] 汪小帆,李翔,陈关荣.网络科学导论[M].北京:高等教育出版社, 2012:44-46. [42] 刘建国,任卓明,郭强,等. 网络科学中节点重要性排序的研究进展[J]. 物理学报, 2013,62(17):1-10. Liu Jianguo, Ren Zhuoming,Guo Qiang, et al. Research progress on order importance of nodes in network science[J]. Acta Physica Sinica, 2013, 62(17):1-10.