Research on User Traits Predicting Based on LDA Topic Model
WANG Yajing1, GUO Qiang1, DENG Chunyan1, LIN Qingxuan1, LIU Jianguo2,3
1. Research Center for Complex Systems Science, University of Shanghai for Science & Technology, Shanghai 200093, China; 2. Institute of Accounting and Finance,Shanghai University of Finance and Economics, Shanghai 200433, China; 3. Institute of Sina WRD Big Data, Shanghai 210204, China
Abstract:User traits can be effectively predicted by singular value decomposition and Logistic Regression through online user’s ‘Like’ information. However, this method cannot predict new users’ traits. To slove the problem, this paper proposes an online user traits predicting method based on LDA topic model. Firstly, the method extracted the Weibo user’s ‘Like’ text topic through LDA model. Then it predicted new user traits based on topic. Finally, the result is compared to the traditional method based on singular value decomposition. The results showed that the F1 value of this method was up to 0.15, and the calculation time was shortened by 69.09% in average. Research inproves the defect that the inherent tags of the ‘Like’ informations cannot accurately reflect user preference, avoiding the disadvantage of recalculating new users and their ‘like’information in the predicting process of traditional methods, providing another feasible way for user traits analysis.
王雅静, 郭强, 邓春燕, 林青轩, 刘建国. 基于LDA主题模型的用户特征预测研究[J]. 复杂系统与复杂性科学, 2020, 17(4): 9-15.
WANG Yajing, GUO Qiang, DENG Chunyan, LIN Qingxuan, LIU Jianguo. Research on User Traits Predicting Based on LDA Topic Model. Complex Systems and Complexity Science, 2020, 17(4): 9-15.
[1] 刘海鸥, 孙晶晶, 苏妍嫄, 等. 国内外用户画像研究综述[J]. 情报理论与实践, 2018, 41(11): 155-160. Liu Haipeng, Sun Jingjing, Su Yanyuan, et al. Literature review of persona at home andabroad[J]. Information Studies: Theory & Application, 2018, 41(11): 155-160. [2] 宋巍, 刘丽珍, 王函石. 基于兴趣偏好的微博用户性别推断研究[J]. 电子学报, 2016, 44(10): 2522-2529. Song Wei, Liu Lizhen, Wang Hanshi. User interest preferences for gender inference on Microblog[J]. Acta Electronica Sinica, 2016, 44(10): 2522-2529. [3] 唐晓波, 朱娟. 大数据环境下知识融合的关键问题研究综述[J]. 图书馆杂志, 2017, 36(7): 10-16. Tang Xiaobo, Zhu Juan. A review on key issues of knowledge fusion in view of big data[J]. Library Journal, 2017, 36(7): 10-16. [4] 单晓红, 张晓月,刘晓燕. 基于在线评论的用户画像研究——以携程酒店为例[J]. 情报理论与实践, 2018, 41(4):99-104, 149. Dan Xiaohong, Zhang Xiaoyue, Liu Xiaoyan. Research on user portrait based on online review: taking Ctrip hotel as an example[J]. Information Studies: Theory & Application, 2018, 41(4):99-104, 149. [5] 王巍. 利用社会化信息的协同过滤推荐算法研究[D]. 成都: 电子科技大学, 2017. Wang Wei. Research on collaborative filtering recommendation leveraging social information[D]. Chengdu: University of Electronic Science and Technology of China, 2017. [6] 刘天宇, 陈登凯, 李雪瑞. 基于用户点赞行为的推荐算法研究[J]. 计算机工程与应用, 2017, 53(24): 75-79. Liu Tianyu, Chen Dengkai, Li Xuerui. Research on recommendation algorithm based on user’s praise pointing behavior[J]. Computer Engineering and Applications, 2017, 53(24): 75-79. [7] Kosinki M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior[J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110(15): 5802-5805. [8] 王涛, 李明. 基于LDA模型与语义网络对评论文本挖掘研究[J]. 重庆工商大学学报:自然科学版, 2019, 36(8): 9-16. Wang Tao, Li Ming. Research on comment text mining based on LDA model and semantic network[J]. Journal of Chongqing Technology and Business University:Natural Science Edition, 2019, 36(8): 9-16. [9] 唐晓波, 祝黎, 谢力. 基于主题的微博二级好友推荐模型研究[J]. 图书情报工作, 2014, 58(9): 105-113. Tang Xiaobo, Zhu Li, Xie Li. Two-level microblog friend recommendation based on topic model[J]. Library and Information Service, 2014, 58(9): 105-113. [10] 唐晓波, 王洪艳. 基于潜在语义分析的微博主题挖掘模型研究[J]. 图书情报工作, 2012, 56(24): 114-119. Tang Xiaobo, Wang Hongyan. Microblog topic mining model based on latent semantic analysis[J]. Library and Information Service, 2012, 56(24): 114-119. [11] Hofman T. Probabilistic latent semantic indexing [C]// Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 50-57. [12] 夏立新, 曾杰妍, 毕崇武, 等. 基于LDA主题模型的用户兴趣层级演化研究[J]. 数据分析与知识发现, 2019, 31(7): 1-13. Xia Lixin, Zeng Jieyan, Bi Chongwu, et al. Identifying hierarchy evolution of user interests with LDA topic model[J]. Data Analysis and Knowledge Discovery, 2019, 31(7): 1-13. [13] 李志清. 基于LDA主题特征的微博转发预测[J]. 情报杂志, 2015, 34(9): 158-162. Li Zhiqing. Predicting retweeting behavior based on LDA topic features[J]. Journal of Intelligence, 2015, 34(9): 158-162. [14] Weng Jianshu, Lim E P, Jiang Jing, et al. Twitterrank: finding topic-sensitive influential twitterers [C]// Proc of the 3rd ACM International Conference on Web Search and Data Mining. New York: ACM Press, 2010: 261-270. [15] 孙海真, 谢颖华. 基于情景和浏览内容的层次性用户兴趣建模[J]. 计算机系统应用, 2017, 26(1): 152-156. Sun Haizhen, Xie Yinghua. Hierarchical user interest modeling based on context and browse content[J]. Computer Systems & Applications, 2017, 26(1): 152-156. [16] 陈春玲, 吴凡, 余瀚. 基于逻辑斯蒂回归的恶意请求分类识别模型[J]. 计算机技术与发展, 2019, 29(2): 124-128. Chen Chunling, Wu Fan, Yu Han. A classification and recognition model of malicious requests based on logistic regression[J]. Computer Technology and Development, 2019, 29(2): 124-128. [17] Chawla N, Bowyer K, Hall L, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357. [18] 曹娟, 张勇东, 李锦涛, 等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10): 1780-1787. Cao Juan, Zhang Yongdong, Li Jintao, et al. A method of adaptively selecting best LDA model based on density[J]. Chinese Journal of Computers, 2008, 31(10): 1780-1787. [19] 万志远, 陶嘉恒, 梁家坤, 等. Stack Overflow上机器学习相关问题的大规模实证研究[J]. 浙江大学学报:工学版, 2019, 53(5): 819-828. Wan Zhiyuan, Tao Jiaheng, Liang Jiakun, et al. Large-scale empirical study on machine learning related questions on Stack Overflow[J]. Journal of Zhejiang University: Engineering Science, 2019, 53(5): 819-828. [20] Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures [C]// Proc of the 8th ACM International Conference on Web Search and Data Mining. Shanghai: ACM Press, 2015: 399-408. [21] Pazzani M, Billsus D. Learning and revising user profiles: the identification of interesting web sites[J]. Machine Learning, 1997(27): 313-331.