基于LDA主题模型的用户特征预测研究

doi:10.13306/j.1672-3813.2020.04.002

复杂系统与复杂性科学

2020, Vol. 17

Issue (4): 9-15 DOI: 10.13306/j.1672-3813.2020.04.002

本期目录 | 过刊浏览 | 高级检索

基于LDA主题模型的用户特征预测研究

王雅静¹, 郭强¹, 邓春燕¹, 林青轩¹, 刘建国^2,3

1.上海理工大学复杂系统科学研究中心,上海 200093;
2.上海财经大学会计与财务研究院,上海 200433;
3.新浪微热点大数据研究院,上海 210204

Research on User Traits Predicting Based on LDA Topic Model

WANG Yajing¹, GUO Qiang¹, DENG Chunyan¹, LIN Qingxuan¹, LIU Jianguo^2,3

1. Research Center for Complex Systems Science, University of Shanghai for Science & Technology, Shanghai 200093, China;
2. Institute of Accounting and Finance,Shanghai University of Finance and Economics, Shanghai 200433, China;
3. Institute of Sina WRD Big Data, Shanghai 210204, China

摘要
参考文献
相关文章
Metrics

全文: PDF(1013 KB)
输出: BibTeX | EndNote (RIS)

摘要用户特征可以通过在线用户的点赞信息进行奇异值分解和Logistic回归有效预测,然而对新用户的特征预测却难以实现。为了解决该问题,提出了一种基于LDA主题模型的在线用户特征预测方法。首先使用LDA模型提取微博用户的点赞文本主题,然后基于主题对新用户的特征进行预测,最后与基于奇异值分解的传统方法比较预测结果。实验结果表明其F1值最高提升0.15,且计算时间平均缩短了69.09%。研究改进了点赞信息固有标签不能准确反映用户偏好的缺陷,避免了传统方法预测过程中仍需对新用户及其点赞信息重新计算的繁琐弊端,为用户特征分析提供了另一条可行途径。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王雅静
	郭强
	邓春燕
	林青轩
	刘建国

关键词 ：用户特征预测, 点赞信息, LDA主题模型, 奇异值分解, Logistic回归

Abstract：User traits can be effectively predicted by singular value decomposition and Logistic Regression through online user’s ‘Like’ information. However, this method cannot predict new users’ traits. To slove the problem, this paper proposes an online user traits predicting method based on LDA topic model. Firstly, the method extracted the Weibo user’s ‘Like’ text topic through LDA model. Then it predicted new user traits based on topic. Finally, the result is compared to the traditional method based on singular value decomposition. The results showed that the F1 value of this method was up to 0.15, and the calculation time was shortened by 69.09% in average. Research inproves the defect that the inherent tags of the ‘Like’ informations cannot accurately reflect user preference, avoiding the disadvantage of recalculating new users and their ‘like’information in the predicting process of traditional methods, providing another feasible way for user traits analysis.

Key words： user traits predicting ‘like’ information LDA topic model singular value decomposition Logistic regression

收稿日期: 2020-03-25 出版日期: 2020-12-21

ZTFLH:

TP391

基金资助:国家自然科学基金(61773248,71771152);国家社科重大项目(18ZDA088,20ZDA060)

通讯作者: 刘建国(1979-),男,山西临汾人,博士,教授,主要研究方向为媒体大数据建模与分析、知识管理、财务管理。

作者简介: 王雅静(1996-),女,安徽淮南人,硕士研究生,主要研究方向为文本挖掘与复杂网络。

引用本文:

王雅静, 郭强, 邓春燕, 林青轩, 刘建国. 基于LDA主题模型的用户特征预测研究[J]. 复杂系统与复杂性科学, 2020, 17(4): 9-15.
WANG Yajing, GUO Qiang, DENG Chunyan, LIN Qingxuan, LIU Jianguo. Research on User Traits Predicting Based on LDA Topic Model. Complex Systems and Complexity Science, 2020, 17(4): 9-15.

链接本文:

http://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2020.04.002 或 http://fzkx.qdu.edu.cn/CN/Y2020/V17/I4/9

[1] 刘海鸥, 孙晶晶, 苏妍嫄, 等. 国内外用户画像研究综述[J]. 情报理论与实践, 2018, 41(11): 155-160.
Liu Haipeng, Sun Jingjing, Su Yanyuan, et al. Literature review of persona at home andabroad[J]. Information Studies: Theory & Application, 2018, 41(11): 155-160.
[2] 宋巍, 刘丽珍, 王函石. 基于兴趣偏好的微博用户性别推断研究[J]. 电子学报, 2016, 44(10): 2522-2529.
Song Wei, Liu Lizhen, Wang Hanshi. User interest preferences for gender inference on Microblog[J]. Acta Electronica Sinica, 2016, 44(10): 2522-2529.
[3] 唐晓波, 朱娟. 大数据环境下知识融合的关键问题研究综述[J]. 图书馆杂志, 2017, 36(7): 10-16.
Tang Xiaobo, Zhu Juan. A review on key issues of knowledge fusion in view of big data[J]. Library Journal, 2017, 36(7): 10-16.
[4] 单晓红, 张晓月,刘晓燕. 基于在线评论的用户画像研究——以携程酒店为例[J]. 情报理论与实践, 2018, 41(4):99-104, 149.
Dan Xiaohong, Zhang Xiaoyue, Liu Xiaoyan. Research on user portrait based on online review: taking Ctrip hotel as an example[J]. Information Studies: Theory & Application, 2018, 41(4):99-104, 149.
[5] 王巍. 利用社会化信息的协同过滤推荐算法研究[D]. 成都: 电子科技大学, 2017.
Wang Wei. Research on collaborative filtering recommendation leveraging social information[D]. Chengdu: University of Electronic Science and Technology of China, 2017.
[6] 刘天宇, 陈登凯, 李雪瑞. 基于用户点赞行为的推荐算法研究[J]. 计算机工程与应用, 2017, 53(24): 75-79.
Liu Tianyu, Chen Dengkai, Li Xuerui. Research on recommendation algorithm based on user’s praise pointing behavior[J]. Computer Engineering and Applications, 2017, 53(24): 75-79.
[7] Kosinki M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior[J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110(15): 5802-5805.
[8] 王涛, 李明. 基于LDA模型与语义网络对评论文本挖掘研究[J]. 重庆工商大学学报:自然科学版, 2019, 36(8): 9-16.
Wang Tao, Li Ming. Research on comment text mining based on LDA model and semantic network[J]. Journal of Chongqing Technology and Business University:Natural Science Edition, 2019, 36(8): 9-16.
[9] 唐晓波, 祝黎, 谢力. 基于主题的微博二级好友推荐模型研究[J]. 图书情报工作, 2014, 58(9): 105-113.
Tang Xiaobo, Zhu Li, Xie Li. Two-level microblog friend recommendation based on topic model[J]. Library and Information Service, 2014, 58(9): 105-113.
[10] 唐晓波, 王洪艳. 基于潜在语义分析的微博主题挖掘模型研究[J]. 图书情报工作, 2012, 56(24): 114-119.
Tang Xiaobo, Wang Hongyan. Microblog topic mining model based on latent semantic analysis[J]. Library and Information Service, 2012, 56(24): 114-119.
[11] Hofman T. Probabilistic latent semantic indexing [C]// Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 50-57.
[12] 夏立新, 曾杰妍, 毕崇武, 等. 基于LDA主题模型的用户兴趣层级演化研究[J]. 数据分析与知识发现, 2019, 31(7): 1-13.
Xia Lixin, Zeng Jieyan, Bi Chongwu, et al. Identifying hierarchy evolution of user interests with LDA topic model[J]. Data Analysis and Knowledge Discovery, 2019, 31(7): 1-13.
[13] 李志清. 基于LDA主题特征的微博转发预测[J]. 情报杂志, 2015, 34(9): 158-162.
Li Zhiqing. Predicting retweeting behavior based on LDA topic features[J]. Journal of Intelligence, 2015, 34(9): 158-162.
[14] Weng Jianshu, Lim E P, Jiang Jing, et al. Twitterrank: finding topic-sensitive influential twitterers [C]// Proc of the 3rd ACM International Conference on Web Search and Data Mining. New York: ACM Press, 2010: 261-270.
[15] 孙海真, 谢颖华. 基于情景和浏览内容的层次性用户兴趣建模[J]. 计算机系统应用, 2017, 26(1): 152-156.
Sun Haizhen, Xie Yinghua. Hierarchical user interest modeling based on context and browse content[J]. Computer Systems & Applications, 2017, 26(1): 152-156.
[16] 陈春玲, 吴凡, 余瀚. 基于逻辑斯蒂回归的恶意请求分类识别模型[J]. 计算机技术与发展, 2019, 29(2): 124-128.
Chen Chunling, Wu Fan, Yu Han. A classification and recognition model of malicious requests based on logistic regression[J]. Computer Technology and Development, 2019, 29(2): 124-128.
[17] Chawla N, Bowyer K, Hall L, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[18] 曹娟, 张勇东, 李锦涛, 等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10): 1780-1787.
Cao Juan, Zhang Yongdong, Li Jintao, et al. A method of adaptively selecting best LDA model based on density[J]. Chinese Journal of Computers, 2008, 31(10): 1780-1787.
[19] 万志远, 陶嘉恒, 梁家坤, 等. Stack Overflow上机器学习相关问题的大规模实证研究[J]. 浙江大学学报:工学版, 2019, 53(5): 819-828.
Wan Zhiyuan, Tao Jiaheng, Liang Jiakun, et al. Large-scale empirical study on machine learning related questions on Stack Overflow[J]. Journal of Zhejiang University: Engineering Science, 2019, 53(5): 819-828.
[20] Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures [C]// Proc of the 8th ACM International Conference on Web Search and Data Mining. Shanghai: ACM Press, 2015: 399-408.
[21] Pazzani M, Billsus D. Learning and revising user profiles: the identification of interesting web sites[J]. Machine Learning, 1997(27): 313-331.

[1]	胡亮, 肖人彬, 王英聪. 蜂群激发抑制算法及其在交通信号配时中的应用[J]. 复杂系统与复杂性科学, 2019, 16(2): 9-18.
[2]	刘琪, 肖人彬. 观点动力学视角下基于意见领袖的网络舆情反转研究[J]. 复杂系统与复杂性科学, 2019, 16(1): 1-13.
[3]	李甍娜, 郭进利, 卞闻, 常宁戈, 肖潇, 陆睿敏. 网络视角下的唐诗[J]. 复杂系统与复杂性科学, 2017, 14(4): 66-71.
[4]	蒲玮, 李雄. 基于能力组件的作战仿真Agent模块化结构设计[J]. 复杂系统与复杂性科学, 2017, 14(3): 45-57.
[5]	崔琼, 李建华, 冉淏丹, 南明莉. 任务流驱动的指挥信息系统动态超网络模型[J]. 复杂系统与复杂性科学, 2017, 14(3): 58-67.
[6]	杨晓波, 陈楚湘, 王至婉. 基于节点相似性的LFM社团发现算法[J]. 复杂系统与复杂性科学, 2017, 14(3): 85-90.
[7]	于同洋, 肖人彬, 侯俊东. 网络舆情结构逆转建模与仿真:基于改进Deffuant模型[J]. 复杂系统与复杂性科学, 2019, 16(3): 30-39.
[8]	瞿倩倩, 韩华, 吕亚楠, 贾承丰, 马媛媛. 基于社交网络结构特征的S2IR谣言传播模型[J]. 复杂系统与复杂性科学, 2019, 16(3): 48-59.
[9]	曹勇, 肖人彬. 蜂群激发抑制与刺激响应相结合的群机器人区域覆盖算法[J]. 复杂系统与复杂性科学, 2019, 16(4): 1-12.
[10]	卢冬冬, 吴洁, 刘鹏, 盛永祥. 网络稳定性研究——以AngularJS为例[J]. 复杂系统与复杂性科学, 2020, 17(3): 38-46.
[11]	林青轩, 郭强, 邓春燕, 王雅静, 刘建国. 基于孤立森林采样策略的企业异常用水模式检测[J]. 复杂系统与复杂性科学, 2020, 17(3): 47-51.
[12]	赵子鸣, 勾文沙, 高晓惠, 陈清华. COVID-19疫情防控需要社区监测及接触者追踪并重[J]. 复杂系统与复杂性科学, 2020, 17(4): 1-8.

Viewed

Full text

Abstract

Cited

Shared

Discussed