[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HKUDS--LLMRec":3,"tool-HKUDS--LLMRec":65},[4,18,32,41,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,2,"2026-04-10T11:13:16",[15,16,27,28,13,29,30,14,31],"视频","插件","其他","语言模型","音频",{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":38,"last_commit_at":39,"category_tags":40,"status":17},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[30,16,29],{"id":42,"name":43,"github_repo":44,"description_zh":45,"stars":46,"difficulty_score":38,"last_commit_at":47,"category_tags":48,"status":17},5773,"cs-video-courses","Developer-Y\u002Fcs-video-courses","cs-video-courses 是一个精心整理的计算机科学视频课程清单，旨在为自学者提供系统化的学习路径。它汇集了全球知名高校（如加州大学伯克利分校、新南威尔士大学等）的完整课程录像，涵盖从编程基础、数据结构与算法，到操作系统、分布式系统、数据库等核心领域，并深入延伸至人工智能、机器学习、量子计算及区块链等前沿方向。\n\n面对网络上零散且质量参差不齐的教学资源，cs-video-courses 解决了学习者难以找到成体系、高难度大学级别课程的痛点。该项目严格筛选内容，仅收录真正的大学层级课程，排除了碎片化的简短教程或商业广告，确保用户能接触到严谨的学术内容。\n\n这份清单特别适合希望夯实计算机基础的开发者、需要补充特定领域知识的研究人员，以及渴望像在校生一样系统学习计算机科学的自学者。其独特的技术亮点在于分类极其详尽，不仅包含传统的软件工程与网络安全，还细分了生成式 AI、大语言模型、计算生物学等新兴学科，并直接链接至官方视频播放列表，让用户能一站式获取高质量的教育资源，免费享受世界顶尖大学的课堂体验。",79792,"2026-04-08T22:03:59",[29,15,16,14],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":24,"last_commit_at":55,"category_tags":56,"status":17},7347,"lobehub","lobehub\u002Flobehub","LobeHub 是一个致力于工作与生活的智能体协作平台，旨在帮助用户发现、构建并与不断成长的 AI 智能体队友协同工作。它解决了当前 AI 应用中单点交互效率低、难以形成规模化协作网络的问题，将“智能体”确立为工作的基本单元，让人类与 AI 能够共同进化。\n\n无论是开发者、研究人员还是普通用户，都能通过 LobeHub 轻松设计多智能体协作流程。平台支持一键安装 MCP 插件、访问丰富的智能体市场，并提供本地与云端数据库管理、多用户协作等高级功能。其独特的技术亮点包括对多种大模型服务商的兼容、本地大模型部署支持、视觉识别、语音对话（TTS\u002FSTT）、文生图以及思维链（Chain of Thought）等能力。此外，LobeHub 还具备分支对话、工件生成、文件上传与知识库集成等实用特性，并适配桌面端、移动端及 PWA 场景，支持自定义主题。\n\n通过开源与自托管选项，LobeHub 为构建人机共演的未来协作网络提供了灵活、可扩展的基础设施。",75141,"2026-04-13T22:06:32",[30,16,13,14,15],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":38,"last_commit_at":63,"category_tags":64,"status":17},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65767,"2026-04-11T11:10:05",[14,29,16],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":10,"env_os":91,"env_gpu":92,"env_ram":91,"env_deps":93,"category_tags":99,"github_topics":100,"view_count":24,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":109,"updated_at":110,"faqs":111,"releases":142},7512,"HKUDS\u002FLLMRec","LLMRec","[WSDM'2024 Oral] \"LLMRec: Large Language Models with Graph Augmentation for Recommendation\"","LLMRec 是一个专为推荐系统设计的开源框架，旨在利用大语言模型（LLM）的强大语义理解能力，解决传统推荐算法在数据稀疏和内容理解不足方面的难题。它通过三种创新的图增强策略，将自然语言视角引入用户与物品的交互图中：一是强化用户与物品之间的互动边连接；二是丰富物品节点的属性描述；三是构建更精准的用户画像。这种方法能有效挖掘如 Netflix、MovieLens 等平台中蕴含的多模态内容信息，显著提升推荐准确率。\n\n该工具特别适合从事推荐系统研究的研究人员、算法工程师以及希望探索大模型与传统图神经网络结合的开发者使用。其核心亮点在于巧妙地将大语言模型生成的文本增强数据（如 GPT-3.5 生成的描述和嵌入）融入图结构学习中，无需复杂架构调整即可实现性能跃升。项目基于 PyTorch 构建，提供了完整的数据预处理脚本、多模态数据集及训练代码，支持快速复现论文结果并进行二次开发。无论是学术探索还是工业界落地尝试，LLMRec 都为提升推荐系统的智能化水平提供了一条高效可行的技术路径。","# LLMRec: Large Language Models with Graph Augmentation for Recommendation\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_5f4430796c02.png' \u002F>\n\nPyTorch implementation for WSDM 2024 paper [LLMRec: Large Language Models with Graph Augmentation for Recommendation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf).\n\n\n\n[Wei Wei](#), [Xubin Ren](https:\u002F\u002Frxubin.com\u002F), [Jiabin Tang](https:\u002F\u002Ftjb-tech.github.io\u002F), [Qingyong Wang](#), [Lixin Su](#), [Suqi Cheng](#), [Junfeng Wang](#), [Dawei Yin](https:\u002F\u002Fwww.yindawei.com\u002F) and [Chao Huang](https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh\u002Fhome)*.\n(*Correspondence)\n\n**[Data Intelligence Lab](https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh\u002Fhome)@[University of Hong Kong](https:\u002F\u002Fwww.hku.hk\u002F)**, Baidu Inc.\n\n\u003Ca href='https:\u002F\u002Fllmrec.github.io\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fllmrec.github.io\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Page-purple'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF-orange'>\u003C\u002Fa> \n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC1wKlPPlP9zKGYk62yR0K_g)\n\n\nThis repository hosts the code, original data and augmented data of **LLMRec**.\n\n-----------\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_0e37cd5ce8e5.png\" alt=\"LLMRec\" \u002F>\n\u003C\u002Fp>\n\nLLMRec is a novel framework that enhances recommenders by applying three simple yet effective LLM-based graph augmentation strategies to recommendation system. LLMRec is to make the most of the content within online platforms (e.g., Netflix, MovieLens) to augment interaction graph by i) reinforcing u-i interactive edges, ii) enhancing item node attributes, and iii) conducting user node profiling, intuitively from the natural language perspective.\n\n-----------\n\n## 🎉 News 📢📢  \n\n- [x] [2024.3.20] 🚀🚀 📢📢📢📢🌹🔥🔥🚀🚀 Because baselines `LATTICE` and `MMSSL` require some minor modifications, we provide code that can be easily run by simply modifying the dataset path.\n\n- [x] [2023.11.3] 🚀🚀 Release the script for constructing the prompt.\n\n- [x] [2023.11.1] 🔥🔥 Release the multi-modal datasets (Netflix, MovieLens), including textual data and visual data.\n\n- [x] [2023.11.1] 🚀🚀 Release LLM-augmented textual data(by gpt-3.5-turbo-0613), and LLM-augmented embedding(by text-embedding-ada-002).\n\n- [x] [2023.10.28] 🔥🔥 The full paper of our LLMRec is available at [LLMRec: Large Language Models with Graph Augmentation for Recommendation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf).\n\n- [x] [2023.10.28] 🚀🚀 Release the code of LLMRec.\n\n\n## 👉 TODO \n\n- [ ] Provide different larger version of the datasets.\n- [ ] ...\n\n\n-----------\n\n\u003Ch2> Dependencies \u003C\u002Fh2>\n\n```\npip install -r requirements.txt\n```\n\n\n\u003Ch2>Usage \u003C\u002Fh2>\n\n\u003Ch4>Stage 1: LLM-based Data Augmentation\u003C\u002Fh4>\n\n```\ncd LLMRec\u002FLLM_augmentation\u002F\npython .\u002Fgpt_ui_aug.py\npython .\u002Fgpt_user_profiling.py\npython .\u002Fgpt_i_attribute_generate_aug.py\n```\n\n\n\n\n\u003Ch4>Stage 2: Recommender training with LLM-augmented Data\u003C\u002Fh4>\n\n```\ncd LLMRec\u002F\npython .\u002Fmain.py --dataset {DATASET}\n```\nSupported datasets:  `netflix`, `movielens`\n\nSpecific code execution example on 'netflix':\n```\n# LLMRec\npython .\u002Fmain.py\n\n# w\u002Fo-u-i\npython .\u002Fmain.py --aug_sample_rate=0.0\n\n# w\u002Fo-u\npython .\u002Fmain.py --user_cat_rate=0\n\n# w\u002Fo-u&i\npython .\u002Fmain.py --user_cat_rate=0  --item_cat_rate=0\n\n# w\u002Fo-prune\npython .\u002Fmain.py --prune_loss_drop_rate=0\n```\n\n\n\n\n\n-----------\n\n\n\u003Ch2> Datasets \u003C\u002Fh2>\n\n  ```\n  ├─ LLMRec\u002F \n      ├── data\u002F\n        ├── netflix\u002F\n        ...\n  ```\n\n\u003Ch3> Multi-modal Datasets \u003C\u002Fh3>\n🌹🌹 Please cite our paper if you use the 'netflix' dataset~ ❤️  \n\nWe collected a multi-modal dataset using the original [Netflix Prize Data](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fnetflix-inc\u002Fnetflix-prize-data) released on the [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002F) website. The data format is directly compatible with state-of-the-art multi-modal recommendation models like [LLMRec](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec), [MMSSL](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FMMSSL), [LATTICE](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FLATTICE), [MICRO](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FMICRO), and others, without requiring any additional data preprocessing.\n\n `Textual Modality:` We have released the item information curated from the original dataset in the \"item_attribute.csv\" file. Additionally, we have incorporated textual information enhanced by LLM into the \"augmented_item_attribute_agg.csv\" file. (The following three images represent (1) information about Netflix as described on the Kaggle website, (2) textual information from the original Netflix Prize Data, and (3) textual information augmented by LLMs.)\n\u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_e2e2c463972f.png\" alt=\"Image 1\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data' on Kaggle.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_bcdcd21b8993.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data'.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_dd3fedc2a856.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>LLM-augmented textual data.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>  \n\u003C\u002Fdiv>\n \n `Visual Modality:` We have released the visual information obtained from web crawling in the \"Netflix_Posters\" folder. (The following image displays the poster acquired by web crawling using item information from the Netflix Prize Data.)\n \u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_4a5887d76ad0.png\" alt=\"Image 1\" style=\"width:690px;height:590px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data' on Kaggle.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\u003C\u002Fdiv>\n \n\n\u003Ch3> Original Multi-modal Datasets & Augmented Datasets \u003C\u002Fh3>\n \u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_f41a85957a36.png\" alt=\"Image 1\" style=\"width:480px;height:270px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data' on Kaggle.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\u003C\u002Fdiv>\n\n\n\u003Cbr>\n\u003Cp>\n\n\u003Ch3> Download the Netflix dataset. \u003C\u002Fh3>\n🚀🚀\nWe provide the processed data (i.e., CF training data & basic user-item interactions, original multi-modal data including images and text of items, encoded visual\u002Ftextual features and LLM-augmented text\u002Fembeddings).  🌹 We hope to contribute to our community and facilitate your research 🚀🚀 ~\n\n- `netflix`: [Google Drive Netflix](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1BGKm3nO4xzhyi_mpKJWcfxgi3sQ2j_Ec?usp=drive_link).  [🌟(Image&Text)](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1euAnMYD1JBPflx0M86O2M9OsbBSfrzPK\u002Fview?usp=drive_link)\n\n\n\n\u003Ch3> Encoding the Multi-modal Content. \u003C\u002Fh3>\n\nWe use [CLIP-ViT](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fclip-vit-base-patch32) and [Sentence-BERT](https:\u002F\u002Fwww.sbert.net\u002F) separately as encoders for visual side information and textual side information.\n\n\n\n\n-----------\n\n\n\n\u003Ch2> Prompt & Completion Example \u003C\u002Fh2>\n\u003Ch4> LLM-based Implicit Feedback Augmentation \u003C\u002Fh4>\n\n> Prompt \n>> Recommend user with movies based on user history  that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms(2002), Comedy|Drama|Romance Candidate: [121]The Vampire Lovers (1970), Horror [155] Billabong Odyssey (2003),Documentary [248]The Invisible Guest 2016, Crime, Drama, Mystery   Output index of user's favorite and dislike movie from candidate.Please just give the index in [].\n\n> Completion\n>> 248   121\n\n\u003Ch4> LLM-based User Profile Augmentation \u003C\u002Fh4>\n\n> Prompt \n>> Generate user profile based on the history of user, that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms (2002), Comedy|Drama|Romance  Please output the following infomation of user, output format: {age: , gender: , liked genre: , disliked genre: , liked directors: , country: , language: }\n\n> Completion\n>> {age: 50, gender: female, liked genre: Comedy|Fantasy, Comedy|Drama|Romance, disliked genre: Thriller, Horror, liked directors: Ron Underwood, country: Canada, United States, language: English}\n\n\n\u003Ch4> LLM-based Item Attributes Augmentation \u003C\u002Fh4>\n\n> Prompt \n>> Provide the inquired information of the given movie. [332] Heart and Souls (1993), Comedy|Fantasy The inquired information is: director, country, language. And please output them in form of: director, country, language \n\n> Completion\n>> Ron Underwood, USA, English\n\n\n\n\u003Ch2> Augmented Data \u003C\u002Fh2>\n\n\u003Ch4> Augmented Implicit Feedback (Edge) \u003C\u002Fh4>\nFor each user, 0 represents a positive sample, and 1 represents a negative sample.\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_6f8c188e38aa.png\" alt=\"Image 2\" style=\"width:150px;height:310px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data'.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n\u003Ch4> Augmented User Profile (User Node) \u003C\u002Fh4>\nFor each user, the dictionary stores augmented information such as 'age,' 'gender,' 'liked genre,' 'disliked genre,' 'liked directors,' 'country,' and 'language.'\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_1ba7045e0eac.png\" alt=\"Image 2\" style=\"width:900px;height:700px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data'.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n##### Augmented item attribute\nFor each item, the dictionary stores augmented information such as 'director,' 'country,' and 'language.'\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_2a520eb35f70.png\" alt=\"Image 2\" style=\"width:500px;height:660px;\">\n\u003C!--     \u003Cfigcaption>Textual data in original 'Netflix Prize Data'.\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n\n\n\u003Ch2> Candidate Preparing for LLM-based Implicit Feedback Augmentation\u003C\u002Fh2>\n\n step 1: select base model such as MMSSL or LATTICE\n \n step 2: obtain user embedding and item embedding\n \n step 3: generate candidate\n```\n      _, candidate_indices = torch.topk(torch.mm(G_ua_embeddings, G_ia_embeddings.T), k=10)  \n      pickle.dump(candidate_indices.cpu(), open('.\u002Fdata\u002F' + args.datasets +  '\u002Fcandidate_indices','wb'))\n```\nExample of specific candidate data.\n```\nIn [3]: candidate_indices\nOut[3]: \ntensor([[ 9765,  2930,  6646,  ..., 11513, 12747, 13503],\n        [ 3665,  8999,  2587,  ...,  1559,  2975,  3759],\n        [ 2266,  8999,  1559,  ...,  8639,   465,  8287],\n        ...,\n        [11905, 10195,  8063,  ..., 12945, 12568, 10428],\n        [ 9063,  6736,  6938,  ...,  5526, 12747, 11110],\n        [ 9584,  4163,  4154,  ...,  2266,   543,  7610]])\n\nIn [4]: candidate_indices.shape\nOut[4]: torch.Size([13187, 10])\n```\n\n\n\n\n\n-----------\n\n\u003Ch1> Citing \u003C\u002Fh1>\n\nIf you find this work helpful to your research, please kindly consider citing our paper.\n\n\n```\n@article{wei2023llmrec,\n  title={LLMRec: Large Language Models with Graph Augmentation for Recommendation},\n  author={Wei, Wei and Ren, Xubin and Tang, Jiabin and Wang, Qinyong and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},\n  journal={arXiv preprint arXiv:2311.00423},\n  year={2023}\n}\n```\n\n\n## Acknowledgement\n\nThe structure of this code is largely based on [MMSSL](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FMMSSL), [LATTICE](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FLATTICE), [MICRO](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FMICRO). Thank them for their work.\n\n","# LLMRec：用于推荐的大语言模型与图增强\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_5f4430796c02.png' \u002F>\n\n这是针对 WSDM 2024 论文 [LLMRec：用于推荐的大语言模型与图增强](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf) 的 PyTorch 实现。\n\n\n\n[Wei Wei](#), [Xubin Ren](https:\u002F\u002Frxubin.com\u002F), [Jiabin Tang](https:\u002F\u002Ftjb-tech.github.io\u002F), [Qingyong Wang](#), [Lixin Su](#), [Suqi Cheng](#), [Junfeng Wang](#), [Dawei Yin](https:\u002F\u002Fwww.yindawei.com\u002F) 和 [Chao Huang](https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh\u002Fhome)*。\n(*通讯作者)\n\n**[数据智能实验室](https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh\u002Fhome)@[香港大学](https:\u002F\u002Fwww.hku.hk\u002F)**, 百度公司。\n\n\u003Ca href='https:\u002F\u002Fllmrec.github.io\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fllmrec.github.io\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Page-purple'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF-orange'>\u003C\u002Fa> \n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC1wKlPPlP9zKGYk62yR0K_g)\n\n\n本仓库托管了 **LLMRec** 的代码、原始数据以及增强后的数据。\n\n-----------\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_0e37cd5ce8e5.png\" alt=\"LLMRec\" \u002F>\n\u003C\u002Fp>\n\nLLMRec 是一种新颖的框架，通过将三种简单而有效的基于大语言模型的图增强策略应用于推荐系统，从而提升推荐效果。LLMRec 充分利用在线平台（如 Netflix、MovieLens）中的内容，从自然语言的角度出发，通过 i) 强化用户-物品交互边，ii) 增强物品节点属性，iii) 进行用户节点画像，来对交互图进行增强。\n\n-----------\n\n## 🎉 新闻 📢📢  \n\n- [x] [2024.3.20] 🚀🚀 📢📢📢📢🌹🔥🔥🚀🚀 由于基线 `LATTICE` 和 `MMSSL` 需要一些小的修改，我们提供了只需修改数据集路径即可轻松运行的代码。\n\n- [x] [2023.11.3] 🚀🚀 发布了用于构建提示词的脚本。\n\n- [x] [2023.11.1] 🔥🔥 发布了多模态数据集（Netflix、MovieLens），包括文本数据和视觉数据。\n\n- [x] [2023.11.1] 🚀🚀 发布了由 gpt-3.5-turbo-0613 增强的文本数据，以及由 text-embedding-ada-002 增强的嵌入数据。\n\n- [x] [2023.10.28] 🔥🔥 我们的 LLMRec 完整论文已在 [LLMRec：用于推荐的大语言模型与图增强](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.00423.pdf) 中发布。\n\n- [x] [2023.10.28] 🚀🚀 发布了 LLMRec 的代码。\n\n\n## 👉 待办事项 \n\n- [ ] 提供不同规模的数据集版本。\n- [ ] ...\n\n\n-----------\n\n\u003Ch2> 依赖项 \u003C\u002Fh2>\n\n```\npip install -r requirements.txt\n```\n\n\n\u003Ch2> 使用方法 \u003C\u002Fh2>\n\n\u003Ch4> 第一阶段：基于大语言模型的数据增强 \u003C\u002Fh4>\n\n```\ncd LLMRec\u002FLLM_augmentation\u002F\npython .\u002Fgpt_ui_aug.py\npython .\u002Fgpt_user_profiling.py\npython .\u002Fgpt_i_attribute_generate_aug.py\n```\n\n\n\n\n\u003Ch4> 第二阶段：使用 LLM 增强数据进行推荐训练 \u003C\u002Fh4>\n\n```\ncd LLMRec\u002F\npython .\u002Fmain.py --dataset {DATASET}\n```\n支持的数据集：`netflix`, `movielens`\n\n以 'netflix' 数据集为例的具体代码执行：\n```\n# LLMRec\npython .\u002Fmain.py\n\n# 不使用 u-i\npython .\u002Fmain.py --aug_sample_rate=0.0\n\n# 不使用 u\npython .\u002Fmain.py --user_cat_rate=0\n\n# 不使用 u&i\npython .\u002Fmain.py --user_cat_rate=0  --item_cat_rate=0\n\n# 不进行剪枝\npython .\u002Fmain.py --prune_loss_drop_rate=0\n```\n\n\n\n\n\n-----------\n\n\n\u003Ch2> 数据集 \u003C\u002Fh2>\n\n  ```\n  ├─ LLMRec\u002F \n      ├── data\u002F\n        ├── netflix\u002F\n        ...\n  ```\n\n\u003Ch3> 多模态数据集 \u003C\u002Fh3>\n🌹🌹 如果您使用 'netflix' 数据集，请引用我们的论文~ ❤️  \n\n我们基于在 [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002F) 网站上发布的原始 [Netflix Prize Data](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fnetflix-inc\u002Fnetflix-prize-data) 收集了一个多模态数据集。该数据格式可以直接兼容最先进的多模态推荐模型，如 [LLMRec](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec)、[MMSSL](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FMMSSL)、[LATTICE](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FLATTICE)、[MICRO](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FMICRO) 等，无需任何额外的数据预处理。\n\n `文本模态：` 我们在 \"item_attribute.csv\" 文件中发布了从原始数据集中整理出的物品信息。此外，我们还将经过大语言模型增强的文本信息整合到了 \"augmented_item_attribute_agg.csv\" 文件中。（以下三张图片分别展示了 (1) Kaggle 网站上关于 Netflix 的描述，(2) 原始 Netflix Prize Data 中的文本信息，以及 (3) 经过大语言模型增强的文本信息。）\n\u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_e2e2c463972f.png\" alt=\"Image 1\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>Kaggle 上原始 'Netflix Prize Data' 中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_bcdcd21b8993.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>原始 'Netflix Prize Data' 中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_dd3fedc2a856.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\">\n\u003C!--     \u003Cfigcaption>经大语言模型增强的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>  \n\u003C\u002Fdiv>\n \n `视觉模态：` 我们在 \"Netflix_Posters\" 文件夹中发布了通过网络爬取获得的视觉信息。（下图展示了根据 Netflix Prize Data 中的物品信息通过网络爬取得到的海报。）\n \u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_4a5887d76ad0.png\" alt=\"Image 1\" style=\"width:690px;height:590px;\">\n\u003C!--     \u003Cfigcaption>Kaggle 上原始 'Netflix Prize Data' 中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\u003C\u002Fdiv>\n \n\n\u003Ch3> 原始多模态数据集与增强数据集 \u003C\u002Fh3>\n \u003Cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\">\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n   \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_f41a85957a36.png\" alt=\"Image 1\" style=\"width:480px;height:270px;\">\n\u003C!--     \u003Cfigcaption>Kaggle 上原始 'Netflix Prize Data' 中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\u003C\u002Fdiv>\n\n\n\u003Cbr>\n\u003Cp>\n\n\u003Ch3> 下载 Netflix 数据集。 \u003C\u002Fh3>\n🚀🚀\n我们提供了处理后的数据（即协同过滤训练数据及基本的用户-物品交互信息、包含物品图像和文本的原始多模态数据、编码后的视觉\u002F文本特征，以及经大语言模型增强的文本\u002F嵌入）。  🌹 我们希望为社区做出贡献，并促进您的研究 🚀🚀 ~\n\n- `netflix`: [Google Drive Netflix](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1BGKm3nO4xzhyi_mpKJWcfxgi3sQ2j_Ec?usp=drive_link).  [🌟(图像&文本)](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1euAnMYD1JBPflx0M86O2M9OsbBSfrzPK\u002Fview?usp=drive_link)\n\n\n\n\u003Ch3> 对多模态内容进行编码。 \u003C\u002Fh3>\n\n我们分别使用[CLIP-ViT](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fclip-vit-base-patch32)和[Sentence-BERT](https:\u002F\u002Fwww.sbert.net\u002F)作为视觉侧信息和文本侧信息的编码器。\n\n\n\n\n-----------\n\n\n\n\u003Ch2> 提示与完成示例 \u003C\u002Fh2>\n\u003Ch4> 基于LLM的隐式反馈增强 \u003C\u002Fh4>\n\n> 提示 \n>> 根据用户的观影历史，为用户推荐电影，每部电影需包含片名、年份和类型。历史：[332]《心灵奇旅》（1993），喜剧|奇幻 [364]《扫帚男》（2002），喜剧|剧情|浪漫 候选：[121]《吸血鬼情人》（1970），恐怖 [155]《水塘历险记》（2003），纪录片 [248]《看不见的客人》（2016），犯罪、剧情、悬疑 请从候选中给出用户喜欢和不喜欢的电影索引，仅以[]形式输出。\n\n> 完成\n>> [248] [121]\n\n\u003Ch4> 基于LLM的用户画像增强 \u003C\u002Fh4>\n\n> 提示 \n>> 根据用户的观影历史，生成用户画像，每部电影需包含片名、年份和类型。历史：[332]《心灵奇旅》（1993），喜剧|奇幻 [364]《扫帚男》（2002），喜剧|剧情|浪漫 请输出以下用户信息，输出格式为：{age: , gender: , liked genre: , disliked genre: , liked directors: , country: , language: }\n\n> 完成\n>> {age: 50, gender: female, liked genre: 喜剧|奇幻, 喜剧|剧情|浪漫, disliked genre: 惊悚, 恐怖, liked directors: Ron Underwood, country: 加拿大, 美国, language: 英语}\n\n\n\u003Ch4> 基于LLM的物品属性增强 \u003C\u002Fh4>\n\n> 提示 \n>> 提供给定电影的相关信息。[332]《心灵奇旅》（1993），喜剧|奇幻 需查询的信息为：导演、国家、语言。请以“导演，国家，语言”的形式输出。\n\n> 完成\n>> Ron Underwood, 美国, 英语\n\n\n\n\u003Ch2> 增强数据 \u003C\u002Fh2>\n\n\u003Ch4> 增强的隐式反馈（边） \u003C\u002Fh4>\n对于每个用户，0代表正样本，1代表负样本。\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_6f8c188e38aa.png\" alt=\"Image 2\" style=\"width:150px;height:310px;\">\n\u003C!--     \u003Cfigcaption>原始'Netflix奖数据'中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n\u003Ch4> 增强的用户画像（用户节点） \u003C\u002Fh4>\n对于每个用户，字典存储了增强后的信息，如‘年龄’、‘性别’、‘喜欢的类型’、‘不喜欢的类型’、‘喜欢的导演’、‘国家’和‘语言’。\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_1ba7045e0eac.png\" alt=\"Image 2\" style=\"width:900px;height:700px;\">\n\u003C!--     \u003Cfigcaption>原始'Netflix奖数据'中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n##### 增强的物品属性\n对于每个物品，字典存储了增强后的信息，如‘导演’、‘国家’和‘语言’。\n  \u003Cfigure style=\"text-align: center; margin: 10px;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_readme_2a520eb35f70.png\" alt=\"Image 2\" style=\"width:500px;height:660px;\">\n\u003C!--     \u003Cfigcaption>原始'Netflix奖数据'中的文本数据。\u003C\u002Ffigcaption> -->\n  \u003C\u002Ffigure>\n\n\n\n\n\u003Ch2> 用于基于LLM的隐式反馈增强的候选准备 \u003C\u002Fh2>\n\n步骤1：选择基础模型，如MMSSL或LATTICE\n\n步骤2：获取用户嵌入和物品嵌入\n\n步骤3：生成候选\n```\n      _, candidate_indices = torch.topk(torch.mm(G_ua_embeddings, G_ia_embeddings.T), k=10)  \n      pickle.dump(candidate_indices.cpu(), open('.\u002Fdata\u002F' + args.datasets +  '\u002Fcandidate_indices','wb'))\n```\n具体候选数据示例。\n```\nIn [3]: candidate_indices\nOut[3]: \ntensor([[ 9765,  2930,  6646,  ..., 11513, 12747, 13503],\n        [ 3665,  8999,  2587,  ...,  1559,  2975,  3759],\n        [ 2266,  8999,  1559,  ...,  8639,   465,  8287],\n        ...,\n        [11905, 10195,  8063,  ..., 12945, 12568, 10428],\n        [ 9063,  6736,  6938,  ...,  5526, 12747, 11110],\n        [ 9584,  4163,  4154,  ...,  2266,   543,  7610]])\n\nIn [4]: candidate_indices.shape\nOut[4]: torch.Size([13187, 10])\n```\n\n\n\n\n\n-----------\n\n\u003Ch1> 引用 \u003C\u002Fh1>\n\n如果您认为本工作对您的研究有所帮助，请考虑引用我们的论文。\n\n\n```\n@article{wei2023llmrec,\n  title={LLMRec: Large Language Models with Graph Augmentation for Recommendation},\n  author={Wei, Wei and Ren, Xubin and Tang, Jiabin and Wang, Qinyong and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},\n  journal={arXiv preprint arXiv:2311.00423},\n  year={2023}\n}\n```\n\n\n\n\n## 致谢\n\n本代码的结构主要基于[MMSSL](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FMMSSL)、[LATTICE](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FLATTICE)、[MICRO](https:\u002F\u002Fgithub.com\u002FCRIPAC-DIG\u002FMICRO)。感谢他们的工作。","# LLMRec 快速上手指南\n\nLLMRec 是一个利用大语言模型（LLM）进行图增强的推荐系统框架。它通过三种策略增强交互图：强化用户 - 物品交互边、增强物品节点属性以及构建用户画像。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux \u002F macOS \u002F Windows (推荐 Linux)\n- **Python**: 3.8 或更高版本\n- **GPU**: 推荐使用支持 CUDA 的 NVIDIA GPU 以加速训练和推理\n\n### 前置依赖\n本项目基于 PyTorch 构建，主要依赖如下：\n- PyTorch\n- Transformers (用于调用 LLM 或编码器)\n- Scikit-learn, Pandas, Numpy 等数据处理库\n- CLIP-ViT 和 Sentence-BERT (用于多模态编码，代码中会自动处理或需手动下载模型权重)\n\n> **注意**：若需复现完整的 LLM 数据增强阶段（Stage 1），您需要拥有 OpenAI API Key 或其他兼容的大模型接口权限。若仅使用已提供的增强数据进行训练（Stage 2），则无需 API Key。\n\n## 安装步骤\n\n1. **克隆仓库**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec.git\n   cd LLMRec\n   ```\n\n2. **安装依赖**\n   建议使用虚拟环境（如 conda 或 venv），然后安装 requirements.txt 中的依赖。\n   \n   *国内用户加速建议*：使用清华或阿里镜像源加速 pip 安装。\n   ```bash\n   pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n3. **数据准备**\n   项目支持 `netflix` 和 `movielens` 数据集。\n   - **自动下载**：部分处理后的数据可能包含在代码逻辑中。\n   - **手动下载**：如需完整的多模态数据（图片、文本）及 LLM 增强后的数据，请访问 [Google Drive 链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1BGKm3nO4xzhyi_mpKJWcfxgi3sQ2j_Ec?usp=drive_link) 下载 `netflix` 数据集，并将其解压至 `LLMRec\u002Fdata\u002Fnetflix\u002F` 目录下。\n   \n   目录结构应如下：\n   ```text\n   LLMRec\u002F\n   ├── data\u002F\n   │   ├── netflix\u002F\n   │   │   ├── ... (原始数据及增强数据文件)\n   ```\n\n## 基本使用\n\nLLMRec 的使用分为两个阶段：**LLM 数据增强**（可选，若已有增强数据可跳过）和 **推荐模型训练**。\n\n### 阶段 1：基于 LLM 的数据增强（可选）\n如果您需要从头生成增强数据（需要配置 API Key），请运行以下脚本。\n*注意：此步骤耗时较长且产生 API 费用，大多数用户可直接使用官方提供的增强数据跳过此步。*\n\n```bash\ncd LLMRec\u002FLLM_augmentation\u002F\n# 设置您的 API KEY (以 OpenAI 为例)\nexport OPENAI_API_KEY=\"your_api_key_here\"\n\n# 执行增强脚本\npython .\u002Fgpt_ui_aug.py\npython .\u002Fgpt_user_profiling.py\npython .\u002Fgpt_i_attribute_generate_aug.py\n```\n\n### 阶段 2：推荐模型训练\n进入主目录，使用增强后的数据训练推荐模型。\n\n**基础命令：**\n```bash\ncd LLMRec\u002F\npython .\u002Fmain.py --dataset netflix\n```\n*注：支持的 dataset 参数为 `netflix` 或 `movielens`。*\n\n**消融实验示例（控制增强策略）：**\n您可以通过参数关闭特定的增强模块以进行对比实验：\n\n```bash\n# 完整模型 (LLMRec)\npython .\u002Fmain.py --dataset netflix\n\n# 关闭用户 - 物品交互边增强 (w\u002Fo-u-i)\npython .\u002Fmain.py --dataset netflix --aug_sample_rate=0.0\n\n# 关闭用户画像增强 (w\u002Fo-u)\npython .\u002Fmain.py --dataset netflix --user_cat_rate=0\n\n# 同时关闭用户画像和物品属性增强 (w\u002Fo-u&i)\npython .\u002Fmain.py --dataset netflix --user_cat_rate=0 --item_cat_rate=0\n\n# 关闭剪枝损失 (w\u002Fo-prune)\npython .\u002Fmain.py --dataset netflix --prune_loss_drop_rate=0\n```\n\n运行结束后，模型将在默认路径下保存检查点，并在终端输出评估指标（如 Recall, NDCG 等）。","某流媒体平台的数据科学团队正致力于优化其电影推荐系统，试图解决传统模型在冷启动和用户兴趣理解上的瓶颈。\n\n### 没有 LLMRec 时\n- **交互数据稀疏**：新用户或冷门电影缺乏足够的点击历史，导致基于协同过滤的算法无法建立有效的用户 - 物品连接，推荐结果随机且不准。\n- **内容理解浅层**：系统仅依赖有限的标签（如“动作”、“喜剧”）描述电影，无法捕捉剧情深度、情感基调等细粒度语义特征。\n- **用户画像模糊**：用户行为被简化为 ID 序列，缺乏自然语言层面的兴趣总结，难以区分“喜欢科幻特效”与“喜欢科幻哲学”的本质差异。\n- **多模态融合困难**：虽然拥有海报和简介文本，但传统模型难以将这些非结构化数据有效融入图神经网络进行联合推理。\n\n### 使用 LLMRec 后\n- **交互图增强**：LLMRec 利用大模型生成潜在的“用户 - 电影”互动边，即使在没有实际点击的情况下，也能基于语义相似性补全稀疏图谱，显著缓解冷启动问题。\n- **属性深度扩充**：通过 LLM 自动重写和扩充电影节点属性，将简短简介转化为包含主题、风格、情感色彩的丰富文本描述，提升了物品表征的区分度。\n- **精细化用户建模**：工具自动分析用户历史行为并生成自然语言画像（如“偏爱高智商犯罪剧”），使推荐系统能从语义层面精准匹配用户深层需求。\n- **无缝多模态整合**：LLMRec 将生成的文本增强数据直接嵌入图结构，让推荐模型能同时利用视觉特征和深层语义信息，大幅提升了排序准确率。\n\nLLMRec 通过将大语言的语义理解能力注入推荐图谱，成功将稀疏的行为数据转化为丰富的语义连接，实现了从“猜你可能喜欢”到“懂你为何喜欢”的跨越。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_LLMRec_5f443079.png","HKUDS","✨Data Intelligence Lab@HKU✨","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHKUDS_fc32cc87.jpg",null,"https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh","https:\u002F\u002Fgithub.com\u002FHKUDS",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,520,66,"2026-04-13T09:08:46","Apache-2.0","未说明","需要 NVIDIA GPU（用于运行 CLIP-ViT 和 Sentence-BERT 编码器及推荐模型训练），具体型号和显存大小未说明，需支持 CUDA",{"notes":94,"python":91,"dependencies":95},"1. 项目依赖 requirements.txt 文件安装环境，但 README 未列出具体包列表。\n2. 第一阶段数据增强需调用 OpenAI API (gpt-3.5-turbo-0613, text-embedding-ada-002)，需准备 API Key。\n3. 视觉编码使用 CLIP-ViT (openai\u002Fclip-vit-base-patch32)，文本编码使用 Sentence-BERT。\n4. 支持的数据集为 Netflix 和 MovieLens，需手动下载预处理后的数据。",[96,97,98],"torch","transformers (隐含，用于 CLIP-ViT)","sentence-transformers (隐含，用于 Sentence-BERT)",[16],[101,102,103,104,105,106,107,108],"content-based-recommendation","data-augmentation-strategies","graph-augmentation","recommendation-system","recommendation-with-side-information","multi-modal-recommendation","colloborative-filtering","graph-learning","2026-03-27T02:49:30.150509","2026-04-15T06:06:36.885444",[112,117,122,127,132,137],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},33672,"运行代码时遇到连接错误（NewConnectionError）或找不到百度内部 URL，如何解决？","该项目原本使用的是百度内部的 API 地址和 Token，外部用户无法直接访问。若要使用 ChatGPT 或其他公开 LLM，请执行以下操作：\n1. 注释掉代码中的 `openai.api_base = \"http:\u002F\u002Fllms-se.baidu-int.com:8200\"` 这一行。\n2. 添加 `openai.api_key = \"你的 OpenAI Key\"`。\n3. 在请求头中将 Authorization 设置为 `\"Bearer 你的 OpenAI Key\"`。\n注意：注册 OpenAI 账号需要合法的国际信用卡和手机号，如需虚拟手机号可参考相关网站（如 sms-activate.org）。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F10",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},33673,"运行 main.py 时报错 FileNotFoundError，提示找不到 logs 目录或文件，怎么办？","这是因为程序试图写入日志文件，但本地的 `.\u002Flogs` 目录不存在。解决方法很简单：在项目根目录下手动创建一个名为 `logs` 的新文件夹即可。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F22",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},33674,"安装 requirements.txt 时出现版本不匹配错误（如 anaconda-client），应该使用什么 Python 版本？","项目作者使用的 Python 版本是 3.9.13。requirements.txt 文件是基于该环境生成的，建议将您的虚拟环境切换到 Python 3.9.x 版本（例如 3.9.13）后重新尝试安装依赖，以避免版本兼容性问题。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F13",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},33675,"运行 gpt_ui_aug.py 时报错找不到 'candidate_indices' 文件，该如何解决？","该文件包含在项目所需的数据集中。请访问作者提供的 Google Drive 链接（https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1BGKm3nO4xzhyi_mpKJWcfxgi3sQ2j_Ec），下载完整的 Netflix 数据集文件，并确保 `candidate_indices` 文件已放置在代码读取的正确路径下。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F11",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},33676,"项目使用了哪些数据集？在哪里可以下载处理好的 MovieLens 或 Netflix 数据？","Netflix 数据集由作者基于 Kaggle 上的 Netflix Prize Data 自行处理（包含文本整理和图片爬取），已发布在项目的数据下载链接中。MovieLens 数据集使用的是 ml-10m 版本，您可以直接从 GroupLens 官网下载：https:\u002F\u002Ffiles.grouplens.org\u002Fdatasets\u002Fmovielens\u002Fml-10m-README.html。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F8",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},33677,"执行 pip install -r requirements.txt 时出现 OSError 或路径不存在的错误，如何处理？","这通常是因为 requirements.txt 中包含了特定环境下的本地路径引用。建议忽略该文件中的路径报错，直接查看各个 .py 文件头部注释中列出的具体依赖包名称，然后使用 `pip install \u003C包名>` 单独安装这些必要的包。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLLMRec\u002Fissues\u002F6",[]]