[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hidasib--GRU4Rec":3,"tool-hidasib--GRU4Rec":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,2,"2026-04-10T11:13:16",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[19,14,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},5773,"cs-video-courses","Developer-Y\u002Fcs-video-courses","cs-video-courses 是一个精心整理的计算机科学视频课程清单，旨在为自学者提供系统化的学习路径。它汇集了全球知名高校（如加州大学伯克利分校、新南威尔士大学等）的完整课程录像，涵盖从编程基础、数据结构与算法，到操作系统、分布式系统、数据库等核心领域，并深入延伸至人工智能、机器学习、量子计算及区块链等前沿方向。\n\n面对网络上零散且质量参差不齐的教学资源，cs-video-courses 解决了学习者难以找到成体系、高难度大学级别课程的痛点。该项目严格筛选内容，仅收录真正的大学层级课程，排除了碎片化的简短教程或商业广告，确保用户能接触到严谨的学术内容。\n\n这份清单特别适合希望夯实计算机基础的开发者、需要补充特定领域知识的研究人员，以及渴望像在校生一样系统学习计算机科学的自学者。其独特的技术亮点在于分类极其详尽，不仅包含传统的软件工程与网络安全，还细分了生成式 AI、大语言模型、计算生物学等新兴学科，并直接链接至官方视频播放列表，让用户能一站式获取高质量的教育资源，免费享受世界顶尖大学的课堂体验。",79792,"2026-04-08T22:03:59",[18,13,14,20],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",75832,"2026-04-17T21:58:25",[19,13,20,18],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":29,"last_commit_at":63,"category_tags":64,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,"2026-04-03T21:50:24",[20,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":79,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":94,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":107,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":22,"created_at":108,"updated_at":109,"faqs":110,"releases":139},8923,"hidasib\u002FGRU4Rec","GRU4Rec","GRU4Rec is the original Theano implementation of the algorithm in \"Session-based Recommendations with Recurrent Neural Networks\" paper, published at ICLR 2016 and its follow-up \"Recurrent Neural Networks with Top-k Gains for Session-based Recommendations\". The code is optimized for execution on the GPU.","GRU4Rec 是一款专为“基于会话的推荐系统”设计的开源算法实现，旨在解决用户在未登录或无历史数据时，如何仅凭当前浏览序列精准预测其下一步兴趣的难题。它源自两篇发表于 ICLR 的重要学术论文，利用循环神经网络（RNN）中的门控循环单元（GRU）来捕捉用户行为的时间动态特征。\n\n该工具特别适合从事推荐系统研究的研究人员以及需要构建高性能原型的专业开发者使用。其核心亮点在于极致的运行效率：代码基于 Theano 框架深度优化，专为 GPU 加速设计，在 GTX 1080Ti 上每秒可处理高达 1500 个迷你批次，且 97.5% 的计算时间均在 GPU 上完成。官方特别强调，虽然社区存在 PyTorch 或 TensorFlow 版本，但未经严格验证的非官方复现可能导致推荐准确率大幅下降或训练时间显著延长，因此建议优先使用经过验证的官方实现以确保结果的可复现性。无论是进行学术实验还是探索序列感知模型，GRU4Rec 都提供了一个高效、可靠的基准方案。","# GRU4Rec\r\n\r\nThis is the original Theano implementation of the algorithm of the paper [\"Session-based Recommendations With Recurrent Neural Networks\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939 \"Session-based Recommendations With Recurrent Neural Networks\"), with the extensions introduced in the paper [\"Recurrent Neural Networks with Top-k Gains for Session-based Recommendations\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03847 \"Recurrent Neural Networks with Top-k Gains for Session-based Recommendations\").\r\n\r\nMake sure to always use the latest version as baseline and cite both papers when you do so!\r\n\r\nThe code was optimized for fast execution on the GPU (up to 1500 mini-batch per second on a GTX 1080Ti). According to the Theano profiler, training spends 97.5% of the time on the GPU (0.5% on CPU and 2% moving data between the two). Running on the CPU is not supported, but it is possible with some modificatons to the code.\r\n\r\nIf you are afraid of using Theano, the following official reimplementations are also available.  \r\n- [Official **PyTorch** version of GRU4Rec](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec_PyTorch_Official)  \r\n- [Official **Tensorflow** version of GRU4Rec](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec_Tensorflow_Official)  \r\n\r\n*NOTE:* These have been validated against the original, but due to how more modern deep learning frameworks operate, they are 1.5-4x slower than this version. Other reimplementations might be available in the future, depending on the research community's interest level.  \r\n**IMPORTANT!** Avoid using unofficial reimplementations. We thorougly examined 6 third party reimplementations (PyTorch\u002FTensorflow, standalone\u002Fframework) in [\"The Effect of Third Party Implementations on Reproducibility\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14956) and all of them were flawed and\u002For missed important features, that resulted in up to **99% lower recommendation accuracy** and up to **335 times longer training times**. Other reimplementations we have found since then are no better.\r\n\r\nYou can train and evaluate the model on your own session data easily using `run.py`. Usage information below.\r\n\r\nScroll down for infromation on reproducing results on public datasets and hyperparameter tuning!\r\n\r\n**LICENSE:** See [license.txt](license.txt) for details. Main guidelines: for research and education purposes the code is and always will be free to use. Using the code or parts of it in commercial systems requires a licence. If you've been using the code or any of its derivates in a commercial system, contact me!\r\n\r\n**CONTENTS:**  \r\n[Requirements](#requirements \"Requirements\")  \r\n  [Theano configuration](#theano-configuration \"Theano configuration\")  \r\n[Usage](#usage \"Usage\")  \r\n  [Execute experiments using `run.py`](#execute-experiments-using-runpy \"Execute experiments using run.py\")  \r\n    [Examples](#examples \"Examples\")  \r\n  [Using GRU4Rec in code or the interpreter](#using-gru4rec-in-code-or-the-interpreter \"Using GRU4Rec in code or the interpreter\")  \r\n  [Notes on sequence-aware and session-based models](#notes-on-sequence-aware-and-session-based-models \"Notes on sequence-aware and session-based models\")  \r\n  [Notes on parameter settings](#notes-on-parameter-settings \"Notes on parameter settings\")  \r\n[Speed of training](#speed-of-training \"Speed of training\")  \r\n[Reproducing results on public datasets](#reproducing-results-on-public-datasets \"Reproducing results on public datasets\")  \r\n[Hyperparameter tuning](#hyperparameter-tuning \"Hyperparameter tuning\")  \r\n[Executing on CPU](#executing-on-cpu \"Executing on CPU\")  \r\n[Major updates](#major-updates \"Major updates\")  \r\n\r\n\r\n## Requirements\r\n\r\n- **python** --> Use python `3.6.3` or newer. The code was mostly tested on `3.6.3`, `3.7.6` and `3.8.12`, but was briefly tested on other versions. Python 2 is NOT supported.\r\n- **numpy** --> `1.16.4` or newer.\r\n- **pandas** --> `0.24.2` or newer.\r\n- **CUDA** --> Needed for the GPU support of Theano. The latest CUDA version Theano was tested with (to the best of my knowledge) is `9.2`. It works fine with more recent versions, e.g. `11.8`.\r\n- **libgpuarray** --> Required for the GPU support of Theano, use the latest version.\r\n- **theano** --> `1.0.5` (last stable release) or newer (occassionally it is still updated with minor stuff). GPU support should be installed.\r\n- **optuna** --> (optional) for hyperparameter optimization, code was tested with `3.0.3`\r\n\r\n**IMPORTANT: cuDNN** --> More recent versions produce a warning, but `8.2.1` still work for me. GRU4Rec doesn't rely heavily on the part of Theano that utilizes cuDNN. Unfortunately, `cudnnReduceTensor` in cuDNN `v7` and newer is seriously bugged, which makes operators based on this function slow and even occasionally unstable (incorrect computations or segfault) when cuDNN is used (e.g. [see here](https:\u002F\u002Fgithub.com\u002FTheano\u002FTheano\u002Fissues\u002F6432)). Therefore it is best not to use cuDNN. If you already have it installed, you can easily configure Theano to exclude cuDNN based operators (see below).  \r\n**This bug is not related to Theano and can be reproduced from CUDA\u002FC++. Unfortunately it hasn't been fixed for more than 6 years.*\r\n\r\n### Theano configuration\r\n\r\nThis code was optimized for GPU execution. Executing the code will fail if you try to run it on CPU (if you really want to mess with it, check out the relevant section of this readme). Therefore Theano configuration must be set in a way to use the GPU. If you use `run.py` for runnning experiments, the code sets this configuration for you. You might want to change some of the preset configuration (e.g. execute on a specified GPU instead of the one with the lowest Id). You can do this in the `THEANO_FLAGS` environment variable or edit `.theanorc_gru4rec`.\r\n\r\nIf you don't use `run.py`, it is possible that the preset config won't have any effect (this happens if theano is imported before `gru4rec` either directly or by another module). In this case, you must set your own config by either editing your `.theanorc` or setting up the `THEANO_FLAGS` environment variable. Please refer to the [documentation of Theano](http:\u002F\u002Fdeeplearning.net\u002Fsoftware\u002Ftheano\u002Flibrary\u002Fconfig.html).\r\n\r\n**Important config parameters**\r\n- `device` --> must always be a CUDA capable GPU (e.g. `cuda0`).\r\n- `floatX` --> must always be `float32`\r\n- `mode` --> should be `FAST_RUN` for fast execution\r\n- `optimizer_excluding` --> should be `local_dnn_reduction:local_cudnn_maxandargmax:local_dnn_argmax` to tell Theano not to use cuDNN based operators, because its `cudnnReduceTensor` function has been bugged since `v7`\r\n\r\n## Usage\r\n\r\n### Execute experiments using `run.py`\r\n`run.py` is an easy way to train, evaluate and save\u002Fload GRU4Rec models.\r\n\r\nExecute with the `-h` argument to take a look at the parameters.\r\n```\r\n$ python run.py -h\r\n```\r\nOutput:\r\n```\r\nusage: run.py [-h] [-ps PARAM_STRING] [-pf PARAM_PATH] [-l] [-s MODEL_PATH] [-t TEST_PATH [TEST_PATH ...]] [-m AT [AT ...]] [-e EVAL_TYPE] [-ss SS] [--sample_store_on_cpu] [-g GRFILE] [-d D] [-ik IK] [-sk SK] [-tk TK]\r\n              [-pm METRIC] [-lpm]\r\n              PATH\r\n\r\nTrain or load a GRU4Rec model & measure recall and MRR on the specified test set(s).\r\n\r\npositional arguments:\r\n  PATH                  Path to the training data (TAB separated file (.tsv or .txt) or pickled pandas.DataFrame object (.pickle)) (if the --load_model parameter is NOT provided) or to the serialized model (if the\r\n                        --load_model parameter is provided).\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -ps PARAM_STRING, --parameter_string PARAM_STRING\r\n                        Training parameters provided as a single parameter string. The format of the string is `param_name1=param_value1,param_name2=param_value2...`, e.g.: `loss=bpr-\r\n                        max,layers=100,constrained_embedding=True`. Boolean training parameters should be either True or False; parameters that can take a list should use \u002F as the separator (e.g. layers=200\u002F200).\r\n                        Mutually exclusive with the -pf (--parameter_file) and the -l (--load_model) arguments and one of the three must be provided.\r\n  -pf PARAM_PATH, --parameter_file PARAM_PATH\r\n                        Alternatively, training parameters can be set using a config file specified in this argument. The config file must contain a single OrderedDict named `gru4rec_params`. The parameters must have\r\n                        the appropriate type (e.g. layers = [100]). Mutually exclusive with the -ps (--parameter_string) and the -l (--load_model) arguments and one of the three must be provided.\r\n  -l, --load_model      Load an already trained model instead of training a model. Mutually exclusive with the -ps (--parameter_string) and the -pf (--parameter_file) arguments and one of the three must be provided.\r\n  -s MODEL_PATH, --save_model MODEL_PATH\r\n                        Save the trained model to the MODEL_PATH. (Default: don't save model)\r\n  -t TEST_PATH [TEST_PATH ...], --test TEST_PATH [TEST_PATH ...]\r\n                        Path to the test data set(s) located at TEST_PATH. Multiple test sets can be provided (separate with spaces). (Default: don't evaluate the model)\r\n  -m AT [AT ...], --measure AT [AT ...]\r\n                        Measure recall & MRR at the defined recommendation list length(s). Multiple values can be provided. (Default: 20)\r\n  -e EVAL_TYPE, --eval_type EVAL_TYPE\r\n                        Sets how to handle if multiple items in the ranked list have the same prediction score (which is usually due to saturation or an error). See the documentation of evaluate_gpu() in evaluation.py\r\n                        for further details. (Default: standard)\r\n  -ss SS, --sample_store_size SS\r\n                        GRU4Rec uses a buffer for negative samples during training to maximize GPU utilization. This parameter sets the buffer length. Lower values require more frequent recomputation, higher values\r\n                        use more (GPU) memory. Unless you know what you are doing, you shouldn't mess with this parameter. (Default: 10000000)\r\n  --sample_store_on_cpu\r\n                        If provided, the sample store will be stored in the RAM instead of the GPU memory. This is not advised in most cases, because it significantly lowers the GPU utilization. This option is\r\n                        provided if for some reason you want to train the model on the CPU (NOT advised). Note that you need to make modifications to the code so that it is able to run on CPU.\r\n  -g GRFILE, --gru4rec_model GRFILE\r\n                        Name of the file containing the GRU4Rec class. Can be used to select different varaiants. (Default: gru4rec)\r\n  -ik IK, --item_key IK\r\n                        Column name corresponding to the item IDs (detault: ItemId).\r\n  -sk SK, --session_key SK\r\n                        Column name corresponding to the session IDs (default: SessionId).\r\n  -tk TK, --time_key TK\r\n                        Column name corresponding to the timestamp (default: Time).\r\n  -pm METRIC, --primary_metric METRIC\r\n                        Set primary metric, recall or mrr (e.g. for paropt). (Default: recall)\r\n  -lpm, --log_primary_metric\r\n                        If provided, evaluation will log the value of the primary metric at the end of the run. Only works with one test file and list length.\r\n```\r\n\r\n#### Examples\r\n\r\nTrain, save and evaluate a model measuring recall and MRR at 1, 5, 10 and 20 using model parameters from a parameter string.\r\n```\r\n$ THEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data_file -t \u002Fpath\u002Fto\u002Ftest_data_file -m 1 5 10 20 -ps layers=224,batch_size=80,dropout_p_embed=0.5,dropout_p_hidden=0.05,learning_rate=0.05,momentum=0.4,n_sample=2048,sample_alpha=0.4,bpreg=1.95,logq=0.0,loss=bpr-max,constrained_embedding=True,final_act=elu-0.5,n_epochs=10 -s \u002Fpath\u002Fto\u002Fsave_model.pickle\r\n```\r\nOutput (on the RetailRocket dataset):\r\n```\r\nUsing cuDNN version 8201 on context None\r\nMapped name None to device cuda0: NVIDIA A30 (0000:3B:00.0)\r\nCreating GRU4Rec model\r\nSET   layers                  TO   [224]     (type: \u003Cclass 'list'>)\r\nSET   batch_size              TO   80        (type: \u003Cclass 'int'>)\r\nSET   dropout_p_embed         TO   0.5       (type: \u003Cclass 'float'>)\r\nSET   dropout_p_hidden        TO   0.05      (type: \u003Cclass 'float'>)\r\nSET   learning_rate           TO   0.05      (type: \u003Cclass 'float'>)\r\nSET   momentum                TO   0.4       (type: \u003Cclass 'float'>)\r\nSET   n_sample                TO   2048      (type: \u003Cclass 'int'>)\r\nSET   sample_alpha            TO   0.4       (type: \u003Cclass 'float'>)\r\nSET   bpreg                   TO   1.95      (type: \u003Cclass 'float'>)\r\nSET   logq                    TO   0.0       (type: \u003Cclass 'float'>)\r\nSET   loss                    TO   bpr-max   (type: \u003Cclass 'str'>)\r\nSET   constrained_embedding   TO   True      (type: \u003Cclass 'bool'>)\r\nSET   final_act               TO   elu-0.5   (type: \u003Cclass 'str'>)\r\nSET   n_epochs                TO   10        (type: \u003Cclass 'int'>)\r\nLoading training data...\r\nLoading data from TAB separated file: \u002Fpath\u002Fto\u002Ftraining_data_file\r\nStarted training\r\nThe dataframe is already sorted by SessionId, Time\r\nCreated sample store with 4882 batches of samples (type=GPU)\r\nEpoch1 --> loss: 0.484484       (6.81s)         [1026.65 mb\u002Fs | 81386 e\u002Fs]\r\nEpoch2 --> loss: 0.381974       (6.89s)         [1015.39 mb\u002Fs | 80493 e\u002Fs]\r\nEpoch3 --> loss: 0.353932       (6.81s)         [1027.68 mb\u002Fs | 81468 e\u002Fs]\r\nEpoch4 --> loss: 0.340034       (6.80s)         [1028.90 mb\u002Fs | 81564 e\u002Fs]\r\nEpoch5 --> loss: 0.330763       (6.80s)         [1028.19 mb\u002Fs | 81508 e\u002Fs]\r\nEpoch6 --> loss: 0.324075       (6.80s)         [1029.36 mb\u002Fs | 81601 e\u002Fs]\r\nEpoch7 --> loss: 0.319033       (6.85s)         [1022.03 mb\u002Fs | 81020 e\u002Fs]\r\nEpoch8 --> loss: 0.314915       (6.80s)         [1029.05 mb\u002Fs | 81577 e\u002Fs]\r\nEpoch9 --> loss: 0.311716       (6.82s)         [1025.44 mb\u002Fs | 81290 e\u002Fs]\r\nEpoch10 --> loss: 0.308915      (6.82s)         [1025.64 mb\u002Fs | 81306 e\u002Fs]\r\nTotal training time: 77.73s\r\nSaving trained model to: \u002Fpath\u002Fto\u002Fsave_model.pickle\r\nLoading test data...\r\nLoading data from TAB separated file: \u002Fpath\u002Fto\u002Ftest_data_file\r\nStarting evaluation (cut-off=[1, 5, 10, 20], using standard mode for tiebreaking)\r\nMeasuring Recall@1,5,10,20 and MRR@1,5,10,20\r\nEvaluation took 4.34s\r\nRecall@1: 0.128055 MRR@1: 0.128055\r\nRecall@5: 0.322165 MRR@5: 0.197492\r\nRecall@20: 0.518184 MRR@20: 0.217481\r\n```\r\n\r\nTrain on `cuda0` using parameters from a parameter file and save the model.\r\n```\r\n$ THEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data_file -pf \u002Fpath\u002Fto\u002Fparameter_file.py -s \u002Fpath\u002Fto\u002Fsave_model.pickle\r\n```\r\nOutput (on the RetailRocket dataset):\r\n```\r\nUsing cuDNN version 8201 on context None\r\nMapped name None to device cuda0: NVIDIA A30 (0000:3B:00.0)\r\nCreating GRU4Rec model\r\nSET   layers                  TO   [224]     (type: \u003Cclass 'list'>)\r\nSET   batch_size              TO   80        (type: \u003Cclass 'int'>)\r\nSET   dropout_p_embed         TO   0.5       (type: \u003Cclass 'float'>)\r\nSET   dropout_p_hidden        TO   0.05      (type: \u003Cclass 'float'>)\r\nSET   learning_rate           TO   0.05      (type: \u003Cclass 'float'>)\r\nSET   momentum                TO   0.4       (type: \u003Cclass 'float'>)\r\nSET   n_sample                TO   2048      (type: \u003Cclass 'int'>)\r\nSET   sample_alpha            TO   0.4       (type: \u003Cclass 'float'>)\r\nSET   bpreg                   TO   1.95      (type: \u003Cclass 'float'>)\r\nSET   logq                    TO   0.0       (type: \u003Cclass 'float'>)\r\nSET   loss                    TO   bpr-max   (type: \u003Cclass 'str'>)\r\nSET   constrained_embedding   TO   True      (type: \u003Cclass 'bool'>)\r\nSET   final_act               TO   elu-0.5   (type: \u003Cclass 'str'>)\r\nSET   n_epochs                TO   10        (type: \u003Cclass 'int'>)\r\nLoading training data...\r\nLoading data from TAB separated file: \u002Fpath\u002Fto\u002Ftraining_data_file\r\nStarted training\r\nThe dataframe is already sorted by SessionId, Time\r\nCreated sample store with 4882 batches of samples (type=GPU)\r\nEpoch1 --> loss: 0.484484       (6.81s)         [1026.65 mb\u002Fs | 81386 e\u002Fs]\r\nEpoch2 --> loss: 0.381974       (6.89s)         [1015.39 mb\u002Fs | 80493 e\u002Fs]\r\nEpoch3 --> loss: 0.353932       (6.81s)         [1027.68 mb\u002Fs | 81468 e\u002Fs]\r\nEpoch4 --> loss: 0.340034       (6.80s)         [1028.90 mb\u002Fs | 81564 e\u002Fs]\r\nEpoch5 --> loss: 0.330763       (6.80s)         [1028.19 mb\u002Fs | 81508 e\u002Fs]\r\nEpoch6 --> loss: 0.324075       (6.80s)         [1029.36 mb\u002Fs | 81601 e\u002Fs]\r\nEpoch7 --> loss: 0.319033       (6.85s)         [1022.03 mb\u002Fs | 81020 e\u002Fs]\r\nEpoch8 --> loss: 0.314915       (6.80s)         [1029.05 mb\u002Fs | 81577 e\u002Fs]\r\nEpoch9 --> loss: 0.311716       (6.82s)         [1025.44 mb\u002Fs | 81290 e\u002Fs]\r\nEpoch10 --> loss: 0.308915      (6.82s)         [1025.64 mb\u002Fs | 81306 e\u002Fs]\r\nTotal training time: 77.73s\r\nSaving trained model to: \u002Fpath\u002Fto\u002Fsave_model.pickle\r\n```\r\n\r\nLoad a previously trained model to `cuda1` and evaluate it measuring recall and MRR at 1, 5, 10 and 20 using the conservative method for tiebreaking.\r\n```\r\n$ THEANO_FLAGS=device=cuda1 python run.py \u002Fpath\u002Fto\u002Fpreviously_saved_model.pickle -l -t \u002Fpath\u002Fto\u002Ftest_data_file -m 1 5 10 20 -e conservative\r\n```\r\nOutput (on the RetailRocket dataset):\r\n```\r\nUsing cuDNN version 8201 on context None\r\nMapped name None to device cuda1: NVIDIA A30 (0000:AF:00.0)\r\nLoading trained model from file: \u002Fpath\u002Fto\u002Fpreviously_saved_model.pickle\r\nLoading test data...\r\nLoading data from TAB separated file: \u002Fpath\u002Fto\u002Ftest_data_file\r\nStarting evaluation (cut-off=[1, 5, 10, 20], using standard mode for tiebreaking)\r\nMeasuring Recall@1,5,10,20 and MRR@1,5,10,20\r\nEvaluation took 4.34s\r\nRecall@1: 0.128055 MRR@1: 0.128055\r\nRecall@5: 0.322165 MRR@5: 0.197492\r\nRecall@20: 0.518184 MRR@20: 0.217481\r\n```\r\n\r\n### Using GRU4Rec in code or the interpreter\r\nYou can simply import the `gru4rec` module in your code or in an interpreter and use the `GRU4Rec` class to create and train models. The trained models can be evaluated by importing the `evaluation` module and using either the `evaluate_gpu` or the `evaluate_session_batch` method. The latter is deprecated and doesn't fully utilize the GPU and is therefore significantly slower. The public version of this code is mainly for running experiments (training and evaluating the algorithm on different datasets), therefore retrieving the actual predictions can be cumbersome and ineffective.\r\n\r\n**IMPORTANT!** For the sake of convenience, the `gru4rec` module sets some important Theano parameters so that you don't have to worry about them if you are not familiar with Theano. But this only has any effect if `gru4rec` is imported *BEFORE* Theano (and any module that imports Theano) is imported. (Because once Theano is initialized, most of its configuration can't be changed. And even if Theano is reimported, the GPU is not reinitialized.) If you do it the other way around, you should set your default `.theanorc` or provide the `THEANO_FLAGS` environment variable with the appropriate configuration.\r\n\r\n### Notes on sequence-aware and session-based models\r\nGRU4Rec is originally for session-based recommendations, where the generally short sessions are considered independent. Every time a user comes to the site, they are considered to be unknown, i.e. nothing of their history is used, even if it is known. (This setup is great for many real-life applications.) This means that when the model is evaluated, the hidden state starts from zero for each test session.\r\n\r\nHowever, RNN (CNN, Transformer, etc.) based models are also a great fit for the practically less important sequence-aware personalized recommendation setup (i.e. the whole user history is used as a sequence to predict future items in the sequence). There are two main differences: \r\n- (1) The sequences are significantly longer in sequence-aware recommendations. This also means that BPTT (backpropagation through time) is useful in this scenario. For session-based recommendations, experiments suggest that BPTT doesn't improve the model.\r\n- (2) Evaluation in the sequence-aware setup should be started from the last value of the hidden state (i.e. the value computed on the training portion of the user history).\r\n\r\nCurrently, neither of these are supported in the public code. These functionalities might be added later if there is enough interewst from the community (they exist in some of my internal research repos). At the moment, you have to extend the code yourself to do this.\r\n\r\n### Notes on parameter settings\r\nGRU4Rec has many parameters (and private versions had even more throughout the years). While you are welcome to play around with them, I found that it is usually the best to leave the following parameters on their default value.\r\n| Parameter             | Defaults to | Comment                                                                                                               |\r\n|--------------------|:-----------:|----------------------------------------------------------------------------------------------------------------------|\r\n| `hidden_act`         |    `tanh`     | The activation function of the hidden layer should be tanh.                                                          |\r\n| `lmbd`               |     `0.0`     | L2 regularization is not needed, use dropout for regularization.                                                       |\r\n| `smoothing`          |     `0.0`     | Label smoothing for cross-entropy loss only has a minor impact on performance.                      |\r\n| `adapt`              |   `adagrad`   | Optimizers perform similarly, with adagrad being slightly better than the others.       |\r\n| `adapt_params`       |     `[]`      | Adagrad has no hyperparameters, therefore this is an empty list.                                                  |\r\n| `grad_cap`           |     `0.0`     | Training works fine without gradient capping\u002Fclipping.                                               |\r\n| `sigma`              |     `0.0`     | Setting the min\u002Fmax value during weight initialization is not needed, `±sqrt(6.0\u002F(dim[0] + dim[1]))` is used when this is 0. |\r\n| `init_as_normal`     |    `False`    | Weights should be initialized from uniform distribution.                                                            |\r\n| `train_random_order` |    `False`    | Training sessions should not be shuffled so that the last updates are based on recent data.                           |\r\n| `time_sort`          |    `True`     | Training sessions should be sorted in ascending order by the timestamp of their first event (oldest session first) so that the last updates are based on recent data.        |\r\n\r\n**Losses and final activations:**\r\n- Among the five loss options (`cross-entropy`, `bpr-max`, `top1`, `bpr`, `top1-max`) **only `cross-entropy` or `bpr-max` should be used**. The others work to some extent, but they suffer from the vanishing gradient problem and thus produce models inferior to the ones trained with `bpr-max` or `cross-entropy`.\r\n- `cross-entropy` has an alternative formulation `xe_logit` that requires a different final activation to be set.\r\n- Always us the final activation (`final_act` parameter) appropriate for the loss. For `loss=cross-entropy` this is always `final_act=softmax`, for `loss=xe_logit` always use `final_act=softmax_logit`, for `loss=bpr-max` you can use either `final_act=linear`, `final_act=relu`, `final_act=tanh`, , `final_act=leaky-\u003CX>`, `final_act=elu-\u003CX>` or `final_act=selu-\u003CX>-\u003CY>` (I usually prefer `elu-0.5` or `elu-1`).\r\n\r\n**Embedding modes:** As it is described in the papers, there are three embedding modes that you can set as follows.\r\n| Embedding mode | How to set | Description |\r\n|-|-|-|\r\n| No embedding | `embedding=0` AND `constrained_embedding=False` | The one-hot vector of the item ID is directly fed to the GRU layer. |\r\n| Separate embedding | `embedding=X` where `X>0` or `X=layersize` AND `conmstrained_embedding=False` | Separate embedding on the input of the GRU layers and for computing the score with the sequence embedding. `embedding=layersize` sets it to the same dimensionality as the number of units in the first GRU layer. |\r\n| Shared embedding | `constrained_embedding=True` | Usually the best performing setting. Uses the same embedding on the input of the GRU layers and for computing the scores with the sequence embedding. This enforces the dimensionality of the embedding to be equal to the size of the last GRU layer, thus the `embedding` parameter has no effect in this mode. |\r\n\r\n\r\n## Speed of training\r\nThis version is the fastest version (by far). The speed of the official PyTorch and Tensorflow implementations are capped by the overhead introduced by the respective DL frameworks.\r\n\r\nTime to complete one epoch (in seconds) on publicly available datasets with the best parameterization (see below), measured on an nVidia A30. The Theano version is 1.7-3 times faster than the PyTorch or Tensorflow versions.\r\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_d073e817c686.png)\r\n\r\n*Details:* The short version is that Theano requires you to build a computational graph and then a Theano function needs to be created that does the computations described by the graph. During the creation of the function, the code is complied into a single (or sometimes more) C++\u002FCUDA executables which are executed every time you call the function from Python. If you don't use any Python based operators, control doesn't need to be given back to Python which significantly lowers the overhead. The published version of GRU4Rec works on ID based representations and thus a single minibatch usually can't max out the GPU. Therefore, having the overhead of passing control between C++\u002FCUDA and Python (as in PyTorch and Tensorflow) can significantly increase training times. This is why the difference is smaller if the layer and\u002For minibatch size is higher. But optimal performance sometimes requires smaller minibatches.\r\n\r\nThe training time mostly depends on the number of events and the model parameters. The following parameters affect the processing speed of events (event\u002Fs):\r\n- `batch_size` --> The processing speed of batches (mb\u002Fs) decreases much slower than the size increase of the batch, therefore event processing speeds up as `batch_size` increases. Unfortunately, `batch_size` also affects model accuracy and smaller batches are usually better for most datasets.\r\n- `n_sample` --> The number of negative samples up to 500-1000 doesn't affect processing speed (depending on the hardware). The default is `n_sample=2048`, but if the number of items is low, it might be lowered without loss of accuracy.\r\n- `loss` --> `cross-entropy` is somewhat faster than `bpr-max`\r\n- `dropout_p_embed`, `dropout_p_hidden`, `momentum` --> setting these to other than 0, training will be a little bit slower\r\n\r\nThe following figures show the difference between training speed (minibatch\u002Fsecond & event\u002Fsecond; higher is better) for various minibatch and layer sizes with and without dropout and momentum enabled, using `n_sample=2048`. Measured on an nVidia A30.\r\n\r\nWith `cross-entropy` loss:\r\n\r\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_f62c6089bb6b.png)\r\n\r\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_8e6f44dd25b3.png)\r\n\r\nWith `bpr-max` loss:\r\n\r\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_f2aba60d2a3b.png)\r\n\r\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_49d53aac9516.png)\r\n\r\n\r\n## Reproducing results on public datasets\r\nThe performance of GRU4Rec has been measured on multiple public datasets in [1,2,3,4]: Yoochoose\u002FRSC15, Rees46, Coveo, RetailRocket and Diginetica.\r\n\r\n*IMPORTANT:* Measuring performance of sequential recommenders makes sense only if the data (and the task) itself shows sequential patterns, e.g. session based data. Evaluation on rating data doesn't give informative results. See [4] for details as well as for other common flaws people do during evaluation of sequential recommenders.\r\n\r\n**Notes:**  \r\n- Always aim to include at least one realistically large dataset in your comparison (e.g. Rees46 is a good example).\r\n- The evaluation setup is described in detail in [1,2,3]. It is a next-item prediction type evaluation considering only the next item as relevant for a given inference. This is a good setup for behaviour prediction and correlates somewhat with online performance. It is a stricter setup than considering any of the subsequent items as relevant, which - while a perfecly reasonable setup - is more forgiving towards simplistic (e.g. counting based) methods. However, similarly to any other offline evaluation it is not a direct approximation of online performance.\r\n\r\n**Getting the data:** Please refer to the original source of the data to obtain a full and legal copy. Links here are provided as best effort. It is not guaranteed that they won't break over time.\r\n- [Yoochoose\u002FRSC15](https:\u002F\u002F2015.recsyschallenge.com) or ([reupload on Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fchadgostopp\u002Frecsys-challenge-2015))\r\n- [Rees46](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fmkechinov\u002Fecommerce-behavior-data-from-multi-category-store)\r\n- [Coveo](https:\u002F\u002Fgithub.com\u002Fcoveooss\u002Fshopper-intent-prediction-nature-2020)\r\n- [RetailRocket](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fretailrocket\u002Fecommerce-dataset)\r\n- [Diginetica](https:\u002F\u002Fcompetitions.codalab.org\u002Fcompetitions\u002F11161#learn_the_details-data2)\r\n\r\n**Preprocessing:**  \r\nThe details and the reasoning behind the preprocessing steps can be found in [1,2] for RSC15 and in [3] for Yoochoose, Rees46, Coveo, RetailRocket and Diginetica. Preprocessing script for RSC15 can be found in the [here](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fblob\u002Fmaster\u002Fexamples\u002Frsc15\u002Fpreprocess.py), and in [the repo corresponding to [3]](https:\u002F\u002Fgithub.com\u002Fhidasib\u002Fgru4rec_third_party_comparison) for Yoochoose, Rees46, Coveo, RetailRocket and Diginetica. After running the scripts, double check if the statistics of the resulting sets match what is reported in the papers.\r\n\r\nPreprocessing scripts yield 4 files per dataset:\r\n- `train_full` --> full training set, used for training the model for the final evaluation \r\n- `test` --> test set for the final evaluation of the model (the pair of `train_full`)\r\n- `train_tr` --> training set for hyperparameter optimization, experimentation\r\n- `train_valid` --> validation set for hyperparameter optimization, experimentation (the pair of `train_tr`)\r\n\r\nBasically, the full preprocessed dataset is split into `train_full` and `test`, then `train_full` is split into `train_tr` and `train_valid` using the same logic.\r\n\r\n*IMPORTANT:* Note that while RSC15 and Yoochoose is derived from the same source (Yoochoose dataset), the preprocessing is different. The main difference is that RSC15 doesn't use deduplication. Therefore results on the two datasets are not compareable and optimal hyperparameters might differ. It is recommended to use the Yoochoose version and rely on the RSC15 version only when comparing to previously reported results if the experiment can't be reproduced for some reason (e.g. implementation of the method is not available).\r\n\r\n[1] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk: [Session-based Recommendations with Recurrent Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939), ICLR 2016  \r\n[2] Balázs Hidasi, Alexandros Karatzoglou: [Recurrent Neural Networks with Top-k Gains for Session-based Recommendations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03847), CIKM 2018  \r\n[3] Balázs Hidasi, Ádám Czapp: [The Effect of Third Party Implementations on Reproducibility](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14956), RecSys 2023  \r\n[4] Balázs Hidasi, Ádám Czapp: [Widespread Flaws in Offline Evaluation of Recommender Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14951), RecSys 2023\r\n\r\n**Hyperparameters:**  \r\nHyperparameters for RSC15 were obtained using a local (star) search optimizer with restarting when a better parameterization is found. It used a smaller parameter space than what is included in this repo (e.g. hidden layer size was fixed to 100). Probably there is room for some small potential improvement here with the new Optuna based optimizer.\r\n\r\nHyperparameters for Yoochoose, Rees46, Coveo, RetailRocket and Diginetica were obtained using the parameter spaces uploaded to this repo. 200 runs were executed per dataset, per embedding mode (no embedding, separate embedding, shared embedding) and per loss function (cross-entropy, bpr-max). The primary metric was MRR@20 that usually also gave the best results wrt. recall@20. A separate training\u002Fvalidation set was used during parameter optimization that was created from the full training set the same way as the (full) training\u002Ftest split was created from the full dataset. Final results are measured on the test set with models trained on the full training set.\r\n\r\n**Best hyperparameters:**  \r\n*Note:* Parameter files (usable with the `-pf` argument of `run.py`) are [included](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Ftree\u002Fmaster\u002Fparamfiles) in this repo for convenience.\r\n\r\n| Dataset | loss | constrained_embedding | embedding | elu_param | layers | batch_size | dropout_p_embed | dropout_p_hidden | learning_rate | momentum | n_sample | sample_alpha | bpreg | logq |\r\n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\r\n| RSC15 | cross-entropy | True | 0 | 0 | 100 | 32 | 0.1 | 0 | 0.1 | 0 | 2048 | 0.75 | 0 | 1 |\r\n| Yoochoose | cross-entropy | True | 0 | 0 | 480 | 48 | 0 | 0.2 | 0.07 | 0 | 2048 | 0.2 | 0 | 1 |\r\n| Rees46 | cross-entropy | True | 0 | 0 | 512 | 240 | 0.45 | 0 | 0.065 | 0 | 2048 | 0.5 | 0 | 1 |\r\n| Coveo | bpr-max | True | 0 | 1 | 512 | 144 | 0.35 | 0 | 0.05 | 0.4 | 2048 | 0.2 | 1.85 | 0 |\r\n| RetailRocket | bpr-max | True | 0 | 0.5 | 224 | 80 | 0.5 | 0.05 | 0.05 | 0.4 | 2048 | 0.4 | 1.95 | 0 |\r\n| Diginetica | bpr-max | True | 0 | 1 | 512 | 128 | 0.5 | 0.3 | 0.05 | 0.15 | 2048 | 0.3 | 0.9 | 0 |\r\n\r\n**Results:**  \r\n*Note:* Due to the changes in the order of the executions of operations on the GPU, some slight variation (even up to a few percent) in the metrics is expected and acceptable.\r\n\r\n| Dataset | Recall@1 | MRR@1 | Recall@5 | MRR@5 | Recall@10 | MRR@10 | Recall@20 | MRR@20 |\r\n|---|---|---|---|---|---|---|---|---|\r\n| RSC15 | 0.1845 | 0.1845 | 0.4906 | 0.2954 | 0.6218 | 0.3130 | 0.7283 | 0.3205 |\r\n| Yoochoose | 0.1829 | 0.1829 | 0.4478 | 0.2783 | 0.5715 | 0.2949 | 0.6789 | 0.3024 |\r\n| Rees46 | 0.1114 | 0.1114 | 0.3010 | 0.1778 | 0.4135 | 0.1928 | 0.5293 | 0.2008 |\r\n| Coveo | 0.0513 | 0.0513 | 0.1496 | 0.0852 | 0.2212 | 0.0946 | 0.3135 | 0.1010 |\r\n| ReatilRocket |  0.1274 | 0.1274 | 0.3237 | 0.1977 | 0.4207 | 0.2107 | 0.5186 | 0.2175 |\r\n| Diginetica | 0.0725 | 0.0725 | 0.2369 | 0.1288 | 0.3542 | 0.1442 | 0.4995 | 0.1542 |\r\n\r\n## Hyperparameter tuning\r\nHyperparameter optimization on new datasets is supported by `paropt.py`. Internally it uses [Optuna](https:\u002F\u002Foptuna.org\u002F) and requires a defined parameter space. A few predefined parameter spaces are [included](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Ftree\u002Fmaster\u002Fparamspaces) in this repo.\r\n\r\n**Recommendations:**\r\n- Run between 100 and 200 iterations with the included parameter spaces.\r\n- Run separate optimizations when using different losses and embedding modes (no embedding (i.e. `embedding=0,constrained_embedding=False`), separate embedding (i.e. `embedding=layersize,constrained_embedding=False`) and shared embedding (i.e. `embedding=0,constrained_embedding=True`)).\r\n\r\n**Fixed parameters:** You can play around with these as well in the optimizer, however the following fixed settings have worked well in the past.  \r\n- `logq` --> Cross-entropy loss usually works the best with `logq=1`, the parameter has no effect when the BPR-max loss is used.\r\n- `n_sample` --> Based on experience, `n_sample=2048` is large enough to get good performance up to a few millions of items and not too large to significantly degrade the speed of training. However, you might want to lower this if the total number of active items is below 5-10K.\r\n- `n_epochs` --> This is usually set to `n_epochs=10`, but `5` gets you similar performance in most cases. So far there hasn't been any reason to significantly increase the number of epochs.\r\n- embedding mode --> Full paropt needs to check all three options separately, but in the past, shared embedding (`constrained_embedding=True` and `embedding=0`) worked the best for most datasets.\r\n- `loss` --> Full paropt needs to check both separately, but past experience indicates BPR-max to perform better on smaller and cross-entropy to perform better on larger datasets.\r\n- `final_act` --> Always use the final activation appropriate for the loss, e.g. `final_act=softmax` when `loss=cross-entropy` and either `elu-\u003Cx>`, `linear` or `relu` when `loss=bpr-max`.\r\n\r\n**Usage:**\r\n```\r\n$ python paropt.py -h\r\n```\r\n\r\nOutput:\r\n```\r\nusage: paropt.py [-h] [-g GRFILE] [-tf [FLAGS]] [-fp PARAM_STRING] [-opf PATH] [-m [AT]] [-nt [NT]] [-fm [AT [AT ...]]] [-pm METRIC] [-e EVAL_TYPE] [-ik IK] [-sk SK] [-tk TK] PATH TEST_PATH\r\n\r\nTrain or load a GRU4Rec model & measure recall and MRR on the specified test set(s).\r\n\r\npositional arguments:\r\n  PATH                  Path to the training data (TAB separated file (.tsv or .txt) or pickled pandas.DataFrame object (.pickle)) (if the --load_model parameter is NOT provided) or to the serialized model (if the --load_model parameter\r\n                        is provided).\r\n  TEST_PATH             Path to the test data set(s) located at TEST_PATH.\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -g GRFILE, --gru4rec_model GRFILE\r\n                        Name of the file containing the GRU4Rec class. Can be sued to select different varaiants. (Default: gru4rec)\r\n  -tf [FLAGS], --theano_flags [FLAGS]\r\n                        Theano settings.\r\n  -fp PARAM_STRING, --fixed_parameters PARAM_STRING\r\n                        Fixed training parameters provided as a single parameter string. The format of the string is `param_name1=param_value1,param_name2=param_value2...`, e.g.: `loss=bpr-max,layers=100,constrained_embedding=True`.\r\n                        Boolean training parameters should be either True or False; parameters that can take a list should use \u002F as the separator (e.g. layers=200\u002F200). Mutually exclusive with the -pf (--parameter_file) and the -l\r\n                        (--load_model) arguments and one of the three must be provided.\r\n  -opf PATH, --optuna_parameter_file PATH\r\n                        File describing the parameter space for optuna.\r\n  -m [AT], --measure [AT]\r\n                        Measure recall & MRR at the defined recommendation list length. A single values can be provided. (Default: 20)\r\n  -nt [NT], --ntrials [NT]\r\n                        Number of optimization trials to perform (Default: 50)\r\n  -fm [AT [AT ...]], --final_measure [AT [AT ...]]\r\n                        Measure recall & MRR at the defined recommendation list length(s) after the optimization is finished. Multiple values can be provided. (Default: 20)\r\n  -pm METRIC, --primary_metric METRIC\r\n                        Set primary metric, recall or mrr (e.g. for paropt). (Default: recall)\r\n  -e EVAL_TYPE, --eval_type EVAL_TYPE\r\n                        Sets how to handle if multiple items in the ranked list have the same prediction score (which is usually due to saturation or an error). See the documentation of evaluate_gpu() in evaluation.py for further\r\n                        details. (Default: standard)\r\n  -ik IK, --item_key IK\r\n                        Column name corresponding to the item IDs (detault: ItemId).\r\n  -sk SK, --session_key SK\r\n                        Column name corresponding to the session IDs (default: SessionId).\r\n  -tk TK, --time_key TK\r\n                        Column name corresponding to the timestamp (default: Time).\r\n```\r\n\r\n**Example:** Run a hyperparater optimization optimizing for MRR@20 for 200 iterations and measuring recall and MRR at 1, 5, 10 and 20 for the best variant after optimization is finished.\r\n*NOTE:* The paropt script can run on the CPU (`THEANO_FLAGS=device=cpu`) as models are trained in separate processes. You can control which device these training processes use by setting `-tf` argument that passes its value to the `THEANO_FLAGS` environment variable for the taining processes. In this example, training(s) are executed on `cuda0`.\r\n```\r\nTHEANO_FLAGS=device=cpu python paropt.py \u002Fpath\u002Fto\u002Ftraining_data_file_for_optimization \u002Fpath\u002Fto\u002Fvaliadation_data_file_for_optimization -pm mrr -m 20 -fm 1 5 10 20 -e conservative -fp n_sample=2048,logq=1.0,loss=cross-entropy,final_act=softmax,constrained_embedding=True,n_epochs=10 -tf device=cuda0 -opf \u002Fpath\u002Fto\u002Fparameter_space.json -n 200\r\n```\r\nOutput (first few lines):\r\n```\r\n--------------------------------------------------------------------------------\r\nPARAMETER SPACE\r\n        PARAMETER layers         type=int        range=[64..512] (step=32)       UNIFORM scale\r\n        PARAMETER batch_size     type=int        range=[32..256] (step=16)       UNIFORM scale\r\n        PARAMETER learning_rate          type=float      range=[0.01..0.25] (step=0.005)         UNIFORM scale\r\n        PARAMETER dropout_p_embed        type=float      range=[0.0..0.5] (step=0.05)    UNIFORM scale\r\n        PARAMETER dropout_p_hidden       type=float      range=[0.0..0.7] (step=0.05)    UNIFORM scale\r\n        PARAMETER momentum       type=float      range=[0.0..0.9] (step=0.05)    UNIFORM scale\r\n        PARAMETER sample_alpha   type=float      range=[0.0..1.0] (step=0.1)     UNIFORM scale\r\n--------------------------------------------------------------------------------\r\n[I 2023-07-25 03:19:53,684] A new study created in memory with name: no-name-83fade3e-49f3-4f26-ac76-5f6cb2f3a02c\r\nSET   n_sample                TO   2048                   (type: \u003Cclass 'int'>)\r\nSET   logq                    TO   1.0                    (type: \u003Cclass 'float'>)\r\nSET   loss                    TO   cross-entropy          (type: \u003Cclass 'str'>)\r\nSET   final_act               TO   softmax                (type: \u003Cclass 'str'>)\r\nSET   constrained_embedding   TO   True                   (type: \u003Cclass 'bool'>)\r\nSET   n_epochs                TO   2                      (type: \u003Cclass 'int'>)\r\nSET   layers                  TO   [96]                   (type: \u003Cclass 'list'>)\r\nSET   batch_size              TO   176                    (type: \u003Cclass 'int'>)\r\nSET   learning_rate           TO   0.045000000000000005   (type: \u003Cclass 'float'>)\r\nSET   dropout_p_embed         TO   0.25                   (type: \u003Cclass 'float'>)\r\nSET   dropout_p_hidden        TO   0.25                   (type: \u003Cclass 'float'>)\r\nSET   momentum                TO   0.0                    (type: \u003Cclass 'float'>)\r\nSET   sample_alpha            TO   0.9                    (type: \u003Cclass 'float'>)\r\nLoading training data...\r\n```\r\n\r\n**Notes:** \r\n- By default, Optuna logs to stderr and the model prints to stdout. You can use this to log the model training details and the summary of the optimization separately by adding `1> \u002Fpath\u002Fto\u002Fmodel_training_details.log 2> \u002Fpath\u002Fto\u002Foptimization.log` to your command. Alternatively, you can play around with Optuna's settings. GRU4Rec at the moment doesn't use proper logging (it just prints).\r\n- If you redirect stderr and\u002For stdout to file(s) and you want to see progress in real time, use python in unbuffered mode, by adding the `-u` argument after `python` (i.e. `python -u paropt.py ...`).\r\n\r\n## Executing on CPU\r\nSome optimizations for speeding up GPU execution (e.g. custom Theano operators) prevent running the code on CPU. Since CPU execution of neural networks is already slow, I decided to abandon CPU support to speed up execution on GPU. If - for some reason - you still want to run GRU4Rec on the CPU, you need to modify the code to disable the custom GPU optimizations. You will be able to run the code on CPU, just don't expect it to be quick.\r\n\r\n**Steps of disabling the custom GPU optimizations:**\r\n- In `gpu_ops.py` change line `13` to `disable_custom_op = True`. This makes the functions in `gpu_ops` to return standard operators or operators assembled from standard operators, instead of the custom ones when the computational graph is computed.\r\n- In `gru4rec.py` comment out line `12` containing `import custom_opt`. One of the custom operators is integrated deeper into Theano through `custom_opt`, which adds it to the optimizer that replaces operators in the computational graph. By removing this import, this operator won't be used.\r\n\r\n\r\n## Major updates\r\n\r\n### Update 24-08-2023\r\n- Added paropt\r\n- Extended info on reproducibility\r\n- Added parameter files and parameter spaces\r\n- Extended readme\r\n\r\n### Update 08-05-2020\r\n- Significant speed-up of the training by increasing GPU utilization.\r\n- logQ normalization added (improves results when cross-entropy loss is used)\r\n- Added `run.py` for easy experimentation.\r\n- Extended this README. \r\n\r\n### Update 08-06-2018\r\n- Refactor and cleaning.\r\n- Speeding up execution.\r\n- Ease of life improvements.\r\n- Code for evaluating on GPU.\r\n\r\n### Update 13-06-2017\r\n- Upgraded to the v2.0 version\r\n- Added BPR-max and TOP1-max losses for cutting edge performance (coupled with additional sampling +30% in recall & MRR over the base results)\r\n- Sacrificed some speed on CPU for faster GPU execution\r\n\r\n### Update 22-12-2016\r\n- Fixed cross-entropy unstability. Very small predicted scores were rounded to 0 and thus their logarithm became NaN. Added a small epsilon (1e-24) to all scores before computing the logarithm. I got better results with this stabilized cross-entropy than with the TOP1 loss on networks with 100 hidden units.\r\n- Added the option of using additional negative samples (besides the default, which is the other examples in the minibatch). The number of additional samples is given by the n_sample parameter. The probability of an item choosen as a sample is supp^sample_alpha, i.e. setting sample_alpha to 1 results in popularity based sampling, setting it to 0 results in uniform sampling. Using additional samples can slow down training, but depending on your config, the slowdown might not be noticable on GPU, up to 1000-2000 additional samples.\r\n- Added an option to training to precompute a large batch of negative samples in advance. The number of int values (IDs) to be stored is determined by the sample_store parameter of the train function (default: 10M). This option is for the additional negative samples only, so only takes effect when n_sample > 0. Computing negative samples in each step results in very inefficient GPU utilization as computations are often interrupted by sample generation (which runs on the CPU). Precomputing samples for several steps in advance makes the process more efficient. However one should avoid setting the sample store too big as generating too many samples takes a long time, resulting in the GPU waiting for its completion for a long time. It also increases the memory footprint.\r\n\r\n### Update 21-09-2016\r\n- Optimized code for GPU execution. Training is ~2x faster now.\r\n- Added retrain functionality.","# GRU4Rec\n\n这是论文《基于会话的推荐系统与循环神经网络》（[arXiv:1511.06939](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939)）中算法的原始 Theano 实现，并加入了论文《用于会话推荐的具有 Top-k 增益的循环神经网络》（[arXiv:1706.03847](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03847)）中的扩展内容。\n\n请务必始终以最新版本作为基准，并在引用时同时注明这两篇论文！\n\n该代码针对 GPU 上的快速执行进行了优化（在 GTX 1080Ti 上每秒可处理多达 1500 个 mini-batch）。根据 Theano 性能分析器的统计，训练过程中 97.5% 的时间是在 GPU 上花费的，而 CPU 上仅占 0.5%，另外 2% 用于在 CPU 和 GPU 之间传输数据。目前不支持在 CPU 上运行，但通过对代码进行一些修改是有可能实现的。\n\n如果您对使用 Theano 感到担忧，也可以参考以下官方重实现版本：  \n- [GRU4Rec 的官方 **PyTorch** 版本](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec_PyTorch_Official)  \n- [GRU4Rec 的官方 **TensorFlow** 版本](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec_Tensorflow_Official)  \n\n*注意：* 这些版本已与原始版本进行了验证，但由于现代深度学习框架的工作方式不同，它们的运行速度比本版本慢 1.5 到 4 倍。未来可能会有其他重实现版本出现，具体取决于研究社区的兴趣程度。  \n**重要提示！** 请避免使用非官方的重实现版本。我们在论文《第三方实现对可重复性的影响》（[arXiv:2307.14956](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14956)）中对 6 种第三方实现（PyTorch\u002FTensorFlow，独立实现或基于框架的实现）进行了全面评估，结果发现所有这些实现都存在缺陷和\u002F或遗漏了重要功能，导致推荐准确率最高降低 **99%**，训练时间最长延长至原来的 **335 倍**。此后我们发现的其他实现也同样不可靠。\n\n您可以使用 `run.py` 轻松地在自己的会话数据上训练和评估模型。使用说明如下。\n\n向下滚动可查看关于在公开数据集上复现结果以及超参数调优的信息！\n\n**许可证：** 详情请参阅 [license.txt](license.txt)。主要原则如下：出于科研和教育目的，该代码始终可以免费使用。若将该代码或其部分应用于商业系统，则需要获得许可。如果您已在商业系统中使用过该代码或其衍生作品，请与我联系！\n\n**目录：**  \n[要求](#requirements \"要求\")  \n  [Theano 配置](#theano-configuration \"Theano 配置\")  \n[使用方法](#usage \"使用方法\")  \n  [使用 `run.py` 执行实验](#execute-experiments-using-runpy \"使用 `run.py` 执行实验\")  \n    [示例](#examples \"示例\")  \n  [在代码或解释器中使用 GRU4Rec](#using-gru4rec-in-code-or-the-interpreter \"在代码或解释器中使用 GRU4Rec\")  \n  [关于序列感知与会话型模型的说明](#notes-on-sequence-aware-and-session-based-models \"关于序列感知与会话型模型的说明\")  \n  [关于参数设置的说明](#notes-on-parameter-settings \"关于参数设置的说明\")  \n[训练速度](#speed-of-training \"训练速度\")  \n[在公开数据集上复现结果](#reproducing-results-on-public-datasets \"在公开数据集上复现结果\")  \n[超参数调优](#hyperparameter-tuning \"超参数调优\")  \n[在 CPU 上执行](#executing-on-cpu \"在 CPU 上执行\")  \n[重大更新](#major-updates \"重大更新\")  \n\n\n## 要求\n\n- **Python** --> 请使用 Python `3.6.3` 或更高版本。代码主要在 `3.6.3`、`3.7.6` 和 `3.8.12` 上进行了测试，但也简要测试过其他版本。Python 2 不受支持。\n- **NumPy** --> `1.16.4` 或更高版本。\n- **Pandas** --> `0.24.2` 或更高版本。\n- **CUDA** --> Theano 的 GPU 支持需要 CUDA。据我所知，Theano 最新测试过的 CUDA 版本是 `9.2`。较新的版本，例如 `11.8`，也能正常工作。\n- **libgpuarray** --> Theano 的 GPU 支持需要此库，请使用最新版本。\n- **Theano** --> `1.0.5`（最后一个稳定版本）或更高版本（偶尔仍会更新一些小功能）。应安装 GPU 支持。\n- **Optuna** --> （可选）用于超参数优化，代码已用 `3.0.3` 测试过。\n\n**重要：cuDNN** --> 较新版本会产生警告，但对我来说 `8.2.1` 仍然可用。GRU4Rec 并不严重依赖于 Theano 中利用 cuDNN 的部分。不幸的是，cuDNN `v7` 及更高版本中的 `cudnnReduceTensor` 功能存在严重 bug，这使得基于该函数的操作在使用 cuDNN 时变得缓慢，甚至偶尔不稳定（计算错误或段错误），例如 [此处](https:\u002F\u002Fgithub.com\u002FTheano\u002FTheano\u002Fissues\u002F6432) 所示。因此，最好避免使用 cuDNN。如果您已经安装了 cuDNN，可以轻松配置 Theano 以排除基于 cuDNN 的操作（见下文）。  \n**此 bug 与 Theano 无关，可在 CUDA\u002FC++ 中复现。遗憾的是，这一问题至今仍未修复超过 6 年。**\n\n### Theano 配置\n\n该代码针对 GPU 执行进行了优化。如果尝试在 CPU 上运行代码，将会失败（如果您确实想尝试，请参阅本 README 的相关部分）。因此，必须正确配置 Theano 以使用 GPU。如果您使用 `run.py` 来运行实验，代码会为您自动设置配置。您可能希望更改某些预设配置（例如，在指定的 GPU 上而不是 ID 最低的 GPU 上运行）。您可以通过设置 `THEANO_FLAGS` 环境变量或编辑 `.theanorc_gru4rec` 文件来完成此操作。\n\n如果您不使用 `run.py`，预设配置可能不会生效（当 Theano 在 `gru4rec` 之前被直接或通过其他模块导入时就会发生这种情况）。在这种情况下，您必须自行设置配置，方法是编辑您的 `.theanorc` 文件或设置 `THEANO_FLAGS` 环境变量。请参考 [Theano 官方文档](http:\u002F\u002Fdeeplearning.net\u002Fsoftware\u002Ftheano\u002Flibrary\u002Fconfig.html)。\n\n**重要配置参数**\n- `device` --> 必须始终为支持 CUDA 的 GPU（例如 `cuda0`）。\n- `floatX` --> 必须始终为 `float32`。\n- `mode` --> 应设置为 `FAST_RUN` 以实现快速执行。\n- `optimizer_excluding` --> 应设置为 `local_dnn_reduction:local_cudnn_maxandargmax:local_dnn_argmax`，以指示 Theano 不要使用基于 cuDNN 的操作，因为其 `cudnnReduceTensor` 函数自 `v7` 以来一直存在 bug。\n\n## 使用方法\n\n### 使用 `run.py` 执行实验\n`run.py` 是一种简便的方式来训练、评估以及保存\u002F加载 GRU4Rec 模型。\n\n通过 `-h` 参数可以查看所有可用的参数。\n```bash\n$ python run.py -h\n```\n输出：\n```plaintext\n用法: run.py [-h] [-ps PARAM_STRING] [-pf PARAM_PATH] [-l] [-s MODEL_PATH] [-t TEST_PATH [TEST_PATH ...]] [-m AT [AT ...]] [-e EVAL_TYPE] [-ss SS] [--sample_store_on_cpu] [-g GRFILE] [-d D] [-ik IK] [-sk SK] [-tk TK]\n              [-pm METRIC] [-lpm]\n              PATH\n\n训练或加载一个 GRU4Rec 模型，并在指定的测试集上计算召回率和 MRR。\n\n位置性参数:\n  PATH                  训练数据的路径（以 TAB 分隔的文件 (.tsv 或 .txt) 或者序列化的 pandas.DataFrame 对象 (.pickle)；如果未提供 --load_model 参数）或已序列化的模型路径（如果提供了 --load_model 参数）。\n\n可选参数:\n  -h, --help            显示此帮助信息并退出\n  -ps PARAM_STRING, --parameter_string PARAM_STRING\n                        以单个字符串形式提供的训练参数。字符串格式为 `param_name1=param_value1,param_name2=param_value2...`，例如：`loss=bpr-max,layers=100,constrained_embedding=True`。布尔类型的训练参数应设置为 True 或 False；可接受列表的参数则使用 \u002F 作为分隔符（如 layers=200\u002F200）。该选项与 -pf (--parameter_file) 和 -l (--load_model) 互斥，三者必选其一。\n  -pf PARAM_PATH, --parameter_file PARAM_PATH\n                        或者，也可以通过此参数指定的配置文件来设置训练参数。配置文件必须包含一个名为 `gru4rec_params` 的 OrderedDict。参数需具有相应类型（如 layers = [100]）。该选项与 -ps (--parameter_string) 和 -l (--load_model) 互斥，三者必选其一。\n  -l, --load_model      加载已训练好的模型，而非重新训练。该选项与 -ps (--parameter_string) 和 -pf (--parameter_file) 互斥，三者必选其一。\n  -s MODEL_PATH, --save_model MODEL_PATH\n                        将训练好的模型保存到 MODEL_PATH。（默认：不保存模型）\n  -t TEST_PATH [TEST_PATH ...], --test TEST_PATH [TEST_PATH ...]\n                        测试数据集的路径位于 TEST_PATH。可以提供多个测试集（用空格分隔）。（默认：不评估模型）\n  -m AT [AT ...], --measure AT [AT ...]\n                        在指定的推荐列表长度处计算召回率和 MRR。可以提供多个值。（默认：20）\n  -e EVAL_TYPE, --eval_type EVAL_TYPE\n                        设置当排序列表中多个项目具有相同预测分数时的处理方式（通常由于饱和或错误导致）。更多详情请参阅 evaluation.py 中 evaluate_gpu() 的文档。（默认：standard）\n  -ss SS, --sample_store_size SS\n                        GRU4Rec 在训练过程中会使用负样本缓冲区以最大化 GPU 利用率。此参数用于设置缓冲区长度。较小的值会导致更频繁的重新计算，而较大的值则会占用更多的（GPU）内存。除非您清楚自己在做什么，否则不建议调整此参数。（默认：10000000）\n  --sample_store_on_cpu\n                        如果启用此选项，样本存储将被放置在 RAM 而不是 GPU 内存中。大多数情况下不建议这样做，因为这会显著降低 GPU 利用率。此选项仅在您出于某种原因希望在 CPU 上训练模型时使用（不推荐）。请注意，您需要对代码进行修改以便使其能够在 CPU 上运行。\n  -g GRFILE, --gru4rec_model GRFILE\n                        包含 GRU4Rec 类的文件名。可用于选择不同的变体。（默认：gru4rec）\n  -ik IK, --item_key IK\n                        对应于商品 ID 的列名（默认：ItemId）。\n  -sk SK, --session_key SK\n                        对应于会话 ID 的列名（默认：SessionId）。\n  -tk TK, --time_key TK\n                        对应于时间戳的列名（默认：Time）。\n  -pm METRIC, --primary_metric METRIC\n                        设置主要指标，即召回率或 MRR（例如用于超参数优化）。（默认：召回率）\n  -lpm, --log_primary_metric\n                        如果启用此选项，评估将在运行结束时记录主要指标的值。此选项仅适用于单个测试文件和单一列表长度。\n```\n\n#### 示例\n\n使用参数字符串中的模型参数训练、保存并评估模型，在 1、5、10 和 20 处计算召回率和 MRR。\n```bash\n$ THEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data_file -t \u002Fpath\u002Fto\u002Ftest_data_file -m 1 5 10 20 -ps layers=224,batch_size=80,dropout_p_embed=0.5,dropout_p_hidden=0.05,learning_rate=0.05,momentum=0.4,n_sample=2048,sample_alpha=0.4,bpreg=1.95,logq=0.0,loss=bpr-max,constrained_embedding=True,final_act=elu-0.5,n_epochs=10 -s \u002Fpath\u002Fto\u002Fsave_model.pickle\n```\n输出（基于 RetailRocket 数据集）：\n```plaintext\n使用 cuDNN 版本 8201，上下文为 None\n将名称 None 映射到设备 cuda0: NVIDIA A30 (0000:3B:00.0)\n创建 GRU4Rec 模型\n设置   layers                  为   [224]     （类型：list）\n设置   batch_size              为   80        （类型：int）\n设置   dropout_p_embed         为   0.5       （类型：float）\n设置   dropout_p_hidden        为   0.05      （类型：float）\n设置   learning_rate           为   0.05      （类型：float）\n设置   momentum                为   0.4       （类型：float）\n设置   n_sample                为   2048      （类型：int）\n设置   sample_alpha            为   0.4       （类型：float）\n设置   bpreg                   为   1.95      （类型：float）\n设置   logq                    为   0.0       （类型：float）\n设置   loss                    为   bpr-max   （类型：str）\n设置   constrained_embedding   为   True      （类型：bool）\n设置   final_act               为   elu-0.5   （类型：str）\n设置   n_epochs                为   10        （类型：int）\n加载训练数据...\n从 TAB 分隔文件加载数据：\u002Fpath\u002Fto\u002Ftraining_data_file\n开始训练\n数据框已按 SessionId 和 Time 排序\n创建了包含 4882 批次样本的样本存储（类型：GPU）\n第1轮 --> 损失: 0.484484       (6.81秒)         [1026.65 MB\u002Fs | 81386 e\u002Fs]\n第2轮 --> 损失: 0.381974       (6.89秒)         [1015.39 MB\u002Fs | 80493 e\u002Fs]\n第3轮 --> 损失: 0.353932       (6.81秒)         [1027.68 MB\u002Fs | 81468 e\u002Fs]\n第4轮 --> 损失: 0.340034       (6.80秒)         [1028.90 MB\u002Fs | 81564 e\u002Fs]\n第5轮 --> 损失: 0.330763       (6.80秒)         [1028.19 MB\u002Fs | 81508 e\u002Fs]\n第6轮 --> 损失: 0.324075       (6.80秒)         [1029.36 MB\u002Fs | 81601 e\u002Fs]\n第7轮 --> 损失: 0.319033       (6.85秒)         [1022.03 MB\u002Fs | 81020 e\u002Fs]\n第8轮 --> 损失: 0.314915       (6.80秒)         [1029.05 MB\u002Fs | 81577 e\u002Fs]\n第9轮 --> 损失: 0.311716       (6.82秒)         [1025.44 MB\u002Fs | 81290 e\u002Fs]\n第10轮 --> 损失: 0.308915      (6.82秒)         [1025.64 MB\u002Fs | 81306 e\u002Fs]\n总训练时间：77.73秒\n将训练好的模型保存到：\u002Fpath\u002Fto\u002Fsave_model.pickle\n加载测试数据...\n从 TAB 分隔文件加载数据：\u002Fpath\u002Fto\u002Ftest_data_file\n开始评估（截断点=[1, 5, 10, 20]，采用标准模式解决平局）\n测量 Recall@1,5,10,20 和 MRR@1,5,10,20\n评估耗时 4.34 秒\nRecall@1: 0.128055 MRR@1: 0.128055\nRecall@5: 0.322165 MRR@5: 0.197492\nRecall@20: 0.518184 MRR@20: 0.217481\n```\n\n使用参数文件中的参数在 `cuda0` 上训练并保存模型。\n```bash\n$ THEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data_file -pf \u002Fpath\u002Fto\u002Fparameter_file.py -s \u002Fpath\u002Fto\u002Fsave_model.pickle\n```\n输出（基于 RetailRocket 数据集）：\n```plaintext\n使用 cuDNN 版本 8201，上下文为 None\n将名称 None 映射到设备 cuda0: NVIDIA A30 (0000:3B:00.0)\n创建 GRU4Rec 模型\n设置   layers                  为   [224]     （类型：list）\n设置   batch_size              为   80        （类型：int）\n设置   dropout_p_embed         为   0.5       （类型：float）\n设置   dropout_p_hidden        为   0.05      （类型：float）\n设置   learning_rate           为   0.05      （类型：float）\n设置   momentum                为   0.4       （类型：float）\n设置   n_sample                为   2048      （类型：int）\n设置   sample_alpha            为   0.4       （类型：float）\n设置   bpreg                   为   1.95      （类型：float）\n设置   logq                    为   0.0       （类型：float）\n设置   loss                    为   bpr-max   （类型：str）\n设置   constrained_embedding   为   True      （类型：bool）\n设置   final_act               为   elu-0.5   （类型：str）\n设置   n_epochs                为   10        （类型：int）\n加载训练数据...\n从 TAB 分隔文件加载数据：\u002Fpath\u002Fto\u002Ftraining_data_file\n开始训练\n数据框已按 SessionId 和 Time 排序\n创建了包含 4882 批次样本的样本存储（类型：GPU）\n第1轮 --> 损失: 0.484484       (6.81秒)         [1026.65 MB\u002Fs | 81386 e\u002Fs]\n第2轮 --> 损失: 0.381974       (6.89秒)         [1015.39 MB\u002Fs | 80493 e\u002Fs]\n第3轮 --> 损失: 0.353932       (6.81秒)         [1027.68 MB\u002Fs | 81468 e\u002Fs]\n第4轮 --> 损失: 0.340034       (6.80秒)         [1028.90 MB\u002Fs | 81564 e\u002Fs]\n第5轮 --> 损失: 0.330763       (6.80秒)         [1028.19 MB\u002Fs | 81508 e\u002Fs]\n第6轮 --> 损失: 0.324075       (6.80秒)         [1029.36 MB\u002Fs | 81601 e\u002Fs]\n第7轮 --> 损失: 0.319033       (6.85秒)         [1022.03 MB\u002Fs | 81020 e\u002Fs]\n第8轮 --> 损失: 0.314915       (6.80秒)         [1029.05 MB\u002Fs | 81577 e\u002Fs]\n第9轮 --> 损失: 0.311716       (6.82秒)         [1025.44 MB\u002Fs | 81290 e\u002Fs]\n第10轮 --> 损失: 0.308915      (6.82秒)         [1025.64 MB\u002Fs | 81306 e\u002Fs]\n总训练时间：77.73秒\n将训练好的模型保存到：\u002Fpath\u002Fto\u002Fsave_model.pickle\n```\n\n在 `cuda1` 上加载先前训练好的模型，并使用保守方法解决平局，在 1、5、10 和 20 处评估召回率和 MRR。\n```bash\n$ THEANO_FLAGS=device=cuda1 python run.py \u002Fpath\u002Fto\u002Fpreviously_saved_model.pickle -l -t \u002Fpath\u002Fto\u002Ftest_data_file -m 1 5 10 20 -e conservative\n```\n输出（基于 RetailRocket 数据集）：\n```plaintext\n使用 cuDNN 版本 8201，上下文为 None\n将名称 None 映射到设备 cuda1: NVIDIA A30 (0000:AF:00.0)\n从文件加载训练过的模型：\u002Fpath\u002Fto\u002Fpreviously_saved_model.pickle\n加载测试数据...\n从 TAB 分隔文件加载数据：\u002Fpath\u002Fto\u002Ftest_data_file\n开始评估（截断点=[1, 5, 10, 20]，采用保守模式解决平局）\n测量 Recall@1,5,10,20 和 MRR@1,5,10,20\n评估耗时 4.34 秒\nRecall@1: 0.128055 MRR@1: 0.128055\nRecall@5: 0.322165 MRR@5: 0.197492\nRecall@20: 0.518184 MRR@20: 0.217481\n```\n\n### 在代码或解释器中使用 GRU4Rec\n你可以在代码或解释器中直接导入 `gru4rec` 模块，并使用 `GRU4Rec` 类来创建和训练模型。训练好的模型可以通过导入 `evaluation` 模块，调用 `evaluate_gpu` 或 `evaluate_session_batch` 方法进行评估。后者已被弃用，因为它无法充分利用 GPU，因此速度明显较慢。该代码的公开版本主要用于运行实验（在不同数据集上训练和评估算法），因此获取实际预测结果可能会比较繁琐且效率不高。\n\n**重要提示！** 为了方便起见，`gru4rec` 模块会设置一些重要的 Theano 参数，这样如果你不熟悉 Theano，就不必担心这些配置。但这一设置只有在 `gru4rec` *先于* Theano（以及任何导入 Theano 的模块）被导入时才会生效。（因为一旦 Theano 初始化完毕，其大部分配置就无法更改。即使重新导入 Theano，GPU 也不会重新初始化。）如果顺序相反，你应该设置默认的 `.theanorc` 文件，或者通过 `THEANO_FLAGS` 环境变量提供适当的配置。\n\n### 关于序列感知与会话型模型的说明\nGRU4Rec 最初是为会话型推荐设计的，在这种场景下，通常较短的会话被视为相互独立的。每当用户访问网站时，系统都会将其视为新用户，即不会利用其历史记录，即便已知这些信息也是如此。（这种设定非常适合许多实际应用场景。）这意味着在评估模型时，每个测试会话的隐藏状态都会从零开始。\n\n然而，基于 RNN、CNN、Transformer 等的模型也非常适合用于序列感知的个性化推荐场景——在这种场景中，用户的整个历史记录会被当作一个序列，用来预测序列中的后续项目。两者的主要区别在于：\n- (1) 序列感知推荐中的序列长度通常要长得多。这也意味着 BPTT（时间反向传播）在这种情况下非常有用。而对于会话型推荐，实验表明 BPTT 并不能提升模型性能。\n- (2) 在序列感知场景中，评估应从隐藏状态的最后一个值开始（即根据用户历史训练部分计算出的值）。\n\n目前，公开代码中尚未支持上述功能。如果社区有足够的兴趣，未来可能会添加这些功能（它们存在于我的一些内部研究仓库中）。现阶段，你需要自行扩展代码来实现这些功能。\n\n### 关于参数设置的说明\nGRU4Rec 具有许多参数（而私有版本多年来甚至有更多的参数）。虽然你可以随意尝试调整这些参数，但我发现将以下参数保持默认值通常是最佳选择。\n| 参数             | 默认值 | 备注                                                                                                               |\n|--------------------|:-----------:|----------------------------------------------------------------------------------------------------------------------|\n| `hidden_act`         |    `tanh`     | 隐藏层的激活函数应设置为 tanh。                                                          |\n| `lmbd`               |     `0.0`     | 不需要 L2 正则化，建议使用 Dropout 进行正则化。                                                       |\n| `smoothing`          |     `0.0`     | 标签平滑对交叉熵损失的影响很小。                      |\n| `adapt`              |   `adagrad`   | 不同优化器的表现大致相同，其中 adagrad 略优于其他优化器。       |\n| `adapt_params`       |     `[]`      | Adagrad 没有超参数，因此此处为空列表。                                                  |\n| `grad_cap`           |     `0.0`     | 训练过程中无需梯度裁剪。                                               |\n| `sigma`              |     `0.0`     | 权重初始化时无需指定最小\u002F最大值；当此值为 0 时，会自动使用 `±sqrt(6.0\u002F(dim[0] + dim[1]))`。 |\n| `init_as_normal`     |    `False`    | 权重应从均匀分布中初始化。                                                            |\n| `train_random_order` |    `False`    | 不应对训练会话进行随机打乱，以确保最后的更新基于最近的数据。                           |\n| `time_sort`          |    `True`     | 应按首次事件的时间戳升序对训练会话进行排序（最早的会话排在前面），以便最后的更新基于近期数据。        |\n\n**损失函数与最终激活函数：**\n- 在五种损失选项（`cross-entropy`、`bpr-max`、`top1`、`bpr`、`top1-max`）中，**仅应使用 `cross-entropy` 或 `bpr-max`**。其他选项在一定程度上也能工作，但容易出现梯度消失问题，因此生成的模型效果不如使用 `bpr-max` 或 `cross-entropy` 训练的模型。\n- `cross-entropy` 有一个替代形式 `xe_logit`，它需要设置不同的最终激活函数。\n- 始终根据所使用的损失函数选择合适的最终激活函数。对于 `loss=cross-entropy`，最终激活函数应始终为 `final_act=softmax`；对于 `loss=xe_logit`，应始终使用 `final_act=softmax_logit`；对于 `loss=bpr-max`，可以使用 `final_act=linear`、`final_act=relu`、`final_act=tanh`、`final_act=leaky-\u003CX>`、`final_act=elu-\u003CX>` 或 `final_act=selu-\u003CX>-\u003CY>`（我通常更倾向于使用 `elu-0.5` 或 `elu-1`）。\n\n**嵌入模式：** 如论文所述，共有三种嵌入模式，可通过以下方式设置。\n| 嵌入模式 | 设置方法 | 描述 |\n|-|-|-|\n| 无嵌入 | `embedding=0` 且 `constrained_embedding=False` | 直接将物品 ID 的 one-hot 向量输入到 GRU 层。 |\n| 分离嵌入 | `embedding=X`，其中 `X>0` 或 `X=layersize`，且 `constrained_embedding=False` | 在 GRU 层的输入端和用于计算得分的序列嵌入中分别使用独立的嵌入。`embedding=layersize` 表示嵌入维度与第一个 GRU 层的单元数相同。 |\n| 共享嵌入 | `constrained_embedding=True` | 通常表现最好的设置。在 GRU 层的输入端和用于计算得分的序列嵌入中使用相同的嵌入。这会强制嵌入维度等于最后一个 GRU 层的大小，因此在此模式下 `embedding` 参数无效。 |\n\n## 训练速度\n本版本是目前最快的版本（遥遥领先）。官方的 PyTorch 和 TensorFlow 实现的速度受限于各自深度学习框架带来的额外开销。\n\n在公开可用的数据集上，采用最佳参数配置（见下文），使用 nVidia A30 显卡测得完成一个 epoch 所需的时间（单位：秒）。Theano 版本比 PyTorch 或 TensorFlow 版本快 1.7 到 3 倍。\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_d073e817c686.png)\n\n*详细说明：* 简而言之，Theano 要求先构建计算图，然后基于该图创建一个 Theano 函数来执行计算。在函数创建过程中，代码会被编译成一个或多个 C++\u002FCUDA 可执行文件，每次从 Python 调用该函数时都会执行这些二进制文件。如果完全不使用任何基于 Python 的算子，控制权无需返回 Python，从而大幅降低开销。GRU4Rec 的发布版本基于 ID 表示法，因此单个 mini-batch 通常无法将 GPU 充分利用。这样一来，像 PyTorch 和 TensorFlow 那样需要在 C++\u002FCUDA 和 Python 之间频繁切换控制权的做法，会显著增加训练时间。这也是为什么当层数和\u002F或 batch size 较大时，速度差异会变小的原因。不过，为了达到最佳性能，有时反而需要使用较小的 batch size。\n\n训练时间主要取决于事件数量和模型参数。以下参数会影响事件处理速度（事件\u002F秒）：\n- `batch_size` --> 处理批次的速度（MB\u002Fs）下降幅度远小于 batch size 增加的速度，因此随着 `batch_size` 的增大，事件处理速度也会加快。然而，`batch_size` 同时也会影响模型精度，对于大多数数据集而言，较小的 batch size  meistens besser.\n- `n_sample` --> 在硬件允许的情况下，负样本数量在 500–1000 以内不会影响处理速度。默认设置为 `n_sample=2048`，但如果物品数量较少，可以适当降低此值而不影响准确性。\n- `loss` --> `交叉熵` 损失略快于 `BPR-Max` 损失。\n- `dropout_p_embed`、`dropout_p_hidden`、`momentum` --> 如果将这些参数设为非零值，训练速度会略微减慢。\n\n以下图表展示了在启用或禁用 dropout 和 momentum 的情况下，不同 batch size 和隐藏层大小下，使用 `n_sample=2048` 时的训练速度差异（每秒处理的 mini-batch 数量及每秒处理的事件数量；数值越高越好）。测量设备为 nVidia A30 显卡。\n\n使用 `交叉熵` 损失：\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_f62c6089bb6b.png)\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_8e6f44dd25b3.png)\n\n使用 `BPR-Max` 损失：\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_f2aba60d2a3b.png)\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_readme_49d53aac9516.png)\n\n## 在公开数据集上复现结果\nGRU4Rec 的性能已在 [1,2,3,4] 中的多个公开数据集上进行了评估：Yoochoose\u002FRSC15、Rees46、Coveo、RetailRocket 和 Diginetica。\n\n*重要提示：* 仅当数据（以及任务）本身表现出序列模式时，例如基于会话的数据，衡量序列推荐系统的性能才有意义。在评分数据上进行评估无法得出有意义的结果。有关详细信息以及其他人在评估序列推荐系统时常见的错误，请参阅 [4]。\n\n**注意事项：**  \n- 始终尽量在比较中包含至少一个规模较为真实的大型数据集（例如，Rees46 是一个很好的例子）。\n- 评估设置在 [1,2,3] 中有详细描述。这是一种下一 item 预测类型的评估，仅将下一个 item 视为与给定推理相关的唯一目标。这种设置非常适合行为预测，并且在一定程度上与线上表现相关。它比将后续任意 item 都视为相关的目标更为严格；后者虽然也是一种合理的设置，但对简单的方法（例如基于计数的方法）更加宽容。然而，与任何其他离线评估一样，它并不能直接反映线上性能。\n\n**获取数据：** 请参考数据的原始来源以获得完整且合法的副本。此处提供的链接仅为尽力而为，不能保证长期有效。\n- [Yoochoose\u002FRSC15](https:\u002F\u002F2015.recsyschallenge.com) 或 ([Kaggle 上的重新上传](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fchadgostopp\u002Frecsys-challenge-2015))\n- [Rees46](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fmkechinov\u002Fecommerce-behavior-data-from-multi-category-store)\n- [Coveo](https:\u002F\u002Fgithub.com\u002Fcoveooss\u002Fshopper-intent-prediction-nature-2020)\n- [RetailRocket](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fretailrocket\u002Fecommerce-dataset)\n- [Diginetica](https:\u002F\u002Fcompetitions.codalab.org\u002Fcompetitions\u002F11161#learn_the_details-data2)\n\n**预处理：**  \nRSC15 的预处理细节及其背后的理由可在 [1,2] 中找到，而 Yoochoose、Rees46、Coveo、RetailRocket 和 Diginetica 的预处理则在 [3] 中说明。RSC15 的预处理脚本可在 [这里](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fblob\u002Fmaster\u002Fexamples\u002Frsc15\u002Fpreprocess.py) 找到，Yoochoose、Rees46、Coveo、RetailRocket 和 Diginetica 的预处理脚本则位于 [对应于 [3] 的仓库](https:\u002F\u002Fgithub.com\u002Fhidasib\u002Fgru4rec_third_party_comparison) 中。运行脚本后，请务必核对生成的数据集统计信息是否与论文中报告的一致。\n\n预处理脚本会为每个数据集生成 4 个文件：\n- `train_full` --> 完整的训练集，用于最终评估时的模型训练\n- `test` --> 模型最终评估所用的测试集（与 `train_full` 成对）\n- `train_tr` --> 用于超参数优化和实验的训练集\n- `train_valid` --> 用于超参数优化和实验的验证集（与 `train_tr` 成对）\n\n基本上，完整的预处理数据集会被分割成 `train_full` 和 `test`，然后 `train_full` 再按照相同的逻辑被进一步分割为 `train_tr` 和 `train_valid`。\n\n*重要提示：* 请注意，尽管 RSC15 和 Yoochoose 来自同一来源（Yoochoose 数据集），但它们的预处理方式并不相同。主要区别在于 RSC15 没有进行去重处理。因此，在这两个数据集上的结果不可直接比较，最优超参数也可能不同。建议使用 Yoochoose 版本，仅在因某些原因无法复现实验时（例如方法实现不可用）才参考 RSC15 版本的结果。\n\n[1] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk: [基于会话的推荐系统与循环神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939), ICLR 2016  \n[2] Balázs Hidasi, Alexandros Karatzoglou: [具有 Top-k 收益的循环神经网络用于基于会话的推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03847), CIKM 2018  \n[3] Balázs Hidasi, Ádám Czapp: [第三方实现对可重复性的影响](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14956), RecSys 2023  \n[4] Balázs Hidasi, Ádám Czapp: [推荐系统离线评估中的普遍缺陷](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14951), RecSys 2023\n\n**超参数：**  \nRSC15 的超参数是通过局部（星型）搜索优化器获得的，该优化器会在找到更好的参数组合时重启。其使用的参数空间比本仓库中包含的要小（例如，隐藏层大小固定为 100）。借助新的 Optuna 基础优化器，这里可能还存在一些小幅改进的空间。\n\nYoochoose、Rees46、Coveo、RetailRocket 和 Diginetica 的超参数则是使用本仓库中上传的参数空间获得的。针对每个数据集、每种嵌入模式（无嵌入、独立嵌入、共享嵌入）以及每种损失函数（交叉熵、BPR-Max），均执行了 200 次运行。主要指标是 MRR@20，它通常也能在召回率@20 方面取得最佳效果。超参数优化过程中使用了一组单独的训练\u002F验证集，该集是从完整的训练集中按照与从完整数据集中划分出完整训练集\u002F测试集相同的方式创建的。最终结果是在测试集上测量的，模型则是在完整训练集上训练得到的。\n\n**最佳超参数：**  \n*注：* 为方便起见，参数文件（可通过 `run.py` 的 `-pf` 参数使用）已 [包含](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Ftree\u002Fmaster\u002Fparamfiles) 在本仓库中。\n\n| 数据集 | 损失函数 | 约束嵌入 | 嵌入类型 | ELU 参数 | 层数 | 批量大小 | 嵌入 dropout 率 | 隐藏层 dropout 率 | 学习率 | 动量 | 样本数量 | 样本 alpha | BPREG | logq |\n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| RSC15 | 交叉熵 | 是 | 0 | 0 | 100 | 32 | 0.1 | 0 | 0.1 | 0 | 2048 | 0.75 | 0 | 1 |\n| Yoochoose | 交叉熵 | 是 | 0 | 0 | 480 | 48 | 0 | 0.2 | 0.07 | 0 | 2048 | 0.2 | 0 | 1 |\n| Rees46 | 交叉熵 | 是 | 0 | 0 | 512 | 240 | 0.45 | 0 | 0.065 | 0 | 2048 | 0.5 | 0 | 1 |\n| Coveo | BPR-Max | 是 | 0 | 1 | 512 | 144 | 0.35 | 0 | 0.05 | 0.4 | 2048 | 0.2 | 1.85 | 0 |\n| RetailRocket | BPR-Max | 是 | 0 | 0.5 | 224 | 80 | 0.5 | 0.05 | 0.05 | 0.4 | 2048 | 0.4 | 1.95 | 0 |\n| Diginetica | BPR-Max | 是 | 0 | 1 | 512 | 128 | 0.5 | 0.3 | 0.05 | 0.15 | 2048 | 0.3 | 0.9 | 0 |\n\n**结果：**  \n*注：* 由于 GPU 上操作执行顺序的变化，指标可能会出现轻微波动（甚至高达几个百分点），这是可以接受的。\n\n| 数据集 | 召回率@1 | MRR@1 | 召回率@5 | MRR@5 | 召回率@10 | MRR@10 | 召回率@20 | MRR@20 |\n|---|---|---|---|---|---|---|---|---|\n| RSC15 | 0.1845 | 0.1845 | 0.4906 | 0.2954 | 0.6218 | 0.3130 | 0.7283 | 0.3205 |\n| Yoochoose | 0.1829 | 0.1829 | 0.4478 | 0.2783 | 0.5715 | 0.2949 | 0.6789 | 0.3024 |\n| Rees46 | 0.1114 | 0.1114 | 0.3010 | 0.1778 | 0.4135 | 0.1928 | 0.5293 | 0.2008 |\n| Coveo | 0.0513 | 0.0513 | 0.1496 | 0.0852 | 0.2212 | 0.0946 | 0.3135 | 0.1010 |\n| RetailRocket | 0.1274 | 0.1274 | 0.3237 | 0.1977 | 0.4207 | 0.2107 | 0.5186 | 0.2175 |\n| Diginetica | 0.0725 | 0.0725 | 0.2369 | 0.1288 | 0.3542 | 0.1442 | 0.4995 | 0.1542 |\n\n## 超参数调优\n`paropt.py` 支持在新数据集上进行超参数优化。它内部使用 [Optuna](https:\u002F\u002Foptuna.org\u002F)，并需要定义一个参数空间。该仓库中已包含一些预定义的参数空间，详见 [此处](https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Ftree\u002Fmaster\u002Fparamspaces)。\n\n**建议：**\n- 使用附带的参数空间运行 100 到 200 次迭代。\n- 当使用不同的损失函数和嵌入模式时（无嵌入，即 `embedding=0,constrained_embedding=False`；独立嵌入，即 `embedding=layersize,constrained_embedding=False`；共享嵌入，即 `embedding=0,constrained_embedding=True`），应分别进行优化。\n\n**固定参数：** 您也可以在优化器中调整这些参数，但以下固定设置在过去表现良好：\n- `logq` --> 交叉熵损失通常在 `logq=1` 时效果最佳；当使用 BPR-max 损失时，此参数无效。\n- `n_sample` --> 根据经验，`n_sample=2048` 对于拥有数百万个物品的数据集来说已经足够获得良好的性能，同时也不会过大而显著降低训练速度。不过，如果活跃物品总数低于 5–10K，则可以适当降低此值。\n- `n_epochs` --> 通常设置为 `n_epochs=10`，但在大多数情况下，`5` 个 epoch 也能达到相似的效果。迄今为止，尚无必要大幅增加 epoch 数。\n- 嵌入模式 --> 全面的超参数优化需要分别检查所有三种选项，但过去对于大多数数据集而言，共享嵌入（`constrained_embedding=True` 且 `embedding=0`）的表现最佳。\n- `loss` --> 全面的超参数优化需要分别测试两种损失函数，但根据以往经验，BPR-max 在较小数据集上表现更好，而交叉熵损失则更适合较大数据集。\n- `final_act` --> 始终使用与损失函数相匹配的最终激活函数，例如，当 `loss=cross-entropy` 时使用 `final_act=softmax`，而当 `loss=bpr-max` 时则使用 `elu-\u003Cx>`、`linear` 或 `relu`。\n\n**用法：**\n```bash\n$ python paropt.py -h\n```\n\n输出：\n```plaintext\nusage: paropt.py [-h] [-g GRFILE] [-tf [FLAGS]] [-fp PARAM_STRING] [-opf PATH] [-m [AT]] [-nt [NT]] [-fm [AT [AT ...]]] [-pm METRIC] [-e EVAL_TYPE] [-ik IK] [-sk SK] [-tk TK] PATH TEST_PATH\n\n训练或加载 GRU4Rec 模型，并在指定的测试集上评估召回率和 MRR。\n\n位置参数:\n  PATH                  训练数据的路径（以 TAB 分隔的文件 (.tsv 或 .txt) 或 Pickle 格式的 pandas.DataFrame 对象 (.pickle)，前提是未提供 --load_model 参数）；或者序列化后的模型路径（若提供了 --load_model 参数）。\n  TEST_PATH             测试数据集的路径。\n\n可选参数:\n  -h, --help            显示帮助信息并退出\n  -g GRFILE, --gru4rec_model GRFILE\n                        包含 GRU4Rec 类的文件名。可用于选择不同变体。（默认值：gru4rec）\n  -tf [FLAGS], --theano_flags [FLAGS]\n                        Theano 设置。\n  -fp PARAM_STRING, --fixed_parameters PARAM_STRING\n                        以单个字符串形式提供的固定训练参数。字符串格式为 `param_name1=param_value1,param_name2=param_value2...`，例如：`loss=bpr-max,layers=100,constrained_embedding=True`。布尔型参数应为 True 或 False；可接受列表的参数应使用 \u002F 作为分隔符（如 layers=200\u002F200）。此参数与 -pf（--parameter_file）和 -l（--load_model）参数互斥，三者必须提供其一。\n  -opf PATH, --optuna_parameter_file PATH\n                        描述 Optuna 参数空间的文件。\n  -m [AT], --measure [AT]\n                        在指定的推荐列表长度上测量召回率和 MRR。可提供单个值。（默认值：20）\n  -nt [NT], --ntrials [NT]\n                        执行的优化试验次数。（默认值：50）\n  -fm [AT [AT ...]], --final_measure [AT [AT ...]]\n                        优化完成后，在指定的推荐列表长度上测量召回率和 MRR。可提供多个值。（默认值：20）\n  -pm METRIC, --primary_metric METRIC\n                        设置主要指标，召回率或 MRR（例如用于超参数优化）。（默认值：召回率）\n  -e EVAL_TYPE, --eval_type EVAL_TYPE\n                        设置当排序列表中多个项目具有相同预测分数时的处理方式（这通常是由于饱和或错误造成的）。更多信息请参阅 evaluation.py 中 evaluate_gpu() 的文档。（默认值：标准）\n  -ik IK, --item_key IK\n                        对应物品 ID 的列名。（默认值：ItemId）\n  -sk SK, --session_key SK\n                        对应会话 ID 的列名。（默认值：SessionId）\n  -tk TK, --time_key TK\n                        对应时间戳的列名。（默认值：Time）\n```\n\n**示例：** 运行一次超参数优化，以 MRR@20 为目标，进行 200 次迭代，并在优化完成后对最佳模型在 1、5、10 和 20 的推荐列表长度上测量召回率和 MRR。\n*注意：* paropt 脚本可以在 CPU 上运行（`THEANO_FLAGS=device=cpu`），因为模型是在独立进程中训练的。您可以通过设置 `-tf` 参数来控制这些训练进程使用的设备，该参数会将值传递给训练进程的 `THEANO_FLAGS` 环境变量。在本例中，训练将在 `cuda0` 上执行。\n```bash\nTHEANO_FLAGS=device=cpu python paropt.py \u002Fpath\u002Fto\u002Ftraining_data_file_for_optimization \u002Fpath\u002Fto\u002Fvaliadation_data_file_for_optimization -pm mrr -m 20 -fm 1 5 10 20 -e conservative -fp n_sample=2048,logq=1.0,loss=cross-entropy,final_act=softmax,constrained_embedding=True,n_epochs=10 -tf device=cuda0 -opf \u002Fpath\u002Fto\u002Fparameter_space.json -n 200\n```\n\n输出（前几行）：\n```plaintext\n--------------------------------------------------------------------------------\nPARAMETER SPACE\n        PARAMETER layers         type=int        range=[64..512] (step=32)       UNIFORM scale\n        PARAMETER batch_size     type=int        range=[32..256] (step=16)       UNIFORM scale\n        PARAMETER learning_rate          type=float      range=[0.01..0.25] (step=0.005)         UNIFORM scale\n        PARAMETER dropout_p_embed        type=float      range=[0.0..0.5] (step=0.05)    UNIFORM scale\n        PARAMETER dropout_p_hidden       type=float      range=[0.0..0.7] (step=0.05)    UNIFORM scale\n        PARAMETER momentum       type=float      range=[0.0..0.9] (step=0.05)    UNIFORM scale\n        PARAMETER sample_alpha   type=float      range=[0.0..1.0] (step=0.1)     UNIFORM scale\n--------------------------------------------------------------------------------\n[I 2023-07-25 03:19:53,684] A new study created in memory with name: no-name-83fade3e-49f3-4f26-ac76-5f6cb2f3a02c\nSET   n_sample                TO   2048                   (type: \u003Cclass 'int'>)\nSET   logq                    TO   1.0                    (type: \u003Cclass 'float'>)\nSET   loss                    TO   cross-entropy          (type: \u003Cclass 'str'>)\nSET   final_act               TO   softmax                (type: \u003Cclass 'str'>)\nSET   constrained_embedding   TO   True                   (type: \u003Cclass 'bool'>)\nSET   n_epochs                TO   2                      (type: \u003Cclass 'int'>)\nSET   layers                  TO   [96]                   (type: \u003Cclass 'list'>)\nSET   batch_size              TO   176                    (type: \u003Cclass 'int'>)\nSET   learning_rate           TO   0.045000000000000005   (type: \u003Cclass 'float'>)\nSET   dropout_p_embed         TO   0.25                   (type: \u003Cclass 'float'>)\nSET   dropout_p_hidden        TO   0.25                   (type: \u003Cclass 'float'>)\nSET   momentum                TO   0.0                    (type: \u003Cclass 'float'>)\nSET   sample_alpha            TO   0.9                    (type: \u003Cclass 'float'>)\nLoading training data...\n```\n\n**注意事项：**\n- 默认情况下，Optuna 将日志记录到 stderr，而模型则将输出打印到 stdout。您可以通过在命令中添加 `1> \u002Fpath\u002Fto\u002Fmodel_training_details.log 2> \u002Fpath\u002Fto\u002Foptimization.log` 来分别记录模型训练细节和优化总结。此外，您也可以调整 Optuna 的相关设置。目前，GRU4Rec 尚未采用完善的日志系统，仅通过 print 输出信息。\n- 如果您将 stderr 和\u002F或 stdout 重定向到文件，并希望实时查看进度，请使用 Python 的非缓冲模式，在 `python` 后添加 `-u` 参数（即 `python -u paropt.py ...`）。\n\n## 在 CPU 上执行\r\n一些用于加速 GPU 执行的优化（例如自定义 Theano 操作符）会阻止代码在 CPU 上运行。由于神经网络在 CPU 上的执行本身就较慢，我决定放弃对 CPU 的支持，以进一步提升 GPU 上的执行速度。不过，如果您出于某种原因仍希望在 CPU 上运行 GRU4Rec，则需要修改代码以禁用这些自定义的 GPU 优化。这样您仍然可以在 CPU 上运行该代码，但请不要期待它会很快。\n\n**禁用自定义 GPU 优化的步骤：**\n- 在 `gpu_ops.py` 中，将第 13 行改为 `disable_custom_op = True`。这会使 `gpu_ops` 中的函数在计算图构建时返回标准操作符或由标准操作符组合而成的操作符，而不是自定义操作符。\n- 在 `gru4rec.py` 中，注释掉包含 `import custom_opt` 的第 12 行。其中一个自定义操作符通过 `custom_opt` 更深入地集成到了 Theano 中，从而被添加到负责替换计算图中操作符的优化器里。移除这一导入后，该操作符将不再被使用。\n\n\n## 主要更新\n\n### 2023 年 8 月 24 日 更新\n- 添加了 paropt\n- 扩展了关于可重复性的说明\n- 增加了参数文件和参数空间\n- 完善了 README 文件\n\n### 2020 年 5 月 8 日 更新\n- 通过提高 GPU 利用率显著加快了训练速度。\n- 添加了 logQ 归一化（在使用交叉熵损失时可提升效果）。\n- 新增了 `run.py` 脚本，便于实验。\n- 进一步丰富了本 README 文件。\n\n### 2018 年 6 月 8 日 更新\n- 重构并清理代码。\n- 提升执行效率。\n- 改进易用性。\n- 增加了在 GPU 上进行评估的代码。\n\n### 2017 年 6 月 13 日 更新\n- 升级至 v2.0 版本\n- 新增 BPR-max 和 TOP1-max 损失函数，以实现更先进的性能（结合额外采样，召回率和 MRR 相较于基础结果可提升约 30%）。\n- 为追求更快的 GPU 执行速度，牺牲了一部分 CPU 性能。\n\n### 2016 年 12 月 22 日 更新\n- 修复了交叉熵损失的不稳定性问题。此前，极小的预测分数会被四舍五入为 0，导致其对数变为 NaN。现已在计算对数之前为所有分数添加了一个微小的 ε 值（1e-24）。在隐藏单元数为 100 的网络上，使用这种稳定化的交叉熵损失所取得的效果优于 TOP1 损失。\n- 增加了使用额外负样本的选项（除了默认的 minibatch 中的其他样本之外）。额外样本的数量由 n_sample 参数指定。每个样本被选中的概率为 supp^sample_alpha，即当 sample_alpha 设置为 1 时采用基于流行度的采样，而设置为 0 时则为均匀采样。使用额外的负样本可能会降低训练速度，但在 GPU 上，根据配置的不同，这种影响可能并不明显，甚至可以容纳多达 1000–2000 个额外样本。\n- 新增了在训练前预先计算大量负样本的选项。待存储的整数值（ID）数量由 train 函数的 sample_store 参数决定（默认值为 1000 万）。此选项仅适用于额外的负样本，因此只有在 n_sample > 0 时才会生效。如果每一步都重新生成负样本，GPU 的利用率会非常低，因为计算过程经常会被在 CPU 上进行的样本生成打断。而提前预计算好若干步所需的样本，则能使流程更加高效。不过，应避免将 sample_store 设置得过大，因为生成过多样本需要较长时间，会导致 GPU 长时间等待其完成，同时也会增加内存占用。\n\n### 2016 年 9 月 21 日 更新\n- 优化了 GPU 执行代码。现在训练速度提升了约两倍。\n- 增加了重新训练功能。","# GRU4Rec 快速上手指南\n\nGRU4Rec 是基于循环神经网络（RNN）的会话推荐算法原始实现。本指南基于官方 Theano 版本，旨在帮助开发者快速在 GPU 环境下部署和运行模型。\n\n> **注意**：官方强烈建议使用此 Theano 版本作为基准。虽然存在 PyTorch 和 TensorFlow 的重写版本，但未经优化的第三方实现可能导致推荐准确率下降高达 99% 或训练时间延长数百倍。\n\n## 1. 环境准备\n\n本工具专为 **GPU 加速**设计，不支持直接在 CPU 上运行（除非修改源码）。请确保满足以下要求：\n\n### 系统要求\n- **操作系统**: Linux (推荐) 或 Windows (需配置 CUDA 环境)\n- **GPU**: 支持 CUDA 的 NVIDIA 显卡 (如 GTX 1080Ti 及以上)\n- **Python**: 3.6.3 或更高版本 (推荐 3.7\u002F3.8，**不支持** Python 2)\n\n### 核心依赖\n- **CUDA Toolkit**: 推荐 9.2 或 11.8+\n- **cuDNN**: 推荐 8.2.1 (注意：cuDNN v7+ 存在已知 Bug，建议通过配置禁用相关算子)\n- **Python 库**:\n  - `theano` >= 1.0.5 (必须安装 GPU 支持)\n  - `libgpuarray` (最新版，用于 Theano GPU 支持)\n  - `numpy` >= 1.16.4\n  - `pandas` >= 0.24.2\n  - `optuna` >= 3.0.3 (可选，用于超参数优化)\n\n### 国内加速方案\n在安装 Python 依赖时，推荐使用清华或阿里镜像源以提升下载速度：\n```bash\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple theano libgpuarray numpy pandas optuna\n```\n\n## 2. 安装步骤\n\n### 第一步：配置 Theano\n由于 cuDNN 特定版本存在 Bug，必须配置 Theano 禁用有问题的算子。你可以创建或编辑当前目录下的 `.theanorc_gru4rec` 文件，或设置环境变量。\n\n**推荐配置内容 (`.theanorc_gru4rec`)：**\n```ini\n[global]\ndevice = cuda0\nfloatX = float32\nmode = FAST_RUN\n\n[cuda]\nroot = \u002Fusr\u002Flocal\u002Fcuda  # 根据你的实际安装路径调整\n\n[nn]\n# 禁用有 Bug 的 cuDNN 算子\noptimizer_excluding = local_dnn_reduction:local_cudnn_maxandargmax:local_dnn_argmax\n```\n\n### 第二步：获取代码\n克隆官方仓库：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec.git\ncd GRU4Rec\n```\n\n### 第三步：验证环境\n确保 `THEANO_FLAGS` 指向正确的 GPU 配置。如果未使用 `run.py` 自动加载配置，请在终端导出变量：\n```bash\nexport THEANO_FLAGS=\"device=cuda0,floatX=float32,mode=FAST_RUN,optimizer_excluding=local_dnn_reduction:local_cudnn_maxandargmax:local_dnn_argmax\"\n```\n\n## 3. 基本使用\n\nGRU4Rec 主要通过 `run.py` 脚本进行训练、评估和模型保存。数据格式要求为 TAB 分隔的文件 (`.tsv` 或 `.txt`) 或 pickle 文件，包含 SessionID, ItemID, Time 等列。\n\n### 最简单的使用示例\n\n以下命令演示了如何加载训练数据、设置关键超参数、训练模型并保存结果。\n\n```bash\nTHEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data.tsv \\\n  -ps \"layers=100,batch_size=50,loss=bpr-max,n_epochs=10\" \\\n  -s \u002Fpath\u002Fto\u002Fsave_model.pickle\n```\n\n**参数说明：**\n- `\u002Fpath\u002Fto\u002Ftraining_data.tsv`: 训练数据路径。\n- `-ps`: 参数字符串，格式为 `key=value,key2=value2`。\n  - `layers=100`: 隐藏层单元数。\n  - `batch_size=50`: 批次大小。\n  - `loss=bpr-max`: 损失函数类型。\n  - `n_epochs=10`: 训练轮数。\n- `-s`: 训练完成后保存模型的路径。\n\n### 训练并同时评估\n\n若需在训练后立即在测试集上评估 Recall 和 MRR 指标：\n\n```bash\nTHEANO_FLAGS=device=cuda0 python run.py \u002Fpath\u002Fto\u002Ftraining_data.tsv \\\n  -t \u002Fpath\u002Fto\u002Ftest_data.tsv \\\n  -m 5 10 20 \\\n  -ps \"layers=100,batch_size=50,loss=bpr-max,n_epochs=10\"\n```\n\n**额外参数说明：**\n- `-t`: 测试数据路径（可指定多个）。\n- `-m 5 10 20`: 计算推荐列表长度为 5、10、20 时的指标。\n\n> **提示**：首次运行时，Theano 会编译计算图，可能需要几分钟时间。后续运行将显著加快。请确保输入数据已按 Session 和时间戳排序。","某大型电商平台的推荐算法团队正致力于优化用户在单次浏览会话（Session）中的商品点击预测，以解决新用户无历史行为数据时的冷启动难题。\n\n### 没有 GRU4Rec 时\n- **忽略序列动态**：传统协同过滤仅依赖用户长期历史画像，无法捕捉用户在当前会话中“从看手机到看手机壳”的实时意图流转。\n- **冷启动失效**：对于未登录游客或新注册用户，因缺乏历史交互数据，系统只能推送泛热门商品，转化率极低。\n- **计算效率瓶颈**：尝试使用未经优化的第三方 RNN 复现版本处理海量会话日志时，训练速度比理论值慢数百倍，且推荐准确率因实现缺陷暴跌近 99%。\n- **资源浪费严重**：模型难以在 GPU 上高效并行，97% 以上的算力时间被浪费在数据搬运而非核心计算上，导致迭代周期长达数周。\n\n### 使用 GRU4Rec 后\n- **精准捕捉时序**：GRU4Rec 利用循环神经网络直接建模会话内的物品点击序列，精准预测用户下一秒最可能感兴趣的商品。\n- **无缝覆盖匿名客**：无需任何用户身份信息，仅凭当前会话的短期行为即可生成高质量推荐，显著提升游客群体的购买转化。\n- **复现官方性能**：采用经过验证的 Theano 官方实现（或其 PyTorch\u002FTensorFlow 官方移植版），确保了算法逻辑的完整性，避免了非官方版本导致的精度崩塌。\n- **极致训练加速**：依托针对 GPU 深度优化的代码架构，在 GTX 1080Ti 上可实现每秒 1500 个迷你批次的处理速度，将模型训练时间从数周缩短至数小时。\n\nGRU4Rec 通过专为会话设计的深度学习架构，将原本不可用的匿名流量转化为高价值的实时推荐机会，同时以工业级的运行效率保障了算法的快速落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidasib_GRU4Rec_f2aba60d.png","hidasib","Balázs Hidasi","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhidasib_e951a72a.jpg",null,"Gravity R&D","Hungary","balazshidasi","http:\u002F\u002Fhidasi.eu","https:\u002F\u002Fgithub.com\u002Fhidasib",[86],{"name":87,"color":88,"percentage":89},"Python","#3572A5",100,805,226,"2026-04-17T08:32:51","NOASSERTION",5,"未说明","必需 NVIDIA GPU (代码针对 GPU 优化，CPU 运行不支持且需修改代码)。示例提及 GTX 1080Ti。需安装 CUDA (测试过 9.2，兼容 11.8 等新版) 和 libgpuarray。强烈建议禁用 cuDNN (因 v7+ 存在严重 Bug)，或限制在 cuDNN 8.2.1。","未说明 (但提到负样本缓冲区会占用大量 GPU 显存，默认 10GB 缓冲区大小)",{"notes":99,"python":100,"dependencies":101},"1. 必须配置 Theano 使用 GPU (device=cudaX, floatX=float32)。2. 必须在 Theano 配置中排除有缺陷的 cuDNN 算子 (optimizer_excluding=local_dnn_reduction:local_cudnn_maxandargmax:local_dnn_argmax)，否则可能导致计算错误或崩溃。3. 官方警告不要使用非官方的 PyTorch\u002FTensorFlow 复现版本，因其准确率低且训练慢。4. 商业使用需要授权。","3.6.3+",[102,103,104,105,106],"numpy>=1.16.4","pandas>=0.24.2","theano>=1.0.5","libgpuarray","optuna>=3.0.3 (可选)",[18],"2026-03-27T02:49:30.150509","2026-04-18T14:25:54.600527",[111,116,121,126,131,135],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},40019,"为什么在 GPU 上的训练速度比 CPU 还慢？","这通常是因为 Theano 配置不当。请在家目录（$HOME）下创建 `.theanorc` 文件，并添加以下最小化配置以确保启用 GPU 和浮点运算优化：\n\n[global]\nfloatX = float32\ndevice = cuda0\n\n[lib]\ncnmem = 1\n\n[nvcc]\nfastmath = True\n\n配置完成后，可以运行测试脚本验证是否真正使用了 GPU（检查输出是否显示 'Used the gpu' 而非 'Used the cpu'）。","https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fissues\u002F7",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},40020,"如何正确获取模型的预测结果（predict_next_batch）？","调用 `predict_next_batch` 时需要提供正确的会话 ID 和输入物品 ID。示例代码如下：\n\nbatch_size = 10\nsession_ids = valid.SessionId.values[0:batch_size]\ninput_item_ids = valid.ItemId.values[0:batch_size]\npredict_for_item_ids = None\n\npreds = gru.predict_next_batch(session_ids, input_item_ids, predict_for_item_ids, batch_size)\npreds.fillna(0, inplace=True)\n\n注意：返回的预测结果 DataFrame 的列名是批次内的索引，如果需要对应具体的 session_id，可能需要重命名索引。如果不需要采样特定物品，`predict_for_item_ids` 可设为 None。","https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fissues\u002F9",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},40021,"运行时提示找不到数据文件（如 rsc15_train_full.txt），数据从哪里获取？","代码本身不包含数据集，需要用户自行下载 RecSys Challenge 2015 (RSC15) 数据集。下载后，需使用项目提供的 `preprocess.py` 脚本进行预处理才能生成代码所需的训练文件。运行预处理脚本后，会生成包含 Events、Sessions 和 Items 统计信息的完整训练集文件。","https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fissues\u002F6",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},40022,"评估指标计算时（get_metrics），隐藏状态（hidden states）是否需要重置？","在该实现中，不需要手动在 `get_metrics` 中重置隐藏状态。\n1. 如果使用推荐的 `evaluate_gpu` 函数（位于 `evaluation.py`），隐藏状态会在第 184-189 行自动重置为零。\n2. 如果使用 `evaluate_sessions_batch`，隐藏状态会在 `gru4rec.py` 的 `predict_next_batch` 方法中，当检测到会话索引变化时（第 657-652 行）自动重置为零。\n注意：不要尝试直接访问 `model.layers` 来重置状态，因为在此实现中该属性仅表示层大小的列表，而非实际的 Keras 层对象。","https:\u002F\u002Fgithub.com\u002Fhidasib\u002FGRU4Rec\u002Fissues\u002F38",{"id":132,"question_zh":133,"answer_zh":134,"source_url":120},40023,"如何在评估时处理物品采样（sampling）？","在评估过程中，如果启用了采样，需要将目标物品（out_idx）与采样物品合并传递给 `predict_next_batch`。具体逻辑如下：\n\nuniq_out = np.unique(np.array(out_idx, dtype=np.int32))\npredict_for_item_ids = np.hstack([items, uniq_out[~np.in1d(uniq_out, items)]])\npreds = pr.predict_next_batch(iters, in_idx, predict_for_item_ids, batch_size)\n\n其中 `items` 是采样的物品集合，`uniq_out` 是真实的下一个物品。这样可以确保真实物品包含在预测候选集中。如果不采样，直接将 `predict_for_item_ids` 设为 None 即可。",{"id":136,"question_zh":137,"answer_zh":138,"source_url":115},40024,"增加 GRU 单元数量对训练时间和准确率有什么影响？","根据论文和实验反馈：\n1. GRU 方法相比 item-KNN 在评估指标上有显著提升，即使单元数仅为 100。\n2. 增加单元数（例如从 100 增加到 1000）可以进一步改善成对损失（pairwise losses）的结果，但可能会降低交叉熵损失的准确率。\n3. 虽然增加单元数会增加训练时间，但在 GPU（如 GeForce GTX Titan X）上，从 100 单元扩展到 1000 单元的训练成本并不高昂，通常在几小时内即可完成。",[]]