papers-I-read
papers-I-read 是一个由研究者发起的“每周一篇论文”开源项目,旨在系统性地整理、解读和分享人工智能及计算机系统领域的高质量学术文献。面对海量且更新迅速的科研论文,许多从业者在筛选重点和理解核心思想上耗费大量精力,papers-I-read 通过提供精选论文列表及其配套的详细摘要与笔记,有效降低了阅读门槛,帮助用户快速把握前沿技术脉络。
该项目涵盖的内容十分广泛,既包括 Toolformer、超网络(HyperNetworks)、持续学习等前沿 AI 算法研究,也深入探讨了 YouTube 推荐系统、广告点击预测、分布式数据库设计(如 Cassandra、CAP 定理)等工业界实战经验。其独特亮点在于不仅关注模型理论,更强调从系统设计和工程落地的视角去解析论文,提供了难得的“战地视角”与实践教训。
papers-I-read 非常适合 AI 研究人员、算法工程师、系统架构师以及希望深入了解技术底层逻辑的开发者使用。对于想要追踪学术动态、寻找灵感或补充系统知识盲区的专业人士而言,这是一个极具价值的知识库与学习指南,能帮助大家在繁忙的工作中高效获取经过提炼的技术精华。
使用场景
某大厂推荐算法团队的资深工程师正在为新一代视频推荐系统寻找能够平衡大规模训练效率与解决“灾难性遗忘”问题的前沿方案。
没有 papers-I-read 时
- 信息检索低效:需要在 arXiv、Google Scholar 等多个平台反复搜索"Continual Learning"或"Pipeline Parallelism"等关键词,耗费数天筛选高价值论文。
- 核心观点难提炼:面对《GPipe》或《Anatomy of Catastrophic Forgetting》等长篇幅技术文档,难以快速抓住其针对分布式训练或模型遗忘的具体优化策略。
- 知识体系碎片化:读过的论文笔记散落在个人博客、本地文档和书签中,无法将《Deep Neural Networks for YouTube Recommendations》与最新的迁移学习理论建立关联。
- 落地参考缺失:缺乏像《Practical Lessons from Predicting Clicks on Ads at Facebook》这样包含工业界实战教训的整理,导致方案设计容易重蹈覆辙。
使用 papers-I-read 后
- 精准直达主题:直接通过目录定位到《GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism》,立即获取微批次流水线并行的核心实现思路。
- 摘要辅助决策:利用仓库中提供的《Remembering for the Right Reasons》等论文的精选摘要,快速理解如何利用解释性机制减少灾难性遗忘,节省 80% 的阅读时间。
- 系统化知识串联:借助作者对《Toolformer》到《HyperNetworks》等一系列文章的连续整理,自然构建起从工具使用到持续学习的完整技术演进视图。
- 避坑指南现成:直接参考《Searching for Build Debt》和《CAP twelve years later》中的实战经验,在架构设计阶段就规避了分布式一致性和技术债务管理的常见陷阱。
papers-I-read 将分散的学术孤岛转化为结构化的工业界实战地图,让工程师能从“大海捞针”转变为“站在巨人肩膀上”进行高效创新。
运行环境要求
未说明
未说明

快速开始
我读过的论文
我正在尝试一项新计划——每周读一篇论文。这个仓库将存放所有这些论文以及相关的摘要和笔记。
论文列表
- Toolformer - Language Models Can Teach Themselves to Use Tools
- Hints for Computer System Design
- Synthesized Policies for Transfer and Adaptation across Tasks and Environments
- Deep Neural Networks for YouTube Recommendations
- The Tail at Scale
- Practical Lessons from Predicting Clicks on Ads at Facebook
- Ad Click Prediction - a View from the Trenches
- Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics
- When Do Curricula Work?
- Continual learning with hypernetworks
- Zero-shot Learning by Generating Task-specific Adapters
- HyperNetworks
- Energy-based Models for Continual Learning
- GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism
- Compositional Explanations of Neurons
- Design patterns for container-based distributed systems
- Cassandra - a decentralized structured storage system
- CAP twelve years later - How the rules have changed
- Consistency Tradeoffs in Modern Distributed Database System Design
- Exploring Simple Siamese Representation Learning
- Data Management for Internet-Scale Single-Sign-On
- Searching for Build Debt - Experiences Managing Technical Debt at Google
- One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL
- Learning Explanations That Are Hard To Vary
- Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting
- A Foliated View of Transfer Learning
- Harvest, Yield, and Scalable Tolerant Systems
- MONet - Unsupervised Scene Decomposition and Representation
- Revisiting Fundamentals of Experience Replay
- Deep Reinforcement Learning and the Deadly Triad
- Alpha Net: Adaptation with Composition in Classifier Space
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- Gradient Surgery for Multi-Task Learning
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- TaskNorm: Rethinking Batch Normalization for Meta-Learning
- Averaging Weights leads to Wider Optima and Better Generalization
- Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
- When to use parametric models in reinforcement learning?
- Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- On the Difficulty of Warm-Starting Neural Network Training
- Supervised Contrastive Learning
- CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- Competitive Training of Mixtures of Independent Deep Generative Models
- What Does Classifying More Than 10,000 Image Categories Tell Us?
- mixup - Beyond Empirical Risk Minimization
- ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- Gradient based sample selection for online continual learning
- Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- Observational Overfitting in Reinforcement Learning
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
- Superposition of many models into one
- Towards a Unified Theory of State Abstraction for MDPs
- ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Contrastive Learning of Structured World Models
- Gossip based Actor-Learner Architectures for Deep RL
- How to train your MAML
- PHYRE - A New Benchmark for Physical Reasoning
- Large Memory Layers with Product Keys
- Abductive Commonsense Reasoning
- Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Assessing Generalization in Deep Reinforcement Learning
- Quantifying Generalization in Reinforcement Learning
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
- Measuring abstract reasoning in neural networks
- Hamiltonian Neural Networks
- Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- Meta-Reinforcement Learning of Structured Exploration Strategies
- Relational Reinforcement Learning
- Good-Enough Compositional Data Augmentation
- Multiple Model-Based Reinforcement Learning
- Towards a natural benchmark for continual learning
- Meta-Learning Update Rules for Unsupervised Representation Learning
- GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- Model Primitive Hierarchical Lifelong Reinforcement Learning
- TuckER - Tensor Factorization for Knowledge Graph Completion
- Linguistic Knowledge as Memory for Recurrent Neural Networks
- Diversity is All You Need - Learning Skills without a Reward Function
- Modular meta-learning
- Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- Efficient Lifelong Learningi with A-GEM
- Pre-training Graph Neural Networks with Kernels
- Smooth Loss Functions for Deep Top-k Classification
- Hindsight Experience Replay
- Representation Tradeoffs for Hyperbolic Embeddings
- Learned Optimizers that Scale and Generalize
- One-shot Learning with Memory-Augmented Neural Networks
- BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- Poincaré Embeddings for Learning Hierarchical Representations
- When Recurrent Models Don’t Need To Be Recurrent
- HoME - a Household Multimodal Environment
- Emergence of Grounded Compositional Language in Multi-Agent Populations
- A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- Hierarchical Graph Representation Learning with Differentiable Pooling
- Imagination-Augmented Agents for Deep Reinforcement Learning
- Kronecker Recurrent Units
- Learning Independent Causal Mechanisms
- Memory-based Parameter Adaptation
- Born Again Neural Networks
- Net2Net-Accelerating Learning via Knowledge Transfer
- Learning to Count Objects in Natural Images for Visual Question Answering
- Neural Message Passing for Quantum Chemistry
- Unsupervised Learning by Predicting Noise
- The Lottery Ticket Hypothesis - Training Pruned Neural Networks
- Cyclical Learning Rates for Training Neural Networks
- Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
- Learning an SAT Solver from Single-Bit Supervision
- Neural Relational Inference for Interacting Systems
- Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- Get To The Point: Summarization with Pointer-Generator Networks
- StarSpace - Embed All The Things!
- Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory
- Exploring Models and Data for Image Question Answering
- How transferable are features in deep neural networks
- Distilling the Knowledge in a Neural Network
- Revisiting Semi-Supervised Learning with Graph Embeddings
- Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- Higher-order organization of complex networks
- Network Motifs - Simple Building Blocks of Complex Networks
- Word Representations via Gaussian Embedding
- HARP - Hierarchical Representation Learning for Networks
- Swish - a Self-Gated Activation Function
- Reading Wikipedia to Answer Open-Domain Questions
- Task-Oriented Query Reformulation with Reinforcement Learning
- Refining Source Representations with Relation Networks for Neural Machine Translation
- Pointer Networks
- Learning to Compute Word Embeddings On the Fly
- R-NET - Machine Reading Comprehension with Self-matching Networks
- ReasoNet - Learning to Stop Reading in Machine Comprehension
- Principled Detection of Out-of-Distribution Examples in Neural Networks
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
- One Model To Learn Them All
- Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- A Decomposable Attention Model for Natural Language Inference
- A Fast and Accurate Dependency Parser using Neural Networks
- Neural Module Networks
- Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
- Conditional Similarity Networks
- Simple Baseline for Visual Question Answering
- VQA: Visual Question Answering
- Learning to Generate Reviews and Discovering Sentiment
- Seeing the Arrow of Time
- End-to-end optimization of goal-driven and visually grounded dialogue systems
- GuessWhat?! Visual object discovery through multi-modal dialogue
- Semantic Parsing via Paraphrasing
- Traversing Knowledge Graphs in Vector Space
- PPDB: The Paraphrase Database
- NewsQA: A Machine Comprehension Dataset
- A Persona-Based Neural Conversation Model
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier
- Conditional Generative Adversarial Nets
- Addressing the Rare Word Problem in Neural Machine Translation
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Improving Word Representations via Global Context and Multiple Word Prototypes
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Skip-Thought Vectors
- Deep Convolutional Generative Adversarial Nets
- Generative Adversarial Nets
- A Roadmap towards Machine Intelligence
- Smart Reply: Automated Response Suggestion for Email
- Convolutional Neural Network For Sentence Classification
- Conditional Image Generation with PixelCNN Decoders
- Pixel Recurrent Neural Networks
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- Bag of Tricks for Efficient Text Classification
- GloVe: Global Vectors for Word Representation
- SimRank: A Measure of Structural-Context Similarity
- How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
- Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge
- WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia
- WikiQA: A challenge dataset for open-domain question answering
- Teaching Machines to Read and Comprehend
- Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems
- Recurrent Neural Network Regularization
- Deep Math: Deep Sequence Models for Premise Selection
- A Neural Conversational Model
- Key-Value Memory Networks for Directly Reading Documents
- Advances In Optimizing Recurrent Networks
- Query Regression Networks for Machine Comprehension
- Sequence to Sequence Learning with Neural Networks
- The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
- Question Answering with Subgraph Embeddings
- Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
- Visualizing Large-scale and High-dimensional Data
- Visualizing Data using t-SNE
- Curriculum Learning
- End-To-End Memory Networks
- Memory Networks
- Learning To Execute
- Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
- Large Scale Distributed Deep Networks
- Efficient Estimation of Word Representations in Vector Space
- Regularization and variable selection via the elastic net
- Fractional Max-Pooling
- TAO: Facebook’s Distributed Data Store for the Social Graph
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- The Unified Logging Infrastructure for Data Analytics at Twitter
- A Few Useful Things to Know about Machine Learning
- Hive – A Petabyte Scale Data Warehouse Using Hadoop
- Kafka: a Distributed Messaging System for Log Processing
- Power-law distributions in Empirical data
- Pregel: A System for Large-Scale Graph Processing
- GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
- Pig Latin: A Not-So-Foreign Language for Data Processing
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- MapReduce: Simplified Data Processing on Large Clusters
- BigTable: A Distributed Storage System for Structured Data
- Spark SQL: Relational Data Processing in Spark
- Spark: Cluster Computing with Working Sets
- Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture
- Scaling Memcache at Facebook
- Dynamo: Amazon’s Highly Available Key-value Store
- f4 : Facebook's Warm BLOB Storage System
- A Theoretician’s Guide to the Experimental Analysis of Algorithms
- Cuckoo Hashing
- Never Ending Learning
相似工具推荐
openclaw
OpenClaw 是一款专为个人打造的本地化 AI 助手,旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚,能够直接接入你日常使用的各类通讯渠道,包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息,OpenClaw 都能即时响应,甚至支持在 macOS、iOS 和 Android 设备上进行语音交互,并提供实时的画布渲染功能供你操控。 这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地,用户无需依赖云端服务即可享受快速、私密的智能辅助,真正实现了“你的数据,你做主”。其独特的技术亮点在于强大的网关架构,将控制平面与核心助手分离,确保跨平台通信的流畅性与扩展性。 OpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者,以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力(支持 macOS、Linux 及 Windows WSL2),即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你
n8n
n8n 是一款面向技术团队的公平代码(fair-code)工作流自动化平台,旨在让用户在享受低代码快速构建便利的同时,保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点,帮助用户轻松连接 400 多种应用与服务,实现复杂业务流程的自动化。 n8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”:既可以通过直观的可视化界面拖拽节点搭建流程,也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外,n8n 原生集成了基于 LangChain 的 AI 能力,支持用户利用自有数据和模型构建智能体工作流。在部署方面,n8n 提供极高的自由度,支持完全自托管以保障数据隐私和控制权,也提供云端服务选项。凭借活跃的社区生态和数百个现成模板,n8n 让构建强大且可控的自动化系统变得简单高效。
AutoGPT
AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台,核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点,让用户只需设定目标,AI 即可自主规划步骤、调用工具并持续运行直至完成任务。 无论是开发者、研究人员,还是希望提升工作效率的普通用户,都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体;研究人员能基于开源架构探索多智能体协作机制;而非技术背景用户也可直接选用预置的智能体模板,立即投入实际工作场景。 AutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑,每个块负责单一动作,灵活且易于调试。同时,平台支持本地自托管与云端部署两种模式,兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本,即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛,让人人都能成为 AI 的创造者与受益者。
stable-diffusion-webui
stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面,旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点,将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。 无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师,还是想要深入探索模型潜力的开发者与研究人员,都能从中获益。其核心亮点在于极高的功能丰富度:不仅支持文生图、图生图、局部重绘(Inpainting)和外绘(Outpainting)等基础模式,还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外,它内置了 GFPGAN 和 CodeFormer 等人脸修复工具,支持多种神经网络放大算法,并允许用户通过插件系统无限扩展能力。即使是显存有限的设备,stable-diffusion-webui 也提供了相应的优化选项,让高质量的 AI 艺术创作变得触手可及。
everything-claude-code
everything-claude-code 是一套专为 AI 编程助手(如 Claude Code、Codex、Cursor 等)打造的高性能优化系统。它不仅仅是一组配置文件,而是一个经过长期实战打磨的完整框架,旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。 通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能,everything-claude-code 能显著提升 AI 在复杂任务中的表现,帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略,使得模型响应更快、成本更低,同时有效防御潜在的攻击向量。 这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库,还是需要 AI 协助进行安全审计与自动化测试,everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目,它融合了多语言支持与丰富的实战钩子(hooks),让 AI 真正成长为懂上
ComfyUI
ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎,专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式,采用直观的节点式流程图界面,让用户通过连接不同的功能模块即可构建个性化的生成管线。 这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景,也能自由组合模型、调整参数并实时预览效果,轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性,不仅支持 Windows、macOS 和 Linux 全平台,还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构,并率先支持 SDXL、Flux、SD3 等前沿模型。 无论是希望深入探索算法潜力的研究人员和开发者,还是追求极致创作自由度的设计师与资深 AI 绘画爱好者,ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能,使其成为当前最灵活、生态最丰富的开源扩散模型工具之一,帮助用户将创意高效转化为现实。