[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-WXY604--LLM-based-causal-discovery":3,"tool-WXY604--LLM-based-causal-discovery":65},[4,17,25,39,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":10,"last_commit_at":23,"category_tags":24,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":26,"name":27,"github_repo":28,"description_zh":29,"stars":30,"difficulty_score":10,"last_commit_at":31,"category_tags":32,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[33,34,35,36,14,37,15,13,38],"图像","数据工具","视频","插件","其他","音频",{"id":40,"name":41,"github_repo":42,"description_zh":43,"stars":44,"difficulty_score":45,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[14,33,13,15,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":45,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[15,33,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":62,"last_commit_at":63,"category_tags":64,"status":16},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[13,37],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":69,"owner_avatar_url":78,"owner_bio":69,"owner_company":69,"owner_location":69,"owner_email":69,"owner_twitter":69,"owner_website":69,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":69,"difficulty_score":10,"env_os":88,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":69,"view_count":45,"oss_zip_url":69,"oss_zip_packed_at":69,"status":16,"created_at":93,"updated_at":94,"faqs":95,"releases":96},557,"WXY604\u002FLLM-based-causal-discovery","LLM-based-causal-discovery",null,"LLM-based-causal-discovery 是一款专为数据科学领域设计的开源工具包，旨在利用大语言模型（LLM）的力量优化因果发现过程。传统的因果推断方法往往需要昂贵的领域专家知识作为指导，而直接使用 LLM 提供的先验知识又面临输出不稳定、前后矛盾的挑战。\n\nLLM-based-causal-discovery 的核心价值在于巧妙避开了直接询问抽象因果关系的不确定性，转而引导 LLM 判断变量之间的具体时间顺序。由于时间顺序的判断更为直观可靠，LLM-based-causal-discovery 能够高效提取高置信度的时序信息，并通过内置机制将分散甚至冲突的局部判断整合为全局一致的变量排序。最终生成的有序先验知识可以灵活适配多种主流因果发现算法，显著提升从观测数据中构建因果结构的准确性与稳健性。\n\nLLM-based-causal-discovery 非常适合从事数据分析、机器学习研究或因果推断开发的科研人员与工程师使用。通过降低获取高质量先验知识的门槛，LLM-based-causal-discovery 帮助团队在节省成本的同时，获得更客观、更可靠的分析结果，是探索复杂","LLM-based-causal-discovery 是一款专为数据科学领域设计的开源工具包，旨在利用大语言模型（LLM）的力量优化因果发现过程。传统的因果推断方法往往需要昂贵的领域专家知识作为指导，而直接使用 LLM 提供的先验知识又面临输出不稳定、前后矛盾的挑战。\n\nLLM-based-causal-discovery 的核心价值在于巧妙避开了直接询问抽象因果关系的不确定性，转而引导 LLM 判断变量之间的具体时间顺序。由于时间顺序的判断更为直观可靠，LLM-based-causal-discovery 能够高效提取高置信度的时序信息，并通过内置机制将分散甚至冲突的局部判断整合为全局一致的变量排序。最终生成的有序先验知识可以灵活适配多种主流因果发现算法，显著提升从观测数据中构建因果结构的准确性与稳健性。\n\nLLM-based-causal-discovery 非常适合从事数据分析、机器学习研究或因果推断开发的科研人员与工程师使用。通过降低获取高质量先验知识的门槛，LLM-based-causal-discovery 帮助团队在节省成本的同时，获得更客观、更可靠的分析结果，是探索复杂数据背后因果逻辑的理想助手。","\n# LLM-Augmented Causal Discovery Toolkit: A Technical Introduction\n\n## **Background and Challenges**\n\nInferring causal relationships from observational data is a core challenge in data science and related research fields. Traditional causal discovery methods rely heavily on prior knowledge from domain experts for guidance. However, acquiring such knowledge often involves significant costs in terms of both time and money, which largely limits the application scope of advanced causal discovery techniques.\n\n## **Opportunities and Challenges of Large Language Models**\n\nThe emergence of Large Language Models (LLMs) has provided new possibilities for acquiring prior knowledge. By querying LLMs about the relationships between variables, researchers can obtain judgments that approach the level of an expert. The advantage of this approach lies in significantly reducing the cost of knowledge acquisition. Furthermore, in some scenarios, the knowledge provided by LLMs can be more objective than the judgments of non-professionals.\n\nHowever, this new paradigm is also accompanied by challenges. LLMs have inherent instability; queries on the same topic may return inconsistent or even self-contradictory results. Using this inaccurate or internally contradictory information directly as priors can not only fail to improve model performance but may also negatively impact the accuracy of the final analysis.\n\n## **Our Approach and the Toolkit's Core Functionality**\n\nTo address this challenge, we have developed this toolkit. It aims to fully leverage the powerful knowledge base of LLMs while systematically mitigating the risks associated with their instability.\n\nOur core approach is inspired by the latest research findings: guiding an LLM to determine the concrete temporal order of events yields more reliable and stable outputs compared to directly asking it to judge abstract causal relationships.\n\nBased on this, the core functionalities of this toolkit include:\n\n  * **Structured Knowledge Elicitation**: After the user defines the research scenario and variables, the toolkit automatically generates structured queries to guide the LLM, efficiently extracting high-confidence information regarding the temporal order of variables.\n  * **Integration and Refinement of Model Outputs**: The toolkit includes a built-in analytical mechanism to process the initial information returned by the LLM. It systematically integrates these potentially inconsistent, localized judgments with the goal of refining a more globally consistent and reliable variable ordering.\n  * **Compatibility with Downstream Algorithms**: The refined temporal priors produced by the toolkit can serve as high-quality constraints and be flexibly applied to various mainstream causal discovery algorithms, helping to construct more accurate and robust causal structures from real-world data.\n\n\n## **Framework Overview**\n\n\n![Figure1.](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_f5bf9f888ff5.png)\n\nThe above framework diagram provides a more intuitive understanding of the entire workflow.\n\nThe entire process can be summarized into several high-level stages:\n\n  * **Stage 1: Initial Knowledge Generation (Partial Order Generation)**\n\n      * In this stage, we initiate structured queries to the LLM through scenario simulation and metadata input.\n      * The objective is to obtain the model's preliminary, discrete judgments about the temporal sequence of variables.\n\n  * **Stage 2: Knowledge Integration and Refinement (Conflicting Decomposition & Optimal Total Order Discovery)**\n\n      * This is a critical step in the process. The toolkit systematically analyzes all preliminary judgments obtained from the LLM.\n      * It integrates these scattered and potentially inconsistent local pieces of information with the aim of refining a more globally consistent and reliable variable ordering.\n\n  * **Stage 3: Guiding Downstream Analysis (Order-based Causality)**\n\n      * Finally, this refined global ordering serves as a high-quality prior knowledge.\n      * It can be input into any standard causal discovery algorithm chosen by the user, acting as a strong external guide to help the algorithm converge more accurately on the real data to infer the final causal graph.\n\nIn short, the core of this framework is to transform the potentially vague, contradictory, and localized knowledge provided by an LLM into a clear and reliable global variable ordering through a series of systematic steps, thereby providing effective support for data-driven causal learning.\n\n\n\n## **Conclusion and Outlook**\n\nWe expect this toolkit to provide effective support for professionals engaged in causal science research and practice, helping users leverage the powerful knowledge source of Large Language Models more conveniently and reliably. We believe this approach offers a valuable technical direction for performing causal discovery in a stable and cost-effective manner and look forward to promoting the further development of this field in collaboration with both academia and industry.\n\n\n# Several LLM-based Methods Can Be Used\n\nThis section introduces some methods for discovering and generating priors using large language models, for reference.\n\n## LLM-Driven Causal Discovery via Harmonized Prior\n\nThe core idea of this framework is to use an LLM as a knowledge expert. Through specially designed Prompting Strategies, it guides the LLM to perform causal reasoning from two different yet complementary perspectives to generate a reliable \"Harmonized Prior\". This harmonized prior is then integrated into mainstream causal structure learning algorithms to enhance the accuracy and reliability of discovering causal relationships from data.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_b426bc1faa6d.png\" alt=\"Harmonized_1\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\nThe overall logical flow of the framework is shown in the left figure and primarily consists of three parts: Dual-Expert LLM Reasoning, Harmonized Prior Construction, and Plug-and-Play Structure Learning.\n\n\n\n**1. Dual-Expert LLM Reasoning Module**\nTo ensure the accuracy of the causal knowledge provided by the LLM, this framework does not have the LLM directly judge the complex relationships between all pairs of variables. Instead, it configures the LLM into two different expert roles focused on specific tasks: the Conservative Expert and the Exploratory Expert.\n\n* **Conservative Expert - Aims for Precision**\n    * As shown in the left figure, the goal of the Conservative Expert is to identify the most explicit and reliable causal relationships.\n    * It first uses \"single-step reasoning\" to quickly screen for causal pairs with the highest confidence.\n    * Subsequently, it employs a \"Decomposition and Verification\" strategy to meticulously verify and reconfirm these selected relationships one by one, in order to filter out potential spurious associations.\n    * The final output is a high-precision set of causal relationships, $\\lambda_p$, which is used as a \"Path Existence\" constraint. That is, if $(A,B)$ is in this set, it is believed that a path from A to B exists in the true causal graph.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_90563a1040b6.png\" alt=\"Harmonized_2\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\n* **Exploratory Expert - Aims for Recall**\n    * As shown in the left figure, the goal of the Exploratory Expert is to identify all potential causal links as comprehensively as possible.\n    * This module centers on each variable, analyzing one by one which other variables in the dataset could be its direct causes.\n    * Through this \"Decomposition and Exploration\" approach, it generates a list of \"possible causes\" $C(x_i)$ for each variable.\n    * All these possible causes are aggregated into a high-recall set of causal relationships, $\\lambda_r$. This set is used to define an \"Edge Absence\" constraint, meaning if a causal relationship $(A,B)$ does not appear in this set, generating a direct edge from A to B in the final causal graph is forbidden.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_a126b2478f8b.png\" alt=\"Harmonized_3\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\n**2. Harmonized Prior Construction**\nThe framework fuses the causal knowledge output by the two aforementioned experts to construct a unified \"Harmonized Prior\". This harmonized prior combines the advantages of both:\n\n* **Path Existence Constraint:** Utilizes the high-precision causal relationships $\\lambda_p$ output by the Conservative Expert.\n* **Edge Absence Constraint:** Utilizes the high-recall causal relationships $\\lambda_r$ output by the Exploratory Expert to define the scope of possible direct edges.\n\nIn this way, it not only ensures that strong causal signals are not lost but also effectively constrains the search space for structure learning by ruling out a large number of impossible causal connections, thereby improving overall accuracy.\n\n**3. Integration with Structure Learning Algorithms**\nFinally, this constructed \"Harmonized Prior\" is integrated in a \"plug-and-play\" manner into various mainstream causal structure learning algorithms, as shown in the top box of Figure 1. Whether they are Score-based, Constraint-based, or Gradient-based methods, all can leverage this harmonized prior to guide their search process, ultimately learning a more accurate and reliable causal graph from observational data.\n\n\n\n## From Query Tools to Causal Architects\n\nThis section will introduce another framework for leveraging Large Language Models (LLMs) for causal discovery. Unlike the aforementioned methods, this framework adopts a three-stage sequential prompting process, designed to extract and revise causal knowledge from an LLM, and then use this knowledge as prior information to guide traditional data-driven causal structure learning algorithms.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_2ffb0883d94f.png\" alt=\"Query_1\" width=\"500\" align=\"right\" hspace=\"15\" vspace=\"5\">\n\nThe core logic of the entire framework can be understood through the example prompts in the table on the right, which clearly demonstrates the three core stages: Variable Understanding, Causal Discovery, and Error Revision.\n\n\n**1. Stage One: Variable Understanding**\n\nThe objective of this stage is to first have the LLM accurately understand the real-world meaning of each variable in the dataset.\n\n* **Input:** The researcher provides the LLM with the symbol and possible values for each variable. This is typically the most basic information available in a standard dataset.\n* **Example Prompt (\"Prompt Understand\"):** As shown in Table 1, the prompt asks the LLM to act as an expert in a specific domain and explain the meaning of each variable based on its symbol and values.\n* **Output:** The LLM generates a detailed textual description for each variable, laying the foundation for a subsequent causal inference.\n\n**2. Stage Two: Causal Discovery**\n\nBased on a full understanding of the variables' meanings, the LLM is asked to identify the causal relationships among them.\n\n* **Example Prompt (\"Prompt Causal Discovery\"):** The prompt for this stage requires the LLM to analyze the causal effects between variables and output the results in the form of a directed graph network.\n* **Key Requirement:** The prompt here explicitly emphasizes that every edge in the graph must represent a **direct causal relationship** between the two variables.\n\n**3. Stage Three: Error Revision**\n\nTo enhance the reliability of the LLM's output, the framework introduces a self-revision stage, prompting the LLM to check and correct its own previously generated conclusions.\n\n* **Example Prompt (\"Prompt Revision\"):** The causal statements generated in the second stage (e.g., $x_i \\rightarrow x_j$) are used as input, asking the LLM in return whether these statements are correct and requesting it to provide reasons.\n* **Objective:** Through this self-checking mechanism, inaccurate causal statements are filtered out, ultimately yielding a higher-quality, more reliable set of causal relationships to guide the subsequent data analysis process.\n\n**4. Integration with Data-Driven Algorithms**\n\nA core insight of this framework is that although the LLM is prompted to output \"direct\" causal relationships, the knowledge inferred by the LLM is inherently closer to qualitative, indirect causal relationships. Therefore, the causal statements obtained through the three-stage process are not directly treated as the final causal graph.\n\n* **Ancestral Constraints:** The framework converts the LLM's revised causal statements (e.g., A causes B) into ancestral constraints. This means that in the final causal graph, there must exist a directed path from A to B, but not necessarily a direct edge.\n* **Hard and Soft Approaches:** These ancestral constraints are then integrated into score-based causal structure learning algorithms. Researchers can choose different integration strategies based on their level of confidence in the LLM's prior knowledge:\n    * **Hard Constraint Approach:** Strictly enforces that the final causal graph must satisfy all ancestral constraints provided by the LLM, narrowing the search space through methods like pruning.\n    * **Soft Constraint Approach:** Incorporates the LLM's prior knowledge as part of the scoring function. This method allows for discarding some prior constraints if there is a strong conflict between the data and the prior knowledge, thus possessing a degree of fault tolerance.\n\n\n# Dictionary\n\n\n\n## Dataset Part\n\n#### `data_structure`\n\n- `{Dataset_name}`\n\n    - `{Dataset_name}_graph.txt`：The ground truth causal graph of the dataset variables.\n\n    - `{Dataset_name}.mapping`：Mapping of dataset variable names.\n\n#### `dataset`\n\n- `{data}`\n\n    - `{Dataset_name}`\n\n        - `{Dataset_name}_continues_{n}dsize_random{r}`：Synthetic datasets, n represents the ratio of dataset size to the number of variables, r represents random generation parameters.\n\n\n\n## LLM Part\n\n\n\n#### `prompt_design`\n\n-   `description`\n\n    -   `{Dataset_name}.json`：Explanation of variables in the dataset.\n\n-   `prompt_generation.py`：Generate the required prompt based on the content in the description.\n\n-   `prompt`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{Variable_name}.txt`：The actual prompt used.\n\n#### `LLM_query`\n\n-   `api.py`：Call the API to inquire about the causal relationships of dataset variables.\n\n-   `LLM_answer`\n\n    -   `{Dataset_name}`\n\n        -   `{LLM_name}`\n\n            -   `{Variable_name}.txt`：The causal relationships (causes and effects) of a specific variable in the dataset.\n\n\n\n## Causal Discovery Part\n\n\n\n#### `prior_knowledge`\n\n-   `knowledge_matrix_convert.py`：Clean the knowledge provided by the large model and convert it into matrix form.\n\n-   `LLM_knowledge`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{LLM_name}.txt`：Large model knowledge stored in matrix form.\n\n-   `generation_edge_prior.py`：Generate edge priors based on large model knowledge or the ground truth causal graph.\n\n-   `prior_based_on_LLM`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{LLM_name}.txt`：Edge prior matrix generated based on the large language model.\n\n-   `prior_based_on_ground_truth`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}.txt`：Edge prior matrix generated based on the ground truth causal graph.\n\n#### `src`\n\n-   `{method_name}.py`：Main method for causal discovery.\n\n#### `causal_discovery`\n\n-   `evalution.py`：Evaluation functions used during the training process.\n\n-   `preparation.py`：Preparatory work such as parameter setting.\n\n-   `main.py`：Main program.\n\n#### `out`\n\n-   `output.csv`：Display of various parameters and metrics of the model training results.\n\n-----\n\n# 🚀 Getting Started\n\nThis project can be used in two primary ways:\n\n1.  **For End-Users:** By downloading and running the pre-built, standalone application (recommended).\n2.  **For Developers:** By cloning the source code and setting up the local development environment.\n\n-----\n\n## For End-Users (Recommended Method)\n\nThe easiest way to use this system is to download the latest pre-built version for Linux. This does not require you to install Python or any dependencies manually, except for two system tools.\n\nBelow is an overview of the application's user interface.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_26b8211bb6af.png\" alt=\"system_1\" width=\"500\" align=\"right\" hspace=\"15\" vspace=\"5\">\n\nThe main modules of the interface are as follows:\n\n1.  **Data Upload:** Allows for uploading and reading local `.csv` files.\n2.  **Prior Knowledge Input:** A dedicated area to input expert knowledge, which will be used as priors for the algorithms.\n3.  **Algorithm Selection & Configuration:** Choose from 4 implemented algorithms (with three more currently in development). After selecting an algorithm, you can configure its parameters. *Please note that not all parameters may be fully functional in the current development phase.*\n4.  **LLM Integration:** Use a Large Language Model (LLM) to generate prior knowledge as an alternative to human experts (effective for specific methods only). This section also provides a direct chat interface for asking the LLM questions.\n\n-----\n\nNow, follow these steps to install and run the application:\n\n**1. Install Prerequisites**\n\nThis application requires `graphviz` (for graph rendering) and `unrar` (for extracting the downloaded file). Please install them first:\n\n```bash\n# For Debian\u002FUbuntu-based systems\nsudo apt-get update && sudo apt-get install graphviz unrar\n```\n\n**2. Download the Application**\n\nGo to the project's **[GitHub Releases Page](https:\u002F\u002Fwww.google.com\u002Fsearch?q=https:\u002F\u002Fgithub.com\u002FWXY604\u002FLLM-based-causal-discovery\u002Freleases)** to find the latest version.\n\nDownload the `.rar` archive (e.g., `Causal.Discovery.System.part1.rar`) from the \"Assets\" section.\n\n**3. Extract and Run**\n\nOpen a terminal and run the following commands:\n\n```bash\n# 1. Extract the RAR archive. Quotes are important due to spaces in the filename.\nunrar x \"Causal Discovery System.part1.rar\"\n\n# 2. Navigate into the new directory\ncd CD\n\n# 3. Grant execute permissions to the main program\nchmod +x CD\n\n# 4. Run the application\n.\u002FCD\n```\n\nThe program will start in the background and automatically open the user interface in your default web browser. To stop the application, press `Ctrl + C` in the terminal.\n\n-----\n\n## For Developers (From Source)\n\nIf you wish to run the scripts directly, modify the code, or contribute to the project, follow these instructions to set up a local development environment.\n\n#### Installation Guide\n\nThis guide provides step-by-step instructions for downloading the `LLM-based-causal-discovery` project and setting up its local environment.\n\n**Step 1: Clone the GitHub Repository**\n\nOpen your terminal and use the `git clone` command to download the project source code.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FWXY604\u002FLLM-based-causal-discovery.git\n```\n\nThis will create a folder named `LLM-based-causal-discovery`. Navigate into this new folder:\n\n```bash\ncd LLM-based-causal-discovery\n```\n\n**Step 2: Create and Activate a Python Virtual Environment**\n\nUsing a virtual environment is highly recommended to keep project dependencies isolated.\n\n```bash\n# Create the virtual environment (we'll name it venv):\npython -m venv venv\n\n# Activate the virtual environment:\n# On Windows:\n.\\venv\\Scripts\\activate\n# On macOS and Linux:\nsource venv\u002Fbin\u002Factivate\n```\n\nOnce activated, you will see `(venv)` at the beginning of your terminal prompt.\n\n**Step 3: Install Project Dependencies**\n\nAll required Python libraries are listed in the `requirements.txt` file. Make sure your virtual environment is activated, then run:\n\n```bash\npip install -r requirements.txt\n```\n\nThis process may take a few moments.\n\n#### Runbook\n\nThis guide provides detailed instructions for running the causal discovery process in different scenarios from the source code.\n\n**Scenario 1: Running Causal Discovery Algorithm Only**\n\nIf you already have all the pre-processed data and prior knowledge, you can run the main program directly.\n\n```bash\npython tools\u002Fcausal_discovery\u002Fmain.py\n```\n\n**Scenario 2: Causal Discovery from a Raw Dataset**\n\nThis workflow guides you through the entire process, from downloading a raw dataset to completing the discovery.\n\n1.  **Prepare Ground Truth and Mapping Files:** Place these files into `data_structure\u002F{Dataset_name}\u002F`.\n2.  **Prepare the Dataset Data File:** Place your `.csv` or `.txt` file into `dataset\u002Fdata\u002F{Dataset_name}\u002F`.\n3.  **Generate the Prior Matrix:** Run the script to generate the edge prior matrix from the ground truth.\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py\n    ```\n4.  **Run the Main Program:**\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fmain.py\n    ```\n\n**Scenario 3: LLM-Assisted Causal Discovery**\n\nThis workflow uses external knowledge from a Large Language Model (LLM) as a prior.\n\n1.  **Prepare Data:** Follow steps 1 & 2 from Scenario 2.\n2.  **Obtain the LLM Knowledge Matrix:** Use the tools in `LLM_query` (or your own methods) and format the results as specified in `LLM_knowledge\u002F{Dataset_name}\u002F{Dataset_name}_{LLM_name}.txt`.\n3.  **Generate Prior Matrix from LLM Knowledge:** First, **modify** the `tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py` script to switch its logic to generate the prior based on LLM knowledge. After modifying, run it:\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py\n    ```\n4.  **Run the Main Program:**\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fmain.py\n    ```\n\n","# LLM 增强的因果发现工具包：技术介绍\n\n## **背景与挑战**\n\n从观测数据中推断因果关系是数据科学及相关研究领域的核心挑战。传统的因果发现方法严重依赖领域专家的先验知识进行指导。然而，获取此类知识往往涉及大量的时间和金钱成本，这在很大程度上限制了高级因果发现技术的应用范围。\n\n## **大语言模型（LLM）的机会与挑战**\n\n大语言模型（LLM）的出现为获取先验知识提供了新的可能性。通过向 LLM 查询变量之间的关系，研究人员可以获得接近专家水平的判断。这种方法的优势在于显著降低了知识获取的成本。此外，在某些场景下，LLM 提供的知识可能比非专业人士的判断更为客观。\n\n然而，这种新范式也伴随着挑战。LLM 具有内在的不稳定性；针对同一主题的查询可能会返回不一致甚至自相矛盾的结果。直接将这种不准确或内部矛盾的信息作为先验使用，不仅无法提高模型性能，还可能对最终分析的准确性产生负面影响。\n\n## **我们的方法与工具包核心功能**\n\n为了解决这一挑战，我们开发了此工具包。它旨在充分利用 LLM 强大的知识库，同时系统性地减轻与其不稳定性相关的风险。\n\n我们的核心方法受最新研究成果启发：与直接要求 LLM 判断抽象的因果关系相比，引导 LLM 确定事件的具体时间顺序能产生更可靠和稳定的输出。\n\n基于此，该工具包的核心功能包括：\n\n  * **结构化知识提取**：在用户定义研究场景和变量后，工具包自动生成结构化查询以引导 LLM，高效提取关于变量时间顺序的高置信度信息。\n  * **模型输出的整合与优化**：工具包包含内置的分析机制来处理 LLM 返回的初始信息。它系统地整合这些潜在不一致的局部判断，旨在优化出更全局一致且可靠的变量排序。\n  * **与下游算法的兼容性**：工具包生成的优化后的时间先验可作为高质量约束，灵活应用于各种主流因果发现算法，帮助从现实世界数据中构建更准确、鲁棒的因果结构。\n\n\n## **框架概述**\n\n\n![Figure1.](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_f5bf9f888ff5.png)\n\n上述框架图提供了对整个工作流程更直观的理解。\n\n整个过程可总结为几个高层级阶段：\n\n  * **阶段 1：初始知识生成（偏序生成）**\n\n      * 在此阶段，我们通过场景模拟和元数据输入向 LLM 发起结构化查询。\n      * 目标是获得模型关于变量时间序列的初步、离散判断。\n\n  * **阶段 2：知识整合与优化（冲突分解与最优全序发现）**\n\n      * 这是过程中的关键步骤。工具包系统分析从 LLM 获得的所有初步判断。\n      * 它整合这些分散且潜在不一致的局部信息，旨在优化出更全局一致且可靠的变量排序。\n\n  * **阶段 3：指导下游分析（基于顺序的因果性）**\n\n      * 最后，这个优化后的全局排序作为高质量的先验知识。\n      * 它可以输入到用户选择的任何标准因果发现算法中，作为强有力的外部指南，帮助算法更准确地收敛于真实数据以推断最终的因果图。\n\n简而言之，该框架的核心是通过一系列系统步骤，将 LLM 提供的潜在模糊、矛盾且局部的知识转化为清晰可靠的全局变量排序，从而为数据驱动的因果学习提供有效支持。\n\n\n\n## **结论与展望**\n\n我们期望此工具包能为从事因果科学研究和实践的专业人士提供有效支持，帮助用户更方便、可靠地利用大语言模型的强大知识源。我们相信这种方法为实现稳定且具有成本效益的因果发现提供了有价值的技术方向，并期待与学术界和产业界合作推动该领域的进一步发展。\n\n\n# 可使用几种基于 LLM 的方法\n\n本节介绍了一些使用大语言模型发现和生成先验的方法，供参考。\n\n## 基于协调先验的 LLM 驱动因果发现\n\n该框架的核心思想是利用 LLM（大型语言模型）作为知识专家。通过专门设计的 Prompting Strategies（提示策略），引导 LLM 从两个不同但互补的角度进行因果推理，生成可靠的“协调先验（Harmonized Prior）”。随后，将此协调先验集成到主流的因果结构学习算法中，以提高从数据中发现因果关系的准确性和可靠性。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_b426bc1faa6d.png\" alt=\"Harmonized_1\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\n框架的整体逻辑流程如左图所示，主要包含三个部分：双专家 LLM 推理、协调先验构建以及即插即用结构学习。\n\n**1. 双专家 LLM 推理模块**\n为了确保 LLM 提供的因果知识的准确性，该框架不让 LLM 直接判断所有变量对之间的复杂关系。相反，它将 LLM 配置为专注于特定任务的两种不同专家角色：保守型专家（Conservative Expert）和探索型专家（Exploratory Expert）。\n\n* **保守型专家（Conservative Expert）- 旨在追求精度**\n    * 如左图所示，保守型专家的目标是识别最明确且可靠的因果关系。\n    * 它首先使用“单步推理”快速筛选置信度最高的因果对。\n    * 随后，它采用“分解与验证”策略，仔细逐一验证和确认这些选定的关系，以过滤掉潜在的虚假关联。\n    * 最终输出是一组高精度的因果关系集合 $\\lambda_p$，用作“路径存在约束（Path Existence Constraint）”。也就是说，如果 $(A,B)$ 在此集合中，则认为在真实因果图中存在从 A 到 B 的路径。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_90563a1040b6.png\" alt=\"Harmonized_2\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\n* **探索型专家（Exploratory Expert）- 旨在追求召回率**\n    * 如左图所示，探索型专家的目标是尽可能全面地识别所有潜在的因果链接。\n    * 该模块以每个变量为中心，逐一分析数据集中哪些其他变量可能是其直接原因。\n    * 通过这种“分解与探索”方法，它为每个变量生成一个“可能原因”列表 $C(x_i)$。\n    * 所有这些可能原因被聚合为一个高召回率的因果关系集合 $\\lambda_r$。该集合用于定义“边缺失约束（Edge Absence Constraint）”，意味着如果因果关系 $(A,B)$ 未出现在此集合中，则在最终因果图中禁止生成从 A 到 B 的直接边。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_a126b2478f8b.png\" alt=\"Harmonized_3\" width=\"300\" align=\"left\" hspace=\"15\" vspace=\"5\">\n\n**2. 协调先验构建**\n该框架融合了上述两位专家输出的因果知识，以构建统一的“协调先验（Harmonized Prior）”。此协调先验结合了双方的优势：\n\n* **路径存在约束（Path Existence Constraint）：** 利用保守型专家输出的高精度因果关系 $\\lambda_p$。\n* **边缺失约束（Edge Absence Constraint）：** 利用探索型专家输出的高召回率因果关系 $\\lambda_r$ 来定义可能直接边的范围。\n\n这样，它不仅确保了强因果信号不被丢失，还通过排除大量不可能的因果连接有效限制了结构学习的搜索空间，从而提高整体准确性。\n\n**3. 与结构学习算法的集成**\n最后，这个构建好的“协调先验”以“即插即用（Plug-and-Play）”的方式集成到各种主流因果结构学习算法中，如图 1 顶部框所示。无论是基于分数（Score-based）、基于约束（Constraint-based）还是基于梯度（Gradient-based）的方法，都可以利用这一协调先验来指导其搜索过程，最终从观测数据（Observational Data）中学习到一个更准确可靠的因果图。\n\n## 从查询工具到因果架构师\n\n本节将介绍另一种利用大语言模型（Large Language Models, LLMs）进行因果发现的框架。与上述方法不同，该框架采用三阶段顺序提示流程，旨在从 LLM 中提取和修订因果知识，然后利用这些知识作为先验信息来指导传统的数据驱动因果结构学习算法。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_2ffb0883d94f.png\" alt=\"Query_1\" width=\"500\" align=\"right\" hspace=\"15\" vspace=\"5\">\n\n整个框架的核心逻辑可以通过右侧表格中的示例提示来理解，它清晰地展示了三个核心阶段：变量理解、因果发现和错误修正。\n\n**1. 第一阶段：变量理解**\n\n此阶段的目标是首先让 LLM 准确理解数据集中每个变量的现实世界含义。\n\n*   **输入：** 研究人员向 LLM 提供每个变量的符号及其可能的取值。这通常是标准数据集中最基本的可用信息。\n*   **示例提示（\"Prompt Understand\"）：** 如表 1 所示，该提示要求 LLM 充当特定领域的专家，并根据其符号和值解释每个变量的含义。\n*   **输出：** LLM 为每个变量生成详细的文本描述，为后续的因果推断奠定基础。\n\n**2. 第二阶段：因果发现**\n\n在充分理解变量含义的基础上，要求 LLM 识别它们之间的因果关系。\n\n*   **示例提示（\"Prompt Causal Discovery\"）：** 本阶段的提示要求 LLM 分析变量间的因果效应，并以有向图网络的形式输出结果。\n*   **关键要求：** 此处的提示明确强调图中的每条边必须代表两个变量之间的**直接因果关系**。\n\n**3. 第三阶段：错误修正**\n\n为了增强 LLM 输出的可靠性，该框架引入了一个自我修正阶段，提示 LLM 检查并纠正其之前生成的结论。\n\n*   **示例提示（\"Prompt Revision\"）：** 第二阶段生成的因果陈述（例如 $x_i \\rightarrow x_j$）被用作输入，反过来询问 LLM 这些陈述是否正确，并要求其提供理由。\n*   **目标：** 通过这种自检机制，不准确的因果陈述会被过滤掉，最终产生一套更高质量、更可靠的因果关系集合，以指导后续的数据分析过程。\n\n**4. 与数据驱动算法的集成**\n\n该框架的一个核心见解是，虽然 LLM 被提示输出“直接”的因果关系，但 LLM 推断的知识本质上更接近于定性的、间接的因果关系。因此，通过三阶段过程获得的因果陈述并不直接被视为最终的因果图。\n\n*   **祖先约束：** 框架将 LLM 修订后的因果陈述（例如 A 导致 B）转换为祖先约束。这意味着在最终的因果图中，必须存在一条从 A 到 B 的有向路径，但不一定是一条直接的边。\n*   **硬约束与软约束方法：** 这些祖先约束随后被集成到基于分数的因果结构学习算法中。研究人员可以根据对 LLM 先验知识的信心程度选择不同的集成策略：\n    *   **硬约束方法：** 严格强制执行最终因果图必须满足 LLM 提供的所有祖先约束，通过剪枝等方法缩小搜索空间。\n    *   **软约束方法：** 将 LLM 的先验知识作为评分函数的一部分纳入其中。这种方法允许在数据与先验知识之间存在强烈冲突时丢弃一些先验约束，因此具有一定的容错能力。\n\n\n# 项目目录\n\n\n\n## 数据集部分\n\n#### `data_structure`\n\n- `{Dataset_name}`\n\n    - `{Dataset_name}_graph.txt`：数据集变量的真实因果图。\n\n    - `{Dataset_name}.mapping`：数据集变量名称的映射。\n\n#### `dataset`\n\n- `{data}`\n\n    - `{Dataset_name}`\n\n        - `{Dataset_name}_continues_{n}dsize_random{r}`：合成数据集，n 代表数据集大小与变量数量的比率，r 代表随机生成参数。\n\n\n\n## LLM 部分\n\n\n\n#### `prompt_design`\n\n-   `description`\n\n    -   `{Dataset_name}.json`：数据集中变量的解释。\n\n-   `prompt_generation.py`：根据 description 中的内容生成所需的提示词。\n\n-   `prompt`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{Variable_name}.txt`：实际使用的提示词。\n\n#### `LLM_query`\n\n-   `api.py`：调用 API 查询数据集变量的因果关系。\n\n-   `LLM_answer`\n\n    -   `{Dataset_name}`\n\n        -   `{LLM_name}`\n\n            -   `{Variable_name}.txt`：数据集中特定变量的因果关系（原因和结果）。\n\n\n\n## 因果发现部分\n\n\n\n#### `prior_knowledge`\n\n-   `knowledge_matrix_convert.py`：清理大模型提供的知识并将其转换为矩阵形式。\n\n-   `LLM_knowledge`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{LLM_name}.txt`：以矩阵形式存储的大模型知识。\n\n-   `generation_edge_prior.py`：基于大模型知识或真实因果图生成边先验。\n\n-   `prior_based_on_LLM`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}_{LLM_name}.txt`：基于大语言模型生成的边先验矩阵。\n\n-   `prior_based_on_ground_truth`\n\n    -   `{Dataset_name}`\n\n        -   `{Dataset_name}.txt`：基于真实因果图生成的边先验矩阵。\n\n#### `src`\n\n-   `{method_name}.py`：因果发现的主要方法。\n\n#### `causal_discovery`\n\n-   `evalution.py`：训练过程中使用的评估函数。\n\n-   `preparation.py`：参数设置等准备工作。\n\n-   `main.py`：主程序。\n\n#### `out`\n\n-   `output.csv`：显示模型训练结果的各个参数和指标。\n\n-----\n\n# 🚀 快速开始\n\n本项目主要有两种使用方式：\n\n1.  **面向终端用户：** 下载并运行预构建的独立应用程序（推荐）。\n2.  **面向开发者：** 克隆源代码并设置本地开发环境。\n\n## 面向最终用户（推荐方法）\n\n使用本系统最简单的方法是下载适用于 Linux 的最新预编译版本。除了两个系统工具外，这不需要您手动安装 Python 或任何依赖项。\n\n以下是应用程序用户界面的概述。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_readme_26b8211bb6af.png\" alt=\"system_1\" width=\"500\" align=\"right\" hspace=\"15\" vspace=\"5\">\n\n界面的主要模块如下：\n\n1.  **数据上传：** 允许上传和读取本地的 `.csv` 文件。\n2.  **先验知识输入：** 一个专门区域用于输入专家知识，将作为算法的先验 (prior)。\n3.  **算法选择与配置：** 从已实现的 4 种算法中选择（另有三种正在开发中）。选择算法后，您可以配置其参数。*请注意，在当前开发阶段，并非所有参数都能完全生效。*\n4.  **大语言模型集成：** 使用大语言模型 (Large Language Model, LLM) 生成先验知识，作为人类专家的替代方案（仅对特定方法有效）。此部分还提供直接与大语言模型对话的界面以提问。\n\n-----\n\n现在，请按照以下步骤安装并运行应用程序：\n\n**1. 安装前置条件**\n\n本应用程序需要 `graphviz`（用于图形渲染）和 `unrar`（用于解压下载的文件）。请先安装它们：\n\n```bash\n# For Debian\u002FUbuntu-based systems\nsudo apt-get update && sudo apt-get install graphviz unrar\n```\n\n**2. 下载应用程序**\n\n前往项目的 **[GitHub Releases Page]**(https:\u002F\u002Fwww.google.com\u002Fsearch?q=https:\u002F\u002Fgithub.com\u002FWXY604\u002FLLM-based-causal-discovery\u002Freleases) 查找最新版本。\n\n从“Assets\"部分下载 `.rar` 压缩包（例如 `Causal.Discovery.System.part1.rar`）。\n\n**3. 解压并运行**\n\n打开终端并运行以下命令：\n\n```bash\n# 1. Extract the RAR archive. Quotes are important due to spaces in the filename.\nunrar x \"Causal Discovery System.part1.rar\"\n\n# 2. Navigate into the new directory\ncd CD\n\n# 3. Grant execute permissions to the main program\nchmod +x CD\n\n# 4. Run the application\n.\u002FCD\n```\n\n程序将在后台启动，并自动在默认 Web 浏览器中打开用户界面。要停止应用程序，请在终端中按 `Ctrl + C`。\n\n-----\n\n## 面向开发者（从源代码）\n\n如果您希望直接运行脚本、修改代码或为项目做出贡献，请遵循以下说明来设置本地开发环境。\n\n#### 安装指南\n\n本指南提供逐步说明，用于下载 `LLM-based-causal-discovery` 项目并设置其本地环境。\n\n**步骤 1：克隆 GitHub 仓库**\n\n打开终端并使用 `git clone` 命令下载项目源代码。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FWXY604\u002FLLM-based-causal-discovery.git\n```\n\n这将创建一个名为 `LLM-based-causal-discovery` 的文件夹。进入此新文件夹：\n\n```bash\ncd LLM-based-causal-discovery\n```\n\n**步骤 2：创建并激活 Python 虚拟环境**\n\n强烈建议使用虚拟环境 (Virtual Environment) 以保持项目依赖项隔离。\n\n```bash\n# Create the virtual environment (we'll name it venv):\npython -m venv venv\n\n# Activate the virtual environment:\n# On Windows:\n.\\venv\\Scripts\\activate\n# On macOS and Linux:\nsource venv\u002Fbin\u002Factivate\n```\n\n激活后，您将看到终端提示符开头出现 `(venv)`。\n\n**步骤 3：安装项目依赖项**\n\n所有必需的 Python 库都列在 `requirements.txt` 文件中。确保您的虚拟环境已激活，然后运行：\n\n```bash\npip install -r requirements.txt\n```\n\n此过程可能需要几分钟。\n\n#### 运行手册\n\n本指南提供详细的说明，用于从源代码在不同场景下运行因果发现 (Causal Discovery) 流程。\n\n**场景 1：仅运行因果发现算法**\n\n如果您已经拥有所有预处理数据和先验知识，可以直接运行主程序。\n\n```bash\npython tools\u002Fcausal_discovery\u002Fmain.py\n```\n\n**场景 2：从原始数据集进行因果发现**\n\n此工作流引导您完成整个过程，从下载原始数据集到完成发现。\n\n1.  **准备真实值 (Ground Truth) 和映射文件：** 将这些文件放入 `data_structure\u002F{Dataset_name}\u002F`。\n2.  **准备数据集数据文件：** 将您的 `.csv` 或 `.txt` 文件放入 `dataset\u002Fdata\u002F{Dataset_name}\u002F`。\n3.  **生成先验矩阵：** 运行脚本以从真实值生成边先验矩阵。\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py\n    ```\n4.  **运行主程序：**\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fmain.py\n    ```\n\n**场景 3：大语言模型辅助的因果发现**\n\n此工作流使用来自大语言模型 (LLM) 的外部知识作为先验。\n\n1.  **准备数据：** 遵循场景 2 的步骤 1 和 2。\n2.  **获取 LLM 知识矩阵：** 使用 `LLM_query` 中的工具（或您自己的方法），并按照 `LLM_knowledge\u002F{Dataset_name}\u002F{Dataset_name}_{LLM_name}.txt` 中的指定格式整理结果。\n3.  **从 LLM 知识生成先验矩阵：** 首先，**修改** `tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py` 脚本，将其逻辑切换为基于 LLM 知识生成先验。修改后，运行它：\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fprior_knowledge\u002Fgeneration_edge_prior.py\n    ```\n4.  **运行主程序：**\n    ```bash\n    python tools\u002Fcausal_discovery\u002Fmain.py\n    ```","# LLM-based-causal-discovery 快速上手指南\n\n## 环境准备\n在使用本工具前，请确保您的开发环境满足以下要求：\n- **操作系统**：Linux \u002F macOS \u002F Windows\n- **Python 版本**：3.8 及以上\n- **依赖项**：Git、网络访问权限（用于连接 LLM API）\n- **硬件建议**：具备 GPU 加速环境可提升数据处理效率（视具体算法而定）\n\n## 安装步骤\n1. **克隆项目代码**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002F[repo-owner]\u002FLLM-based-causal-discovery.git\n   cd LLM-based-causal-discovery\n   ```\n\n2. **创建虚拟环境并安装依赖**\n   建议使用国内镜像源以加速下载过程：\n   ```bash\n   python -m venv venv\n   source venv\u002Fbin\u002Factivate  # Windows 用户使用：venv\\Scripts\\activate\n   pip install -r requirements.txt --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n## 基本使用\n本工具的核心目标是将 LLM 提供的潜在模糊或矛盾的知识，转化为清晰可靠的变量全局排序，从而辅助因果发现算法。主要工作流程如下：\n\n### 1. 初始化与知识生成\n- **定义场景**：在配置文件中设定研究场景及变量列表。\n- **结构化查询**：系统自动生成针对 LLM 的查询指令，引导其判断变量间的时序关系（Temporal Order）。\n\n### 2. 知识整合与优化\n- **冲突处理**：自动分析 LLM 返回的多轮次、局部性判断。\n- **全局排序**：通过冲突分解与最优全序发现机制，生成一致且可靠的变量顺序。\n\n### 3. 下游算法集成\n将优化后的时序先验作为约束条件，输入至主流因果发现算法中。工具支持多种基于 LLM 的先验构建方法，例如：\n\n- **Harmonized Prior（和谐先验）**\n  采用双专家模式：\n  - **保守专家**：筛选高精度因果关系（路径存在约束）。\n  - **探索专家**：全面识别潜在因果链（边缺失约束）。\n  两者融合后用于指导结构学习。\n\n- **Query Tools to Causal Architects（从查询工具到因果架构师）**\n  采用三阶段提示流程：\n  1. **变量理解**：让 LLM 解释变量含义。\n  2. **因果发现**：输出直接因果关系图。\n  3. **错误修正**：自我检查并修正因果陈述。\n\n### 运行示例\n根据上述框架配置参数后，执行主程序即可开始分析：\n```bash\npython main.py --config config.yaml\n```\n*(注：具体命令参数请参考项目根目录下的详细文档)*","某医疗数据分析团队正在研究手术并发症与多种生理指标间的因果链条，旨在优化术后护理方案并预测高风险患者。\n\n### 没有 LLM-based-causal-discovery 时\n- 依赖领域专家逐一定义变量先后顺序，沟通成本极高且难以在短时间内覆盖所有潜在医学因素。\n- 传统因果发现算法缺乏先验知识约束，极易从观测数据中推断出错误的因果方向，误导临床决策。\n- 面对海量历史病历数据，人工筛选关键路径效率低下，导致整个建模项目的迭代周期长达数月之久。\n- 不同专家的主观判断可能存在偏差，相互矛盾的先验信息直接输入模型会严重降低最终结果的可靠性。\n\n### 使用 LLM-based-causal-discovery 后\n- LLM-based-causal-discovery 通过结构化查询引导大模型自动生成变量时间顺序，大幅降低了获取高置信度先验知识的门槛。\n- 内置的分析机制有效整合了 LLM 多轮反馈中的不一致信息，输出了全局一致且可靠的变量排序。\n- 将优化后的时序作为高质量约束导入下游算法，显著提升了在处理复杂医疗数据时的因果推断准确率。\n- 项目交付周期从数月缩短至一周，快速验证了多个潜在风险因子的因果关系，加速了研究成果落地。\n\nLLM-based-causal-discovery 通过低成本获取稳定先验知识，让因果推断在真实场景中既快又准。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWXY604_LLM-based-causal-discovery_7994c7db.png","WXY604","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FWXY604_45a2be17.png","https:\u002F\u002Fgithub.com\u002FWXY604",[81],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,839,61,"2026-01-18T11:04:35","未说明",{"notes":90,"python":88,"dependencies":91},"提供的文档内容侧重于技术原理、框架流程（如三阶段工作流、和谐先验构建）及方法论介绍，未包含具体的安装步骤、系统环境配置或依赖库版本信息。实际部署时需查阅项目完整仓库中的 requirements.txt 或 setup.py 文件。",[],[15,37],"2026-03-27T02:49:30.150509","2026-04-06T08:42:15.092533",[],[97],{"id":98,"version":99,"summary_zh":100,"released_at":101},111492,"CausalDiscoverySystem","### **Version v1.0-linux - Initial Release**\r\n\r\nThis is the first stable release of the project for the Linux platform. It is packaged as a standalone executable for ease of use.\r\n\r\n#### **Usage Instructions**\r\n\r\n1.  **Download**\r\n    Download the `.zip` archive from the **Assets** section below.\r\n\r\n2.  **Unzip and Run**\r\n    Open a terminal and execute the following commands:\r\n\r\n    ```bash\r\n    # Unzip the downloaded file\r\n    unzip Causal Discovery System.part1.rar\r\n\r\n    # Navigate into the extracted directory\r\n    cd CD\r\n\r\n    # Grant execute permissions to the main program\r\n    chmod +x CD\r\n\r\n    # Run the program\r\n    .\u002FCD\r\n    ```\r\n\r\n3.  **Getting Started**\r\n    After execution, the program will start in the background and automatically open the application interface in your default web browser.\r\n\r\n#### **Prerequisites**\r\n\r\n  * This application depends on **Graphviz** for graph rendering. Please ensure it is installed on your Linux system (e.g., `sudo apt-get install graphviz`).\r\n  * Do not close the terminal window from which you ran the program, as this will terminate the server. To stop the application, press `Ctrl + C` in the terminal.","2025-06-28T10:09:29"]