[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-The-FinAI--PIXIU":3,"tool-The-FinAI--PIXIU":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":10,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":113,"github_topics":114,"view_count":130,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":131,"updated_at":132,"faqs":133,"releases":164},1171,"The-FinAI\u002FPIXIU","PIXIU","This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).","PIXIU 是一个开源项目，专注于金融领域的人工智能技术，提供了首个金融大语言模型、指令调优数据集以及评估基准，旨在全面评估金融领域的语言模型。它解决了金融领域AI模型缺乏统一标准和高质量训练数据的问题，帮助研究人员和开发者提升模型在金融场景中的表现。适合金融研究者、AI开发者以及对金融科技感兴趣的专业人士使用。PIXIU 在金融文本理解、数据分析和自动化处理等方面具有独特优势，是推动金融AI开放发展的有力工具。","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_400e2fc0f7cf.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fp>\n\u003Cdiv>\n\u003Cdiv align=\"left\">\n    \u003Ca target='_blank'>Qianqian Xie\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>Weiguang Han\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>Zhengyu Chen\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>Ruoyu Xiang\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Xiao Zhang\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yueru He\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Mengxi Xiao\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Dong Li\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yongfu Dai\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Duanyu Feng\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yijing Xu\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Haoqiang Kang\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Ziyan Kuang\u003Csup>12\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Chenhan Yuan\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Kailai Yang\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Zheheng Luo\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Tianlin Zhang\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Zhiwei Liu\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Guojun Xiong\u003Csup>10\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Zhiyang Deng\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yuechen Jiang\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Zhiyuan Yao\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Haohang Li\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yangyang Yu\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Gang Hu\u003Csup>8\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Jiajia Huang\u003Csup>11\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Xiao-Yang Liu\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fwarrington.ufl.edu\u002Fdirectory\u002Fperson\u002F12693\u002F' target='_blank'>Alejandro Lopez-Lira\u003Csup>4\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Benyou Wang\u003Csup>6\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Yanzhao Lai\u003Csup>13\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Hao Wang\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Min Peng\u003Csup>2*\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>Sophia Ananiadou\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='' target='_blank'>Jimin Huang\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr \u002F>\n\n\u003Cdiv align=\"left\">\n    \u003Csup>1\u003C\u002Fsup>The Fin AI&emsp;\n    \u003Csup>2\u003C\u002Fsup>Wuhan University&emsp;\n    \u003Csup>3\u003C\u002Fsup>The University of Manchester&emsp;\n    \u003Csup>4\u003C\u002Fsup>University of Florida&emsp;\n    \u003Csup>5\u003C\u002Fsup>Columbia University&emsp;\n    \u003Csup>6\u003C\u002Fsup>The Chinese University of Hong Kong, Shenzhen&emsp;\n    \u003Csup>7\u003C\u002Fsup>Sichuan University&emsp;\n    \u003Csup>8\u003C\u002Fsup>Yunnan University&emsp;\n    \u003Csup>9\u003C\u002Fsup>Stevens Institute of Technology&emsp;\n    \u003Csup>10\u003C\u002Fsup>Stony Brook University&emsp;\n    \u003Csup>11\u003C\u002Fsup>Nanjin Audit University&emsp;\n    \u003Csup>12\u003C\u002Fsup>Jiangxi Normal University&emsp;\n    \u003Csup>13\u003C\u002Fsup>Southwest Jiaotong University\n\u003C\u002Fdiv>\n\u003Cbr \u002F>\n\n\u003Cdiv align=\"left\">\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_52fc257eddd9.png' alt='Wuhan University Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_35f70ee618cd.png' alt='Manchester University Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_775c6a4653f0.jpg' alt='University of Florida Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fadmissions.ucr.edu\u002Fsites\u002Fdefault\u002Ffiles\u002Fstyles\u002Fform_preview\u002Fpublic\u002F2020-07\u002Fucr-education-logo-columbia-university.png?itok=-0FD6Ma2' alt='Columbia University Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_9b9890c8c729.png' alt='HK University (shenzhen) Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_5b3a4cb80605.png' alt='Sichuan University' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_8568559eb060.png' alt='Yunnan University' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_54b8c3717741.png' alt='Stevens Insititute of Technology' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_577e53e068f1.jpg' alt='Stony Brook University' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fen\u002F9\u002F9c\u002FNanjing_Audit_University_logo.png' alt='Nanjing Audit University' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fen\u002Fthumb\u002Fc\u002Fc5\u002FJiangxi_Normal_University.svg\u002F1200px-Jiangxi_Normal_University.svg.png' alt='Jiangxi Normal University' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_324a11a42ec7.png' alt='Southwest Jiaotong University Logo' height='50px'>&emsp;\n\u003C\u002Fdiv>\n\n-----------------\n\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpixiu-v0.1-gold)\n![](https:\u002F\u002Fblack.readthedocs.io\u002Fen\u002Fstable\u002F_static\u002Flicense.svg)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1146837080798933112)](https:\u002F\u002Fdiscord.gg\u002FHRWpUmKB)\n\n[Pixiu Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05443) | [FinBen Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffinosfoundation\u002FOpen-Financial-LLM-Leaderboard)\n\n**Disclaimer**\n\nThis repository and its contents are provided for **academic and educational purposes only**. None of the material constitutes financial, legal, or investment advice. No warranties, express or implied, are offered regarding the accuracy, completeness, or utility of the content. The authors and contributors are not responsible for any errors, omissions, or any consequences arising from the use of the information herein. Users should exercise their own judgment and consult professionals before making any financial, legal, or investment decisions. The use of the software and information contained in this repository is entirely at the user's own risk.\n\n**By using or accessing the information in this repository, you agree to indemnify, defend, and hold harmless the authors, contributors, and any affiliated organizations or persons from any and all claims or damages.**\n\n**📢 Update (Date: 09-22-2023)**\n\n🚀 We're thrilled to announce that our paper, \"PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance\", has been accepted by NeurIPS 2023 Track Datasets and Benchmarks!\n\n**📢 Update (Date: 10-08-2023)**\n\n🌏 We're proud to share that the enhanced versions of FinBen, which now support both Chinese and Spanish!\n\n**📢 Update (Date: 02-20-2024)**\n\n🌏 We're delighted to share that our paper, \"The FinBen: An Holistic Financial Benchmark for Large Language Models\", is now available at [FinBen](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.12659).\n\n**📢 Update (Date: 05-02-2024)**\n\n🌏 We're pleased to invite you to attend the IJCAI2024-challenge, \"Financial Challenges in Large Language Models - FinLLM\", the starter-kit is available at [Starter-kit](README.ijcai_challenge.md).\n\n**Checkpoints:** \n\n- [FinMA v0.1 (NLP 7B version)](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-nlp)\n- [FinMA v0.1 (Full 7B version)](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-full)\n\n**Languages**\n\n- [English](README.md)\n- [Spainish](README.es.md)\n- [Chinese](README.zh.md)\n\n**Papers**\n\n- [PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05443)\n- [The FinBen: An Holistic Financial Benchmark for Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.12659)\n- [No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.06249)\n- [Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.07405)\n\n**Evaluations**:\n\n- [English Evaluation Datasets](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fenglish-evaluation-dataset-658f515911f68f12ea193194) (More details on FinBen section)\n- [Spanish Evaluation Datasets](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fspanish-evaluation-datasets-65e5855900680b19bc83e03d)\n- [Chinese Evaluation Datasets](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fchinese-evaluation-datasets-65e5851af7daaa71c1c59902)\n\n> Sentiment Analysis\n\n- [FPB (en_fpb)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fen-fpb)\n- [FIQASA (flare_fiqasa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fen-fpb)\n- [FOMC (flare_fomc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fomc)\n- [SemEval-2017 Task5 (flare_tsa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-tsa)\n\n> Classification\n\n- [Headlines (flare_headlines)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-headlines)\n- [FinArg ECC Task1 (flare_finarg_ecc_auc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finarg-ecc-auc)\n- [FinArg ECC Task2 (flare_finarg_ecc_arc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finarg-ecc-arc)\n- [CFA (flare_cfa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-cfa)\n- [MultiFin EN (flare_multifin_en)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-multifin-en)\n- [M&A (flare_ma)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ma)\n- [MLESG EN (flare_mlesg)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-mlesg)\n\n> Knowledge Extraction\n\n- [NER (flare_ner)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ner)\n- [Finer Ord (flare_finer_ord)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finer-ord)\n- [FinRED (flare_finred)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finred)\n- [FinCausal20 Task1 (flare_causal20_sc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-causal20-sc)\n- [FinCausal20 Task2 (flare_cd)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-cd)\n\n> Number Understanding\n\n- [FinQA (flare_finqa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finqa)\n- [TATQA (flare_tatqa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-tatqa)\n- [FNXL (flare_fnxl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fnxl)\n- [FSRL (flare_fsrl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fsrl)\n\n> Text Summarization\n\n- [ECTSUM (flare_ectsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ectsum)\n- [EDTSUM (flare_edtsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-edtsum)\n\n> Credit Scoring\n\n- [German (flare_german)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-german)\n- [Australian (flare_australian)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-australian)\n- [Lendingclub (flare_cra_lendingclub)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-lendingclub)\n- [Credit Card Fraud (flare_cra_ccf)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-ccf)\n- [ccFraud (flare_cra_ccfraud)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-ccfraud)\n- [Polish (flare_cra_polish)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-polish)\n- [Taiwan Economic Journal (flare_cra_taiwan)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-taiwan)\n- [PortoSeguro (flare_cra_portoseguro)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-portoseguro)\n- [Travle Insurance (flare_cra_travelinsurance)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-travelinsurance) \n\n> Forecasting\n\n- [BigData22 for Stock Movement (flare_sm_bigdata)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-bigdata)\n- [ACL18 for Stock Movement (flare_sm_acl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-acl)\n- [CIKM18 for Stock Movement (flare_sm_cikm)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-cikm)\n\n## Overview\n\nWelcome to the **PIXIU** project! This project is designed to support the development, fine-tuning, and evaluation of Large Language Models (LLMs) in the financial domain. PIXIU is a significant step towards understanding and harnessing the power of LLMs in the financial domain.\n\n### Structure of the Repository\n\nThe repository is organized into several key components, each serving a unique purpose in the financial NLP pipeline:\n\n- **FinBen**: Our Financial Language Understanding and Prediction Evaluation Benchmark. FinBen serves as the evaluation suite for financial LLMs, with a focus on understanding and prediction tasks across various financial contexts.\n- **FIT**: Our Financial Instruction Dataset. FIT is a multi-task and multi-modal instruction dataset specifically tailored for financial tasks. It serves as the training ground for fine-tuning LLMs for these tasks.\n\n- **FinMA**: Our Financial Large Language Model (LLM). FinMA is the core of our project, providing the learning and prediction power for our financial tasks.\n\n### Key Features\n\n- **Open resources**: PIXIU openly provides the financial LLM, instruction tuning data, and datasets included in the evaluation benchmark to encourage open research and transparency.\n  \n- **Multi-task**: The instruction tuning data and benchmark in PIXIU cover a diverse set of financial tasks, including four financial NLP tasks and one financial prediction task.\n- **Multi-modality**: PIXIU's instruction tuning data and benchmark consist of multi-modality financial data, including time series data from the stock movement prediction task. It covers various types of financial texts, including reports, news articles, tweets, and regulatory filings.\n- **Diversity**: Unlike previous benchmarks focusing mainly on financial NLP tasks, PIXIU's evaluation benchmark includes critical financial prediction tasks aligned with real-world scenarios, making it more challenging.\n\n---\n\n## FinBen 2.0: Financial Language Understanding and Prediction Evaluation Benchmark\n\nIn this section, we provide a detailed performance analysis of FinMA compared to other leading models, including ChatGPT, GPT-4, and BloombergGPT et al. For this analysis, we've chosen a range of tasks and metrics that span various aspects of financial Natural Language Processing and financial prediction. All model results of FinBen can be found on our [leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTheFinAI\u002Fflare)!\n\n### Tasks\n\n| Data                  | Task                             | Raw    | Data Types                | Modalities        | License         | Paper |\n| --------------------- | -------------------------------- | ------ | ------------------------- | ----------------- | --------------- | ----- |\n| FPB                   | sentiment analysis               | 4,845  | news                      | text              | CC BY-SA 3.0    | [[1]](#1) |\n| FiQA-SA               | sentiment analysis               | 1,173  | news headlines, tweets    | text              | Public          | [[2]](#2) |\n| TSA | sentiment analysis | 561 | news headlines | text | CC BY-NC-SA 4.0 | [[3]](#3)       |\n| FOMC                  | hawkish-dovish classification    | 496    | FOMC transcripts          | text              | CC BY-NC 4.0 | [[4]](#4)       |\n| Headlines             | news headline classification     | 11,412 | news headlines            | text              | CC BY-SA 3.0    | [[5]](#5) |\n| FinArg-ECC-Task1      | argument unit classification     | 969    | earnings conference call  | text              | CC BY-NC-SA 4.0 | [[6]](#6) |\n| FinArg-ECC-Task2      | argument relation classification | 690    | earnings conference call  | text              | CC BY-NC-SA 4.0 | [[6]](#6) |\n| Multifin EN        | multi-class classification | 546 | article headlines | text          | Public | [[7]](#7) |\n| M&A                     | deal completeness classification  | 500    | news articles, tweets           | text              | Public          | [[8]](#8) |\n| MLESG EN                | ESG Issue Identification          | 300    | news articles                   | text              | CC BY-NC-ND     | [[9]](#9) |\n| NER                     | named entity recognition          | 1,366  | financial agreements            | text              | CC BY-SA 3.0    | [[10]](#10) |\n| Finer Ord             | named entity recognition         | 1,080  | news articles             | text              | CC BY-NC 4.0    | [[11]](#11) |\n| FinRED                | relation extraction              | 1,070  | earning call transcipts   | text              | Public          | [[12]](#12) |\n| FinCausual 2020 Task1 | causal classification            | 8,630  | news articles, SEC        | text              | CC BY 4.0       | [[13]](#13) |\n| FinCausual 2020 Task2 | causal detection                 | 226    | news articles, SEC        | text              | CC BY 4.0       | [[13]](#13) |\n| FinQA                 | question answering               | 8,281  | earnings reports          | text, table       | MIT License     | [[14]](#14) |\n| TatQA                 | question answering               | 1,670  | financial reports         | text, table       | MIT License     | [[15]](#15) |\n| FNXL                  | numeric labeling                 | 318    | SEC                       | text              | Public          | [[16]](#16) |\n| FSRL                  | token classification             | 97     | news articles             | text              | MIT License     | [[17]](#17) |\n| ECTSUM                | text summarization               | 495    | earning call transcipts   | text              | Public          | [[18]](#18) |\n| EDTSUM                | text summarization               | 2000   | news articles             | text              | Public          | [[19]](#19) |\n| German                | credit scoring                   | 1000   | credit records            | table             | CC BY 4.0       | [[20]](#20) |\n| Australian            | credit scoring                   | 690    | credit records            | table             | CC BY 4.0       | [[21]](#21) |\n| Lending Club | credit scoring | 1,3453 | financial information | table | CC0 1.0 | [[22]](#26to32) |\n| BigData22             | stock movement prediction        | 7,164  | tweets, historical prices | text, time series | Public          | [[23]](#23) |\n| ACL18                 | stock movement prediction        | 27,053 | tweets, historical prices | text, time series | MIT License     | [[24]](#24) |\n| CIKM18                | stock movement prediction        | 4,967  | tweets, historical prices | text, time series | Public          | [[25]](#25) |\n| ConvFinQA             | multi-turn question answering    | 1,490  | earnings reports          | text, table       | MIT License     | [[26]](#26) |\n| Credit Card Fraud     | Fraud Detection                  | 11,392 | financial information     | table             | (DbCL) v1.0     | [[22]](#26to32) |\n| ccFraud               | Fraud Detection                  | 10,485 | financial information     | table             | Public          | [[22]](#26to32) |\n| Polish                | Financial Distress Identification| 8,681  | financial status features | table             | CC BY 4.0       | [[22]](#26to32) |\n|Taiwan Economic Journal| Financial Distress Identification| 6,819  | financial status features | table             | CC BY 4.0       | [[22]](#26to32) |\n| PortoSeguro           | Claim Analysis                   | 11,904 | claim and financial information | table             | Public          | [[22]](#26to32) |\n| Travel Insurance      | Claim Analysis                   | 12,665 | claim and financial information | table             | (ODbL) v1.0     | [[22]](#26to32) |\n\n\n\n\u003Cspan id=\"1\">1.\u003C\u002Fspan> Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65, 4 (2014), 782–796.\n\n\u003Cspan id=\"2\">2.\u003C\u002Fspan> Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. Www’18 open challenge: financial opinion mining and question answering. In Companion proceedings of the the web conference 2018. 1941–1942.\n\n\u003Cspan id=\"3\">3.\u003C\u002Fspan> Keith Cortis, André Freitas, Tobias Daudert, Manuela Huerlimann, Manel Zarrouk, Siegfried Handschuh, and Brian Davis. 2017. [SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News](https:\u002F\u002Faclanthology.org\u002FS17-2089). In *Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)*, pages 519–535, Vancouver, Canada. Association for Computational Linguistics.\n\n\u003Cspan id=\"4\">4.\u003C\u002Fspan> Agam Shah, Suvan Paturi, and Sudheer Chava. 2023. [Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis](https:\u002F\u002Faclanthology.org\u002F2023.acl-long.368). In *Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)*, pages 6664–6679, Toronto, Canada. Association for Computational Linguistics.\n\n\u003Cspan id=\"5\">5.\u003C\u002Fspan> Ankur Sinha and Tanmay Khandait. 2021. Impact of news on the commodity market: Dataset and results. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2. Springer, 589–601.\n\n\u003Cspan id=\"6\">6.\u003C\u002Fspan> Chen C C, Lin C Y, Chiu C J, et al. [Overview of the NTCIR-17 FinArg-1 Task: Fine-grained argument understanding in financial analysis](https:\u002F\u002Fresearch.nii.ac.jp\u002Fntcir\u002Fworkshop\u002FOnlineProceedings17\u002Fpdf\u002Fntcir\u002F01-NTCIR17-OV-FINARG-ChenC.pdf)[C]\u002F\u002FProceedings of the 17th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan. 2023.\n\n\u003Cspan id=\"7\">7.\u003C\u002Fspan> Rasmus Jørgensen, Oliver Brandt, Mareike Hartmann, Xiang Dai, Christian Igel, and Desmond Elliott. 2023. [MultiFin: A Dataset for Multilingual Financial NLP](https:\u002F\u002Faclanthology.org\u002F2023.findings-eacl.66). In *Findings of the Association for Computational Linguistics: EACL 2023*, pages 894–909, Dubrovnik, Croatia. Association for Computational Linguistics.\n\n\u003Cspan id=\"8\">8.\u003C\u002Fspan> Yang, L., Kenny, E.M., Ng, T.L., Yang, Y., Smyth, B., & Dong, R. (2020). [Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.12512) *International Conference on Computational Linguistics*.\n\n\u003Cspan id=\"9\">9.\u003C\u002Fspan> Chung-Chi Chen, Yu-Min Tseng, Juyeon Kang, Anaïs Lhuissier, Min-Yuh Day, Teng-Tsai Tu, and Hsin-Hsi Chen. 2023. Multi-lingual esg issue identification. In *Proceedings of the Fifth Workshop on Financial Tech- nology and Natural Language Processing (FinNLP) and the Second Multimodal AI For Financial Fore- casting (Muffin)*.\n\n\u003Cspan id=\"10\">10.\u003C\u002Fspan> Julio Cesar Salinas Alvarado, Karin Verspoor, and Timothy Baldwin. 2015. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015. 84–90.\n\n\u003Cspan id=\"11\">11.\u003C\u002Fspan> Shah A, Vithani R, Gullapalli A, et al. Finer: Financial named entity recognition dataset and weak-supervision model[J]. arXiv preprint arXiv:2302.11157, 2023.\n\n\u003Cspan id=\"12\">12.\u003C\u002Fspan> Sharma, Soumya et al. “FinRED: A Dataset for Relation Extraction in Financial Domain.” *Companion Proceedings of the Web Conference 2022* (2022): n. pag.\n\n\u003Cspan id=\"13\">13.\u003C\u002Fspan> Dominique Mariko, Hanna Abi-Akl, Estelle Labidurie, Stephane Durfort, Hugues De Mazancourt, and Mahmoud El-Haj. 2020. [The Financial Document Causality Detection Shared Task (FinCausal 2020)](https:\u002F\u002Faclanthology.org\u002F2020.fnp-1.3). In *Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation*, pages 23–32, Barcelona, Spain (Online). COLING.\n\n\u003Cspan id=\"14\">14.\u003C\u002Fspan> Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R Routledge, et al . 2021. FinQA: A Dataset of Numerical Reasoning over Financial Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3697–3711.\n\n\u003Cspan id=\"15\">15.\u003C\u002Fspan> Zhu, Fengbin, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng and Tat-Seng Chua. “TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance.” *ArXiv* abs\u002F2105.07624 (2021): n. pag.\n\n\u003Cspan id=\"16\">16.\u003C\u002Fspan> Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh, Koustuv Dasgupta, Pawan Goyal, and Niloy Ganguly. 2023. [Financial Numeric Extreme Labelling: A dataset and benchmarking](https:\u002F\u002Faclanthology.org\u002F2023.findings-acl.219). In *Findings of the Association for Computational Linguistics: ACL 2023*, pages 3550–3561, Toronto, Canada. Association for Computational Linguistics.\n\n\u003Cspan id=\"17\">17.\u003C\u002Fspan> Matthew Lamm, Arun Chaganty, Christopher D. Manning, Dan Jurafsky, and Percy Liang. 2018. [Textual Analogy Parsing: What’s Shared and What’s Compared among Analogous Facts](https:\u002F\u002Faclanthology.org\u002FD18-1008). In *Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing*, pages 82–92, Brussels, Belgium. Association for Computational Linguistics.\n\n\u003Cspan id=\"18\">18.\u003C\u002Fspan> Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya Sharma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, and Pawan Goyal. 2022. [ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts](https:\u002F\u002Faclanthology.org\u002F2022.emnlp-main.748). In *Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing*, pages 10893–10906, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.\n\n\u003Cspan id=\"19\">19.\u003C\u002Fspan> Zhihan Zhou, Liqian Ma, and Han Liu. 2021. [Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading](https:\u002F\u002Faclanthology.org\u002F2021.findings-acl.186). In *Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021*, pages 2114–2124, Online. Association for Computational Linguistics.\n\n\u003Cspan id=\"20\">20.\u003C\u002Fspan> Hofmann,Hans. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. https:\u002F\u002Fdoi.org\u002F10.24432\u002FC5NC77.\n\n\u003Cspan id=\"21\">21.\u003C\u002Fspan> Quinlan,Ross. Statlog (Australian Credit Approval). UCI Machine Learning Repository. https:\u002F\u002Fdoi.org\u002F10.24432\u002FC59012.\n\n\u003Cspan id=\"26to32\">22.\u003C\u002Fspan> Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Alejandro Lopez-Lira, Hao Wang. 2023. Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models. *ArXiv* abs\u002F2310.00566 (2023): n. pag.\n\n\u003Cspan id=\"23\">23.\u003C\u002Fspan> Yejun Soun, Jaemin Yoo, Minyong Cho, Jihyeong Jeon, and U Kang. 2022. Accurate Stock Movement Prediction with Self-supervised Learning from Sparse Noisy Tweets. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 1691–1700.\n\n\u003Cspan id=\"24\">24.\u003C\u002Fspan> Yumo Xu and Shay B Cohen. 2018. Stock movement prediction from tweets and historical prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1970–1979.\n\n\u003Cspan id=\"25\">25.\u003C\u002Fspan> Huizhe Wu, Wei Zhang, Weiwei Shen, and Jun Wang. 2018. Hybrid deep sequential modeling for social text-driven stock prediction. In Proceedings of the 27th ACM international conference on information and knowledge management. 1627–1630.\n\n\u003Cspan id=\"26\">26.\u003C\u002Fspan> Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. 2022. ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6279–6292, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.\n\n\n\n\n### Evaluation\n\n#### Preparation\n\n##### Locally install\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU.git --recursive\ncd PIXIU\npip install -r requirements.txt\ncd src\u002Ffinancial-evaluation\npip install -e .[multilingual]\n```\n##### Docker image\n```bash\nsudo bash scripts\u002Fdocker_run.sh\n```\nAbove command starts a docker container, you can modify `docker_run.sh` to fit your environment. We provide pre-built image by running `sudo docker pull tothemoon\u002Fpixiu:latest`\n\n```bash\ndocker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \\\n    --network host \\\n    --env https_proxy=$https_proxy \\\n    --env http_proxy=$http_proxy \\\n    --env all_proxy=$all_proxy \\\n    --env HF_HOME=$hf_home \\\n    -it [--rm] \\\n    --name pixiu \\\n    -v $pixiu_path:$pixiu_path \\\n    -v $hf_home:$hf_home \\\n    -v $ssh_pub_key:\u002Froot\u002F.ssh\u002Fauthorized_keys \\\n    -w $workdir \\\n    $docker_user\u002Fpixiu:$tag \\\n    [--sshd_port 2201 --cmd \"echo 'Hello, world!' && \u002Fbin\u002Fbash\"]\n```\nArguments explain:\n- `[]` means ignoreable arguments\n- `HF_HOME`: huggingface cache dir\n- `sshd_port`: sshd port of the container, you can run `ssh -i private_key -p $sshd_port root@$ip` to connect to the container, default to 22001\n- `--rm`: remove the container when exit container (ie.`CTRL + D`)\n\n#### Automated Task Assessment\nBefore evaluation, please download [BART checkpoint](https:\u002F\u002Fdrive.google.com\u002Fu\u002F0\u002Fuc?id=1_7JfF7KOInb7ZrxKHIigTMR4ChVET01m&export=download) to `src\u002Fmetrics\u002FBARTScore\u002Fbart_score.pth`.\n\nFor automated evaluation, please follow these instructions:\n\n1. Huggingface Transformer\n\n   To evaluate a model hosted on the HuggingFace Hub (for instance, finma-7b-full), use this command:\n\n```bash\npython eval.py \\\n    --model \"hf-causal-llama\" \\\n    --model_args \"use_accelerate=True,pretrained=TheFinAI\u002Ffinma-7b-full,tokenizer=TheFinAI\u002Ffinma-7b-full,use_fast=False\" \\\n    --tasks \"flare_ner,flare_sm_acl,flare_fpb\"\n```\n\nMore details can be found in the [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) documentation.\n\n2. Commercial APIs\n\n\nPlease note, for tasks such as NER, the automated evaluation is based on a specific pattern. This might fail to extract relevant information in zero-shot settings, resulting in relatively lower performance compared to previous human-annotated results.\n\n```bash\nexport OPENAI_API_SECRET_KEY=YOUR_KEY_HERE\npython eval.py \\\n    --model gpt-4 \\\n    --tasks flare_ner,flare_sm_acl,flare_fpb\n```\n\n3. Self-Hosted Evaluation\n\nTo run inference backend:\n\n```bash\nbash scripts\u002Frun_interface.sh\n```\n\nPlease adjust run_interface.sh according to your environment requirements.\n\nTo evaluate:\n\n```bash\npython data\u002F*\u002Fevaluate.py\n```\n\n### Create new tasks\n\nCreating a new task for FinBen involves creating a Huggingface dataset and implementing the task in a Python file. This guide walks you through each step of setting up a new task using the FinBen framework.\n\n#### Creating your dataset in Huggingface\n\nYour dataset should be created in the following format:\n\n```python\n{\n    \"query\": \"...\",\n    \"answer\": \"...\",\n    \"text\": \"...\"\n}\n```\n\nIn this format:\n\n- `query`: Combination of your prompt and text\n- `answer`: Your label\n\nFor **Multi-turn** tasks (such as )\n\nFor **Classification** tasks (such as [FPB (FinBen_fpb)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fpb)), additional keys should be defined:\n\n- `choices`: Set of labels\n- `gold`: Index of the correct label in choices (Start from 0)\n\nFor **Sequential Labeling** tasks (such as [Finer Ord (FinBen_finer_ord)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finer-ord)), additional keys should be defined:\n\n- `label`: List of token labels\n\n- `token`: List of tokens\n\nFor **Extractive Summarization** tasks (such as [ECTSUM (FinBen_ectsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ectsum)), additional keys should be defined:\n\n- `label`: List of sentence labels\n\nFor **abstractive Summarization** and **Question Answering** tasks (such as [EDTSUM (FinBen_edtsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-edtsum)), no additional keys should be defined\n\n#### Implementing the task\n\nOnce your dataset is ready, you can start implementing your task. Your task should be defined within a new class in flare.py or any other Python file located within the tasks directory.\n\nTo cater to a range of tasks, we offer several specialized base classes, including `Classification`, `SequentialLabeling`, `RelationExtraction`, `ExtractiveSummarization`, `AbstractiveSummarization` and `QA`.\n\nFor instance, if you are embarking on a classification task, you can directly leverage our `Classification` base class. This class allows for efficient and intuitive task creation. To better demonstrate this, let's delve into an example of crafting a task named FinBen-FPB using the `Classification` base class:\n\n```python\nclass flareFPB(Classification):\n    DATASET_PATH = \"flare-fpb\"\n```\n\nAnd that's it! Once you've created your task class, the next step is to register it in the `src\u002Ftasks\u002F__init__.py` file. To do this, add a new line following the format `\"task_name\": module.ClassName`. Here is how it's done:\n\n```python\nTASK_REGISTRY = {\n    \"flare_fpb\": flare.FPB,\n    \"your_new_task\": your_module.YourTask,  # This is where you add your task\n}\n```\n\n#### Predefined task metrics\n\n| Task                                     | Metric                                 | Illustration                                                 |\n| ---------------------------------------- | -------------------------------------- | ------------------------------------------------------------ |\n| Classification                           | Accuracy                               | This metric represents the ratio of correctly predicted observations to total observations. It is calculated as (True Positives + True Negatives) \u002F Total Observations. |\n| Classification                           | F1 Score                               | The F1 Score represents the harmonic mean of precision and recall, thereby creating an equilibrium between these two factors. It proves particularly useful in scenarios where one factor bears more significance than the other. The score ranges from 0 to 1, with 1 signifying perfect precision and recall, and 0 indicating the worst case. Furthermore, we provide both 'weighted' and 'macro' versions of the F1 score. |\n| Classification                           | Missing Ratio                          | This metric calculates the proportion of responses where no options from the given choices in the task are returned. |\n| Classification                           | Matthews Correlation Coefficient (MCC) | The MCC is a metric that assesses the quality of binary classifications, producing a score ranging from -1 to +1. A score of +1 signifies perfect prediction, 0 denotes a prediction no better than random chance, and -1 indicates a completely inverse prediction. |\n| Sequential Labeling                      | F1 score                               | In the context of Sequential Labeling tasks, we utilize the F1 Score as computed by the `seqeval` library, a robust entity-level evaluation metric. This metric mandates an exact match of both the entity's span and type between the predicted and ground truth entities for a correct evaluation. True Positives (TP) represent correctly predicted entities, False Positives (FP) denote incorrectly predicted entities or entities with mismatched spans\u002Ftypes, and False Negatives (FN) signify missed entities from the ground truth. Precision, recall, and F1-score are then computed using these quantities, with the F1 Score representing the harmonic mean of precision and recall. |\n| Sequential Labeling                      | Label F1 score                         | This metric evaluates model performance based solely on the correctness of the labels predicted, without considering entity spans. |\n| Relation Extraction                      | Precision                              | Precision measures the proportion of correctly predicted relations out of all predicted relations. It is calculated as the number of True Positives (TP) divided by the sum of True Positives and False Positives (FP). |\n| Relation Extraction                      | Recall                                 | Recall measures the proportion of correctly predicted relations out of all actual relations. It is calculated as the number of True Positives (TP) divided by the sum of True Positives and False Negatives (FN). |\n| Relation Extraction                      | F1 score                               | The F1 Score is the harmonic mean of precision and recall, and it provides a balance between these two metrics. The F1 Score is at its best at 1 (perfect precision and recall) and worst at 0. |\n| Extractive and Abstractive Summarization | Rouge-N                                | This measures the overlap of N-grams (a contiguous sequence of N items from a given sample of text) between the system-generated summary and the reference summary. 'N' can be 1, 2, or more, with ROUGE-1 and ROUGE-2 being commonly used to assess unigram and bigram overlaps respectively. |\n| Extractive and Abstractive Summarization | Rouge-L                                | This metric evaluates the longest common subsequence (LCS) between the system and the reference summaries. LCS takes into account sentence level structure similarity naturally and identifies longest co-occurring in-sequence n-grams automatically. |\n| Question Answering                       | EmACC                                  | EMACC assesses the exact match between the model-generated response and the reference answer. In other words, the model-generated response is considered correct only if it matches the reference answer exactly, word-for-word. |\n\n>  Additionally, you can determine if the labels should be lowercased during the matching process by specifying `LOWER_CASE` in your class definition. This is pertinent since labels are matched based on their appearance in the generated output. For tasks like examinations where the labels are a specific set of capitalized letters such as 'A', 'B', 'C', this should typically be set to False.\n\n---\n\n## FIT: Financial Instruction Dataset\n\nOur instruction dataset is uniquely tailored for the domain-specific LLM, FinMA. This dataset has been meticulously assembled to fine-tune our model on a diverse range of financial tasks. It features publicly available multi-task and multi-modal data derived from the multiple open released financial datasets.\n\nThe dataset is multi-faceted, featuring tasks including sentiment analysis, news headline classification, named entity recognition, question answering, and stock movement prediction. It covers both textual and time-series data modalities, offering a rich variety of financial data. The task specific instruction prompts for each task have been carefully degined by domain experts.\n\n### Modality and Prompts\n\nThe table below summarizes the different tasks, their corresponding modalities, text types, and examples of the instructions used for each task:\n\n| **Task**                     | **Modalities**    | **Text Types**        | **Instructions Examples**                                    |\n| ---------------------------- | ----------------- | --------------------- | ------------------------------------------------------------ |\n| Sentiment Analysis           | Text              | news headlines,tweets | \"Analyze the sentiment of this statement extracted from a financial news article.Provide your answer as either negative, positive or neutral. For instance, 'The company's stocks plummeted following the scandal.' would be classified as negative.\" |\n| News Headline Classification | Text              | News Headlines        | \"Consider whether the headline mentions the price of gold. Is there a Price or Not in the gold commodity market indicated in the news headline? Please answer Yes or No.\" |\n| Named Entity Recognition     | Text              | financial agreements  | \"In the sentences extracted from financial agreements in U.S. SEC filings, identify the named entities that represent a person ('PER'), an organization ('ORG'), or a location ('LOC'). The required answer format is: 'entity name, entity type'. For instance, in 'Elon Musk, CEO of SpaceX, announced the launch from Cape Canaveral.', the entities would be: 'Elon Musk, PER; SpaceX, ORG; Cape Canaveral, LOC'\" |\n| Question Answering           | Text              | earnings reports      | \"In the context of this series of interconnected finance-related queries and the additional information provided by the pretext, table data, and post text from a company's financial filings, please provide a response to the final question. This may require extracting information from the context and performing mathematical calculations. Please take into account the information provided in the preceding questions and their answers when formulating your response:\" |\n| Stock Movement Prediction    | Text, Time-Series | tweets, Stock Prices  | \"Analyze the information and social media posts to determine if the closing price of *\\{tid\\}* will ascend or descend at *\\{point\\}*. Please respond with either Rise or Fall.\" |\n\n### Dataset Statistics\n\nThe dataset contains a vast amount of instruction data samples (136K), allowing FinMA to capture the nuances of the diverse financial tasks. The table below provides the statistical details of the instruction dataset:\n\n| Data      | Task                         | Raw    | Instruction | Data Types                | Modalities        | License      | Original Paper |\n| --------- | ---------------------------- | ------ | ----------- | ------------------------- | ----------------- | ------------ | -------------- |\n| FPB       | sentiment analysis           | 4,845  | 48,450      | news                      | text              | CC BY-SA 3.0 | [1]            |\n| FiQA-SA   | sentiment analysis           | 1,173  | 11,730      | news headlines, tweets    | text              | Public       | [2]            |\n| Headline  | news headline classification | 11,412 | 11,412      | news headlines            | text              | CC BY-SA 3.0 | [3]            |\n| NER       | named entity recognition     | 1,366  | 13,660      | financial agreements      | text              | CC BY-SA 3.0 | [4]            |\n| FinQA     | question answering           | 8,281  | 8,281       | earnings reports          | text, table       | MIT License  | [5]            |\n| ConvFinQA | question answering           | 3,892  | 3,892       | earnings reports          | text, table       | MIT License  | [6]            |\n| BigData22 | stock movement prediction    | 7,164  | 7,164       | tweets, historical prices | text, time series | Public       | [7]            |\n| ACL18     | stock movement prediction    | 27,053 | 27,053      | tweets, historical prices | text, time series | MIT License  | [8]            |\n| CIKM18    | stock movement prediction    | 4,967  | 4,967       | tweets, historical prices | text, time series | Public       | [9]            |\n\n1. Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65, 4 (2014), 782–796.\n2. Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. Www’18 open challenge: financial opinion mining and question answering. In Companion proceedings of the the web conference 2018. 1941–1942\n3. Ankur Sinha and Tanmay Khandait. 2021. Impact of news on the commodity market: Dataset and results. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2. Springer, 589–601\n4. Julio Cesar Salinas Alvarado, Karin Verspoor, and Timothy Baldwin. 2015. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015. 84–90.\n5. Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R Routledge, et al . 2021. FinQA: A Dataset of Numerical Reasoning over Financial Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3697–3711.\n6. Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. 2022. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. arXiv preprint arXiv:2210.03849 (2022).\n7. Yejun Soun, Jaemin Yoo, Minyong Cho, Jihyeong Jeon, and U Kang. 2022. Accurate Stock Movement Prediction with Self-supervised Learning from Sparse Noisy Tweets. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 1691–1700.\n8. Yumo Xu and Shay B Cohen. 2018. Stock movement prediction from tweets and historical prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1970–1979.\n9. Huizhe Wu, Wei Zhang, Weiwei Shen, and Jun Wang. 2018. Hybrid deep sequential modeling for social text-driven stock prediction. In Proceedings of the 27th ACM international conference on information and knowledge management. 1627–1630.\n\n### Generating Datasets for FIT\n\nWhen you are working with the Financial Instruction Dataset (FIT), it's crucial to follow the prescribed format for training and testing models.\n\nThe format should look like this:\n\n```json\n{\n    \"id\": \"unique id\",\n    \"conversations\": [\n        {\n            \"from\": \"human\",\n            \"value\": \"Your prompt and text\"\n        },\n        {\n            \"from\": \"agent\",\n            \"value\": \"Your answer\"\n        }\n    ],\n    \"text\": \"Text to be classified\",\n    \"label\": \"Your label\"\n}\n```\n\nHere's what each field means:\n\n- \"id\": a unique identifier for each example in your dataset.\n- \"conversations\": a list of conversation turns. Each turn is represented as a dictionary, with \"from\" representing the speaker, and \"value\" representing the text spoken in the turn.\n- \"text\": the text to be classified.\n- \"label\": the ground truth label for the text.\n\n\nThe first turn in the \"conversations\" list should always be from \"human\", and contain your prompt and the text. The second turn should be from \"agent\", and contain your answer.\n\n---\n\n## FinMA v0.1: Financial Large Language Model\n\nWe are pleased to introduce the first version of FinMA, including three models FinMA-7B, FinMA-7B-full, FinMA-30B, fine-tuned on LLaMA 7B and LLaMA-30B. FinMA-7B and FinMA-30B are trained with the NLP instruction data, while FinMA-7B-full is trained with the full instruction data from FIT covering both NLP and prediction tasks. \n\nFinMA v0.1 is now available on [Huggingface](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-nlp) for public use. We look forward to the valuable contributions that this initial version will make to the financial NLP field and encourage users to apply it to various financial tasks and scenarios. We also invite feedback and shared experiences to help improve future versions.\n\n### How to fine-tune a new large language model using PIXIU based on FIT?\n\nComing soon.\n\n---\n\n## FinMem: A Performance-Enhanced LLM Trading Agent\n\nFinMem is a novel LLM-based agent framework devised for financial decision-making, encompasses three core modules: Profiling, to outline the agent's characteristics; Memory, with layered processing, to aid the agent in assimilating realistic hierarchical financial data; and Decision-making, to convert insights gained from memories into investment decisions. Currently, FinMem can trade single stocks with high returns after a simple mode warm-up. Below is a quick start for a dockerized version framework, with TSLA as sample input.\n\nStep 1: Set environmental variables\nin `.env` add HUGGINGFACE TOKEN and OPENAI API KEY as needed.\n```bash\nOPENAI_API_KEY = \"\u003CYour OpenAI Key>\"\nHF_TOKEN = \"\u003CYour HF token>\"\n```\n\nStep 2: Set endpoint URL in `config.toml`\nUse endpoint URL to deploy models based on the model of choice (OPENAI, Gemini, open source models on HuggingFace, etc.). For open-source models on HuggingFace, one choice for generating TGI endpoints is through RunPod. \n```bash\n[chat]\nmodel = \"tgi\"\nend_point = \"\u003Cset the your endpoint address>\"\ntokenization_model_name = \"\u003Cmodel name>\"\n...\n```\n\nStep 3: Build Docker Image and Container\n```bash\ndocker build -t test-finmem .devcontainer\u002F. \n```\nstart container:\n```bash\ndocker run -it --rm -v $(pwd):\u002Ffinmem test-finmem bash\n```\n\nStep 4: Start Simulation!\n```bash\n Usage: run.py sim [OPTIONS]                                                                                                                \n                                                                                                                                            \n Start Simulation                                                                                                                           \n                                                                                                                                            \n╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --market-data-path    -mdp      TEXT  The environment data pickle path [default: data\u002F06_input\u002Fsubset_symbols.pkl]                       │\n│ --start-time          -st       TEXT  The training or test start time [default: 2022-06-30 For Ticker 'TSLA']                                                               │\n│ --end-time            -et       TEXT  The training or test end time [default: 2022-10-11]                                                                 │\n│ --run-model           -rm       TEXT  Run mode: train or test [default: train]                                                           │\n│ --config-path         -cp       TEXT  config file path [default: config\u002Fconfig.toml]                                                     │\n│ --checkpoint-path     -ckp      TEXT  The checkpoint save path [default: data\u002F10_checkpoint_test]                                             │\n│ --result-path         -rp       TEXT  The result save path [default: data\u002F11_train_result]                                               │\n│ --trained-agent-path  -tap      TEXT  Only used in test mode, the path of trained agent [default: None. Can be changed to data\u002F05_train_model_output OR data\u002F06_train_checkpoint]                                  │\n│ --help                                Show this message and exit.                                                                        │\n╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n                              \n```\nExample Usage:\n```bash\npython run.py sim --market-data-path data\u002F03_model_input\u002Ftsla.pkl --start-time 2022-06-30 --end-time 2022-10-11 --run-model train --config-path config\u002Ftsla_tgi_config.toml --checkpoint-path data\u002F06_train_checkpoint --result-path data\u002F05_train_model_output\n```\n\nThere are also checkpoint functionalities. For more details please visit [FinMem Repository](https:\u002F\u002Fgithub.com\u002Fpipiku915\u002FFinMem-LLM-StockTrading) directly. \n\n---\n\n## Citation\n\nIf you use PIXIU in your work, please cite our paper.\n\n```\n@misc{xie2023pixiu,\n      title={PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance}, \n      author={Qianqian Xie and Weiguang Han and Xiao Zhang and Yanzhao Lai and Min Peng and Alejandro Lopez-Lira and Jimin Huang},\n      year={2023},\n      eprint={2306.05443},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n@misc{xie2024FinBen,\n      title={The FinBen: An Holistic Financial Benchmark for Large Language Models}, \n      author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},\n      year={2024},\n      eprint={2402.12659},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## License\n\nPIXIU is licensed under [MIT]. For more details, please see the [MIT](LICENSE) file.\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_093817943670.png)](https:\u002F\u002Fstar-history.com\u002F#The-FinAI\u002FPIXIU&Date)\n\n","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_400e2fc0f7cf.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fp>\n\u003Cdiv>\n\u003Cdiv align=\"left\">\n    \u003Ca target='_blank'>谢倩倩\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>韩伟光\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>陈正宇\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>&emsp;\n    \u003Ca target='_blank'>向若雨\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>张晓\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>何月如\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>肖梦溪\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>李东\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>戴永福\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>冯端宇\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>许艺静\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>康浩强\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>匡子言\u003Csup>12\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>袁晨翰\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>杨凯来\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>罗哲恒\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>张天林\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>刘志伟\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>熊国军\u003Csup>10\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>邓志阳\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>蒋悦辰\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>姚志远\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>李浩航\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>于洋洋\u003Csup>9\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>胡刚\u003Csup>8\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>黄佳佳\u003Csup>11\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>刘小洋\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fwarrington.ufl.edu\u002Fdirectory\u002Fperson\u002F12693\u002F' target='_blank'>亚历杭德罗·洛佩斯-利拉\u003Csup>4\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>王本友\u003Csup>6\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>赖燕昭\u003Csup>13\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>王浩\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>彭敏\u003Csup>2*\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca target='_blank'>索菲娅·阿纳尼亚杜\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='' target='_blank'>黄继民\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr \u002F>\n\n\u003Cdiv align=\"left\">\n    \u003Csup>1\u003C\u002Fsup>Fin AI&emsp;\n    \u003Csup>2\u003C\u002Fsup>武汉大学&emsp;\n    \u003Csup>3\u003C\u002Fsup>曼彻斯特大学&emsp;\n    \u003Csup>4\u003C\u002Fsup>佛罗里达大学&emsp;\n    \u003Csup>5\u003C\u002Fsup>哥伦比亚大学&emsp;\n    \u003Csup>6\u003C\u002Fsup>香港中文大学（深圳）&emsp;\n    \u003Csup>7\u003C\u002Fsup>四川大学&emsp;\n    \u003Csup>8\u003C\u002Fsup>云南大学&emsp;\n    \u003Csup>9\u003C\u002Fsup>史蒂文斯理工学院&emsp;\n    \u003Csup>10\u003C\u002Fsup>石溪大学&emsp;\n    \u003Csup>11\u003C\u002Fsup>南京审计大学&emsp;\n    \u003Csup>12\u003C\u002Fsup>江西师范大学&emsp;\n    \u003Csup>13\u003C\u002Fsup>西南交通大学\n\u003C\u002Fdiv>\n\u003Cbr \u002F>\n\n\u003Cdiv align=\"left\">\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_52fc257eddd9.png' alt='武汉大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_35f70ee618cd.png' alt='曼彻斯特大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_775c6a4653f0.jpg' alt='佛罗里达大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fadmissions.ucr.edu\u002Fsites\u002Fdefault\u002Ffiles\u002Fstyles\u002Fform_preview\u002Fpublic\u002F2020-07\u002Fucr-education-logo-columbia-university.png?itok=-0FD6Ma2' alt='哥伦比亚大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_9b9890c8c729.png' alt='香港中文大学（深圳） Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_5b3a4cb80605.png' alt='四川大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_8568559eb060.png' alt='云南大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_54b8c3717741.png' alt='史蒂文斯理工学院 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_577e53e068f1.jpg' alt='石溪大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fen\u002F9\u002F9c\u002FNanjing_Audit_University_logo.png' alt='南京审计大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fen\u002Fthumb\u002Fc\u002Fc5\u002FJiangxi_Normal_University.svg\u002F1200px-Jiangxi_Normal_University.svg.png' alt='江西师范大学 Logo' height='50px'>&emsp;\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_324a11a42ec7.png' alt='西南交通大学 Logo' height='50px'>&emsp;\n\u003C\u002Fdiv>\n\n-----------------\n\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpixiu-v0.1-gold)\n![](https:\u002F\u002Fblack.readthedocs.io\u002Fen\u002Fstable\u002F_static\u002Flicense.svg)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1146837080798933112)](https:\u002F\u002Fdiscord.gg\u002FHRWpUmKB)\n\n[Pixiu论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05443) | [FinBen排行榜](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffinosfoundation\u002FOpen-Financial-LLM-Leaderboard)\n\n**免责声明**\n\n本仓库及其内容仅用于**学术和教育目的**。其中任何材料均不构成财务、法律或投资建议。对于内容的准确性、完整性或实用性，不提供任何明示或暗示的保证。作者及贡献者对因使用本仓库信息而产生的任何错误、遗漏或后果概不负责。用户在做出任何财务、法律或投资决策前，应自行判断并咨询专业人士。使用本仓库中的软件和信息，完全由用户自行承担风险。\n\n**通过使用或访问本仓库中的信息，您即表示同意赔偿、保护并使作者、贡献者以及任何关联组织或人员免受任何索赔或损害的影响。**\n\n**📢 更新（日期：2023年9月22日）**\n\n🚀 我们非常高兴地宣布，我们的论文《PIXIU：面向金融领域的综合基准、指令数据集与大型语言模型》已被NeurIPS 2023数据集与基准赛道接收！\n\n**📢 更新（日期：2023年10月8日）**\n\n🌏 我们自豪地宣布，FinBen的增强版本现已支持中文和西班牙语！\n\n**📢 更新（日期：2024年2月20日）**\n\n🌏 我们很高兴地宣布，我们的论文《The FinBen：大型语言模型的综合性金融基准》现已发表在[FinBen](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.12659)上。\n\n**📢 更新（日期：2024年5月2日）**\n\n🌏 我们诚挚邀请您参加IJCAI2024挑战赛“大型语言模型中的金融挑战——FinLLM”，入门指南可在[Starter-kit](README.ijcai_challenge.md)中找到。\n\n**检查点：**\n\n- [FinMA v0.1（NLP 7B版本）](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-nlp)\n- [FinMA v0.1（完整7B版本）](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-full)\n\n**语言**\n\n- [英语](README.md)\n- [西班牙语](README.es.md)\n- [中文](README.zh.md)\n\n**论文**\n\n- [PIXIU：面向金融领域的综合基准、指令数据集与大型语言模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05443)\n- [The FinBen：大型语言模型的综合性金融基准](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.12659)\n- [没有一种语言是孤岛：在金融领域大型语言模型、指令数据和基准中统一中文与英文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.06249)\n- [Dólares还是Dollars？解析金融LLM在西班牙语与英语之间的双语能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.07405)\n\n**评估**：\n\n- [英语评估数据集](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fenglish-evaluation-dataset-658f515911f68f12ea193194)（更多细节请参见FinBen部分）\n- [西班牙语评估数据集](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fspanish-evaluation-datasets-65e5855900680b19bc83e03d)\n- [中文评估数据集](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTheFinAI\u002Fchinese-evaluation-datasets-65e5851af7daaa71c1c59902)\n\n> 情感分析\n\n- [FPB (en_fpb)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fen-fpb)\n- [FIQASA (flare_fiqasa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fen-fpb)\n- [FOMC (flare_fomc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fomc)\n- [SemEval-2017 Task5 (flare_tsa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-tsa)\n\n> 分类\n\n- [新闻标题 (flare_headlines)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-headlines)\n- [FinArg ECC Task1 (flare_finarg_ecc_auc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finarg-ecc-auc)\n- [FinArg ECC Task2 (flare_finarg_ecc_arc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finarg-ecc-arc)\n- [CFA (flare_cfa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-cfa)\n- [MultiFin EN (flare_multifin_en)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-multifin-en)\n- [并购 (flare_ma)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ma)\n- [MLESG EN (flare_mlesg)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-mlesg)\n\n> 知识抽取\n\n- [NER (flare_ner)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ner)\n- [Finer Ord (flare_finer_ord)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finer-ord)\n- [FinRED (flare_finred)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finred)\n- [FinCausal20 Task1 (flare_causal20_sc)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-causal20-sc)\n- [FinCausal20 Task2 (flare_cd)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-cd)\n\n> 数字理解\n\n- [FinQA (flare_finqa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finqa)\n- [TATQA (flare_tatqa)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-tatqa)\n- [FNXL (flare_fnxl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fnxl)\n- [FSRL (flare_fsrl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fsrl)\n\n> 文本摘要\n\n- [ECTSUM (flare_ectsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ectsum)\n- [EDTSUM (flare_edtsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-edtsum)\n\n> 信用评分\n\n- [德国数据集 (flare_german)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-german)\n- [澳大利亚数据集 (flare_australian)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-australian)\n- [Lendingclub数据集 (flare_cra_lendingclub)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-lendingclub)\n- [信用卡欺诈数据集 (flare_cra_ccf)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-ccf)\n- [ccFraud数据集 (flare_cra_ccfraud)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-ccfraud)\n- [波兰数据集 (flare_cra_polish)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-polish)\n- [台湾经济期刊数据集 (flare_cra_taiwan)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-taiwan)\n- [PortoSeguro数据集 (flare_cra_portoseguro)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-portoseguro)\n- [旅行保险数据集 (flare_cra_travelinsurance)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdaishen\u002Fcra-travelinsurance)\n\n> 预测\n\n- [BigData22用于股票走势预测 (flare_sm_bigdata)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-bigdata)\n- [ACL18用于股票走势预测 (flare_sm_acl)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-acl)\n- [CIKM18用于股票走势预测 (flare_sm_cikm)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-sm-cikm)\n\n\n\n## 概述\n\n欢迎来到**PIXIU**项目！本项目旨在支持金融领域大型语言模型（LLMs）的开发、微调及评估。PIXIU是迈向理解和利用LLMs在金融领域潜力的重要一步。\n\n### 仓库结构\n\n仓库被组织成几个关键组件，每个组件都在金融NLP流程中发挥独特作用：\n\n- **FinBen**：我们的金融语言理解与预测评估基准。FinBen作为金融LLMs的评估套件，专注于各类金融情境下的理解和预测任务。\n- **FIT**：我们的金融指令数据集。FIT是一个多任务、多模态的指令数据集，专为金融任务设计，是微调LLMs以完成这些任务的训练基础。\n\n- **FinMA**：我们的金融大型语言模型（LLM）。FinMA是我们项目的核心，为各项金融任务提供学习与预测能力。\n\n### 主要特点\n\n- **开放资源**：PIXIU公开提供金融LLM、指令微调数据以及评估基准中包含的数据集，以促进开放研究与透明度。\n  \n- **多任务性**：PIXIU中的指令微调数据和基准涵盖了多样化的金融任务，包括四项金融NLP任务和一项金融预测任务。\n- **多模态性**：PIXIU的指令微调数据和基准由多模态金融数据组成，其中包括用于股票走势预测任务的时间序列数据。它覆盖了各种类型的金融文本，如报告、新闻文章、推文和监管文件。\n- **多样性**：与以往主要关注金融NLP任务的基准不同，PIXIU的评估基准包含了与现实场景紧密相关的关键金融预测任务，使其更具挑战性。\n\n---\n\n## FinBen 2.0：金融语言理解与预测评估基准\n\n在这一部分，我们提供了FinMA与其他领先模型（包括ChatGPT、GPT-4、BloombergGPT等）的详细性能对比分析。为了进行这项分析，我们选取了一系列涵盖金融自然语言处理和金融预测各个方面的任务和指标。FinBen的所有模型结果都可以在我们的[排行榜](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTheFinAI\u002Fflare)上找到！\n\n### 任务\n\n| 数据集                  | 任务                             | 样本数量    | 数据类型                | 模态        | 许可证         | 论文 |\n| --------------------- | -------------------------------- | ------ | ------------------------- | ----------------- | --------------- | ----- |\n| FPB                   | 情感分析               | 4,845  | 新闻                      | 文本              | CC BY-SA 3.0    | [[1]](#1) |\n| FiQA-SA               | 情感分析               | 1,173  | 新闻标题、推文    | 文本              | 公开            | [[2]](#2) |\n| TSA | 情感分析 | 561 | 新闻标题 | 文本 | CC BY-NC-SA 4.0 | [[3]](#3)       |\n| FOMC                  | 鹰派-鸽派分类    | 496    | FOMC会议记录          | 文本              | CC BY-NC 4.0 | [[4]](#4)       |\n| Headlines             | 新闻标题分类     | 11,412 | 新闻标题            | 文本              | CC BY-SA 3.0    | [[5]](#5) |\n| FinArg-ECC-Task1      | 论证单元分类     | 969    | 盈利电话会议  | 文本              | CC BY-NC-SA 4.0 | [[6]](#6) |\n| FinArg-ECC-Task2      | 论证关系分类     | 690    | 盈利电话会议  | 文本              | CC BY-NC-SA 4.0 | [[6]](#6) |\n| Multifin EN        | 多分类           | 546 | 文章标题 | 文本          | 公开 | [[7]](#7) |\n| M&A                     | 交易完整性分类  | 500    | 新闻文章、推文           | 文本              | 公开          | [[8]](#8) |\n| MLESG EN                | ESG议题识别          | 300    | 新闻文章                   | 文本              | CC BY-NC-ND     | [[9]](#9) |\n| NER                     | 命名实体识别          | 1,366  | 金融协议            | 文本              | CC BY-SA 3.0    | [[10]](#10) |\n| Finer Ord             | 命名实体识别         | 1,080  | 新闻文章             | 文本              | CC BY-NC 4.0    | [[11]](#11) |\n| FinRED                | 关系抽取              | 1,070  | 盈利电话会议记录   | 文本              | 公开          | [[12]](#12) |\n| FinCausual 2020 Task1 | 因果分类            | 8,630  | 新闻文章、SEC文件        | 文本              | CC BY 4.0       | [[13]](#13) |\n| FinCausual 2020 Task2 | 因果检测                 | 226    | 新闻文章、SEC文件        | 文本              | CC BY 4.0       | [[13]](#13) |\n| FinQA                 | 问答               | 8,281  | 盈利报告          | 文本、表格       | MIT许可证     | [[14]](#14) |\n| TatQA                 | 问答               | 1,670  | 财务报告          | 文本、表格       | MIT许可证     | [[15]](#15) |\n| FNXL                  | 数值标注           | 318    | SEC文件             | 文本              | 公开          | [[16]](#16) |\n| FSRL                  | 分词分类           | 97     | 新闻文章             | 文本              | MIT许可证     | [[17]](#17) |\n| ECTSUM                | 文本摘要           | 495    | 盈利电话会议记录   | 文本              | 公开          | [[18]](#18) |\n| EDTSUM                | 文本摘要           | 2000   | 新闻文章             | 文本              | 公开          | [[19]](#19) |\n| German                | 信用评分           | 1000   | 信用记录            | 表格             | CC BY 4.0       | [[20]](#20) |\n| Australian            | 信用评分           | 690    | 信用记录            | 表格             | CC BY 4.0       | [[21]](#21) |\n| Lending Club | 信用评分 | 1,3453 | 财务信息 | 表格 | CC0 1.0 | [[22]](#26to32) |\n| BigData22             | 股票走势预测        | 7,164  | 推文、历史价格 | 文本、时间序列 | 公开          | [[23]](#23) |\n| ACL18                 | 股票走势预测        | 27,053 | 推文、历史价格 | 文本、时间序列 | MIT许可证     | [[24]](#24) |\n| CIKM18                | 股票走势预测        | 4,967  | 推文、历史价格 | 文本、时间序列 | 公开          | [[25]](#25) |\n| ConvFinQA             | 多轮问答           | 1,490  | 盈利报告          | 文本、表格       | MIT许可证     | [[26]](#26) |\n| 信用卡欺诈     | 欺诈检测           | 11,392 | 财务信息     | 表格             | (DbCL) v1.0     | [[22]](#26to32) |\n| ccFraud               | 欺诈检测           | 10,485 | 财务信息     | 表格             | 公开          | [[22]](#26to32) |\n| Polish                | 财务困境识别       | 8,681  | 财务状况特征 | 表格             | CC BY 4.0       | [[22]](#26to32) |\n|台湾经济期刊| 财务困境识别| 6,819  | 财务状况特征 | 表格             | CC BY 4.0       | [[22]](#26to32) |\n| PortoSeguro           | 理赔分析           | 11,904 | 理赔及财务信息 | 表格             | 公开          | [[22]](#26to32) |\n| 旅行保险      | 理赔分析           | 12,665 | 理赔及财务信息 | 表格             | (ODbL) v1.0     | [[22]](#26to32) |\n\n\n\n\u003Cspan id=\"1\">1.\u003C\u002Fspan> 佩卡·马洛、安库尔·辛哈、佩卡·科尔霍宁、于尔基·瓦莱纽斯和皮里·塔卡拉。2014年。好债务还是坏债务：检测经济文本中的语义倾向。信息科学与技术协会期刊，第65卷，第4期（2014年），782–796页。\n\n\u003Cspan id=\"2\">2.\u003C\u002Fspan> 马塞多·迈亚、西格弗里德·汉德舒、安德烈·弗雷塔斯、布莱恩·戴维斯、罗斯·麦克德莫特、马内尔·扎鲁克和亚历山德拉·巴拉胡尔。2018年。Www’18开放挑战：金融观点挖掘与问答。载于2018年万维网大会配套论文集。1941–1942页。\n\n\u003Cspan id=\"3\">3.\u003C\u002Fspan> 凯斯·科蒂斯、安德烈·弗雷塔斯、托比亚斯·道尔特、曼努埃拉·许尔利曼、马内尔·扎鲁克、西格弗里德·汉德舒和布莱恩·戴维斯。2017年。[SemEval-2017任务5：金融微博和新闻的细粒度情感分析](https:\u002F\u002Faclanthology.org\u002FS17-2089)。载于《第11届国际语义评估研讨会（SemEval-2017）论文集》，第519–535页，加拿大温哥华。计算语言学协会出版。\n\n\u003Cspan id=\"4\">4.\u003C\u002Fspan> 阿甘·沙阿、苏万·帕图里和苏迪尔·查瓦。2023年。[万亿美元之词：一个新的金融数据集、任务与市场分析](https:\u002F\u002Faclanthology.org\u002F2023.acl-long.368)。载于《第61届计算语言学协会年会论文集（第一卷：长篇论文）》，第6664–6679页，加拿大多伦多。计算语言学协会出版。\n\n\u003Cspan id=\"5\">5.\u003C\u002Fspan> 安库尔·辛哈和坦迈·坎代特。2021年。新闻对商品市场的影响：数据集与结果。载于《信息与通信进展：2021未来信息与通信大会（FICC）论文集》，第二卷。施普林格出版社，589–601页。\n\n\u003Cspan id=\"6\">6.\u003C\u002Fspan> 陈昌成、林承毅、邱崇杰等。[NTCIR-17 FinArg-1 任务概述：金融分析中的细粒度论点理解](https:\u002F\u002Fresearch.nii.ac.jp\u002Fntcir\u002Fworkshop\u002FOnlineProceedings17\u002Fpdf\u002Fntcir\u002F01-NTCIR17-OV-FINARG-ChenC.pdf)[C]\u002F\u002F第17届NTCIR信息获取技术评估会议论文集，日本东京，2023年。\n\n\u003Cspan id=\"7\">7.\u003C\u002Fspan> 拉斯穆斯·约根森、奥利弗·布兰特、马赖克·哈特曼、戴翔、克里斯蒂安·伊格尔和德斯蒙德·埃利奥特。2023年。[MultiFin：多语言金融NLP数据集](https:\u002F\u002Faclanthology.org\u002F2023.findings-eacl.66)。载于《计算语言学协会成果：EACL 2023》，第894–909页，克罗地亚杜布罗夫尼克。计算语言学协会。\n\n\u003Cspan id=\"8\">8.\u003C\u002Fspan> 杨、肯尼、吴、杨、史密斯和董。（2020）。[为金融文本分类中的深度变换器生成合理反事实解释。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.12512) *国际计算语言学会议*。\n\n\u003Cspan id=\"9\">9.\u003C\u002Fspan> 陈中奇、曾宇民、姜柔妍、安娜伊斯·吕西耶、戴敏育、涂腾蔡和陈信希。2023年。多语言ESG议题识别。载于《第五届金融科技与自然语言处理研讨会（FinNLP）及第二届多模态人工智能金融预测研讨会（Muffin）》论文集。\n\n\u003Cspan id=\"10\">10.\u003C\u002Fspan> 胡里奥·塞萨尔·萨利纳斯·阿尔瓦拉多、卡琳·韦尔斯普尔和蒂莫西·鲍德温。2015年。命名实体识别的领域适应，以支持信用风险评估。载于2015年澳大拉西亚语言技术协会研讨会论文集，第84–90页。\n\n\u003Cspan id=\"11\">11.\u003C\u002Fspan> 沙阿、维塔尼、古拉帕利等。Finer：金融命名实体识别数据集及弱监督模型[J]。arXiv预印本arXiv:2302.11157，2023年。\n\n\u003Cspan id=\"12\">12.\u003C\u002Fspan> 沙尔马、索米娅等。“FinRED：金融领域关系抽取数据集。” *2022年网络大会配套论文集*（2022）：无页码。\n\n\u003Cspan id=\"13\">13.\u003C\u002Fspan> 多米尼克·马里科、汉娜·阿比-阿克尔、埃斯特尔·拉比迪里、斯蒂芬·杜福尔、于格·德·马赞库尔和马哈茂德·埃尔-哈吉。2020年。[金融文档因果关系检测共享任务（FinCausal 2020）](https:\u002F\u002Faclanthology.org\u002F2020.fnp-1.3)。载于《第一届金融叙事处理与多语言金融摘要联合研讨会》论文集，第23–32页，西班牙巴塞罗那（线上）。COLING。\n\n\u003Cspan id=\"14\">14.\u003C\u002Fspan> 陈志宇、陈文虎、查瑞丝·斯迈利、萨米娜·沙赫、伊安娜·博罗娃、迪伦·兰登、丽玛·穆萨、马特·比恩、黄廷浩、布莱恩·R·鲁特莱奇等。2021年。FinQA：金融数据上的数值推理数据集。载于2021年自然语言处理经验方法会议论文集，第3697–3711页。\n\n\u003Cspan id=\"15\">15.\u003C\u002Fspan> 朱峰斌、雷文强、黄友诚、王超、张硕、吕建程、冯福利和蔡达生。“TAT-QA：金融领域表格与文本混合内容上的问答基准。” *ArXiv* abs\u002F2105.07624（2021）：无页码。\n\n\u003Cspan id=\"16\">16.\u003C\u002Fspan> 索米娅·沙尔马、苏班杜·卡图亚、曼朱纳特·黑格德、阿芙琳·谢赫、考斯图夫·达斯古普塔、帕万·戈亚尔和尼洛伊·甘古利。2023年。[金融数值极端标注：数据集与基准测试](https:\u002F\u002Faclanthology.org\u002F2023.findings-acl.219)。载于《计算语言学协会成果：ACL 2023》，第3550–3561页，加拿大多伦多。计算语言学协会。\n\n\u003Cspan id=\"17\">17.\u003C\u002Fspan> 马修·拉姆、阿伦·查甘蒂、克里斯托弗·D·曼宁、丹·朱拉夫斯基和珀西·梁。2018年。[文本类比解析：类比事实中哪些是共有的，哪些是被比较的](https:\u002F\u002Faclanthology.org\u002FD18-1008)。载于《2018年自然语言处理经验方法会议论文集》，第82–92页，比利时布鲁塞尔。计算语言学协会。\n\n\u003Cspan id=\"18\">18.\u003C\u002Fspan> 拉杰迪普·穆克吉、阿比纳夫·博拉、阿卡什·班纳吉、索米娅·沙尔马、曼朱纳特·黑格德、阿芙琳·谢赫、希瓦尼·施里瓦斯塔瓦、考斯图夫·达斯古普塔、尼洛伊·甘古利、萨普塔尔希·戈什和帕万·戈亚尔。2022年。[ECTSum：长篇财报电话会议记录要点摘要的新基准数据集](https:\u002F\u002Faclanthology.org\u002F2022.emnlp-main.748)。载于《2022年自然语言处理经验方法会议论文集》，第10893–10906页，阿拉伯联合酋长国阿布扎比。计算语言学协会。\n\n\u003Cspan id=\"19\">19.\u003C\u002Fspan> 周志涵、马立谦和刘瀚。2021年。[交易事件：基于新闻的事件驱动交易中的企业事件检测](https:\u002F\u002Faclanthology.org\u002F2021.findings-acl.186)。载于《计算语言学协会成果：ACL-IJCNLP 2021》，第2114–2124页，线上。计算语言学协会。\n\n\u003Cspan id=\"20\">20.\u003C\u002Fspan> 霍夫曼，汉斯。（1994）。Statlog（德国信用数据）。UCI机器学习资源库。https:\u002F\u002Fdoi.org\u002F10.24432\u002FC5NC77。\n\n\u003Cspan id=\"21\">21.\u003C\u002Fspan> 昆兰，罗斯。Statlog（澳大利亚信贷审批）。UCI机器学习资源库。https:\u002F\u002Fdoi.org\u002F10.24432\u002FC59012。\n\n\u003Cspan id=\"26to32\">22.\u003C\u002Fspan> 冯端宇、戴永富、黄继明、张怡芳、谢倩倩、韩伟光、亚历杭德罗·洛佩斯-利拉和王浩。2023年。赋能多数，偏袒少数：通过大型语言模型进行通用信用评分。*ArXiv* abs\u002F2310.00566（2023）：无页码。\n\n\u003Cspan id=\"23\">23.\u003C\u002Fspan> 孙艺俊、柳在民、赵珉勇、全智亨和姜宇。2022年。利用自监督学习从稀疏且嘈杂的推文中准确预测股票走势。载于2022年IEEE大数据国际会议（Big Data）。IEEE，第1691–1700页。\n\n\u003Cspan id=\"24\">24.\u003C\u002Fspan> 徐宇墨和谢伊·B·科恩。2018年。根据推文和历史价格预测股票走势。载于《第56届计算语言学协会年会论文集》（第1卷：长篇论文），第1970–1979页。\n\n\u003Cspan id=\"25\">25.\u003C\u002Fspan> 吴慧哲、张伟、沈伟伟和王军。2018年。社交文本驱动的混合深度序列建模用于股票预测。载于《第27届ACM国际信息与知识管理会议论文集》，第1627–1630页。\n\n\u003Cspan id=\"26\">26.\u003C\u002Fspan> 陈志宇、李世扬、查瑞丝·斯迈利、马志强、萨米娜·沙赫和威廉·杨旺。2022年。ConvFinQA：探索对话式金融问答中的数值推理链。载于《2022年自然语言处理经验方法会议论文集》，第6279–6292页，阿拉伯联合酋长国阿布扎比。计算语言学协会。\n\n### 评估\n\n#### 准备工作\n\n##### 本地安装\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU.git --recursive\ncd PIXIU\npip install -r requirements.txt\ncd src\u002Ffinancial-evaluation\npip install -e .[multilingual]\n```\n##### Docker 镜像\n```bash\nsudo bash scripts\u002Fdocker_run.sh\n```\n上述命令会启动一个 Docker 容器，您可以根据自己的环境修改 `docker_run.sh`。我们还提供了预构建的镜像，运行以下命令即可拉取最新版本：`sudo docker pull tothemoon\u002Fpixiu:latest`\n\n```bash\ndocker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \\\n    --network host \\\n    --env https_proxy=$https_proxy \\\n    --env http_proxy=$http_proxy \\\n    --env all_proxy=$all_proxy \\\n    --env HF_HOME=$hf_home \\\n    -it [--rm] \\\n    --name pixiu \\\n    -v $pixiu_path:$pixiu_path \\\n    -v $hf_home:$hf_home \\\n    -v $ssh_pub_key:\u002Froot\u002F.ssh\u002Fauthorized_keys \\\n    -w $workdir \\\n    $docker_user\u002Fpixiu:$tag \\\n    [--sshd_port 2201 --cmd \"echo 'Hello, world!' && \u002Fbin\u002Fbash\"]\n```\n参数说明：\n- `[]` 表示可选参数\n- `HF_HOME`：Hugging Face 缓存目录\n- `sshd_port`：容器内的 SSH 服务端口，默认为 22001，可通过 `ssh -i private_key -p $sshd_port root@$ip` 连接到容器\n- `--rm`：退出容器时（如按 `CTRL + D`）自动删除容器\n\n#### 自动化任务评估\n在进行评估之前，请先下载 [BART 检查点](https:\u002F\u002Fdrive.google.com\u002Fu\u002F0\u002Fuc?id=1_7JfF7KOInb7ZrxKHIigTMR4ChVET01m&export=download) 并放置到 `src\u002Fmetrics\u002FBARTScore\u002Fbart_score.pth`。\n\n自动化评估步骤如下：\n\n1. Hugging Face Transformer\n\n若要评估托管在 Hugging Face Hub 上的模型（例如 finma-7b-full），可使用以下命令：\n\n```bash\npython eval.py \\\n    --model \"hf-causal-llama\" \\\n    --model_args \"use_accelerate=True,pretrained=TheFinAI\u002Ffinma-7b-full,tokenizer=TheFinAI\u002Ffinma-7b-full,use_fast=False\" \\\n    --tasks \"flare_ner,flare_sm_acl,flare_fpb\"\n```\n\n更多详细信息请参阅 [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) 文档。\n\n2. 商业 API\n\n\n请注意，对于 NER 等任务，自动化评估基于特定模式，这可能导致在零样本场景下无法提取相关信息，从而使得评估结果相对低于人工标注的结果。\n\n```bash\nexport OPENAI_API_SECRET_KEY=YOUR_KEY_HERE\npython eval.py \\\n    --model gpt-4 \\\n    --tasks flare_ner,flare_sm_acl,flare_fpb\n```\n\n3. 自行部署评估\n\n运行推理后端：\n\n```bash\nbash scripts\u002Frun_interface.sh\n```\n\n请根据您的环境需求调整 `run_interface.sh`。\n\n进行评估：\n\n```bash\npython data\u002F*\u002Fevaluate.py\n```\n\n### 创建新任务\n\n为 FinBen 创建新任务需要先创建一个 Hugging Face 数据集，并在 Python 文件中实现该任务。本指南将引导您逐步完成使用 FinBen 框架设置新任务的过程。\n\n#### 在 Hugging Face 上创建数据集\n\n您的数据集应按照以下格式创建：\n\n```python\n{\n    \"query\": \"...\",\n    \"answer\": \"...\",\n    \"text\": \"...\"\n}\n```\n\n其中：\n- `query`：提示与文本的组合\n- `answer`：标签\n\n对于**多轮对话**任务（如 ）\n\n对于**分类**任务（如 [FPB (FinBen_fpb)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-fpb)），还需定义以下键值：\n- `choices`：标签集合\n- `gold`：正确标签在 `choices` 中的索引（从 0 开始）\n\n对于**序列标注**任务（如 [Finer Ord (FinBen_finer_ord)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-finer-ord)），还需定义以下键值：\n- `label`：标记标签列表\n- `token`：标记列表\n\n对于**抽取式摘要**任务（如 [ECTSUM (FinBen_ectsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-ectsum)），还需定义以下键值：\n- `label`：句子标签列表\n\n对于**生成式摘要**和**问答**任务（如 [EDTSUM (FinBen_edtsum)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTheFinAI\u002Fflare-edtsum)），则无需额外定义键值。\n\n#### 实现任务\n\n数据集准备好后，即可开始实现任务。您的任务应在 `flare.py` 或任务目录下的其他 Python 文件中定义在一个新类中。\n\n为了支持多种任务类型，我们提供了多个专用基类，包括 `Classification`、`SequentialLabeling`、`RelationExtraction`、`ExtractiveSummarization`、`AbstractiveSummarization` 和 `QA`。\n\n例如，如果您正在开发一个分类任务，可以直接使用我们的 `Classification` 基类。该基类能够帮助您高效且直观地创建任务。下面以使用 `Classification` 基类创建名为 FinBen-FPB 的任务为例：\n\n```python\nclass flareFPB(Classification):\n    DATASET_PATH = \"flare-fpb\"\n```\n\n仅此而已！创建好任务类后，下一步是在 `src\u002Ftasks\u002F__init__.py` 文件中注册该任务。添加一行，格式为 `\"task_name\": module.ClassName`。例如：\n\n```python\nTASK_REGISTRY = {\n    \"flare_fpb\": flare.FPB,\n    \"your_new_task\": your_module.YourTask,  # 在此处添加您的任务\n}\n```\n\n#### 预定义的任务指标\n\n| 任务                                     | 指标                                 | 说明                                                 |\n| ---------------------------------------- | -------------------------------------- | ------------------------------------------------------------ |\n| 分类                                     | 准确率                               | 该指标表示正确预测的样本数占总样本数的比例。计算公式为：(真正例 + 真负例) \u002F 总样本数。 |\n| 分类                                     | F1 分数                               | F1 分数是精确率和召回率的调和平均值，能够在两者之间取得平衡。在某一指标比另一指标更重要的场景中尤为有用。分数范围为0到1，1表示精确率和召回率都达到完美，0则表示最差情况。此外，我们还提供加权和宏平均两种版本的F1分数。 |\n| 分类                                     | 缺失率                          | 该指标计算的是在任务中给出的选项均未被返回的响应所占比例。 |\n| 分类                                     | 马修斯相关系数 (MCC)                 | MCC是一种用于评估二分类质量的指标，取值范围为-1到+1。+1表示完美预测，0表示预测效果与随机猜测无异，-1则表示完全相反的预测。 |\n| 序列标注                      | F1 分数                               | 在序列标注任务中，我们使用由 `seqeval` 库计算的F1分数，这是一种强大的实体级评估指标。该指标要求预测实体与真实实体在实体范围和类型上完全匹配才能算作正确。真正例 (TP) 表示正确预测的实体，假正例 (FP) 表示错误预测的实体或实体范围\u002F类型不匹配的情况，假负例 (FN) 则表示遗漏了真实实体。基于这些量，可以计算出精确率、召回率和F1分数，其中F1分数是精确率和召回率的调和平均值。 |\n| 序列标注                      | 标签 F1 分数                         | 该指标仅根据预测标签的正确性来评估模型性能，而不考虑实体范围。 |\n| 关系抽取                      | 精确率                              | 精确率衡量的是所有预测关系中正确预测的关系所占比例。计算公式为：真正例 (TP) 数量除以真正例与假正例 (FP) 数量之和。 |\n| 关系抽取                      | 召回率                                 | 召回率衡量的是所有实际关系中正确预测的关系所占比例。计算公式为：真正例 (TP) 数量除以真正例与假负例 (FN) 数量之和。 |\n| 关系抽取                      | F1 分数                               | F1 分数是精确率和召回率的调和平均值，能够在两者之间取得平衡。F1分数的最佳值为1（精确率和召回率都达到完美），最差值为0。 |\n| 抽取式与摘要式文本摘要          | Rouge-N                                | 该指标衡量系统生成摘要与参考摘要之间N-gram（文本中连续的N个词）的重叠程度。“N”可以是1、2或更多，其中ROUGE-1和ROUGE-2常用于分别评估一元组和二元组的重叠情况。 |\n| 抽取式与摘要式文本摘要          | Rouge-L                                | 该指标评估系统摘要与参考摘要之间的最长公共子序列 (LCS)。LCS能够自然地考虑句子层面的结构相似性，并自动识别最长的连续共现N-gram。 |\n| 问答                                       | EmACC                                  | EmACC评估模型生成的回答与参考答案是否完全一致。换言之，只有当模型生成的答案与参考答案逐字逐句完全匹配时，才被视为正确。\n\n> 此外，您可以通过在类定义中指定 `LOWER_CASE` 来决定匹配过程中是否应将标签转换为小写。这一点很重要，因为标签是根据其在生成输出中的形式进行匹配的。对于考试等任务，如果标签是一组特定的大写字母，如“A”、“B”、“C”，通常应将其设置为False。\n\n---\n\n\n\n## FIT：金融指令数据集\n\n我们的指令数据集专为领域特定的大语言模型 FinMA 定制。该数据集经过精心构建，旨在对我们的模型进行多样的金融任务微调。它包含了来自多个公开发布的金融数据集的多任务、多模态公开数据。\n\n该数据集内容丰富，涵盖情感分析、新闻标题分类、命名实体识别、问答以及股票走势预测等多种任务。数据模态既包括文本数据，也包括时间序列数据，提供了丰富的金融数据资源。每个任务的具体指令提示均由领域专家精心设计。\n\n### 模态与提示\n\n下表总结了不同任务、其对应的模态、文本类型以及每项任务所使用的指令示例：\n\n| **任务**                     | **模态**    | **文本类型**        | **指令示例**                                    |\n| ---------------------------- | ------------- | --------------------- | ------------------------------------------------------------ |\n| 情感分析           | 文本              | 新闻标题、推文 | “请分析这段摘自财经新闻的语句的情感倾向。请以‘负面’、‘正面’或‘中性’作答。例如，‘该公司股票在丑闻曝光后暴跌。’应被归类为负面。” |\n| 新闻标题分类       | 文本              | 新闻标题        | “请判断该标题是否提及黄金价格。新闻标题中是否暗示了黄金商品市场的价格变动？请回答‘是’或‘否’。” |\n| 命名实体识别       | 文本              | 金融协议        | “请从美国证券交易委员会文件中的金融协议段落中，识别出代表人物（‘PER’）、组织（‘ORG’）或地点（‘LOC’）的命名实体。答案格式为：‘实体名称, 实体类型’。例如，在‘SpaceX首席执行官埃隆·马斯克宣布从卡纳维拉尔角发射’这句话中，实体应为：‘埃隆·马斯克, PER；SpaceX, ORG；卡纳维拉尔角, LOC’。” |\n| 问答任务           | 文本              | 盈利报告      | “基于这一系列相互关联的金融相关问题，以及前置文本、表格数据和公司财务报表中的附加信息，请回答最后一个问题。这可能需要从上下文中提取信息并进行数学计算。请在作答时考虑前面问题及其答案所提供的信息：” |\n| 股票走势预测       | 文本、时间序列 | 推文、股票价格  | “请分析相关信息和社交媒体帖子，判断*\\{tid\\}*的收盘价将在*\\{point\\}*时上涨还是下跌。请以‘上涨’或‘下跌’作答。” |\n\n### 数据集统计信息\n\n该数据集包含大量指令数据样本（136K），使FinMA能够捕捉到多样化金融任务的细微差别。下表提供了指令数据集的统计详情：\n\n| 数据      | 任务                         | 原始    | 指令 | 数据类型                | 模态        | 许可证      | 原始论文 |\n| --------- | ---------------------------- | ------ | ----------- | ------------------------- | ------------ | ------------ | ---------- |\n| FPB       | 情感分析           | 4,845  | 48,450      | 新闻                      | 文本              | CC BY-SA 3.0 | [1]            |\n| FiQA-SA   | 情感分析           | 1,173  | 11,730      | 新闻标题、推文    | 文本              | 公有领域       | [2]            |\n| Headline  | 新闻标题分类       | 11,412 | 11,412      | 新闻标题            | 文本              | CC BY-SA 3.0 | [3]            |\n| NER       | 命名实体识别     | 1,366  | 13,660      | 金融协议          | 文本              | CC BY-SA 3.0 | [4]            |\n| FinQA     | 问答任务           | 8,281  | 8,281       | 盈利报告          | 文本、表格       | MIT许可证  | [5]            |\n| ConvFinQA | 问答任务           | 3,892  | 3,892       | 盈利报告          | 文本、表格       | MIT许可证  | [6]            |\n| BigData22 | 股票走势预测       | 7,164  | 7,164       | 推文、历史价格    | 文本、时间序列 | 公有领域       | [7]            |\n| ACL18     | 股票走势预测       | 27,053 | 27,053      | 推文、历史价格    | 文本、时间序列 | MIT许可证  | [8]            |\n| CIKM18    | 股票走势预测       | 4,967  | 4,967       | 推文、历史价格    | 文本、时间序列 | 公有领域       | [9]            |\n\n1. 佩卡·马洛、安库尔·辛哈、佩卡·科尔霍宁、于尔基·瓦莱纽斯和皮里·塔卡拉。2014年。好债务还是坏债务：检测经济文本中的语义倾向。信息科学与技术协会期刊，第65卷第4期（2014年），782–796页。\n2. 马塞多·迈亚、西格弗里德·汉施胡、安德烈·弗雷塔斯、布莱恩·戴维斯、罗斯·麦克德莫特、马内尔·扎鲁克和亚历山德拉·巴拉胡尔。2018年。Www’18开放挑战：金融观点挖掘与问答。2018年网络大会配套论文集，1941–1942页。\n3. 安库尔·辛哈和坦迈·坎代特。2021年。新闻对商品市场的影响：数据集与结果。信息与通信进展：2021年信息与通信未来大会（FICC）论文集，第2卷。施普林格出版社，589–601页。\n4. 胡里奥·塞萨尔·萨利纳斯·阿尔瓦拉多、卡琳·费尔斯波尔和蒂莫西·鲍德温。2015年。用于支持信用风险评估的命名实体识别领域适应。2015年澳大利亚语言技术协会研讨会论文集，84–90页。\n5. 陈志宇、陈文虎、查瑞丝·斯迈利、萨米娜·沙赫、伊安娜·博罗娃、迪伦·兰登、丽玛·穆萨、马特·比恩、黄廷浩、布莱恩·R·劳特利奇等。2021年。FinQA：金融数据上的数值推理数据集。2021年自然语言处理经验方法会议论文集，3697–3711页。\n6. 陈志宇、李世阳、查瑞丝·斯迈利、智强·马、萨米娜·沙赫和威廉·杨·王。2022年。Convfinqa：探索会话式金融问答中的数值推理链条。arXiv预印本arXiv:2210.03849（2022年）。\n7. 叶俊·孙、赵在民、曹敏勇、全智亨和U·康。2022年。利用来自稀疏且嘈杂推文的自监督学习实现精准的股票走势预测。2022年IEEE国际大数据会议（Big Data）。IEEE出版，1691–1700页。\n8. 徐宇墨和谢伊·B·科恩。2018年。基于推文和历史价格的股票走势预测。第56届计算语言学协会年会论文集（第1卷：长篇论文），1970–1979页。\n9. 吴慧哲、张伟、沈伟伟和王军。2018年。结合深度序列模型的社会文本驱动股票预测。第27届ACM国际信息与知识管理会议论文集，1627–1630页。\n\n### 为 FIT 生成数据集\n\n在使用金融指令数据集（FIT）时，遵循规定的格式来训练和测试模型至关重要。\n\n格式应如下所示：\n\n```json\n{\n    \"id\": \"唯一标识符\",\n    \"conversations\": [\n        {\n            \"from\": \"human\",\n            \"value\": \"您的提示和文本\"\n        },\n        {\n            \"from\": \"agent\",\n            \"value\": \"您的回答\"\n        }\n    ],\n    \"text\": \"待分类的文本\",\n    \"label\": \"您的标签\"\n}\n```\n\n各字段含义如下：\n\n- “id”：数据集中每个样本的唯一标识符。\n- “conversations”：对话轮次列表。每一轮次用一个字典表示，其中“from”表示发言者，“value”表示该轮次中的文本。\n- “text”：待分类的文本。\n- “label”：该文本的真实标签。\n\n\n“conversations”列表中的第一轮必须来自“human”，包含您的提示和文本。第二轮则应来自“agent”，包含您的答案。\n\n---\n\n## FinMA v0.1：金融大型语言模型\n\n我们很高兴推出 FinMA 的第一个版本，包括三个模型：FinMA-7B、FinMA-7B-full 和 FinMA-30B，它们分别基于 LLaMA 7B 和 LLaMA-30B 进行微调。FinMA-7B 和 FinMA-30B 使用 NLP 指令数据进行训练，而 FinMA-7B-full 则使用涵盖 NLP 和预测任务的完整 FIT 指令数据进行训练。\n\nFinMA v0.1 现已在 [Huggingface](https:\u002F\u002Fhuggingface.co\u002FTheFinAI\u002Ffinma-7b-nlp) 上公开发布，供公众使用。我们期待这一初始版本能为金融 NLP 领域带来宝贵贡献，并鼓励用户将其应用于各种金融任务和场景。我们也欢迎反馈和经验分享，以帮助改进未来的版本。\n\n### 如何基于 FIT 使用 PIXIU 微调一个新的大型语言模型？\n\n敬请期待。\n\n---\n\n## FinMem：性能增强型 LLM 交易代理\n\nFinMem 是一种新颖的基于 LLM 的代理框架，专为金融决策设计，包含三个核心模块：Profile 模块，用于描绘代理的特征；Memory 模块，采用分层处理方式，帮助代理吸收真实的分层金融数据；以及 Decision-making 模块，将从记忆中获得的洞察转化为投资决策。目前，经过简单的模式预热后，FinMem 已能实现高收益的单只股票交易。以下是基于 Docker 的快速入门框架，以 TSLA 作为示例输入。\n\n步骤 1：设置环境变量  \n在 `.env` 文件中根据需要添加 HUGGINGFACE TOKEN 和 OPENAI API KEY。  \n```bash\nOPENAI_API_KEY = \"\u003C您的 OpenAI 密钥>\"\nHF_TOKEN = \"\u003C您的 HF 令牌>\"\n```\n\n步骤 2：在 `config.toml` 中设置端点 URL  \n根据所选模型（OPENAI、Gemini、HuggingFace 上的开源模型等）使用端点 URL 部署模型。对于 HuggingFace 上的开源模型，可以通过 RunPod 生成 TGI 端点。  \n```bash\n[chat]\nmodel = \"tgi\"\nend_point = \"\u003C设置您的端点地址>\"\ntokenization_model_name = \"\u003C模型名称>\"\n...\n```\n\n步骤 3：构建 Docker 镜像和容器  \n```bash\ndocker build -t test-finmem .devcontainer\u002F. \n```\n启动容器：  \n```bash\ndocker run -it --rm -v $(pwd):\u002Ffinmem test-finmem bash\n```\n\n步骤 4：开始模拟！  \n```bash\n用法：run.py sim [选项]                                                                                                                \n                                                                                                                                            \n 开始模拟                                                                                                                           \n                                                                                                                                            \n╭─ 选项 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --market-data-path    -mdp      TEXT  环境数据 pickle 路径 [默认：data\u002F06_input\u002Fsubset_symbols.pkl]                       │\n│ --start-time          -st       TEXT  训练或测试开始时间 [默认：2022-06-30 对于 Ticker 'TSLA']                                                               │\n│ --end-time            -et       TEXT  训练或测试结束时间 [默认：2022-10-11]                                                                 │\n│ --run-model           -rm       TEXT  运行模式：train 或 test [默认：train]                                                           │\n│ --config-path         -cp       TEXT  配置文件路径 [默认：config\u002Fconfig.toml]                                                     │\n│ --checkpoint-path     -ckp      TEXT  检查点保存路径 [默认：data\u002F10_checkpoint_test]                                             │\n│ --result-path         -rp       TEXT  结果保存路径 [默认：data\u002F11_train_result]                                               │\n│ --trained-agent-path  -tap      TEXT  仅在测试模式下使用，已训练代理的路径 [默认：无。可更改为 data\u002F05_train_model_output 或 data\u002F06_train_checkpoint]                                  │\n│ --help                                显示此消息并退出。                                                                        │\n╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n                              \n```\n\n使用示例：  \n```bash\npython run.py sim --market-data-path data\u002F03_model_input\u002Ftsla.pkl --start-time 2022-06-30 --end-time 2022-10-11 --run-model train --config-path config\u002Ftsla_tgi_config.toml --checkpoint-path data\u002F06_train_checkpoint --result-path data\u002F05_train_model_output\n```\n\n此外，还提供了检查点功能。更多详情请直接访问 [FinMem 仓库](https:\u002F\u002Fgithub.com\u002Fpipiku915\u002FFinMem-LLM-StockTrading)。 \n\n---\n\n## 引用\n\n如果您在工作中使用了 PIXIU，请引用我们的论文。\n\n```\n@misc{xie2023pixiu,\n      title={PIXIU：面向金融领域的大型语言模型、指令数据集及评估基准}, \n      author={谢倩倩、韩伟光、张晓、赖延昭、彭敏、亚历杭德罗·洛佩斯-利拉、黄继民},\n      year={2023},\n      eprint={2306.05443},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n@misc{xie2024FinBen,\n      title={FinBen：面向大型语言模型的综合性金融基准测试集}, \n      author={谢倩倩、韩伟光、陈正宇、向若愚、张晓、何悦如、肖梦溪、李东、戴永福、冯端宇、徐艺静、康浩强、匡子言、袁晨瀚、杨凯来、罗哲恒、张天林、刘志伟、熊国俊、邓志阳、蒋岳辰、姚志远、李浩航、于洋洋、胡刚、黄佳佳、刘晓阳、亚历杭德罗·洛佩斯-利拉、王本友、赖延昭、王浩、彭敏、索菲娅·阿纳尼亚杜、黄继民},\n      year={2024},\n      eprint={2402.12659},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## 许可协议\n\nPIXIU 采用 [MIT] 许可协议授权。更多详情请参阅 [MIT](LICENSE) 文件。\n\n## 星标历史\n\n[![星标历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_readme_093817943670.png)](https:\u002F\u002Fstar-history.com\u002F#The-FinAI\u002FPIXIU&Date)","# PIXIU 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- 操作系统：Linux 或 macOS（Windows 也可使用，但推荐使用 WSL）\n- Python 版本：Python 3.8 或以上\n- GPU 支持（可选）：NVIDIA GPU 以加速模型训练和推理\n\n### 前置依赖\n```bash\npip install -r requirements.txt\n```\n\n> 注意：若国内用户安装速度较慢，可使用国内镜像源：\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 安装步骤\n\n1. 克隆仓库：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FTheFinAI\u002Fpixiu.git\ncd pixiu\n```\n\n2. 安装依赖：\n```bash\npip install -r requirements.txt\n```\n\n> 国内用户建议使用清华镜像加速安装：\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n### 加载预训练模型\n```bash\n# 示例：加载 FinMA 模型\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"TheFinAI\u002Ffinma-7b-nlp\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\n```\n\n### 推理示例\n```bash\ninput_text = \"What is the impact of interest rates on stock prices?\"\ninputs = tokenizer(input_text, return_tensors=\"pt\")\noutputs = model.generate(**inputs)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\n### 使用 FinBen 评估模型\n```bash\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"TheFinAI\u002Fen-fpb\")  # 示例：英文情感分析数据集\nprint(dataset[\"train\"][:2])\n```","某金融科技公司正在开发一款智能投顾系统，需要对大量金融文本进行理解和分析，以提供个性化的投资建议。团队在训练和评估金融大模型时面临诸多挑战。\n\n### 没有 PIXIU 时  \n- 缺乏专门针对金融领域的大型语言模型，导致模型理解能力有限  \n- 没有统一的指令调优数据集，难以有效提升模型在具体任务上的表现  \n- 评估标准不明确，无法准确衡量模型在金融场景中的实际效果  \n- 数据和模型资源分散，团队需要花费大量时间自行收集和整理  \n\n### 使用 PIXIU 后  \n- 直接使用 PIXIU 提供的金融大模型，显著提升对金融文本的理解和生成能力  \n- 利用内置的指令调优数据集，快速优化模型在投资建议等任务上的表现  \n- 通过标准化的评估基准，精准衡量模型在不同金融场景下的性能  \n- 节省大量数据准备时间，使团队能够更专注于算法优化和产品开发  \n\nPIXIU 为金融 AI 开发者提供了从模型、数据到评估的一站式解决方案，极大提升了研发效率和模型质量。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FThe-FinAI_PIXIU_400e2fc0.png","The-FinAI","The Fin AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FThe-FinAI_141cd16c.jpg","",null,"https:\u002F\u002Fthefin.ai","https:\u002F\u002Fgithub.com\u002FThe-FinAI",[80,84,88],{"name":81,"color":82,"percentage":83},"Jupyter Notebook","#DA5B0B",86.6,{"name":85,"color":86,"percentage":87},"Python","#3572A5",13.3,{"name":89,"color":90,"percentage":91},"Shell","#89e051",0.2,848,114,"2026-04-08T15:56:11","MIT","Linux, macOS, Windows","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","16GB+",{"notes":100,"python":101,"dependencies":102},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件","3.8+",[103,104,105,106,107,108,109,110,111,112],"torch>=2.0","transformers>=4.30","accelerate","datasets","sentencepiece","peft","bitsandbytes","evaluate","deepspeed","wandb",[14,35],[115,116,117,118,119,120,121,122,123,124,125,126,127,128,129],"aifinance","fintech","llama","machine-learning","named-entity-recognition","natural-language-processing","nlp","question-answering","sentiment-analysis","stock-price-prediction","text-classification","chatgpt","gpt-4","large-language-models","pixiu",4,"2026-03-27T02:49:30.150509","2026-04-11T18:31:38.978306",[134,139,144,149,154,159],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},5308,"无法找到 FinMA 的检查点文件","请查看我们的最新 Notebook：https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ogcCmhMc5lPhUamCk6512H3PJwPEaBZN?usp=sharing。所有复现问题应已解决。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F63",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},5309,"如何复现论文中的结果？","您需要使用正确的推理后端。我们建议使用自动评估框架，该框架目前更成熟且方便。如果要复现论文结果，请确保使用 `--model_prompt finma_prompt` 参数。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F16",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},5310,"FLARE 基准测试中遇到 VLLM 依赖问题","请安装特定版本的 vllm：`!pip install vllm==0.2.7`。同时确保已下载 BART 检查点，并正确指定所有参数。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F46",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},5311,"如何解决 KeyError 'hf-causal-llama' 问题？","我们需要修改 lm-evaluation-harness，因此请使用 `git clone --recursive` 克隆仓库。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F33",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},5312,"训练集是否开源？","目前训练集未开源，我们正在准备新的指令调优数据集，原始数据集将用于更新。请关注后续更新。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F66",{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},5313,"排行榜链接失效了怎么办？","该链接已正确重定向到 https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffinosfoundation\u002FOpen-Financial-LLM-Leaderboard。FinBen 的排行榜即将被移除。","https:\u002F\u002Fgithub.com\u002FThe-FinAI\u002FPIXIU\u002Fissues\u002F78",[]]