Version 0.7(2025-04-17)

English
中文

1、Overview

We are pleased to announce the official release of KAG 0.7. This update continues our commitment to enhancing the consistency, rigor, and precision of knowledge base-augmented reasoning in large language models, while introducing several significant new features.

Firstly, we have completely refactored the framework. The update adds support for both static and iterative task planning modes, along with a more rigorous hierarchical knowledge mechanism during the reasoning phase. Additionally, the new multi-executor extension mechanism and MCP protocol integration enable horizontal scaling of various symbolic solvers (such as math-executor and cypher-executor). These improvements not only help users quickly build knowledge-augmented applications to validate innovative ideas or domain-specific solutions, but also support continuous optimization of KAG Solver's capabilities, thereby further enhancing reasoning rigor in vertical applications.

Secondly, we have comprehensively optimized the product experience: during the reasoning phase, we introduced dual modes "Simple Mode" and "Deep Reasoning" and added support for streaming reasoning output, significantly reducing user wait times. Particularly noteworthy is the introduction of the "Lightweight Construction" mode to better facilitate the large-scale business application of KAG and address the community's most pressing concern about high knowledge construction costs. As shown in the KAG-V0.7LC column of Figure 1, we tested a hybrid approach where a 7B model handles knowledge construction and a 72B model handles knowledge-based question answering. The results on the two_wiki, hotpotqa, and musique benchmarks showed only minor declines of 1.20%, 1.90%, and 3.11%, respectively. However, the token cost（Refer to Aliyun Bailian pricing）for constructing a 100,000-character document was reduced from 4.63￥ to 0.479￥, a 89% reduction, which substantially saves users both time and financial costs. Additionally, we will release a KAG-specific extraction model and a distributed offline batch construction version, continuously compressing model size and improving construction throughput to achieve daily construction capabilities for millions or even tens of millions of documents in a single scenario.

Finally, to better promote business applications, technological advancement, and community exchange for knowledge-augmented LLMs, we have added an open_benchmark directory at the root level of the KAG repository. This directory includes reproduction methods for various datasets to help users replicate and improve KAG's performance across different tasks. Moving forward, we will continue to expand with more vertical scenario task datasets to provide users with richer resources.

Beyond these framework and product optimizations, we've fixed several bugs in both reasoning and construction phases. This update uses Qwen2.5-72B as the base model, completing effect alignment across various RAG frameworks and partial KG datasets. For overall benchmark results, please refer to Figures 1 and 2, with detailed rankings available in the open_benchmark section.

Figure1. Performance of KAG V0.7 and baselines on Multi-hop QA benchmarks

_Figure2. Performance of KAG V0.7 and baselines(from OpenKG OneEval) on _Knowledge based QA benchmarks

2、Framework Enhancements

2.1、Hybrid Static-Dynamic Task Planning

This release introduces optimizations to the KAG-Solver framework implementation, providing more flexible architectural support for: "Retrieval during reasoning" workflows, Multi-scenario algorithm experimentation, LLM-symbolic engine integration (via MCP protocol).

The framework's Static/Iterative Planner transforms complex problems into directed acyclic graphs (DAGs) of interconnected Executors, enabling step-by-step resolution based on dependency relationships. We've implemented built-in Pipeline support for both Static and Iterative Planners, including a predefined NaiveRAG Pipeline - offering developers customizable solver chaining capabilities while maintaining implementation flexibility.

2.2、Extensible Symbolic Solvers

Leveraging LLM's FunctionCall capability, we have optimized the design of symbolic solvers (Executors) to enable more rational solver matching during complex problem planning. This release includes built-in solvers such as kag_hybrid_executor, math_executor, and cypher_executor, while providing a flexible extension mechanism that allows developers to define custom solvers for personalized requirements.

2.3、Optimized Retrieval/Reasoning Strategies

Using the enhanced KAG-Solver framework, we have rewritten the logic of kag_hybrid_executor to implement a more rigorous knowledge layering mechanism during reasoning. Based on business requirements for knowledge precision and following KAG's knowledge hierarchy definition, the system now sequentially retrieves three knowledge layers: (schema-constrained), (schema-free), and (raw context), subsequently performing reasoning to generate answers.

2.4、MCP Protocol Integration

This KAG release achieves compatibility with the MCP protocol, enabling the incorporation of external data sources and symbolic solvers into the KAG framework via MCP. We have included a baidu_map_mcp example in the example directory for developers' reference.

3、OpenBenchmark

To better facilitate academic exchange and accelerate the adoption and technological advancement of large language models with external knowledge bases in enterprise settings, KAG has released more detailed benchmark reproduction steps in this version, along with open-sourcing all code and data. This will enable developers and researchers to easily reproduce and align results across various datasets.

For more accurate quantification of reasoning performance, we have adopted multiple evaluation metrics, including EM (Exact Match), F1, and LLM_Accuracy. In addition to existing datasets such as TwoWiki, Musique, and HotpotQA, this update introduces the OpenKG OneEval knowledge graph QA dataset (including AffairQA and PRQA) to evaluate the capabilities of both the cypher_executor and KAG's default framework.

Building benchmarks is a time-consuming and complex endeavor. In future work, we will continue to expand benchmark datasets and provide domain-specific solutions to further enhance the accuracy, rigor, and consistency of large models in leveraging external knowledge. We warmly invite community members to collaborate with us in advancing the KAG framework's capabilities and real-world applications across diverse tasks.

3.1、Multi-hop QA Dataset

3.1.1、benchMark

musique

Method	em	f1	llm_accuracy
Naive Gen	0.033	0.074	0.083
Naive RAG	0.248	0.357	0.384
HippoRAGV2	0.289	0.404	0.452
PIKE-RAG	0.383	0.498	0.565
KAG-V0.6.1	0.363	0.481	0.547
KAG-V0.7LC	0.379	0.513	0.560
KAG-V0.7	0.385	0.520	0.579

hotpotqa

Method	em	f1	llm_accuracy
Naive Gen	0.223	0.313	0.342
Naive RAG	0.566	0.704	0.762
HippoRAGV2	0.557	0.694	0.807
PIKE-RAG	0.558	0.686	0.787
KAG-V0.6.1	0.599	0.745	0.841
KAG-V0.7LC	0.600	0.744	0.828
KAG-V0.7	0.603	0.748	0.844

twowiki

Method	em	f1	llm_accuracy
Naive Gen	0.199	0.310	0.382
Naive RAG	0.448	0.512	0.573
HippoRAGV2	0.542	0.618	0.684
PIKE-RAG	0.63	0.72	0.81
KAG-V0.6.1	0.666	0.755	0.811
KAG-V0.7LC	0.683	0.769	0.826
KAG-V0.7	0.684	0.770	0.836

3.1.2、params for each method

Method	dataset	LLM(Build/Reason)	embed	param
Naive Gen	10k docs、1k questions provided by HippoRAG	qwen2.5-72B	bge-m3	无
Naive RAG	same as above	qwen2.5-72B	bge-m3	num_docs: 10
HippoRAGV2	same as above	qwen2.5-72B	bge-m3	retrieval_top_k=200 linking_top_k=5 max_qa_steps=3 qa_top_k=5 graph_type=facts_and_sim_passage_node_unidirectional embedding_batch_size=8
PIKE-RAG	same as above	qwen2.5-72B	bge-m3	tagging_llm_temperature: 0.7 qa_llm_temperature: 0.0 chunk_retrieve_k: 8 chunk_retrieve_score_threshold: 0.5 atom_retrieve_k: 16 atomic_retrieve_score_threshold: 0.2 max_num_question: 5 num_parallel: 5
KAG-V0.6.1	same as above	qwen2.5-72B	bge-m3	refer to the `kag_config.yaml` files in each subdirectory under https://github.com/OpenSPG/KAG/tree/v0.6/kag/examples.
KAG-V0.7	same as above	qwen2.5-72B	bge-m3	refer to the `kag_config.yaml` files in each subdirectory under https://github.com/OpenSPG/KAG/tree/master/kag/open_benchmark

3.2、Structured Datasets

PeopleRelQA (Person Relationship QA) and AffairQA (Government Affairs QA) are datasets provided by Alibaba Tianchi Competition and Zhejiang University respectively on the OpenKG OneEval benchmark. KAG delivers a streamlined implementation paradigm for vertical domain applications through its "semantic modeling + structured graph construction + NL2Cypher retrieval" approach. Moving forward, we will continue optimizing structured data QA performance by enhancing the integration between large language models and knowledge engines.

The OpenKG OneEval Benchmark primarily evaluates large language models' (LLMs) capabilities in comprehending and utilizing diverse knowledge domains. As documented in OpenKG's official description, the benchmark employs relatively simple retrieval strategies that may introduce noise in recalled results, while simultaneously assessing LLMs' robustness when processing imperfect or redundant knowledge. KAG's performance improvements in these scenarios stem from its effective retrieval strategies that ensure strong relevance between retrieved content and query intent.

In this update, KAG has validated its retrieval and reasoning capabilities on traditional knowledge graph tasks using the AffairQA and PRQA datasets. Future developments will focus on advancing schema standardization and reasoning framework alignment, along with releasing additional evaluation metrics to support broader application scenarios.

PeopleRelQA

Method	em	f1	llm_accuracy	Methodology	Metric Sources
deepseek-v3(OpenKG oneEval)	-	2.60%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
qwen2.5-72B(OpenKG oneEval)	-	2.50%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
GPT-4o(OpenKG oneEval)	-	3.20%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
QWQ-32B(OpenKG oneEval)	-	3.00%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
Grok 3(OpenKG oneEval)	-	4.70%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
KAG-V0.7	45.5%	86.6%	84.8%	Custom PRQA Pipeline with Cypher Solver Based on KAG Framework	Ant Group KAG Team

AffairQA

Method	em	f1	llm_accuracy	Methodology	Metric Sources
deepseek-v3	-	42.50%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
qwen2.5-72B	-	45.00%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
GPT-4o	-	41.00%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
QWQ-32B	-	45.00%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
Grok 3	-	45.50%	-	Dense Retrieval + LLM Generation	OpenKG WeChat
KAG-V0.7	77.5%	83.1%	88.2%	Custom PRQA Pipeline with Cypher Solver Based on KAG Framework	Ant Group KAG Team

4、Product and platform optimization

This update enhances the knowledge Q&A product experience. Users can refer to the KAG User Manual and access our demo files under the Quick Start -> Product Mode section to reproduce the results shown in the following video.

Demo Of KAG Builder

Demo Of KAG Solver

4.1、Enhanced Q&A Experience

By optimizing the planning, execution, and generation capabilities of the KAG-Solver framework—leveraging Qwen2.5-72B and DeepSeek-V3 models—the system now achieves deep reasoning performance comparable to DeepSeek-R1. Three key features have been introduced:

Streaming output for dynamic delivery of reasoning results
Auto-rendering of Markdown-formatted graph indices
Intelligent citation linking between generated content and source references

4.2、Dual-Mode Retrieval

The new Deep Reasoning Toggle allows users to balance answer accuracy against computational costs by enabling/disabling deep reasoning as needed. (Note: Web-augmented search is currently in testing—stay tuned for updates in future KAG releases.)

4.3、Indexing Infrastructure Upgrades

Data Import
- Expanded structured data support for CSV/ODPS/SLS sources
- Optimized ingestion pipelines for improved usability
Hybrid Processing
- Unified handling of structured and unstructured data
- Enhanced task management via: Job scheduling、Execution logging、Data sampling for diagnostics

5、Roadmap

In upcoming iterations, We are continuously committed to enhancing the capability of large models to utilize external knowledge bases, achieving bidirectional enhancement and organic integration between large models and symbolic knowledge. This effort aims to consistently improve the factual accuracy, rigor, and coherence of reasoning and question-answering in specialized scenarios. We will also continue to release updates, constantly raising the upper limits of these capabilities and advancing their implementation in vertical domains.

6、Acknowledgments

This release addresses several issues in the hierarchical retrieval module, and we extend our sincere gratitude to the community developers who reported these problems.

The framework upgrade has received tremendous support from the following experts and colleagues, to whom we are deeply grateful:

Tongji University: Prof. Haofen Wang, Prof. Meng Wang
Institute of Computing Technology, CAS: Dr. Long Bai
Hunan KeChuang Information: R&D Expert Ling Liu
Open Source Community: Senior Developer Yunpeng Li
Bank of Communications: R&D Engineer Chenxing Gao

1、总体摘要

我们正式发布KAG 0.7版本，本次更新旨在持续提升大模型利用知识库推理问答的一致性、严谨性和精准性，并引入了多项重要功能特性。

首先，我们对框架进行了全面重构。新增了对static和iterative两种任务规划模式的支持，同时实现了更严谨的推理阶段知识分层机制。此外，新增的multi-executor扩展机制以及MCP协议的接入，使用户能够横向扩展多种符号求解器（如math-executor和cypher-executor等）。这些改进不仅帮助用户快速搭建外挂知识库应用以验证创新想法或领域解决方案，还支持用户持续优化KAG Solver的能力，从而进一步提升垂直领域应用的推理严谨性。

其次，我们对产品体验进行了全面优化：在推理阶段新增"简易模式"和"深度推理"双模式，并支持流式推理输出，显著缩短了用户等待时间；特别值得关注的是，为更好的促进KAG的规模化业务应用，同时也回应社区最为关切的知识构建成本高的问题，本次发布提供了"轻量级构建"模式，如图1中KAG-V0.7LC列所示，我们测试了7B模型做知识构建、72B模型做知识问答的混合方案，在two_wiki、hotpotqa和musique三个榜单上的效果仅小幅下降1.20%、1.90%和3.11%，但十万字文档的构建token成本（参考阿里云百炼定价）从4.63￥减少到0.479￥, 降低89%，可大幅节约用户的时间和资金成本；我们还将发布KAG专用抽取模型和分布式离线批量构建版本，持续压缩模型尺寸提升构建吞吐，以实现单场景百万级甚至千万级文档的日构建能力。

最后，为了更好地推动大模型外挂知识库的业务应用、技术进步和社区交流，我们在KAG仓库的一级目录中新增了open_benchmark目录。该目录内置了各数据集的复现方法，帮助用户复现并提升KAG在各类任务上的效果。未来，我们将持续扩充更多垂直场景的任务数据集，为用户提供更丰富的资源。

除了上述框架和产品优化外，我们还修复了推理和构建阶段的若干Bug。本次更新以Qwen2.5-72B为基础模型，完成了各RAG框架及部分KG数据集的效果对齐。发布的整体榜单效果可参考图1和图2，榜单细节详见open_benchmark部分。

图1 Performance of KAG V0.7 and baselines on Multi-hop QA benchmarks

_图2 Performance of KAG V0.7 and baselines(from OpenKG OneEval) on _Knowledge based QA benchmarks

2、框架优化

2.1、静态与动态结合的任务规划

本次发布对KAG-Solver框架的实现进行了优化，为“边推理边检索”、“多场景算法实验”以及“大模型与符号引擎结合（基于MCP协议）”提供了更加灵活的框架支持。

通过Static/Iterative Planner，复杂问题可以被转换为多个Executor之间的有向无环图（DAG），并根据依赖关系逐步求解。框架内置了Static/Iterative Planner的Pipeline实现，并预定义了NaiveRAG Pipeline，方便开发者灵活自定义求解链路。

2.2、支持可扩展的符号求解器

基于LLM对FunctionCall的支持，我们优化了符号求解器（Executor）的设计，使其在复杂问题规划时能够更合理地匹配相应的求解器。本次更新内置了kag_hybrid_executor、math_executor、cypher_executor等求解器，同时提供了灵活的扩展机制，支持开发者定义新的求解器以满足个性化需求。

2.3、显性知识分层及分层检索、推理策略优化

基于优化后的KAG-Solver框架，我们重写了kag_hybrid_executor的逻辑，实现了更严谨的推理阶段知识分层机制。根据业务场景对知识精准性的要求，按照KAG的知识分层定义，依次检索三层知识：（基于schema-constraint）、(基于schema-free）和（原始上下文），并在此基础上进行推理生成答案。

2.4、拥抱MCP协议

KAG本次发版实现了对MCP协议的兼容，支持在KAG框架中通过MCP协议引入外部数据源和外部符号求解器。在example目录中，我们内置了baidu_map_mcp示例，供开发者参考使用。

3、OpenBenchMark

为更好地促进学术交流，加速大模型外挂知识库在企业中的落地和技术进步，KAG在本次发版中发布了更详细的Benchmark复现步骤，并开源了全部代码和数据。这将方便开发者和科研人员复现并对齐各数据集的结果。为了更准确地量化推理效果，我们采用了EM（Exact Match）、F1和LLM_Accuracy等多项评估指标。在原有TwoWiki、Musique、HotpotQA等数据集的基础上，本次更新新增了OpenKG OneEval知识图谱类问答数据集（如AffairQA和PRQA），以分别验证cypher_executor及KAG默认框架的能力。

搭建Benchmark是一个耗时且复杂的工程。在未来的工作中，我们将持续扩充更多Benchmark数据集，并提供针对不同领域的解决方案，进一步提升大模型利用外部知识的准确性、严谨性和一致性。我们也诚邀社区同仁共同参与，携手推进KAG框架在各类任务中的能力提升与实际应用落地。

3.1、多跳事实问答数据集

3.1.1、benchMark

musique

Method	em	f1	llm_accuracy
Naive Gen	0.033	0.074	0.083
Naive RAG	0.248	0.357	0.384
HippoRAGV2	0.289	0.404	0.452
PIKE-RAG	0.383	0.498	0.565
KAG-V0.6.1	0.363	0.481	0.547
KAG-V0.7LC	0.379	0.513	0.560
KAG-V0.7	0.385	0.520	0.579

hotpotqa

Method	em	f1	llm_accuracy
Naive Gen	0.223	0.313	0.342
Naive RAG	0.566	0.704	0.762
HippoRAGV2	0.557	0.694	0.807
PIKE-RAG	0.558	0.686	0.787
KAG-V0.6.1	0.599	0.745	0.841
KAG-V0.7LC	0.600	0.744	0.828
KAG-V0.7	0.603	0.748	0.844

twowiki

Method	em	f1	llm_accuracy
Naive Gen	0.199	0.310	0.382
Naive RAG	0.448	0.512	0.573
HippoRAGV2	0.542	0.618	0.684
PIKE-RAG	0.63	0.72	0.81
KAG-V0.6.1	0.666	0.755	0.811
KAG-V0.7LC	0.683	0.769	0.826
KAG-V0.7	0.684	0.770	0.836

3.1.2、各种方法的参数配置

Method	数据集	基模(构建/推理)	向量模型	参数设置
Naive Gen	hippoRAG 论文提供的1万 docs、1千 questions；	qwen2.5-72B	bge-m3	无
Naive RAG	同上	qwen2.5-72B	bge-m3	num_docs: 10
HippoRAGV2	同上	qwen2.5-72B	bge-m3	retrieval_top_k=200 linking_top_k=5 max_qa_steps=3 qa_top_k=5 graph_type=facts_and_sim_passage_node_unidirectional embedding_batch_size=8
PIKE-RAG	同上	qwen2.5-72B	bge-m3	tagging_llm_temperature: 0.7 qa_llm_temperature: 0.0 chunk_retrieve_k: 8 chunk_retrieve_score_threshold: 0.5 atom_retrieve_k: 16 atomic_retrieve_score_threshold: 0.2 max_num_question: 5 num_parallel: 5
KAG-V0.6.1	同上	qwen2.5-72B	bge-m3	参见https://github.com/OpenSPG/KAG/tree/v0.6 examples 各子目录的kag_config.yaml
KAG-V0.7LC	同上	构建：qwen2.5-7B 问答：qwen2.5-72B	bge-m3	参见https://github.com/OpenSPG/KAG open_benchmarks 各子目录kag_config.yaml
KAG-V0.7	同上	qwen2.5-72B	bge-m3	参见https://github.com/OpenSPG/KAG open_benchmarks 各子目录kag_config.yaml

3.2、结构化数据集

PeopleRelQA（人物关系问答）和 AffairQA(政务问答) 分别是OpenKG OneEval榜单上阿里云天池大赛和浙江大学提供的数据集。KAG通过“语义化建模 + 结构化构图 + NL2Cypher检索”的方式，为垂直领域应用提供了一个简洁的落地范式。未来，我们将围绕大模型与知识引擎的结合，持续优化结构化数据问答的效果。

OpenKG OneEval 榜单的重点在于评估大语言模型（LLM）对各类知识的理解与运用能力。参考OpenKG官方描述，该榜单在知识检索方面采用了较为简单的策略，该方法召回结果可能引入噪声，同时也评估LLM在面对不完美或冗余知识时的鲁棒性。KAG在这些场景中的指标提升得益于有效的检索策略保证了检索结果与问题之间的相关性。

本次更新中，KAG在AffairQA和PRQA数据集上验证了其针对传统知识图谱类任务的检索与推理能力。未来，KAG将进一步推动Schema的标准化和推理框架的对齐，并发布更多测试指标以支持更广泛的应用场景。

PeopleRelQA(人物关系问答)

Method	em	f1	llm_accuracy	方法论	指标来源
deepseek-v3(OpenKG oneEval)	-	2.60%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
qwen2.5-72B(OpenKG oneEval)	-	2.50%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
GPT-4o(OpenKG oneEval)	-	3.20%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
QWQ-32B(OpenKG oneEval)	-	3.00%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
Grok 3(OpenKG oneEval)	-	4.70%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
KAG-V0.7	45.5%	86.6%	84.8%	基于KAG 框架自定义AffairQA pipeline + cypher_solver	蚂蚁KAG 团队

AffairQA（政务信息问答）

Method	em	f1	llm_accuracy	方法论	指标提供者
deepseek-v3	-	42.50%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
qwen2.5-72B	-	45.00%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
GPT-4o	-	41.00%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
QWQ-32B	-	45.00%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
Grok 3	-	45.50%	-	Dense Retrieval + LLM Generation	OpenKG 公众号
KAG-V0.7	77.5%	83.1%	88.2%	基于KAG 框架自定义AffairQA pipeline	蚂蚁KAG 团队

4、产品及平台优化

本次更新优化了知识问答的产品体验，用户可访问 KAG 用户手册，在快速开始->产品模式一节，获取我们的语料文件以复现以下视频中的结果。

知识构建Demo

知识问答Demo

4.1、问答体验优化

通过优化KAG-Solver框架的规划、执行与生成功能，基于Qwen2.5-72B和DeepSeek-V3模型的应用，可实现与DeepSeek-R1相当的深度推理效果。在此基础上，产品新增三项能力：支持推理结果的流式动态输出、实现Markdown格式的图索引自动渲染，以及生成内容与原始文献引用的智能关联功能。

4.2、支持深度推理与普通检索

新增深度推理开关功能，用户可根据需求灵活启用或关闭，以平衡回答准确率与计算资源消耗；联网搜索的能力当前测试中，请关注KAG框架的后续版本更新。

4.3、索引构建能力完善

本次更新提升结构化数据导入能力，支持从 CSV、ODPS、SLS 等多种数据源导入结构化数据，优化数据加载流程，提升使用体验；可同时处理"结构化"和"非结构化"数据，满足多样性需求。同时，增强了知识构建的任务管理能力，提供任务调度、执行日志、数据抽样等功能，便于问题追踪与分析。

5、后续计划

近期版本迭代中，我们持续致力于持续提升大模型利用外部知识库的能力，实现大模型与符号知识的双向增强和有机融合，不断提升专业场景推理问答的事实性、严谨性和一致性等，我们也将持续发布，不断提升能力的上限，不断推进垂直领域的落地。

6、致谢

本次发布修复了分层检索模块中的若干问题，在此特别感谢反馈这些问题的社区开发者们。

此次框架升级得到了以下专家和同仁的鼎力支持，我们深表感激：

同济大学：王昊奋教授、王萌教授
中科院计算所：白龙博士
湖南科创信息：研发专家刘玲
开源社区：资深开发者李云鹏
交通银行：研发工程师高晨星

1、Overview​

2、Framework Enhancements​

2.1、Hybrid Static-Dynamic Task Planning​

2.2、Extensible Symbolic Solvers​

2.3、Optimized Retrieval/Reasoning Strategies​

2.4、MCP Protocol Integration​

3、OpenBenchmark​

3.1、Multi-hop QA Dataset​

3.1.1、benchMark​

3.1.2、params for each method​

3.2、Structured Datasets​

4、Product and platform optimization​

4.1、Enhanced Q&A Experience​

4.2、Dual-Mode Retrieval​

4.3、Indexing Infrastructure Upgrades​

5、Roadmap​

6、Acknowledgments​

1、总体摘要​

2、框架优化​

2.1、静态与动态结合的任务规划​

2.2、支持可扩展的符号求解器​

2.3、显性知识分层及分层检索、推理策略优化​

2.4、拥抱MCP协议​

3、OpenBenchMark​

3.1、多跳事实问答数据集​

3.1.1、benchMark​

3.1.2、各种方法的参数配置​

3.2、结构化数据集​

4、产品及平台优化​

4.1、问答体验优化​

4.2、支持深度推理与普通检索​

4.3、索引构建能力完善​

5、后续计划​

6、致谢​

1、Overview

2、Framework Enhancements

2.1、Hybrid Static-Dynamic Task Planning

2.2、Extensible Symbolic Solvers

2.3、Optimized Retrieval/Reasoning Strategies

2.4、MCP Protocol Integration

3、OpenBenchmark

3.1、Multi-hop QA Dataset

3.1.1、benchMark

3.1.2、params for each method

3.2、Structured Datasets

4、Product and platform optimization

4.1、Enhanced Q&A Experience

4.2、Dual-Mode Retrieval

4.3、Indexing Infrastructure Upgrades

5、Roadmap

6、Acknowledgments

1、总体摘要

2、框架优化

2.1、静态与动态结合的任务规划

2.2、支持可扩展的符号求解器

2.3、显性知识分层及分层检索、推理策略优化

2.4、拥抱MCP协议

3、OpenBenchMark

3.1、多跳事实问答数据集

3.1.1、benchMark

3.1.2、各种方法的参数配置

3.2、结构化数据集

4、产品及平台优化

4.1、问答体验优化

4.2、支持深度推理与普通检索

4.3、索引构建能力完善

5、后续计划

6、致谢