Skip to main content

Version 0.7(2025-04-17)

1、Overview

We are pleased to announce the official release of KAG 0.7. This update continues our commitment to enhancing the consistency, rigor, and precision of knowledge base-augmented reasoning in large language models, while introducing several significant new features.

Firstly, we have completely refactored the framework. The update adds support for both static and iterative task planning modes, along with a more rigorous hierarchical knowledge mechanism during the reasoning phase. Additionally, the new multi-executor extension mechanism and MCP protocol integration enable horizontal scaling of various symbolic solvers (such as math-executor and cypher-executor). These improvements not only help users quickly build knowledge-augmented applications to validate innovative ideas or domain-specific solutions, but also support continuous optimization of KAG Solver's capabilities, thereby further enhancing reasoning rigor in vertical applications.

Secondly, we have comprehensively optimized the product experience: during the reasoning phase, we introduced dual modes "Simple Mode" and "Deep Reasoning" and added support for streaming reasoning output, significantly reducing user wait times. Particularly noteworthy is the introduction of the "Lightweight Construction" mode to better facilitate the large-scale business application of KAG and address the community's most pressing concern about high knowledge construction costs. As shown in the KAG-V0.7LC column of Figure 1, we tested a hybrid approach where a 7B model handles knowledge construction and a 72B model handles knowledge-based question answering. The results on the two_wiki, hotpotqa, and musique benchmarks showed only minor declines of 1.20%, 1.90%, and 3.11%, respectively. However, the token cost(Refer to Aliyun Bailian pricing)for constructing a 100,000-character document was reduced from 4.63¥ to 0.479¥, a 89% reduction, which substantially saves users both time and financial costs. Additionally, we will release a KAG-specific extraction model and a distributed offline batch construction version, continuously compressing model size and improving construction throughput to achieve daily construction capabilities for millions or even tens of millions of documents in a single scenario.

Finally, to better promote business applications, technological advancement, and community exchange for knowledge-augmented LLMs, we have added an open_benchmark directory at the root level of the KAG repository. This directory includes reproduction methods for various datasets to help users replicate and improve KAG's performance across different tasks. Moving forward, we will continue to expand with more vertical scenario task datasets to provide users with richer resources.

Beyond these framework and product optimizations, we've fixed several bugs in both reasoning and construction phases. This update uses Qwen2.5-72B as the base model, completing effect alignment across various RAG frameworks and partial KG datasets. For overall benchmark results, please refer to Figures 1 and 2, with detailed rankings available in the open_benchmark section.

Figure1. Performance of KAG V0.7 and baselines on Multi-hop QA benchmarks

_Figure2. Performance of KAG V0.7 and baselines(from OpenKG OneEval) on _Knowledge based QA benchmarks

2、Framework Enhancements

2.1、Hybrid Static-Dynamic Task Planning

This release introduces optimizations to the KAG-Solver framework implementation, providing more flexible architectural support for: "Retrieval during reasoning" workflows, Multi-scenario algorithm experimentation, LLM-symbolic engine integration (via MCP protocol).

The framework's Static/Iterative Planner transforms complex problems into directed acyclic graphs (DAGs) of interconnected Executors, enabling step-by-step resolution based on dependency relationships. We've implemented built-in Pipeline support for both Static and Iterative Planners, including a predefined NaiveRAG Pipeline - offering developers customizable solver chaining capabilities while maintaining implementation flexibility.

画板

2.2、Extensible Symbolic Solvers

Leveraging LLM's FunctionCall capability, we have optimized the design of symbolic solvers (Executors) to enable more rational solver matching during complex problem planning. This release includes built-in solvers such as kag_hybrid_executor, math_executor, and cypher_executor, while providing a flexible extension mechanism that allows developers to define custom solvers for personalized requirements.

2.3、Optimized Retrieval/Reasoning Strategies

Using the enhanced KAG-Solver framework, we have rewritten the logic of kag_hybrid_executor to implement a more rigorous knowledge layering mechanism during reasoning. Based on business requirements for knowledge precision and following KAG's knowledge hierarchy definition, the system now sequentially retrieves three knowledge layers: image(schema-constrained), image (schema-free), andimage (raw context), subsequently performing reasoning to generate answers.

画板

2.4、MCP Protocol Integration

This KAG release achieves compatibility with the MCP protocol, enabling the incorporation of external data sources and symbolic solvers into the KAG framework via MCP. We have included a baidu_map_mcp example in the example directory for developers' reference.

3、OpenBenchmark

To better facilitate academic exchange and accelerate the adoption and technological advancement of large language models with external knowledge bases in enterprise settings, KAG has released more detailed benchmark reproduction steps in this version, along with open-sourcing all code and data. This will enable developers and researchers to easily reproduce and align results across various datasets.

For more accurate quantification of reasoning performance, we have adopted multiple evaluation metrics, including EM (Exact Match), F1, and LLM_Accuracy. In addition to existing datasets such as TwoWiki, Musique, and HotpotQA, this update introduces the OpenKG OneEval knowledge graph QA dataset (including AffairQA and PRQA) to evaluate the capabilities of both the cypher_executor and KAG's default framework.

Building benchmarks is a time-consuming and complex endeavor. In future work, we will continue to expand benchmark datasets and provide domain-specific solutions to further enhance the accuracy, rigor, and consistency of large models in leveraging external knowledge. We warmly invite community members to collaborate with us in advancing the KAG framework's capabilities and real-world applications across diverse tasks.

3.1、Multi-hop QA Dataset

3.1.1、benchMark

  • musique
Methodemf1llm_accuracy
Naive Gen0.0330.0740.083
Naive RAG0.2480.3570.384
HippoRAGV20.2890.4040.452
PIKE-RAG0.3830.4980.565
KAG-V0.6.10.3630.4810.547
KAG-V0.7LC0.3790.5130.560
KAG-V0.70.3850.5200.579
  • hotpotqa
Methodemf1llm_accuracy
Naive Gen0.2230.3130.342
Naive RAG0.5660.7040.762
HippoRAGV20.5570.6940.807
PIKE-RAG0.5580.6860.787
KAG-V0.6.10.5990.7450.841
KAG-V0.7LC0.6000.7440.828
KAG-V0.70.6030.7480.844
  • twowiki
Methodemf1llm_accuracy
Naive Gen0.1990.3100.382
Naive RAG0.4480.5120.573
HippoRAGV20.5420.6180.684
PIKE-RAG0.630.720.81
KAG-V0.6.10.6660.7550.811
KAG-V0.7LC0.6830.7690.826
KAG-V0.70.6840.7700.836

3.1.2、params for each method

MethoddatasetLLM(Build/Reason)embedparam
Naive Gen10k docs、1k questions provided by HippoRAGqwen2.5-72Bbge-m3
Naive RAGsame as aboveqwen2.5-72Bbge-m3num_docs: 10
HippoRAGV2same as aboveqwen2.5-72Bbge-m3retrieval_top_k=200
linking_top_k=5
max_qa_steps=3
qa_top_k=5
graph_type=facts_and_sim_passage_node_unidirectional
embedding_batch_size=8
PIKE-RAGsame as aboveqwen2.5-72Bbge-m3tagging_llm_temperature: 0.7
qa_llm_temperature: 0.0
chunk_retrieve_k: 8
chunk_retrieve_score_threshold: 0.5
atom_retrieve_k: 16
atomic_retrieve_score_threshold: 0.2
max_num_question: 5
num_parallel: 5
KAG-V0.6.1same as aboveqwen2.5-72Bbge-m3refer to the kag_config.yaml files in each subdirectory under https://github.com/OpenSPG/KAG/tree/v0.6/kag/examples.
KAG-V0.7same as aboveqwen2.5-72Bbge-m3refer to the kag_config.yaml files in each subdirectory under https://github.com/OpenSPG/KAG/tree/master/kag/open_benchmark

3.2、Structured Datasets

PeopleRelQA (Person Relationship QA) and AffairQA (Government Affairs QA) are datasets provided by Alibaba Tianchi Competition and Zhejiang University respectively on the OpenKG OneEval benchmark. KAG delivers a streamlined implementation paradigm for vertical domain applications through its "semantic modeling + structured graph construction + NL2Cypher retrieval" approach. Moving forward, we will continue optimizing structured data QA performance by enhancing the integration between large language models and knowledge engines.

The OpenKG OneEval Benchmark primarily evaluates large language models' (LLMs) capabilities in comprehending and utilizing diverse knowledge domains. As documented in OpenKG's official description, the benchmark employs relatively simple retrieval strategies that may introduce noise in recalled results, while simultaneously assessing LLMs' robustness when processing imperfect or redundant knowledge. KAG's performance improvements in these scenarios stem from its effective retrieval strategies that ensure strong relevance between retrieved content and query intent.

In this update, KAG has validated its retrieval and reasoning capabilities on traditional knowledge graph tasks using the AffairQA and PRQA datasets. Future developments will focus on advancing schema standardization and reasoning framework alignment, along with releasing additional evaluation metrics to support broader application scenarios.

  • PeopleRelQA
Methodemf1llm_accuracyMethodologyMetric Sources
deepseek-v3(OpenKG oneEval)-2.60%-Dense Retrieval + LLM GenerationOpenKG WeChat
qwen2.5-72B(OpenKG oneEval)-2.50%-Dense Retrieval + LLM GenerationOpenKG WeChat
GPT-4o(OpenKG oneEval)-3.20%-Dense Retrieval + LLM GenerationOpenKG WeChat
QWQ-32B(OpenKG oneEval)-3.00%-Dense Retrieval + LLM GenerationOpenKG WeChat
Grok 3(OpenKG oneEval)-4.70%-Dense Retrieval + LLM GenerationOpenKG WeChat
KAG-V0.745.5%86.6%84.8%Custom PRQA Pipeline with Cypher Solver Based on KAG FrameworkAnt Group
KAG Team
  • AffairQA
Methodemf1llm_accuracyMethodologyMetric Sources
deepseek-v3-42.50%-Dense Retrieval + LLM GenerationOpenKG WeChat
qwen2.5-72B-45.00%-Dense Retrieval + LLM GenerationOpenKG WeChat
GPT-4o-41.00%-Dense Retrieval + LLM GenerationOpenKG WeChat
QWQ-32B-45.00%-Dense Retrieval + LLM GenerationOpenKG WeChat
Grok 3-45.50%-Dense Retrieval + LLM GenerationOpenKG WeChat
KAG-V0.777.5%83.1%88.2%Custom PRQA Pipeline with Cypher Solver Based on KAG FrameworkAnt Group
KAG Team

4、Product and platform optimization

This update enhances the knowledge Q&A product experience. Users can refer to the KAG User Manual and access our demo files under the Quick Start -> Product Mode section to reproduce the results shown in the following video.

  • Demo Of KAG Builder
  • Demo Of KAG Solver

4.1、Enhanced Q&A Experience

By optimizing the planning, execution, and generation capabilities of the KAG-Solver framework—leveraging Qwen2.5-72B and DeepSeek-V3 models—the system now achieves deep reasoning performance comparable to DeepSeek-R1. Three key features have been introduced:

  • Streaming output for dynamic delivery of reasoning results
  • Auto-rendering of Markdown-formatted graph indices
  • Intelligent citation linking between generated content and source references

4.2、Dual-Mode Retrieval

The new Deep Reasoning Toggle allows users to balance answer accuracy against computational costs by enabling/disabling deep reasoning as needed. (Note: Web-augmented search is currently in testing—stay tuned for updates in future KAG releases.)

4.3、Indexing Infrastructure Upgrades

  • Data Import
    • Expanded structured data support for CSV/ODPS/SLS sources
    • Optimized ingestion pipelines for improved usability
  • Hybrid Processing
    • Unified handling of structured and unstructured data
    • Enhanced task management via: Job scheduling、Execution logging、Data sampling for diagnostics

5、Roadmap

In upcoming iterations, We are continuously committed to enhancing the capability of large models to utilize external knowledge bases, achieving bidirectional enhancement and organic integration between large models and symbolic knowledge. This effort aims to consistently improve the factual accuracy, rigor, and coherence of reasoning and question-answering in specialized scenarios. We will also continue to release updates, constantly raising the upper limits of these capabilities and advancing their implementation in vertical domains.

6、Acknowledgments

This release addresses several issues in the hierarchical retrieval module, and we extend our sincere gratitude to the community developers who reported these problems.

The framework upgrade has received tremendous support from the following experts and colleagues, to whom we are deeply grateful:

  • Tongji University: Prof. Haofen Wang, Prof. Meng Wang
  • Institute of Computing Technology, CAS: Dr. Long Bai
  • Hunan KeChuang Information: R&D Expert Ling Liu
  • Open Source Community: Senior Developer Yunpeng Li
  • Bank of Communications: R&D Engineer Chenxing Gao