feat: add the data agent to build datasets and evaluate the quality of results.

2026-02-09 01:59:19 +08:00 · 2024-07-08 17:35:30 +08:00
parent 7b9b6f2686
commit 85652c956c
31 changed files with 581 additions and 163 deletions
--- a/docs/guidebook/_picture/data_agent_eval_report.png
+++ b/docs/guidebook/_picture/data_agent_eval_report.png
--- a/docs/guidebook/_picture/data_agent_eval_report_en.png
+++ b/docs/guidebook/_picture/data_agent_eval_report_en.png
--- a/docs/guidebook/_picture/data_agent_eval_result.png
+++ b/docs/guidebook/_picture/data_agent_eval_result.png
--- a/docs/guidebook/_picture/data_agent_eval_result_en.png
+++ b/docs/guidebook/_picture/data_agent_eval_result_en.png
--- a/docs/guidebook/_picture/data_agent_flowchart.jpg
+++ b/docs/guidebook/_picture/data_agent_flowchart.jpg
--- a/docs/guidebook/en/8_1_1_data_governance_agent.md
+++ b/docs/guidebook/en/8_1_1_data_governance_agent.md
@@ -0,0 +1,47 @@
+# Data Governance Agent
+## Introduction
+**One-sentence introduction**: agentUniverse dataAgent, through the queryset specified by users, **automates the batch execution of multiple rounds of agent calls**, and **generates the dataset**. Subsequently, it carries out **automated evaluation annotation** and **generates the evaluation report**.
+
+Currently, agentUniverse has released **the MVP version of data governance**, conveying the concept of data governance to the open-source community. Subsequently, it will open up a complete workflow including offline adaptive data collection, fine-tuning dataset generation, model inference training, and multidimensional automatic evaluation of datasets, to strengthen the data governance process for large language models and Agents.
+
+## DataAgent Flowchart
+![data_agent_flowchart](../_picture/data_agent_flowchart.jpg)
+- data_agent consists of two parts. **dataset_build_agent** is responsible for automating batch execution of multiple rounds of agent calls and producing datasets.
+- **dataset_eval_agent** is responsible for automating the multidimensional evaluation of the dataset and generating evaluation report.
+
+## DataAgent Effect
+Before elaborating on how to use dataAgent, let's first take a look at the operational effect of dataAgent.
+
+- [dataAgent entry](../../../sample_standard_app/app/examples/data_agent.py): users build the following agent queryset, start dataAgent with one click, and produce datasets.
+```jsonl
+{"input": "What is the reason for the surge in Nvidia's stock?"}
+{"input": "Why does Tesla stock fluctuate so much?"}
+```
+- [dataAgent datasets](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-09-11:45:42.jsonl): the production dataset is stored in the format of a JSONL file containing Q&A pairs of the agent.
+```json lines
+{"prompt": "Why does Tesla stock fluctuate so much?", "answer": "The question about Tesla's stock fluctuations isn't directly addressed by the provided background information on Warren Buffett's investment strategies or his dealings with BYD. However, I can provide insights based on general knowledge and market principles.\n\nTesla's stock price is known for its significant volatility, which can be attributed to several factors:\n\n1. **Market Sentiment and Speculation**: Tesla is often seen as a bellwether for the electric vehicle (EV) industry and the broader tech sector. Positive or negative news, whether related to production numbers, new model releases, regulatory changes, or CEO Elon Musk's tweets, can significantly impact investor sentiment and lead to sharp movements in the stock price.\n\n2. **Financial Performance**: Like any publicly traded company, Tesla's quarterly earnings reports can cause substantial stock price movements. If the company beats or misses analysts' expectations regarding revenue, profitability, or guidance, it can result in immediate and sometimes dramatic reactions from the market.\n\n3. **Valuation and Growth Expectations**: Tesla has been valued at very high multiples compared to traditional automakers, reflecting investors' expectations for future growth. Any indication that this growth might slow down can lead to a reevaluation of the stock's valuation, causing price drops.\n\n4. **Competition and Market Dynamics**: The EV market is rapidly evolving, with established automakers and new entrants alike increasing their focus on electric vehicles. Changes in competitive dynamics, such as new product launches or shifts in market share, can influence Tesla's stock price.\n\n5. **Regulatory Environment**: Government policies and regulations, particularly those related to EV incentives, emissions standards, and autonomous driving technology, can have a significant impact on Tesla's business outlook and stock performance.\n\n6. **Supply Chain and Production Challenges**: Tesla has faced production bottlenecks and supply chain disruptions, especially during times of global economic uncertainty. These challenges can affect the company's ability to meet production targets and deliver vehicles, impacting stock price.\n\nIn summary, Tesla's stock fluctuations are influenced by a combination of market sentiment, financial performance, high growth expectations, competitive dynamics, regulatory changes, and operational challenges. These factors contribute to the stock's characteristic volatility."}
+{"prompt": "What is the reason for the surge in Nvidia's stock?", "answer": "The question about Nvidia's stock surge isn't directly related to the provided background information on Warren Buffett's investment strategies and his transactions involving BYD shares. However, I can provide a general explanation based on common factors that might cause a surge in a company's stock price.\n\nNvidia's stock price can surge due to several reasons, including but not limited to:\n\n1. **Strong Financial Performance**: If Nvidia reports better-than-expected earnings or revenue figures, this can lead to a positive market reaction, driving up its stock price. Investors often look for signs of growth and profitability when deciding where to invest their money.\n\n2. **Innovative Product Launches**: The introduction of new, cutting-edge products or technologies can also boost investor confidence and increase demand for Nvidia's stock. For instance, advancements in graphics processing units (GPUs) for gaming, artificial intelligence, and data center applications have historically been significant drivers for Nvidia.\n\n3. **Market Expansion**: If Nvidia announces plans to enter new markets or expand into existing ones, this can signal potential for increased revenue streams, which can positively impact its stock price.\n\n4. **Strategic Acquisitions**: Acquiring other companies can strengthen Nvidia's position in the market, providing access to new technologies, customer bases, or markets. Successful acquisitions can lead to a rise in stock prices as investors anticipate future growth.\n\n5. **Industry Trends**: The technology sector, particularly segments like AI, autonomous vehicles, and cloud computing, has seen significant growth in recent years. As a leader in these areas, Nvidia benefits from positive industry trends, which can contribute to higher stock prices.\n\n6. **Analyst Recommendations**: Positive recommendations from financial analysts can influence investor sentiment and lead to increased buying activity, resulting in a stock price increase.\n\n7. **Economic Conditions**: A strong economy generally leads to more consumer spending and business investments, which can benefit companies like Nvidia that sell high-end technology products.\n\nIt's important to note that while these factors can contribute to a stock price surge, individual events or market conditions can also cause volatility. Investors should consider multiple indicators and consult with financial advisors before making investment decisions."}
+```
+- After producing the dataset, dataAgent starts to evaluate and annotate the data in multiple dimensions, [complete evaluation results produced by dataAgent](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-09-11:45:42.xlsx) (containing comprehensive scores for multidimensional evaluation of the data, specific scores for each dimension and problems discovered during the evaluation process) 
+![data_agent_eval_result](../_picture/data_agent_eval_result_en.png)
+- [comprehensive evaluation report](../../../sample_standard_app/app/examples/data/eval_report_2024-07-09-11:45:42.xlsx) (including the total average scores of multiple rounds of evaluations and the total average scores of each dimension)
+![data_agent_eval_report](../_picture/data_agent_eval_report_en.png)
+
+## DataAgent Detailed Description
+### data_agent
+- [configuration file](../../../sample_standard_app/app/core/agent/data_agent_case/data_agent.yaml): the configuration file mainly contains two important items: `dataset_builder` configures the agent name of the production evaluation dataset, and `dataset_evaluator` configures the agent name of the data evaluation and annotation.
+- [agent file](../../../sample_standard_app/app/core/agent/data_agent_case/data_agent.py)
+- The input parameter corresponding to data_agent is the queryset path, and each json data in the queryset must contain **a complete input** of an agent invocation.
+
+### dataset_build_agent
+- [configuration file](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.yaml): the configuration file mainly contains two important items,`candidate` configures the name of the agent to be evaluated (such as evaluating the demo_rag_agent effect, candidate configures demo_rag_agent), and `concurrency_level` configures the concurrency when executing agent calls in batches (such as 5, which means that candidate agents are called concurrently 5)
+- [agent file](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.py)
+- The evaluation data produced by dataset_build_agent is stored locally in jsonl format (the jsonl file name is dataset_turn_{i}_{date}, `i` represents the round, and `date` represents the generation time)
+- In the **data governance MVP version**, dataset_build_agent does not include agent adaptive data collection. Currently, it is in the form of "**pseudo-agent**"
+
+### dataset_eval_agent
+- [configuration file](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.yaml): the configuration file mainly contains two important items,`llm_model` configures the agent model, and `max_eval_lines` configures the number of evaluation data lines (such as configuring 10 means that only the first 10 evaluation data items are evaluated, avoiding global evaluation and consuming huge amounts of tokens)
+- [prompt file](../../../sample_standard_app/app/core/prompt/dataset_eval_agent_en.yaml)：agentUniverse currently opens six agent evaluation standards that are validated in the industry (the MVP version does not open **comprehensive dimension**. The current comprehensive evaluation standard is biased towards the financial field, so it is not mentioned in the open source community)
+- [agent file](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.py)
+- The **complete evaluation results** produced by dataset_eval_agent are stored locally in Excel format (the file name is eval_result_turn_{i}_{date}, `i` represents the round, and `date` represents the generation time)
+- The **comprehensive evaluation report** of dataset_eval_a_agent production is stored locally in Excel format (the file name is eval_report_{date}, and date represents the generation time)
--- a/docs/guidebook/zh/8_1_1_数据治理智能体.md
+++ b/docs/guidebook/zh/8_1_1_数据治理智能体.md
@@ -0,0 +1,47 @@
+# 数据治理智能体
+## 介绍
+**一句话介绍**：agentUniverse dataAgent通过用户指定的**问题集**，**自动化批量执行多轮智能体调用**，**生产评测数据集**，**自动化评测标注**，**产出评测报告**。
+
+当前agentUniverse开放了**数据治理MVP版本**，将数据治理思想传递到开源社区，后续将开放离线自适应数据采集、微调数据集生成、模型推理训练、数据集多维度自动评估等完整的work flow，以强化大模型和Agent的数据治理流程。
+
+## dataAgent流程图
+![data_agent_flowchart](../_picture/data_agent_flowchart.jpg)
+- data_agent包含两部分，**dataset_build_agent**负责自动化批量执行多轮智能体调用，生产评测数据集
+- **dataset_eval_agent**负责自动化多维度评估数据集，产出评测报告
+
+## dataAgent效果
+在详细阐述dataAgent使用方法前，先看下dataAgent的运行效果。
+
+- [dataAgent入口](../../../sample_standard_app/app/examples/data_agent.py): 用户构建如下智能体query集，一键启动dataAgent，生产评测数据集。
+```jsonl
+{"input": "英伟达股票大涨的原因是什么？"}
+{"input": "贵州茅台为什么不行了？"}
+```
+- [dataAgent生产的评测数据集](../../../sample_standard_app/app/examples/data/dataset_turn_1_2024-07-08-20:17:28.jsonl): 生产的评测数据集以jsonl文件的格式存储智能体q&a对。
+```json lines
+{"prompt": "英伟达股票大涨的原因是什么？", "answer": "您的问题与提供的背景信息关于巴菲特减持比亚迪股票的情况不符，因此我无法直接回答“英伟达股票大涨的原因”。然而，如果要探讨股市中某个特定股票（如NVIDIA）的大涨，通常会考虑行业趋势、公司业绩、技术创新、市场需求增长、宏观经济环境等因素。对于英伟达这样的科技巨头，GPU性能提升、人工智能和数据中心需求的增长、以及新兴市场的扩张都可能是推动股价上涨的原因。但具体到英伟达，需要查看最新的行业报告和公司公告来了解详细情况。"}
+{"prompt": "特斯拉股票为什么波动很大？", "answer": "特斯拉股票的波动性较大，主要有以下几个原因：\n\n1. 技术创新与行业竞争：特斯拉作为电动汽车行业的领导者，其技术创新能力和市场份额对投资者信心有直接影响。新产品的发布、技术突破或竞争对手的动作都可能导致股价大幅波动。\n\n2. 市场预期：特斯拉的股价常常受到市场对其增长前景、盈利能力和电池技术等未来预期的影响。如果市场对特斯拉的未来表现乐观，股价会上涨；反之，如果预期不达预期或出现负面新闻，股价可能会下跌。\n\n3. 宏观经济因素：全球经济状况、政策变化（如补贴政策）、能源政策调整等都会影响电动车行业的整体环境，从而影响特斯拉的股价。\n\n4. 融资活动：特斯拉的融资活动，如股权融资、债务融资，可能会导致股价短期内调整，尤其是如果融资条件优于市场预期。\n\n5. 泡沫与估值调整：特斯拉作为一家高成长性的公司，其股票估值可能在短期内被推高，一旦市场情绪变化或估值过高被认为难以维持，股价可能回调。\n\n6. 公司管理与舆论：CEO埃隆·马斯克的个人言行、公司战略调整以及媒体报道都可能成为股价波动的催化剂。\n\n总之，特斯拉股票的波动反映了市场对公司及其行业动态的快速反应，同时也体现了投资者对于特斯拉未来盈利能力、技术进步和市场地位的期待与担忧。"}
+```
+- 生产评测数据集后，dataAgent开始数据多维度评估标注，[dataAgent生产的完整评测结果](../../../sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-08-20:17:28.xlsx)（包含数据多维度评估的综合分数，各维度具体分数和评估过程发现的问题）
+![data_agent_eval_result](../_picture/data_agent_eval_result.png)
+- [产出综合评测报告](../../../sample_standard_app/app/examples/data/eval_report_2024-07-08-20:17:28.xlsx)（包含多轮评估总平均分及各维度多轮总平均分）
+![data_agent_eval_report](../_picture/data_agent_eval_report.png)
+
+## dataAgent详细说明
+### data_agent
+- [配置文件](../../../sample_standard_app/app/core/agent/data_agent_case/data_agent.yaml): 配置文件主要包含两个重要项，`dataset_builder`配置生产评测数据集的智能体名称，`dataset_evaluator`配置数据评估标注的智能体名称。
+- [智能体文件](../../../sample_standard_app/app/core/agent/data_agent_case/data_agent.py)
+- data_agent对应的入参为query问题集路径，query问题集每条json数据需包含**一次智能体调用的完整input**。
+
+### dataset_build_agent
+- [配置文件](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.yaml): 配置文件主要包含两个重要项，`candidate`配置待评测的智能体名称（如评测demo_rag_agent效果，candidate配置demo_rag_agent），`concurrency_level`配置批量执行智能体调用时的并发度（如配置5，代表并发5调用candidate智能体）
+- [智能体文件](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.py)
+- dataset_build_agent生产的评测数据以jsonl格式本地存储（jsonl文件名称为dataset_turn_{i}_{date}，`i`代表轮次，`date`代表生成时间）
+- 数据治理MVP版本中，dataset_build_agent不包含智能体自适应数据采集，当前实际为"**伪智能体**"形态
+
+### dataset_eval_agent
+- [配置文件](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.yaml): 配置文件主要包含两个重要项，`llm_model`配置智能体模型，`max_eval_lines`配置评测数据行数（如配置10代表只评估前10条评测数据，避免全局评测，消耗巨大token量）
+- [提示词文件](../../../sample_standard_app/app/core/prompt/dataset_eval_agent_cn.yaml)：agentUniverse当前开放了6种产业中验证有效的智能体评估标准（MVP版本未开放**全面性维度**，当前全面性评估标准偏向金融领域，故未在开源社区中提及）
+- [智能体文件](../../../sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.py)
+- dataset_eval_agent生产的**完整评测结果**以excel格式本地存储（文件名称为eval_result_turn_{i}_{date}，`i`代表轮次，`date`代表生成时间）
+- dataset_eval_agent生产的**综合评测报告**以excel格式本地存储（文件名称为eval_report_{date}，date代表生成时间）
--- a/sample_standard_app/app/core/agent/data_agent_case/data_agent.py
+++ b/sample_standard_app/app/core/agent/data_agent_case/data_agent.py
@@ -5,16 +5,13 @@
 # @Author  : wangchongshi
 # @Email   : wangchongshi.wcs@antgroup.com
 # @FileName: data_agent.py
-from typing import Tuple, List
+import datetime

 from agentuniverse.agent.agent import Agent
 from agentuniverse.agent.agent_manager import AgentManager
 from agentuniverse.agent.input_object import InputObject
 from agentuniverse.agent.output_object import OutputObject
 from agentuniverse.base.util.logging.logging_util import LOGGER
-from sample_standard_app.app.util.jsonl_file_utils import JsonFileWriter, JsonFileReader
-
-from sample_standard_app.app.util.txt_file_utils import TxtFileReader


 class DataAgent(Agent):
@@ -22,11 +19,11 @@ class DataAgent(Agent):

    def input_keys(self) -> list[str]:
        """Return the input keys of the Agent."""
-        return ['dataset_path']
+        return ['queryset_path']

    def output_keys(self) -> list[str]:
        """Return the output keys of the Agent."""
-        return ['prompt_answer_list', 'eval_report_list']
+        return []

    def parse_input(self, input_object: InputObject, agent_input: dict) -> dict:
        """Agent parameter parsing.
@@ -37,9 +34,7 @@ class DataAgent(Agent):
        Returns:
            dict: agent input parsed from `input_object` by the user.
        """
-        agent_input['dataset_path'] = input_object.get_data('dataset_path')
-        agent_input['turn'] = input_object.get_data('turn', 1)
-        return agent_input
+        return {}

    def parse_result(self, planner_result: dict) -> dict:
        """Planner result parser.
@@ -58,71 +53,34 @@ class DataAgent(Agent):
            input_object (InputObject): input parameters passed by the user.
            agent_input (dict): agent input parsed from `input_object` by the user.
        """
-        # step1: collect q&a dataset from the candidate agent which needs to be evaluated.
-        prompt_answer_list = self.collect_dataset(input_object, agent_input)
+        date = datetime.datetime.now().strftime("%Y-%m-%d-%H:%M:%S")
+        input_object.add_data('date', date)
+
+        # step1: build q&a dataset from the candidate agent which needs to be evaluated.
+        prompt_answer_list = self.dataset_build_agent(input_object)
        input_object.add_data('prompt_answer_list', prompt_answer_list)

-        LOGGER.info("-------------------------------------------")
-        LOGGER.info("End: collect q&a dataset from the candidate agent done.")
-        LOGGER.info("-------------------------------------------")
-
-        # step2: write the q&a dataset to json file.
-        for i in range(len(prompt_answer_list)):
-            one_turn_prompt_answer_list = prompt_answer_list[i]
-            json_writer = JsonFileWriter(f'data_agent_turn_{i + 1}_dataset')
-            json_writer.write_json_prompt_answer_list(one_turn_prompt_answer_list)
-        LOGGER.info(f"Progress: write the q&a dataset to local jsonl files.")
-
-        # step3: evaluate q&a datasets generated by the candidate agent and generate evaluation report.
-        eval_report_list = self.eval_agent(input_object)
+        # step2: evaluate q&a dataset generated by the candidate agent and generate evaluation report.
+        eval_report_list = self.dataset_eval_agent(input_object)
        return {'prompt_answer_list': prompt_answer_list, 'eval_report_list': eval_report_list}

-    def collect_dataset(self, input_object: InputObject, agent_input: dict) -> List[List[Tuple[str, str]]]:
-        """Collect q&a dataset from the candidate agent which needs to be evaluated."""
-
-        candidate_agent_name = self.agent_model.plan.get('planner', {}).get('candidate', '')
-        # get the candidate agent which needs to be evaluated
-        candidate_agent: Agent = AgentManager().get_instance_obj(candidate_agent_name)
-        if not candidate_agent:
-            raise ValueError('The agent instance corresponding to `candidate` parameter is empty')
-
-        # init jsonl file reader
-        jsonl_file_reader = JsonFileReader(agent_input.get('dataset_path'))
-        # read query list
-        query_list = jsonl_file_reader.read_json_obj_list()
-        if not query_list:
-            raise ValueError('query list information read from dataset_path is empty')
-
-        prompt_answer_list = []
-        for i in range(agent_input.get('turn')):
-            LOGGER.info("-------------------------------------------")
-            LOGGER.info(f"Start: collect q&a dataset from the candidate agent `{candidate_agent_name}`, turn {i + 1}.")
-            one_turn_prompt_answer_list = []
-            # single turn query and answer processing.
-            for j in range(len(query_list)):
-                query_dict: dict = query_list[j]
-                if query_dict:
-                    # init the input and output key in agent
-                    first_input_key = candidate_agent.input_keys()[0]
-                    first_output_key = candidate_agent.output_keys()[0]
-                    # run the target agent
-                    output_object: OutputObject = candidate_agent.run(**query_dict)
-                    # note: the first index of input_keys and output_keys is identified as the prompt and answer.
-                    one_turn_prompt_answer_list.append(
-                        (query_dict.get(first_input_key, ''), output_object.get_data(first_output_key)))
-                    LOGGER.info(f"Progress: the turn {i + 1} query line {j + 1} has generated the answer "
-                                f"successfully.")
-            LOGGER.info(f"End: the turn {i + 1} has generated the answer successfully.")
-            # collect q&a dataset
-            prompt_answer_list.append(one_turn_prompt_answer_list)
-        return prompt_answer_list
-
-    def eval_agent(self, input_object: InputObject):
-        """Evaluate q&a datasets generated by the candidate agent and generate evaluation report."""
+    def dataset_build_agent(self, input_object: InputObject):
+        """Build q&a dataset from the candidate agent which needs to be evaluated."""

        planner = self.agent_model.plan.get('planner', {})
-        eval_agent: Agent = AgentManager().get_instance_obj(planner.get('evaluator'))
-        if eval_agent is None:
-            raise ValueError('The agent instance corresponding to `evaluator` parameter is empty')
-        output: OutputObject = eval_agent.run(**input_object.to_dict())
+        dataset_build_agent: Agent = AgentManager().get_instance_obj(planner.get('dataset_builder'))
+        if dataset_build_agent is None:
+            raise ValueError('The agent instance corresponding to `dataset_builder` parameter is empty')
+        output: OutputObject = dataset_build_agent.run(**input_object.to_dict())
+        return output.get_data('prompt_answer_list', [])
+
+    def dataset_eval_agent(self, input_object: InputObject):
+        """Evaluate q&a dataset generated by the candidate agent and generate evaluation report."""
+
+        planner = self.agent_model.plan.get('planner', {})
+        dataset_eval_agent: Agent = AgentManager().get_instance_obj(planner.get('dataset_evaluator'))
+        if dataset_eval_agent is None:
+            LOGGER.warn("The agent instance corresponding to `dataset_evaluator` parameter is empty")
+            return []
+        output: OutputObject = dataset_eval_agent.run(**input_object.to_dict())
        return output.get_data('eval_report_json_list', [])
--- a/sample_standard_app/app/core/agent/data_agent_case/data_agent.yaml
+++ b/sample_standard_app/app/core/agent/data_agent_case/data_agent.yaml
@@ -1,14 +1,10 @@
 info:
  name: 'data_agent'
  description: 'data agent'
-profile:
-  llm_model:
-    name: 'demo_llm'
-    model_name: 'gpt-4o'
 plan:
  planner:
-    candidate: 'demo_rag_agent'
-    evaluator: 'eval_agent'
+    dataset_builder: 'dataset_build_agent'
+    dataset_evaluator: 'dataset_eval_agent'
 metadata:
  type: 'AGENT'
  module: 'sample_standard_app.app.core.agent.data_agent_case.data_agent'
--- a/sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.py
+++ b/sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.py
@@ -0,0 +1,138 @@
+# !/usr/bin/env python3
+# -*- coding:utf-8 -*-
+
+# @Time    : 2024/7/5 15:07
+# @Author  : wangchongshi
+# @Email   : wangchongshi.wcs@antgroup.com
+# @FileName: dataset_build_agent.py
+from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
+from typing import Tuple, List, Any, Optional
+
+from agentuniverse.agent.agent import Agent
+from agentuniverse.agent.agent_manager import AgentManager
+from agentuniverse.agent.input_object import InputObject
+from agentuniverse.agent.output_object import OutputObject
+from agentuniverse.base.util.logging.logging_util import LOGGER
+from sample_standard_app.app.util.jsonl_file_utils import JsonFileWriter, JsonFileReader
+
+
+class DatasetBuildAgent(Agent):
+    """Dataset Build Agent class."""
+
+    executor: Optional[Any] = ThreadPoolExecutor(max_workers=1, thread_name_prefix="data_agent")
+
+    def input_keys(self) -> list[str]:
+        """Return the input keys of the Agent."""
+        return ['queryset_path']
+
+    def output_keys(self) -> list[str]:
+        """Return the output keys of the Agent."""
+        return ['prompt_answer_list']
+
+    def parse_input(self, input_object: InputObject, agent_input: dict) -> dict:
+        """Agent parameter parsing.
+
+        Args:
+            input_object (InputObject): input parameters passed by the user.
+            agent_input (dict): agent input preparsed by the agent.
+        Returns:
+            dict: agent input parsed from `input_object` by the user.
+        """
+        agent_input['queryset_path'] = input_object.get_data('queryset_path')
+        agent_input['turn'] = input_object.get_data('turn', 1)
+        agent_input['candidate'] = input_object.get_data('candidate')
+        return agent_input
+
+    def parse_result(self, planner_result: dict) -> dict:
+        """Planner result parser.
+
+        Args:
+            planner_result(dict): Planner result
+        Returns:
+            dict: Agent result object.
+        """
+        return planner_result
+
+    def execute(self, input_object: InputObject, agent_input: dict):
+        """Execute agent instance.
+
+        Args:
+            input_object (InputObject): input parameters passed by the user.
+            agent_input (dict): agent input parsed from `input_object` by the user.
+        """
+        # init thread pool
+        profile = self.agent_model.profile or {}
+        self.executor = ThreadPoolExecutor(
+            max_workers=profile.get('concurrency_level') if profile.get('concurrency_level') else 1,
+            thread_name_prefix="data_agent")
+
+        # step1: build q&a dataset from the candidate agent which needs to be evaluated.
+        prompt_answer_list = self.build_dataset(agent_input)
+        input_object.add_data('prompt_answer_list', prompt_answer_list)
+
+        LOGGER.info("-------------------------------------------")
+        LOGGER.info("End: build q&a dataset from the candidate agent done.")
+        LOGGER.info("-------------------------------------------")
+
+        # step2: write the q&a dataset to json file.
+        date = input_object.get_data('date', '')
+        for i in range(len(prompt_answer_list)):
+            one_turn_prompt_answer_list = prompt_answer_list[i]
+            json_writer = JsonFileWriter(f'dataset_turn_{i + 1}_{date}')
+            json_writer.write_json_prompt_answer_list(one_turn_prompt_answer_list)
+        LOGGER.info(f"Progress: write the q&a dataset to local jsonl files.")
+        return {'prompt_answer_list': prompt_answer_list}
+
+    def build_dataset(self, agent_input: dict) -> List[List[Tuple[str, str]]]:
+        """Build q&a dataset from the candidate agent which needs to be evaluated."""
+
+        candidate_agent_name = agent_input.get('candidate') \
+            if agent_input.get('candidate') else self.agent_model.plan.get('planner', {}).get('candidate', '')
+        # get the candidate agent which needs to be evaluated
+        candidate_agent: Agent = AgentManager().get_instance_obj(candidate_agent_name)
+        if not candidate_agent:
+            raise ValueError('The agent instance corresponding to `candidate` parameter is empty')
+
+        # init jsonl file reader
+        jsonl_file_reader = JsonFileReader(agent_input.get('queryset_path'))
+        # read query list
+        query_list = jsonl_file_reader.read_json_obj_list()
+        if not query_list:
+            raise ValueError('query list information read from queryset_path is empty')
+
+        # init the input and output key in agent
+        first_input_key = candidate_agent.input_keys()[0]
+        first_output_key = candidate_agent.output_keys()[0]
+
+        prompt_answer_list = []
+
+        for i in range(agent_input.get('turn')):
+            LOGGER.info("-------------------------------------------")
+            LOGGER.info(f"Start: build q&a dataset from the candidate agent `{candidate_agent_name}`, turn {i + 1}.")
+            one_turn_prompt_answer_list = []
+            futures_to_query = {}
+
+            # single turn query and answer processing.
+            for j in range(len(query_list)):
+                query_dict: dict = query_list[j]
+                if query_dict:
+                    # run the target agent
+                    future = self.executor.submit(candidate_agent.run, **query_dict)
+                    futures_to_query[future] = query_dict
+
+            done, not_done = wait(futures_to_query.keys(), return_when=ALL_COMPLETED)
+
+            for future in done:
+                output_object: OutputObject = future.result()
+                # note: the first index of input_keys and output_keys is identified as the prompt and answer.
+                prompt = futures_to_query.get(future, {}).get(first_input_key, '')
+                answer = output_object.get_data(first_output_key, '')
+
+                one_turn_prompt_answer_list.append((prompt, answer))
+                LOGGER.info(f"Progress: the turn {i + 1} query: `{prompt}` has generated the answer successfully.")
+
+            LOGGER.info(f"End: the turn {i + 1} has generated the answer successfully.")
+
+            # build q&a dataset
+            prompt_answer_list.append(one_turn_prompt_answer_list)
+        return prompt_answer_list
--- a/sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.yaml
+++ b/sample_standard_app/app/core/agent/data_agent_case/dataset_build_agent.yaml
@@ -0,0 +1,12 @@
+info:
+  name: 'dataset_build_agent'
+  description: 'dataset build agent'
+profile:
+  concurrency_level: 1
+plan:
+  planner:
+    candidate: 'demo_rag_agent'
+metadata:
+  type: 'AGENT'
+  module: 'sample_standard_app.app.core.agent.data_agent_case.dataset_build_agent'
+  class: 'DatasetBuildAgent'
--- a/sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.py
+++ b/sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.py
@@ -4,13 +4,13 @@
 # @Time    : 2024/7/1 17:00
 # @Author  : wangchongshi
 # @Email   : wangchongshi.wcs@antgroup.com
-# @FileName: eval_agent.py
+# @FileName: dataset_eval_agent.py
 from typing import Tuple, List

-import pandas as pd
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.utils.json import parse_json_markdown
 from openpyxl.reader.excel import load_workbook
+import pandas as pd

 from agentuniverse.agent.agent import Agent
 from agentuniverse.agent.input_object import InputObject
@@ -21,11 +21,10 @@ from agentuniverse.prompt.chat_prompt import ChatPrompt
 from agentuniverse.prompt.prompt import Prompt
 from agentuniverse.prompt.prompt_manager import PromptManager
 from agentuniverse.prompt.prompt_model import AgentPromptModel
-from sample_standard_app.app.util.jsonl_file_utils import JsonFileWriter


-class EvalAgent(Agent):
-    """Evaluation Agent class."""
+class DatasetEvalAgent(Agent):
+    """Dataset Evaluation Agent class."""

    def input_keys(self) -> list[str]:
        """Return the input keys of the Agent."""
@@ -78,11 +77,11 @@ class EvalAgent(Agent):
        LOGGER.info(f"End: evaluate the dataset in multiple dimensions done.")

        LOGGER.info("-------------------------------------------")
-        # step2: write the eval results to excel file.
-        self.generate_eval_results_excel(prompt_answer_list, eval_dims_json_list)
+        # step2: write the eval results to Excel file.
+        self.generate_eval_results_excel(prompt_answer_list, eval_dims_json_list, input_object.get_data('date', ''))

        # step3: generate eval report
-        eval_report_json_list = self.generate_eval_report(eval_dims_json_list)
+        eval_report_json_list = self.generate_eval_report(eval_dims_json_list, input_object.get_data('date', ''))
        LOGGER.info(f"End: generate evaluation report done.")
        LOGGER.info("-------------------------------------------")
        return {'eval_report_json_list': eval_report_json_list, 'eval_dims_json_list': eval_dims_json_list}
@@ -143,7 +142,7 @@ class EvalAgent(Agent):
                        # calculate avg score from multiple dimensions.
                        avg_score = sum(data['score'] for data in dimensions)
                except Exception as e:
-                    LOGGER.warn(f'except[eval_prompt_answer_from_jsonl]>>>{e}:{res[0]}')
+                    LOGGER.warn(f'except[eval_prompt_answer_from_jsonl]>>> res: {res}, exception: {e}')
                    continue
                if len(dimensions) > 0:
                    avg_score = avg_score / len(dimensions)
@@ -192,13 +191,14 @@ class EvalAgent(Agent):
        llm.set_by_agent_model(**self.agent_model.profile.get('llm_model'))
        return llm

-    def generate_eval_report(self, eval_dims_json_list: List[List[dict]]):
+    def generate_eval_report(self, eval_dims_json_list: List[List[dict]], date: str):
        """Integrate multidimensional evaluation scores and generate evaluation report

        Args:
            eval_dims_json_list (List[List[dict]]) : The list of evaluation results from multiple turns,
            the type of single turn is List[dict].

+            date (str): The date of the evaluation.
        Returns:
            eval_report_json_list (List[dict]): The list of evaluation report.
        """
@@ -245,7 +245,7 @@ class EvalAgent(Agent):
            LOGGER.info(f"Progress: total evaluation report has generated successfully.")

        # generate excel report
-        self.generate_eval_report_excel(eval_report_json_list)
+        self.generate_eval_report_excel(eval_report_json_list, date)
        return eval_report_json_list

    @staticmethod
@@ -279,11 +279,11 @@ class EvalAgent(Agent):

    @staticmethod
    def generate_eval_results_excel(prompt_answer_list: List[List[Tuple[str, str]]],
-                                    eval_dims_json_list: List[List[dict]]):
+                                    eval_dims_json_list: List[List[dict]], date: str):
        """Generate evaluation results in excel format."""

        rows = []
-        columns: List[str] = ['Line Number', 'Avg Score', 'Prompt', 'Answer']
+        columns: List[str] = ['Line Number', 'Overall Score', 'Prompt', 'Answer']
        if len(eval_dims_json_list) > 0 and len(eval_dims_json_list[0]) > 0:
            one_row_eval_result = eval_dims_json_list[0][0]
            dims = one_row_eval_result.get('dimensions', [])
@@ -313,12 +313,12 @@ class EvalAgent(Agent):

                df = pd.DataFrame(rows, columns=columns)

-                df.to_excel(f"./data/eval_results_turn_{i + 1}.xlsx", index=False, engine='openpyxl')
+                df.to_excel(f"./data/eval_result_turn_{i + 1}_{date}.xlsx", index=False, engine='openpyxl')
            LOGGER.info(f"Progress: generate evaluation detailed results in excel format successfully.")
            LOGGER.info("-------------------------------------------")

    @staticmethod
-    def generate_eval_report_excel(eval_report_json_list: List[dict]):
+    def generate_eval_report_excel(eval_report_json_list: List[dict], date: str):
        """Generate excel eval report."""

        rows = []
@@ -338,13 +338,13 @@ class EvalAgent(Agent):

        df = pd.DataFrame(rows, columns=columns)

-        df.to_excel("./data/eval_report.xlsx", index=False, engine='openpyxl')
+        df.to_excel(f"./data/eval_report_{date}.xlsx", index=False, engine='openpyxl')

        # tweak column width
-        workbook = load_workbook('./data/eval_report.xlsx')
+        workbook = load_workbook(f"./data/eval_report_{date}.xlsx")
        worksheet = workbook.active
        column_widths = [25, 20] + [20] * (len(columns) - 2)
        for i, column_width in enumerate(column_widths, 1):
            worksheet.column_dimensions[chr(64 + i)].width = column_width
-        workbook.save('./data/eval_report.xlsx')
+        workbook.save(f"./data/eval_report_{date}.xlsx")
        LOGGER.info(f"Progress: generate evaluation excel file successfully.")
--- a/sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.yaml
+++ b/sample_standard_app/app/core/agent/data_agent_case/dataset_eval_agent.yaml
@@ -0,0 +1,14 @@
+info:
+  name: 'dataset_eval_agent'
+  description: 'dataset eval agent'
+profile:
+  prompt_version: dataset_eval_agent.cn
+  max_eval_lines: 10
+  llm_model:
+    name: 'qwen_llm'
+    model_name: 'qwen2-72b-instruct'
+    temperature: 0.1
+metadata:
+  type: 'AGENT'
+  module: 'sample_standard_app.app.core.agent.data_agent_case.dataset_eval_agent'
+  class: 'DatasetEvalAgent'
--- a/sample_standard_app/app/core/agent/data_agent_case/eval_agent.yaml
+++ b/sample_standard_app/app/core/agent/data_agent_case/eval_agent.yaml
@@ -1,14 +0,0 @@
-info:
-  name: 'eval_agent'
-  description: 'eval agent'
-profile:
-  prompt_version: eval_agent.cn
-  max_eval_lines: 10
-  llm_model:
-    name: 'demo_llm'
-    model_name: 'gpt-4o'
-    temperature: 0.1
-metadata:
-  type: 'AGENT'
-  module: 'sample_standard_app.app.core.agent.data_agent_case.eval_agent'
-  class: 'EvalAgent'
--- a/sample_standard_app/app/core/agent/rag_agent_case/demo_rag_agent.yaml
+++ b/sample_standard_app/app/core/agent/rag_agent_case/demo_rag_agent.yaml
@@ -2,29 +2,9 @@ info:
  name: 'demo_rag_agent'
  description: 'demo rag agent'
 profile:
-  introduction: 你是一位精通信息分析的ai助手。
-  target: 你的目标是判断问题对应的答案是否提供了有价值的信息，并对问题的答案做出建议和评价。
-  instruction: |
-    你需要遵守的规则是:
-    1. 必须使用中文结合查询的背景信息结合你所拥有的知识回答用户提出的问题。
-    2. 结构化答案生成，必要时通过空行提升阅读体验。
-    3. 不采用背景信息中的错误信息。
-    4. 要考虑答案和问题的相关性，不做对问题没有帮助的回答。
-    5. 详尽回答问题，重点突出，不过多花哨词藻。
-    6. 不说模糊的推测。
-    7. 尽量多的使用数值类信息。
-
-    背景信息是:
-    {background}
-  
-    之前的对话:
-    {chat_history}
-    
-    开始!
-
-    需要回答的问题是: {input}
+  prompt_version: demo_rag_agent.cn
  llm_model:
-    name: 'default_qwen_llm'
+    name: 'qwen_llm'
 #    model_name: 'gpt-4o'
 plan:
  planner:
--- a/sample_standard_app/app/core/llm/qwen_llm.yaml
+++ b/sample_standard_app/app/core/llm/qwen_llm.yaml
@@ -1,6 +1,6 @@
 name: 'qwen_llm'
 description: 'demo qwen llm with spi'
-model_name: 'qwen-max'
+model_name: 'qwen2-72b-instruct'
 max_tokens: 2000
 metadata:
  type: 'LLM'
--- a/sample_standard_app/app/core/prompt/dataset_eval_agent_cn.yaml
+++ b/sample_standard_app/app/core/prompt/dataset_eval_agent_cn.yaml
@@ -18,7 +18,7 @@ instruction: |
    {answer}

    --------------------------------
-    质量打分共5个维度，每个维度包含了不符合准入条件的要求（0.0分）/门槛值的要求（0.0～3.0分）/加分项（0.0～2.0分）
+    质量打分共6个维度，每个维度包含了不符合准入条件的要求（0.0分）/门槛值的要求（0.0～3.0分）/加分项（0.0～2.0分）
    
    质量打分维度描述如下:

@@ -36,8 +36,8 @@ instruction: |
      遇到模糊意图时进行恰当的意图反问确认，并给予准确回应

      3.加分项
-      能够get到用户在问什么后：
-      a、对用户的所有问题（包含情绪，只有一个问题，那就是回复了就算）都给出回应
+      能够理解到用户在问什么问题后：
+      a、对用户的所有问题（包含情绪）都给出回应
      b、回应的内容整体相关性都很高
      c、回复的问题全面且易懂

@@ -48,7 +48,7 @@ instruction: |
      数据错误，不符合事实

      2.门槛值
-      个别数据不准确，但不影响整体事实性
+      仅几个数据不准确，但不影响整体事实性

      3.加分项
      解读内容中所有引用数据真实可信，数据来源官方可信
@@ -58,10 +58,9 @@ instruction: |
    dimension requirement: 
      1.不符合准入条件
      出现专业错误
-      （如经济复苏利好股市利空债市、供不应求带来价格上涨、经济下行压力加剧通常会倒逼宏观政策加码等）

      2.门槛值
-      符合业内公认的金融逻辑，前后结论不要自相矛盾，不出现明显的专业错误
+      符合业内公认的事实逻辑，前后结论不要自相矛盾，不出现明显的专业错误

      3.加分项
      通篇解读要一以贯之、逻辑自洽，且逻辑推演是有逐层递进的（后文分析以前文结论为依据）
@@ -73,29 +72,60 @@ instruction: |
      数据陈旧过时，不符合现状

      2.门槛值
-      次新，符合当下现状
+      数据次新，符合当下现状

      3.加分项
      引用官方发布的最新数据，包含清晰的发布日期

    --------------------------------
+    dimension name: 结构性 
+    dimension requirement: 
+      1.不符合准入条件
+      答案如果是总分结构：
+        a、总论点和分论点只是抽取了原文中的部分内容，并没有真正的总结归纳
+        b、生成答案结构是总分结构，但是分论点是为了回答query而硬答，与query实际并没有关系
+        c、总结内容论点逻辑冲突
+  
+      答案如果是总结摘要结构：
+        a、摘要内容是抽取了部分段落，并没有真正的归纳总结
+        b、摘要内容，仅仅总结成一段文本，无摘要详述
+      
+      2.门槛值
+      答案如果是总分结构：
+        a、总分结构，有总结归纳，分论点有层次
+        b、答案总分结构，分论点之间有逻辑关系（互补、递进），分论点与query之间有相关性
+        c、总结的内容论点间无冲突
+  
+      答案如果是总结摘要结构：
+        a、总结内容有对原文进行归纳总结，不是抽取的原文段落
+        b、总结内容论点有展开叙述
+      
+      3.加分项
+      答案如果是总分结构：
+        a、总分结构，分论点有归类，有条理有逻辑
+        b、总分结构，分论点为1234分层结构，有条理
+  
+      答案如果是总结摘要结构：
+        a、总结内容论点叙述详细充分，有逻辑有条理
+        b、总结内容话术表达逻辑清晰，表达精炼
+    --------------------------------
    dimension name: 整体性 
    dimension requirement: 
      1.不符合准入条件
-      满足以下任意一点即为0分：
-      a、逻辑不通顺，或存在语病，或表达冗余内容超长（整体文本长度>XX字）
-      b、论点&总结重复/论点&论点重复：任意两条论点存在80%以上的重复度
-      c、倒装、错别字、语句不通顺、逻辑问题等
-      d、存在特殊符号
+      满足以下任意一点，分数为0：
+        a、逻辑不通顺，或存在语法问题，或表达冗余内容超长
+        b、论点和论点之间存在重复：任意两条论点存在80%以上的重复度
+        c、答案存在倒装、错别字、语句不通顺、逻辑问题等
+        d、答案中存在特殊符号

      2.门槛值
-      a、语句流畅、表达可阅读性强
-      b、表达直白（不反复说车轱辘话）
-      c、论点论据表达无重复
+        a、语句流畅、表达可阅读性强
+        b、答案表达直白（不反复说车轱辘话）
+        c、论点和论据在表达上无重复

      3.加分项
-      a、答案对用户query有明确回应表达
-      b、语言简洁精炼，逻辑清晰
+        a、答案对用户query有明确回应表达
+        b、语言简洁精炼，逻辑清晰

    --------------------------------
    输出格式要求：
@@ -106,17 +136,17 @@ instruction: |
    {{
      "dimensions": [
          {{
-              "name":质量打分维度名称,
-              "score":分数,
-              "negative point":扣分的原因和给出的优化建议
+              "name":质量打分维度名称(string字符串格式),
+              "score":分数(float浮点数格式),
+              "negative point":扣分的原因和给出的优化建议(string字符串格式)
          }},
          {{
-              "name":质量打分维度,
-              "score":分数,
-              "negative point":扣分的原因和给出的优化建议
+              "name":质量打分维度名称(string字符串格式),
+              "score":分数(float浮点数格式),
+              "negative point":扣分的原因和给出的优化建议(string字符串格式)
          }}
      ]
    }}
 metadata:
  type: 'PROMPT'
-  version: 'eval_agent.cn'
+  version: 'dataset_eval_agent.cn'
--- a/sample_standard_app/app/core/prompt/dataset_eval_agent_en.yaml
+++ b/sample_standard_app/app/core/prompt/dataset_eval_agent_en.yaml
@@ -0,0 +1,156 @@
+introduction: |
+  You are an extremely rigorous financial expert and need help me evaluate prompt (user input) and answer (model output) scores in various dimensions according to the following requirements.   
+
+  The full score for each dimension is 5 points. If you do not meet the entry conditions, you will be awarded 0 points directly, and if you meet the threshold value, you will be awarded 0-3 points. If you meet the additional requirements, you will add 0-2 points to the threshold value.
+  For scoring requirements, please be accurate to 1 decimal place:
+  1. Failure to meet access conditions (0.0 point)
+  2. Threshold value (0.0 - 3.0 points)
+  3. Bonus points (0.0 - 2.0 points)
+  
+  Note: Strict scoring and high scores cannot be easily given
+instruction: |
+  --------------------------------
+    prompt:
+    {prompt}
+
+    --------------------------------
+    answer:
+    {answer}
+
+    --------------------------------
+    There are 6 dimensions in quality scoring, each dimension contains requirements for non-compliance with access conditions (0.0 points)/threshold values (0.0 - 3.0 points)/bonus points (0.0 - 2.0 points)
+    
+    The quality scoring dimensions are described as follows:
+
+    dimension name: relevance 
+    dimension requirement: 
+      1. non-compliance with access conditions
+      Answer is irrelevant.
+
+      2.threshold values
+      Being able to know what the user is asking, and the answer is exactly what the user is asking, including：
+  
+      【When there is a clear intention】
+      Identify user questions and respond to the questions, and the content of the reply is relevant.
+      When the question has the main intention, the reply content should be related to the main intention.
+  
+      【Without clear intention】
+      Conduct appropriate rhetorical confirmation of intention when encountering vague intentions and then give an accurate response.
+
+      3.bonus points
+      After being able to understand what questions users ask:
+      a、Respond to all user questions (including emotions)
+      b、The overall relevance of the response is very high
+      c、The answers are comprehensive and easy to understand
+
+    --------------------------------
+    dimension name: factual 
+    dimension requirement: 
+      1. non-compliance with access conditions
+      The data is wrong and does not conform to the facts
+
+      2.threshold values
+      Only a few data are inaccurate, but they do not affect the overall factual nature
+
+      3.bonus points
+      All quoted data in the interpretation content are true and credible, and the data source is officially credible
+    --------------------------------
+
+    dimension name: rationality 
+    dimension requirement: 
+      1. non-compliance with access conditions
+      Professional error occurred
+
+      2.threshold values
+      Adhere to the factual logic recognized by the industry, ensure that the conclusions are consistent and do not contradict each other, and avoid obvious professional errors.
+
+      3.bonus points
+      
+    The interpretation throughout the text must be consistent and logically coherent, with the logical reasoning advancing step by step (subsequent analysis is based on previous conclusions).
+
+    --------------------------------
+    dimension name: timeliness 
+    dimension requirement: 
+      1. non-compliance with access conditions
+      The data is stale and no longer corresponds to the present reality.
+
+      2.threshold values
+      The data is relatively recent and aligns with the current situation.
+
+      3.bonus points
+      Cite the latest official data, including a clear publication date.
+
+    --------------------------------
+    dimension name: structural
+    dimension requirement: 
+      1. non-compliance with access conditions
+      If the answer follows a general-specific structure:
+        a、The main thesis and sub-theses only extract parts of the content from the original text and do not truly summarize or generalize.
+        b、The structure of the answer is general-specific, but the sub-points are forced to respond to the query without actually being related to it.
+        c、The summary of the content has logical conflicts in the argument points.
+  
+      If the answer is a summary abstract structure:
+        a、The abstract content is extracted from certain paragraphs and does not truly synthesize or summarize.
+        b、The abstract content is merely summarized into a single paragraph, without a detailed abstract.
+        
+      2.threshold values
+      If the answer follows a general-specific structure:
+        a、The structure is general-specific, with summarization, and the sub-points are hierarchically organized.
+        b、The answer has a general-specific structure, with sub-points that have logical relationships (complementary, progressive), and the sub-points are relevant to the query.
+        c、Summary content without conflict between arguments
+  
+      If the answer is a summary structure:
+        a、The summary content summarizes the original text, not extracted original paragraphs
+        b、The summary content and arguments are described in an expanded manner
+      
+      3.bonus points
+      If the answer follows a general-specific structure:
+        a、The structure is general-specific, with categorized sub-points that are orderly and logical.
+        b、The structure is general-specific, with the main points organized in a hierarchical structure labeled 1, 2, 3, and 4, which is orderly.
+  
+      If the answer is a summary structure:
+        a、The summary content and arguments are detailed and sufficient, logical and organized
+        b、The summary should be expressed with clear logic and concise language.
+    --------------------------------
+    dimension name: integrity 
+    dimension requirement: 
+      1. non-compliance with access conditions
+      If any of the following points are satisfied, the score is 0:
+        a、Logic is not smooth, or there are grammatical problems, or redundant content is expressed too long
+        b、Repetition between arguments: Any two arguments have more than 80% repetition
+        c、The answers include inversion, typos, inconsistent sentences, logical problems, etc.
+        d、There are special symbols in the answer
+
+      2.threshold values
+        a、Smooth sentences and highly readable expressions
+        b、Express the answer directly (do not repeat it over and over again)
+        c、The thesis and evidence should be expressed without repetition.
+
+      3.bonus points
+        a、The answer is clearly expressed in response to user query
+        b、The language is concise, and the logic is clear
+
+    --------------------------------
+    Output format requirements:
+    1. According to the specific scoring dimension requirements, evaluate prompt and answer content at the same time and score.
+    2. Be sure to give a negative point, combine prompt and answer, and give specific reasons for deducting points and specific optimization suggestions.
+
+    Please score strictly in accordance with the quality scoring dimensions. High scores cannot be easily given. 
+    Please ensure that the output only contains the following json format fields, and each field cannot be empty:
+    {{
+      "dimensions": [
+          {{
+              "name": dimension name(string),
+              "score": score(float),
+              "negative point": Reasons for deductions and optimization suggestions given(string)
+          }},
+          {{
+              "name": dimension name(string),
+              "score": score(float),
+              "negative point": Reasons for deductions and optimization suggestions given(string)
+          }}
+      ]
+    }}
+metadata:
+  type: 'PROMPT'
+  version: 'dataset_eval_agent.en'
--- a/sample_standard_app/app/core/prompt/demo_rag_agent_cn.yaml
+++ b/sample_standard_app/app/core/prompt/demo_rag_agent_cn.yaml
@@ -0,0 +1,24 @@
+introduction: 你是一位精通信息分析的ai助手。
+target: 你的目标是使用中文结合查询的背景信息及你所拥有的知识回答用户提出的问题。
+instruction: |
+  你需要遵守的规则是:
+  1. 必须使用中文结合查询的背景信息结合你所拥有的知识回答用户提出的问题。
+  2. 结构化答案生成，必要时通过空行提升阅读体验。
+  3. 不采用背景信息中的错误信息。
+  4. 要考虑答案和问题的相关性，不做对问题没有帮助的回答。
+  5. 详尽回答问题，重点突出，不过多花哨词藻。
+  6. 不说模糊的推测。
+  7. 尽量多的使用数值类信息。
+  
+  背景信息是:
+  {background}
+  
+  之前的对话:
+  {chat_history}
+  
+  开始!
+  
+  需要回答的问题是: {input}
+metadata:
+  type: 'PROMPT'
+  version: 'demo_rag_agent.cn'
--- a/sample_standard_app/app/core/prompt/demo_rag_agent_en.yaml
+++ b/sample_standard_app/app/core/prompt/demo_rag_agent_en.yaml
@@ -0,0 +1,25 @@
+introduction:  You are an ai assistant who is proficient in information analysis.
+target: Your goal is to use English to combine the background information of the query and your knowledge to answer questions raised by users.
+instruction: |
+  The rules you need to follow are:
+  1. You must use English combined with the background information of the query and your knowledge to answer the questions raised by the user.
+  2. Structured answer generation and use blank lines to improve the reading experience if necessary.
+  3. Do not use background information that is incorrect.
+  4. Consider the relevance of the answer and question, do not answer questions that do not help.
+  5. Answer questions in detail, with emphasis on key points.
+  6. Do not make vague inferences.
+  7. Try to use numerical information as much as possible.
+  8. The answer should not exceed 500 words
+  
+  Background is:
+  {background}
+  
+  Previous conversations are:
+  {chat_history}
+  
+  Begin!
+  
+  The question needs to be answered is: {input}
+metadata:
+  type: 'PROMPT'
+  version: 'demo_rag_agent.en'
--- a/sample_standard_app/app/examples/data/dataset_turn_1_2024-07-08-20:17:28.jsonl
+++ b/sample_standard_app/app/examples/data/dataset_turn_1_2024-07-08-20:17:28.jsonl
@@ -0,0 +1,2 @@
+{"prompt": "英伟达股票大涨的原因是什么？", "answer": "您的问题与提供的背景信息关于巴菲特减持比亚迪股票的情况不符，因此我无法直接回答“英伟达股票大涨的原因”。然而，如果要探讨股市中某个特定股票（如NVIDIA）的大涨，通常会考虑行业趋势、公司业绩、技术创新、市场需求增长、宏观经济环境等因素。对于英伟达这样的科技巨头，GPU性能提升、人工智能和数据中心需求的增长、以及新兴市场的扩张都可能是推动股价上涨的原因。但具体到英伟达，需要查看最新的行业报告和公司公告来了解详细情况。"}
+{"prompt": "特斯拉股票为什么波动很大？", "answer": "特斯拉股票的波动性较大，主要有以下几个原因：\n\n1. 技术创新与行业竞争：特斯拉作为电动汽车行业的领导者，其技术创新能力和市场份额对投资者信心有直接影响。新产品的发布、技术突破或竞争对手的动作都可能导致股价大幅波动。\n\n2. 市场预期：特斯拉的股价常常受到市场对其增长前景、盈利能力和电池技术等未来预期的影响。如果市场对特斯拉的未来表现乐观，股价会上涨；反之，如果预期不达预期或出现负面新闻，股价可能会下跌。\n\n3. 宏观经济因素：全球经济状况、政策变化（如补贴政策）、能源政策调整等都会影响电动车行业的整体环境，从而影响特斯拉的股价。\n\n4. 融资活动：特斯拉的融资活动，如股权融资、债务融资，可能会导致股价短期内调整，尤其是如果融资条件优于市场预期。\n\n5. 泡沫与估值调整：特斯拉作为一家高成长性的公司，其股票估值可能在短期内被推高，一旦市场情绪变化或估值过高被认为难以维持，股价可能回调。\n\n6. 公司管理与舆论：CEO埃隆·马斯克的个人言行、公司战略调整以及媒体报道都可能成为股价波动的催化剂。\n\n总之，特斯拉股票的波动反映了市场对公司及其行业动态的快速反应，同时也体现了投资者对于特斯拉未来盈利能力、技术进步和市场地位的期待与担忧。"}
--- a/sample_standard_app/app/examples/data/dataset_turn_1_2024-07-09-11:45:42.jsonl
+++ b/sample_standard_app/app/examples/data/dataset_turn_1_2024-07-09-11:45:42.jsonl
@@ -0,0 +1,2 @@
+{"prompt": "Why does Tesla stock fluctuate so much?", "answer": "The question about Tesla's stock fluctuations isn't directly addressed by the provided background information on Warren Buffett's investment strategies or his dealings with BYD. However, I can provide insights based on general knowledge and market principles.\n\nTesla's stock price is known for its significant volatility, which can be attributed to several factors:\n\n1. **Market Sentiment and Speculation**: Tesla is often seen as a bellwether for the electric vehicle (EV) industry and the broader tech sector. Positive or negative news, whether related to production numbers, new model releases, regulatory changes, or CEO Elon Musk's tweets, can significantly impact investor sentiment and lead to sharp movements in the stock price.\n\n2. **Financial Performance**: Like any publicly traded company, Tesla's quarterly earnings reports can cause substantial stock price movements. If the company beats or misses analysts' expectations regarding revenue, profitability, or guidance, it can result in immediate and sometimes dramatic reactions from the market.\n\n3. **Valuation and Growth Expectations**: Tesla has been valued at very high multiples compared to traditional automakers, reflecting investors' expectations for future growth. Any indication that this growth might slow down can lead to a reevaluation of the stock's valuation, causing price drops.\n\n4. **Competition and Market Dynamics**: The EV market is rapidly evolving, with established automakers and new entrants alike increasing their focus on electric vehicles. Changes in competitive dynamics, such as new product launches or shifts in market share, can influence Tesla's stock price.\n\n5. **Regulatory Environment**: Government policies and regulations, particularly those related to EV incentives, emissions standards, and autonomous driving technology, can have a significant impact on Tesla's business outlook and stock performance.\n\n6. **Supply Chain and Production Challenges**: Tesla has faced production bottlenecks and supply chain disruptions, especially during times of global economic uncertainty. These challenges can affect the company's ability to meet production targets and deliver vehicles, impacting stock price.\n\nIn summary, Tesla's stock fluctuations are influenced by a combination of market sentiment, financial performance, high growth expectations, competitive dynamics, regulatory changes, and operational challenges. These factors contribute to the stock's characteristic volatility."}
+{"prompt": "What is the reason for the surge in Nvidia's stock?", "answer": "The question about Nvidia's stock surge isn't directly related to the provided background information on Warren Buffett's investment strategies and his transactions involving BYD shares. However, I can provide a general explanation based on common factors that might cause a surge in a company's stock price.\n\nNvidia's stock price can surge due to several reasons, including but not limited to:\n\n1. **Strong Financial Performance**: If Nvidia reports better-than-expected earnings or revenue figures, this can lead to a positive market reaction, driving up its stock price. Investors often look for signs of growth and profitability when deciding where to invest their money.\n\n2. **Innovative Product Launches**: The introduction of new, cutting-edge products or technologies can also boost investor confidence and increase demand for Nvidia's stock. For instance, advancements in graphics processing units (GPUs) for gaming, artificial intelligence, and data center applications have historically been significant drivers for Nvidia.\n\n3. **Market Expansion**: If Nvidia announces plans to enter new markets or expand into existing ones, this can signal potential for increased revenue streams, which can positively impact its stock price.\n\n4. **Strategic Acquisitions**: Acquiring other companies can strengthen Nvidia's position in the market, providing access to new technologies, customer bases, or markets. Successful acquisitions can lead to a rise in stock prices as investors anticipate future growth.\n\n5. **Industry Trends**: The technology sector, particularly segments like AI, autonomous vehicles, and cloud computing, has seen significant growth in recent years. As a leader in these areas, Nvidia benefits from positive industry trends, which can contribute to higher stock prices.\n\n6. **Analyst Recommendations**: Positive recommendations from financial analysts can influence investor sentiment and lead to increased buying activity, resulting in a stock price increase.\n\n7. **Economic Conditions**: A strong economy generally leads to more consumer spending and business investments, which can benefit companies like Nvidia that sell high-end technology products.\n\nIt's important to note that while these factors can contribute to a stock price surge, individual events or market conditions can also cause volatility. Investors should consider multiple indicators and consult with financial advisors before making investment decisions."}
--- a/sample_standard_app/app/examples/data/eval_report_2024-07-08-20:17:28.xlsx
+++ b/sample_standard_app/app/examples/data/eval_report_2024-07-08-20:17:28.xlsx
--- a/sample_standard_app/app/examples/data/eval_report_2024-07-09-11:45:42.xlsx
+++ b/sample_standard_app/app/examples/data/eval_report_2024-07-09-11:45:42.xlsx
--- a/sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-08-20:17:28.xlsx
+++ b/sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-08-20:17:28.xlsx
--- a/sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-09-11:45:42.xlsx
+++ b/sample_standard_app/app/examples/data/eval_result_turn_1_2024-07-09-11:45:42.xlsx
--- a/sample_standard_app/app/examples/data/query.jsonl
+++ b/sample_standard_app/app/examples/data/query.jsonl
@@ -1,3 +0,0 @@
-{"input": "我的股票为什么会跌？"}
-{"input": "英伟达股票大涨的原因是什么？"}
-{"input": "贵州茅台为什么不行了？"}
--- a/sample_standard_app/app/examples/data/queryset_cn.jsonl
+++ b/sample_standard_app/app/examples/data/queryset_cn.jsonl
@@ -0,0 +1,2 @@
+{"input": "英伟达股票大涨的原因是什么？"}
+{"input": "特斯拉股票为什么波动很大？"}
--- a/sample_standard_app/app/examples/data/queryset_en.jsonl
+++ b/sample_standard_app/app/examples/data/queryset_en.jsonl
@@ -0,0 +1,2 @@
+{"input": "What is the reason for the surge in Nvidia's stock?"}
+{"input": "Why does Tesla stock fluctuate so much?"}
--- a/sample_standard_app/app/examples/data_agent.py
+++ b/sample_standard_app/app/examples/data_agent.py
@@ -12,10 +12,10 @@ from agentuniverse.agent.agent_manager import AgentManager
 AgentUniverse().start(config_path='../../config/config.toml')


-def data_process(dataset_path: str, turn: int, **kwargs):
+def data_process(queryset_path: str, turn: int, **kwargs):
    instance: Agent = AgentManager().get_instance_obj('data_agent')
-    instance.run(dataset_path=dataset_path, turn=turn, **kwargs)
+    instance.run(queryset_path=queryset_path, turn=turn, **kwargs)


 if __name__ == '__main__':
-    data_process(dataset_path="./data/query.jsonl", turn=1)
+    data_process(queryset_path="data/queryset_cn.jsonl", turn=1)
--- a/sample_standard_app/app/util/jsonl_file_utils.py
+++ b/sample_standard_app/app/util/jsonl_file_utils.py
@@ -60,8 +60,8 @@ class JsonFileReader(object):


 class JsonFileWriter(object):
-    def __init__(self, output_file_name: str, extension='jsonl'):
-        self.outfile_path = DATA_DIR + output_file_name + '.' + extension
+    def __init__(self, output_file_name: str, extension='jsonl', directory=DATA_DIR):
+        self.outfile_path = directory + output_file_name + '.' + extension
        directory = os.path.dirname(self.outfile_path)
        if not os.path.exists(directory):
            os.makedirs(directory)