README

Project Overview

This project evaluates the agentic quality of text samples using a multi-agent reasoning pipeline. It leverages Azure OpenAI services and a set of predefined prompts to assess the quality of content on a scale of 1 to 5, providing constructive feedback and scoring.

Features Offered

Multi-Agent Evaluation: Includes discussion, criticism, and ranking agents to evaluate text samples.
Customizable Prompts: Prompts for agents can be tailored to specific evaluation needs.
Scoring and Feedback: Provides a score (1–10) and detailed feedback for each sample.
Azure Integration: Utilizes Azure OpenAI services for model inference.

Prerequisites

Python: Ensure Python 3.8+ is installed.
Dependencies: Install required Python packages using:
```
pip install -r requirements.txt
```
Azure Credentials: Set up Azure credentials and environment variables:
- AZURE_DEPLOYMENT
- MODEL_NAME
- AZURE_ENDPOINT
- API_TOKEN
Input File: Prepare a .jsonl file containing JSON objects (one per line) that includes all the information needed by the evaluation model.

How to Run

Clone the Repository:

git clone https://github.com/microsoft-mousa/agentAsAJudge
cd agentAsAJudge

Set Up Environment Variables: Create a .env file in the project root and add the following:

AZURE_DEPLOYMENT=<your-deployment-name>
MODEL_NAME=<your-model-name>
AZURE_ENDPOINT=<your-endpoint-url>
API_TOKEN=<your-api-token>

Change System Prompts: To customize the system prompts:
- Create a new directory under metrics and add your prompt files (e.g., .md files).
- Update the initialization of the AgentEvalPrompts object in main.py with the paths to your new prompt files:
```
agent_eval_prompts = AgentEvalPrompts(
    reviewer_prompt="<path-to-your-reviewer-prompt>",
    critic_prompt="<path-to-your-critic-prompt>",
    ranker_prompt="<path-to-your-ranker-prompt>"
)
```
Run Evaluation: Execute the script:
```
python main.py <path-to-jsonl-file>
```

Expected Output

Validation:

If the .jsonl file is valid:
```
✅ All lines are valid JSON objects!
```

If there are issues:

❌ Found issues in the file:
 - Line X: <error-description>

Evaluation: For each sample, the output includes:
- Score: A numeric value (1–10).
- Feedback: Detailed reasoning for the score.
Example:
```
🔍 Evaluating Sample 1...
📊 Score: 4
🗣 Review: The content is well-structured and informative.
```

Errors: If evaluation fails for a sample:

❌ Error evaluating sample X: <error-description>

Contribution

Feel free to contribute by improving prompts, adding new metrics, or enhancing the evaluation pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agent_eval_prompts		agent_eval_prompts
agent_llm_evaluation/agent_llm_evaluation		agent_llm_evaluation/agent_llm_evaluation
compare_multiple_choice		compare_multiple_choice
introduction		introduction
metrics		metrics
multiple_choice		multiple_choice
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dummy_data.jsonl		dummy_data.jsonl
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README

Project Overview

Features Offered

Prerequisites

How to Run

Expected Output

Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

microsoft/AgentAsJudge

Folders and files

Latest commit

History

Repository files navigation

README

Project Overview

Features Offered

Prerequisites

How to Run

Expected Output

Contribution

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages