llm-agentsFeatured

llm-evaluation

5.2k starsUpdated 2025-12-28

Compatible with:claudecodex

Description

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking.

How to Use

Visit the GitHub repository to get the SKILL.md file
Copy the file to your project root or .cursor/rules directory
Restart your AI assistant or editor to apply the new skill

Full Skill Documentation

name

llm-evaluation

description

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

About llm-evaluation

llm-evaluation is an AI skill in the llm-agents category, designed to help developers and users work more effectively with AI tools. Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking.

This skill has earned 5,200 stars on GitHub, reflecting strong community adoption and trust. It is compatible with claude, codex.

Key Capabilities

✓llm

✓evaluation

✓testing

Why Use llm-evaluation

Adding llm-evaluation to your AI workflow can significantly enhance your productivity in llm-agents tasks. With pre-defined prompt templates and best practices, this skill helps AI assistants better understand your requirements and deliver more accurate responses.

Whether you use claude or codex, you can easily integrate this skill into your existing development environment.

Explore More llm-agents Skills

Discover more AI skills in the llm-agents category to build a comprehensive AI skill stack.

agent-identifier47.9k configured-agent47.9k command-name47.9k claude-opus-4-5-migration47.9k PPTX creation, editing, and analysis31.9k

Related Skills

agent-identifier

This skill should be used when the user asks to create an agent, add an agent, write a subagent, or needs guidance on agent structure, system prompts, triggering conditions.

47.9k

configured-agent

This skill should be used when the user asks about plugin settings, store plugin configuration, user-configurable plugin, .local.md files, or plugin state files.

47.9k

command-name

This skill should be used when the user asks to create a plugin, scaffold a plugin, understand plugin structure, organize plugin components, or set up plugin.json.

47.9k

claude-opus-4-5-migration

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5. Handles model string updates and prompt adjustments.

47.9k

PPTX creation, editing, and analysis

description: "Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks"

31.9k

llm-evaluation

Description

How to Use

Full Skill Documentation

Tags

About llm-evaluation

Key Capabilities

Why Use llm-evaluation

Explore More llm-agents Skills

Related Skills

agent-identifier

configured-agent

command-name

claude-opus-4-5-migration

PPTX creation, editing, and analysis