Run Eval

POST /v2/eval-runs/-/score-responses

Run the eval with the provided responses.

Args: eval_run_data (EvalRunRequest): Data for the eval run, including responses. workspace_uuid (str, optional): UUID of the workspace. Defaults to None. is_sandbox (bool, optional): Whether to run in sandbox mode. Defaults to False.

Returns: EvalRunResult: The result of the eval run after scoring the responses.

Raises: AymaraAPIError: If the organization is missing or the request is invalid.

Example: POST /api/eval-runs/-/score-responses { "eval_uuid": "...", "responses": [...] }

Query parameters

  • workspace_uuid string
  • is_sandbox boolean

    Default value is false.

application/json

Body Required

  • eval_uuid string Required

    Unique identifier for the eval.

  • eval_run_uuid string | null
  • name string | null
  • ai_description string | null
  • continue_thread boolean | null

    Default value is false.

  • eval_run_examples array[object] | null

    Schema for examples to include with an eval run.

    Hide eval_run_examples attributes Show eval_run_examples attributes object
    • example_uuid string | null
    • type string Required

      Type of the example: "pass" or "fail".

      Values are pass or fail.

    • prompt string Required

      Prompt text for the example.

    • response string Required

      Expected response for the example.

    • explanation string | null
  • responses array[object] Required

    List of AI responses to eval prompts.

    Schema for submitting AI responses to eval prompts.

    Hide responses attributes Show responses attributes object
    • prompt_uuid string Required

      Unique identifier for the prompt.

    • thread_uuid string | null
    • turn_number integer

      Turn number in the conversation (default: 1).

      Default value is 1.

    • continue_thread boolean

      Whether to continue the thread after this response.

      Default value is false.

    • content string | null | object

      Content of the AI response or a file reference.

      Any of:
    • content_type string

      Content type for AI interactions.

      Values are text or image. Default value is text.

    • exclude_from_scoring boolean

      Whether to exclude this response from scoring.

      Default value is false.

    • ai_refused boolean

      Whether the AI refused to answer the prompt.

      Default value is false.

Responses

  • 200 application/json

    OK

    Hide response attributes Show response attributes object
    • eval_run_uuid string Required

      Unique identifier for the eval run.

    • eval_uuid string Required

      Unique identifier for the eval.

    • name string | null
    • status string Required

      Resource status.

      Values are created, processing, finished, or failed.

    • created_at string(date-time) Required

      Timestamp when the eval run was created.

    • updated_at string(date-time) Required

      Timestamp when the eval run was last updated.

    • evaluation object | null

      Schema for configuring an Eval based on a eval_type.

      Hide evaluation attributes Show evaluation attributes object | null
      • eval_uuid string | null
      • name string | null
      • ai_description string Required

        Description of the AI under evaluation.

      • ai_instructions string | null
      • eval_type string Required

        Type of the eval (safety, accuracy, etc.)

      • eval_instructions string | null
      • language string | null

        Default value is en.

      • modality string

        Content type for AI interactions.

        Values are text or image. Default value is text.

      • ground_truth string | null | object

        Ground truth data or reference file, if any.

        Any of:
      • num_prompts integer | null

        Default value is 100.

      • prompt_examples array[object] | null
        Hide prompt_examples attributes Show prompt_examples attributes object
        • content string Required

          Content of the example prompt.

        • example_uuid string | null
        • type string

          Values are good or bad. Default value is good.

        • explanation string | null
      • is_jailbreak boolean

        Indicates if the eval is a jailbreak test.

        Default value is false.

      • is_sandbox boolean

        Indicates if the eval results are sandboxed.

        Default value is false.

      • workspace_uuid string | null
      • status string | null

        Resource status.

        Values are created, processing, finished, or failed.

      • created_at string(date-time) | null
      • updated_at string(date-time) | null
    • ai_description string | null
    • workspace_uuid string | null
    • pass_rate number | null
    • num_prompts integer | null
    • num_responses_scored integer | null
    • responses array[object] | null

      Schema for returning AI response data.

      Hide responses attributes Show responses attributes object
      • prompt_uuid string Required

        Unique identifier for the prompt.

      • thread_uuid string | null
      • turn_number integer

        Turn number in the conversation (default: 1).

        Default value is 1.

      • continue_thread boolean

        Whether to continue the thread after this response.

        Default value is false.

      • content string | null | object

        Content of the AI response or a file reference.

        Any of:
      • content_type string

        Content type for AI interactions.

        Values are text or image. Default value is text.

      • exclude_from_scoring boolean

        Whether to exclude this response from scoring.

        Default value is false.

      • ai_refused boolean

        Whether the AI refused to answer the prompt.

        Default value is false.

      • response_uuid string | null
      • explanation string | null
      • confidence number | null
      • is_passed boolean | null
      • next_prompt object | null
        Hide next_prompt attributes Show next_prompt attributes object | null
        • prompt_uuid string Required

          Unique identifier for the prompt.

        • thread_uuid string | null
        • turn_number integer

          Turn number in the conversation (default: 1).

          Default value is 1.

        • content string Required

          Content of the prompt.

        • category string | null
  • 400 application/json

    Bad Request

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 401 application/json

    Unauthorized

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 403 application/json

    Forbidden

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 404 application/json

    Not Found

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 409 application/json

    Conflict

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 422 application/json

    Unprocessable Entity

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 429 application/json

    Too Many Requests

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 500 application/json

    Internal Server Error

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

  • 503 application/json

    Service Unavailable

    Hide response attributes Show response attributes object
    • error object Required

      Schema for the contents of an error response.

      This schema defines the structure of the error data inside the error field of an API error response.

      Hide error attributes Show error attributes object
      • code string Required

        Enumeration of all error codes used in the API.

        Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.

      • message string Required
      • details object

        Default value is {} (empty).

    • request_id string

      Default value is empty.

POST /v2/eval-runs/-/score-responses
import os
from aymara_ai import AymaraAI

client = AymaraAI(
    api_key=os.environ.get("AYMARA_AI_API_KEY"),  # This is the default and can be omitted
)
eval_run_result = client.evals.runs.score_responses(
    eval_uuid="eval_uuid",
    responses=[{
        "prompt_uuid": "prompt_uuid"
    }],
)
print(eval_run_result.eval_run_uuid)
curl \
 --request POST 'https://api.aymara.ai/v2/eval-runs/-/score-responses' \
 --header "x-api-key: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"eval_uuid":"string","eval_run_uuid":"string","name":"string","ai_description":"string","continue_thread":false,"eval_run_examples":[{"example_uuid":"string","type":"pass","prompt":"string","response":"string","explanation":"string"}],"responses":[{"prompt_uuid":"string","thread_uuid":"string","turn_number":1,"continue_thread":false,"content":"string","content_type":"text","exclude_from_scoring":false,"ai_refused":false}]}'
Request examples
{
  "eval_uuid": "string",
  "eval_run_uuid": "string",
  "name": "string",
  "ai_description": "string",
  "continue_thread": false,
  "eval_run_examples": [
    {
      "example_uuid": "string",
      "type": "pass",
      "prompt": "string",
      "response": "string",
      "explanation": "string"
    }
  ],
  "responses": [
    {
      "prompt_uuid": "string",
      "thread_uuid": "string",
      "turn_number": 1,
      "continue_thread": false,
      "content": "string",
      "content_type": "text",
      "exclude_from_scoring": false,
      "ai_refused": false
    }
  ]
}
Response examples (200)
{
  "eval_run_uuid": "string",
  "eval_uuid": "string",
  "name": "string",
  "status": "created",
  "created_at": "2025-05-04T09:42:00Z",
  "updated_at": "2025-05-04T09:42:00Z",
  "evaluation": {
    "eval_uuid": "string",
    "name": "string",
    "ai_description": "string",
    "ai_instructions": "string",
    "eval_type": "string",
    "eval_instructions": "string",
    "language": "en",
    "modality": "text",
    "ground_truth": "string",
    "num_prompts": 100,
    "prompt_examples": [
      {
        "content": "string",
        "example_uuid": "string",
        "type": "good",
        "explanation": "string"
      }
    ],
    "is_jailbreak": false,
    "is_sandbox": false,
    "workspace_uuid": "string",
    "status": "created",
    "created_at": "2025-05-04T09:42:00Z",
    "updated_at": "2025-05-04T09:42:00Z"
  },
  "ai_description": "string",
  "workspace_uuid": "string",
  "pass_rate": 42.0,
  "num_prompts": 42,
  "num_responses_scored": 42,
  "responses": [
    {
      "prompt_uuid": "string",
      "thread_uuid": "string",
      "turn_number": 1,
      "continue_thread": false,
      "content": "string",
      "content_type": "text",
      "exclude_from_scoring": false,
      "ai_refused": false,
      "response_uuid": "string",
      "explanation": "string",
      "confidence": 42.0,
      "is_passed": true,
      "next_prompt": {
        "prompt_uuid": "string",
        "thread_uuid": "string",
        "turn_number": 1,
        "content": "string",
        "category": "string"
      }
    }
  ]
}
Response examples (400)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (401)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (403)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (404)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (409)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (422)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (429)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (500)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}
Response examples (503)
{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}