Run Eval

POST /v2/eval-runs/-/score-responses

Run the eval with the provided responses.

Args: eval_run_data (EvalRunRequest): Data for the eval run, including responses. workspace_uuid (str, optional): UUID of the workspace. Defaults to None. is_sandbox (bool, optional): Whether to run in sandbox mode. Defaults to False.

Returns: EvalRunResult: The result of the eval run after scoring the responses.

Raises: AymaraAPIError: If the organization is missing or the request is invalid.

Example: POST /api/eval-runs/-/score-responses { "eval_uuid": "...", "responses": [...] }

Query parameters

workspace_uuid string
is_sandbox boolean

Default value is false.

application/json

Body Required

eval_uuid string Required

Unique identifier for the eval.
eval_run_uuid string | null
name string | null
ai_description string | null
continue_thread boolean | null

Default value is false.
eval_run_examples array[object] | null

Schema for examples to include with an eval run.
Hide eval_run_examples attributes Show eval_run_examples attributes object
- example_uuid string | null
- type string Required
  
  Type of the example: "pass" or "fail".
  
  Values are pass or fail.
- prompt string Required
  
  Prompt text for the example.
- response string Required
  
  Expected response for the example.
- explanation string | null
responses array[object] Required

List of AI responses to eval prompts.

Schema for submitting AI responses to eval prompts.
Hide responses attributes Show responses attributes object
- prompt_uuid string Required
  
  Unique identifier for the prompt.
- thread_uuid string | null
- turn_number integer
  
  Turn number in the conversation (default: 1).
  
  Default value is 1.
- continue_thread boolean
  
  Whether to continue the thread after this response.
  
  Default value is false.
- content string | null | object
  
  Content of the AI response or a file reference.
  
  Any of:
  string-1 string | null FileReference object | null
- content_type string
  
  Type of content in the response (e.g., text, image).
  
  Values are text or image. Default value is text.
- exclude_from_scoring boolean
  
  Whether to exclude this response from scoring.
  
  Default value is false.
- ai_refused boolean
  
  Whether the AI refused to answer the prompt.
  
  Default value is false.

Responses

200 application/json

OK
Hide response attributes Show response attributes object
- eval_run_uuid string Required
  
  Unique identifier for the eval run.
- eval_uuid string Required
  
  Unique identifier for the eval.
- name string | null
- status string Required
  
  Resource status.
  
  Values are created, processing, finished, or failed.
- created_at string(date-time) Required
  
  Timestamp when the eval run was created.
- updated_at string(date-time) Required
  
  Timestamp when the eval run was last updated.
- evaluation object | null
  
  Schema for configuring an Eval based on a eval_type.
  
  Hide evaluation attributes Show evaluation attributes object | null
  
  eval_uuid string | null
  
  name string | null
  
  ai_description string Required
  
  Description of the AI under evaluation.
  
  ai_instructions string | null
  
  eval_type string Required
  
  Type of the eval (safety, accuracy, etc.)
  
  eval_instructions string | null
  
  language string | null
  
  Default value is en.
  
  modality string
  
  Modality of the eval (e.g., text, image).
  
  Values are text or image. Default value is text.
  
  ground_truth string | null | object
  
  Ground truth data or reference file, if any.
  
  Any of:
  string-1 string | null FileReference object | null
  
  num_prompts integer | null
  
  Default value is 100.
  
  prompt_examples array[object] | null
  
  Hide prompt_examples attributes Show prompt_examples attributes object
  
  content string Required
  
  Content of the example prompt.
  
  example_uuid string | null
  
  type string
  
  Type of the example (e.g., GOOD, BAD).
  
  Values are good or bad. Default value is good.
  
  explanation string | null
  
  is_jailbreak boolean
  
  Indicates if the eval is a jailbreak test.
  
  Default value is false.
  
  is_sandbox boolean
  
  Indicates if the eval results are sandboxed.
  
  Default value is false.
  
  workspace_uuid string | null
  
  status string | null
  
  Resource status.
  
  Values are created, processing, finished, or failed.
  
  created_at string(date-time) | null
  
  updated_at string(date-time) | null
- ai_description string | null
- workspace_uuid string | null
- pass_rate number | null
- num_prompts integer | null
- num_responses_scored integer | null
- responses array[object] | null
  
  Schema for returning AI response data.
  
  Hide responses attributes Show responses attributes object
  
  prompt_uuid string Required
  
  Unique identifier for the prompt.
  
  thread_uuid string | null
  
  turn_number integer
  
  Turn number in the conversation (default: 1).
  
  Default value is 1.
  
  continue_thread boolean
  
  Whether to continue the thread after this response.
  
  Default value is false.
  
  content string | null | object
  
  Content of the AI response or a file reference.
  
  Any of:
  string-1 string | null FileReference object | null
  
  content_type string
  
  Type of content in the response (e.g., text, image).
  
  Values are text or image. Default value is text.
  
  exclude_from_scoring boolean
  
  Whether to exclude this response from scoring.
  
  Default value is false.
  
  ai_refused boolean
  
  Whether the AI refused to answer the prompt.
  
  Default value is false.
  
  response_uuid string | null
  
  explanation string | null
  
  confidence number | null
  
  is_passed boolean | null
  
  next_prompt object | null
  
  Hide next_prompt attributes Show next_prompt attributes object | null
  
  prompt_uuid string Required
  
  Unique identifier for the prompt.
  
  thread_uuid string | null
  
  turn_number integer
  
  Turn number in the conversation (default: 1).
  
  Default value is 1.
  
  content string Required
  
  Content of the prompt.
  
  category string | null
400 application/json

Bad Request
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
401 application/json

Unauthorized
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
403 application/json

Forbidden
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
404 application/json

Not Found
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
409 application/json

Conflict
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
422 application/json

Unprocessable Entity
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
429 application/json

Too Many Requests
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
500 application/json

Internal Server Error
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.
503 application/json

Service Unavailable
Hide response attributes Show response attributes object
- error object Required
  
  Schema for the contents of an error response.
  
  This schema defines the structure of the error data inside the error field of an API error response.
  
  Hide error attributes Show error attributes object
  
  code string Required
  
  Enumeration of all error codes used in the API.
  
  Values are auth.invalid_key, auth.expired_key, auth.insufficient_permissions, validation.invalid_request, validation.invalid_format, resource.not_found, resource.conflict, quota.limit_exceeded, or server.internal_error.
  
  message string Required
  
  details object
  
  Default value is {} (empty).
- request_id string
  
  Default value is empty.

POST /v2/eval-runs/-/score-responses

import os
from aymara_ai import AymaraAI

client = AymaraAI(
    api_key=os.environ.get("AYMARA_AI_API_KEY"),  # This is the default and can be omitted
)
eval_run_result = client.evals.runs.score_responses(
    eval_uuid="eval_uuid",
    responses=[{
        "prompt_uuid": "prompt_uuid"
    }],
)
print(eval_run_result.eval_run_uuid)

curl \
 --request POST 'https://api.aymara.ai/v2/eval-runs/-/score-responses' \
 --header "x-api-key: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"eval_uuid":"string","eval_run_uuid":"string","name":"string","ai_description":"string","continue_thread":false,"eval_run_examples":[{"example_uuid":"string","type":"pass","prompt":"string","response":"string","explanation":"string"}],"responses":[{"prompt_uuid":"string","thread_uuid":"string","turn_number":1,"continue_thread":false,"content":"string","content_type":"text","exclude_from_scoring":false,"ai_refused":false}]}'

Request examples

{
  "eval_uuid": "string",
  "eval_run_uuid": "string",
  "name": "string",
  "ai_description": "string",
  "continue_thread": false,
  "eval_run_examples": [
    {
      "example_uuid": "string",
      "type": "pass",
      "prompt": "string",
      "response": "string",
      "explanation": "string"
    }
  ],
  "responses": [
    {
      "prompt_uuid": "string",
      "thread_uuid": "string",
      "turn_number": 1,
      "continue_thread": false,
      "content": "string",
      "content_type": "text",
      "exclude_from_scoring": false,
      "ai_refused": false
    }
  ]
}

Response examples (200)

{
  "eval_run_uuid": "string",
  "eval_uuid": "string",
  "name": "string",
  "status": "created",
  "created_at": "2026-05-04T09:42:00Z",
  "updated_at": "2026-05-04T09:42:00Z",
  "evaluation": {
    "eval_uuid": "string",
    "name": "string",
    "ai_description": "string",
    "ai_instructions": "string",
    "eval_type": "string",
    "eval_instructions": "string",
    "language": "en",
    "modality": "text",
    "ground_truth": "string",
    "num_prompts": 100,
    "prompt_examples": [
      {
        "content": "string",
        "example_uuid": "string",
        "type": "good",
        "explanation": "string"
      }
    ],
    "is_jailbreak": false,
    "is_sandbox": false,
    "workspace_uuid": "string",
    "status": "created",
    "created_at": "2026-05-04T09:42:00Z",
    "updated_at": "2026-05-04T09:42:00Z"
  },
  "ai_description": "string",
  "workspace_uuid": "string",
  "pass_rate": 42.0,
  "num_prompts": 42,
  "num_responses_scored": 42,
  "responses": [
    {
      "prompt_uuid": "string",
      "thread_uuid": "string",
      "turn_number": 1,
      "continue_thread": false,
      "content": "string",
      "content_type": "text",
      "exclude_from_scoring": false,
      "ai_refused": false,
      "response_uuid": "string",
      "explanation": "string",
      "confidence": 42.0,
      "is_passed": true,
      "next_prompt": {
        "prompt_uuid": "string",
        "thread_uuid": "string",
        "turn_number": 1,
        "content": "string",
        "category": "string"
      }
    }
  ]
}

Response examples (400)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (401)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (403)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (404)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (409)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (422)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (429)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (500)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Response examples (503)

{
  "error": {
    "code": "auth.invalid_key",
    "message": "string",
    "details": {}
  },
  "request_id": ""
}

Run Eval

Query parameters

Body Required

content string | null | object

Responses

ground_truth string | null | object

content string | null | object