Writing Good Evaluations

The quality of your Tempered results depends directly on the quality of your input. This guide covers how to write effective evaluation descriptions for different use cases.

The Golden Rule

Be specific about what, why, and what's affected. Vague descriptions produce vague analysis. Specific descriptions produce actionable insights.

Structure of a Good Description

1. What Is Changing

State the change clearly. Include technical specifics where relevant.

2. Why You're Making the Change

Context about motivation helps the analysis focus on the right risks.

3. What's Affected

List systems, users, data, and downstream dependencies.

4. Constraints and Context

Include anything that shapes the risk profile.

Using the Context Field

The optional JSON context field provides structured data that Tempered can analyse alongside your description:

{
  "environment": "production",
  "services_affected": ["order-service", "reporting-dashboard"],
  "users_affected": 5000,
  "maintenance_window": "Sunday 02:00-06:00 UTC",
  "rollback_plan": "Blue-green switchback, tested in staging",
  "compliance_frameworks": ["PCI DSS", "ISO 27001"],
  "data_sensitivity": "high",
  "tested_in_staging": true,
  "dependencies": ["api-v2.3"]
}

There's no fixed schema — use whatever keys are relevant to your change. Common keys include:

Key Description
environment Where the change is applied (production, staging, etc.)
services_affected List of services or systems impacted
users_affected Number or description of affected users
maintenance_window Scheduled change window
rollback_plan How to revert if things go wrong
data_sensitivity Classification of data involved (low, medium, high, critical)
compliance_frameworks Applicable regulations or standards
tested Whether the change has been tested
dependencies Other changes or systems this depends on
reversible Whether the change can be undone

Examples by Scenario

Infrastructure Change

Deploy updated Traefik reverse proxy configuration to enable HTTP/3 (QUIC) on all public-facing endpoints. Change involves: new UDP listener on port 443, updated TLS configuration, and Traefik version bump from 2.11 to 3.0. Tested in staging for 1 week. Rollback: revert to previous Traefik image tag.

Security Change

Implement mandatory MFA for all administrator accounts across the platform. Currently 12 admin accounts use password-only authentication. Change enforces TOTP or WebAuthn on next login. Grace period: 7 days. Fallback: admin can request temporary bypass via support ticket with identity verification.

Data Migration

Migrate 500,000 customer records from legacy CRM (Salesforce) to new platform (HubSpot). Data includes: contact details, purchase history, support tickets. PII fields: name, email, phone, address. Migration runs via encrypted API-to-API transfer with field mapping validated against 1,000 sample records.

Policy Change

Update the data retention policy from 7 years to 5 years for non-financial customer records. Affects approximately 2 million records. Requires: legal review sign-off, customer notification, automated deletion job scheduling. Does not affect records subject to financial regulatory requirements (FCA 7-year retention).

Knowledge Documents

For recurring types of evaluations, upload knowledge documents that provide standing context:

Upload via Settings → Knowledge Documents or via the API:

curl -X POST https://your-tempered-instance/api/v1/knowledge/ \
  -H "Authorization: Bearer prx_your_api_key" \
  -F "title=Infrastructure Security Policy" \
  -F "[email protected]"

Then reference them when submitting evaluations:

{
  "description": "...",
  "knowledge_document_ids": ["doc-uuid-1", "doc-uuid-2"]
}

Common Mistakes

Mistake Impact Fix
Too vague ("update the system") Generic, unhelpful analysis Be specific about what's changing
Missing environment Can't assess blast radius Always state production/staging/dev
No rollback plan mentioned Analysis assumes irreversible Describe your rollback strategy
Ignoring compliance context Misses regulatory requirements List applicable frameworks
Omitting data sensitivity Under-assesses data risks Classify the data involved
Wall of text with no structure Key details buried Use clear sections or bullet points