Writing Good Evaluations

The quality of your Tempered results depends directly on the quality of your input. This guide covers how to write effective evaluation descriptions for different use cases.

The Golden Rule

Be specific about what, why, and what's affected. Vague descriptions produce vague analysis. Specific descriptions produce actionable insights.

Structure of a Good Description

1. What Is Changing

State the change clearly. Include technical specifics where relevant.

Bad: "Update the system"
Good: "Upgrade PostgreSQL from 15 to 16 on the production cluster"

2. Why You're Making the Change

Context about motivation helps the analysis focus on the right risks.

Bad: "Performance improvements"
Good: "Current query performance is degrading under peak load. PostgreSQL 16's improved query planner addresses our specific bottleneck in join operations on the orders table."

3. What's Affected

List systems, users, data, and downstream dependencies.

"Affects: order processing service, reporting dashboard, nightly batch jobs"
"Users impacted: all 5,000 active customers during the maintenance window"

4. Constraints and Context

Include anything that shapes the risk profile.

Environment: production, staging, development
Timeline: "During the Sunday 02:00-06:00 maintenance window"
Rollback plan: "Blue-green deployment with instant switchback"
Dependencies: "Requires API v2.3 deployed first"
Compliance: "PCI DSS scope — cardholder data environment"

Using the Context Field

The optional JSON context field provides structured data that Tempered can analyse alongside your description:

{
  "environment": "production",
  "services_affected": ["order-service", "reporting-dashboard"],
  "users_affected": 5000,
  "maintenance_window": "Sunday 02:00-06:00 UTC",
  "rollback_plan": "Blue-green switchback, tested in staging",
  "compliance_frameworks": ["PCI DSS", "ISO 27001"],
  "data_sensitivity": "high",
  "tested_in_staging": true,
  "dependencies": ["api-v2.3"]
}

There's no fixed schema — use whatever keys are relevant to your change. Common keys include:

Key	Description
`environment`	Where the change is applied (production, staging, etc.)
`services_affected`	List of services or systems impacted
`users_affected`	Number or description of affected users
`maintenance_window`	Scheduled change window
`rollback_plan`	How to revert if things go wrong
`data_sensitivity`	Classification of data involved (low, medium, high, critical)
`compliance_frameworks`	Applicable regulations or standards
`tested`	Whether the change has been tested
`dependencies`	Other changes or systems this depends on
`reversible`	Whether the change can be undone

Examples by Scenario

Infrastructure Change

Deploy updated Traefik reverse proxy configuration to enable HTTP/3 (QUIC) on all public-facing endpoints. Change involves: new UDP listener on port 443, updated TLS configuration, and Traefik version bump from 2.11 to 3.0. Tested in staging for 1 week. Rollback: revert to previous Traefik image tag.

Security Change

Implement mandatory MFA for all administrator accounts across the platform. Currently 12 admin accounts use password-only authentication. Change enforces TOTP or WebAuthn on next login. Grace period: 7 days. Fallback: admin can request temporary bypass via support ticket with identity verification.

Data Migration

Migrate 500,000 customer records from legacy CRM (Salesforce) to new platform (HubSpot). Data includes: contact details, purchase history, support tickets. PII fields: name, email, phone, address. Migration runs via encrypted API-to-API transfer with field mapping validated against 1,000 sample records.

Policy Change

Update the data retention policy from 7 years to 5 years for non-financial customer records. Affects approximately 2 million records. Requires: legal review sign-off, customer notification, automated deletion job scheduling. Does not affect records subject to financial regulatory requirements (FCA 7-year retention).

Knowledge Documents

For recurring types of evaluations, upload knowledge documents that provide standing context:

Security policies — so every security-related evaluation considers your policies
Architecture diagrams — so infrastructure changes are assessed against your actual topology
Compliance requirements — so regulatory obligations are automatically factored in
Runbooks — so operational changes are assessed against your standard procedures

Upload via Settings → Knowledge Documents or via the API:

curl -X POST https://your-tempered-instance/api/v1/knowledge/ \
  -H "Authorization: Bearer prx_your_api_key" \
  -F "title=Infrastructure Security Policy" \
  -F "[email protected]"

Then reference them when submitting evaluations:

{
  "description": "...",
  "knowledge_document_ids": ["doc-uuid-1", "doc-uuid-2"]
}

Common Mistakes

Mistake	Impact	Fix
Too vague ("update the system")	Generic, unhelpful analysis	Be specific about what's changing
Missing environment	Can't assess blast radius	Always state production/staging/dev
No rollback plan mentioned	Analysis assumes irreversible	Describe your rollback strategy
Ignoring compliance context	Misses regulatory requirements	List applicable frameworks
Omitting data sensitivity	Under-assesses data risks	Classify the data involved
Wall of text with no structure	Key details buried	Use clear sections or bullet points