Eight Dimensions. Compound Scoring.
Proportional Safety.
Jeff Toffoli
Founder, Quallaa
Plumber's Text-Back Bot
"Sorry we missed your call! How can we help?"
Autonomous Super Bowl Ad
AI writes, produces, and broadcasts to 100M viewers. No human review.
Both powered by the same foundation model. Both get identical safety treatment.
7 trustworthy AI characteristics, 211 actions
No scoring system. No deployment-level assessment.
4 risk tiers. Same model = different risk by deployment.
Binary classification (high/not-high). No graduation.
10 agentic risk categories. Least Agency principle.
Security checklist, not a scoring framework.
No one scores deployment risk.
The question "Is this specific deployment configured safely?" has no standardized answer.
Independent axes. Each scored 1-5. Together they define a deployment's risk profile.
Autonomy
Agent drafts; human reviews and sends every response
Action Capability
Can only read context and generate text responses
Consequence Severity
Wrong information, wasted time
Reversibility
Text response
Audience Exposure
< 50 known contacts
Domain Sensitivity
Retail, food service, general services
Identity Representation
Generic AI assistant
Data Sensitivity
Works with publicly available information
Agent drafts; human reviews and sends every response
Example
Email draft assistant with send approval
Can only read context and generate text responses
Example
Simple Q&A chatbot with no integrations
Wrong information, wasted time. No financial or legal exposure
Example
Agent gives wrong business hours
Text response. Customer can ignore it.
Example
Wrong info in a text — next message corrects it
< 50 known contacts. Owner knows everyone.
Example
Plumber's agent serving existing customers
Retail, food service, general services
Example
Plumber, restaurant, hair salon
Generic AI assistant. No business identity.
Example
"AI Assistant" widget on a website
Works with publicly available information
Example
FAQ bot using public website content
8 dimensions × 5 levels = 390,625 possible combinations. That's why you need compound scoring.
Deployment A: One Spike
[3, 1, 5, 1, 1, 1, 1, 1]
One dimension at max, rest minimal
Deployment B: Uniform Moderate
[3, 3, 3, 3, 3, 3, 3, 3]
Every dimension at moderate
Average treats them as equal. But uniformly moderate risk across 8 dimensions is far more dangerous than one spike.
Regardless of compound score, these combinations trigger mandatory requirements
Any dimension = 5
Human oversight plan + incident response
Consequence >= 4 AND Autonomy >= 4
Human-in-the-loop on all consequential actions
Data Sensitivity >= 4 AND Action >= 3
Data access audit logging + retention policy
Domain Sensitivity >= 4
Domain-specific compliance review
Identity >= 4 AND Consequence >= 3
Explicit AI disclosure at first contact
Missed-call text-back agent
Compound Risk Score
1.73
Booking agent with calendar access
Compound Risk Score
2.11
Clinic intake agent collecting symptoms
Compound Risk Score
3.26
Hard-Stops Triggered
Domain Sensitivity >= 4: compliance review required
Data Sensitivity = 5: human oversight plan required
Transaction-capable financial agent
Compound Risk Score
4.50
Hard-Stops Triggered
Consequence >= 4 AND Autonomy >= 4: human-in-the-loop required
Data Sensitivity >= 4 AND Action >= 3: audit logging required
Domain Sensitivity >= 4: compliance review required
Identity >= 4 AND Consequence >= 3: explicit AI disclosure required
Multiple dimensions = 5: human oversight plan required
The scoring system becomes infrastructure.
Risk Assessment
Score the deployment across 8 dimensions
Proportional Safety
Match safety measures to the actual risk tier
Citations
Every claim traced to its source document
Audit Trail
Complete record of what the agent did and why
Quallaa builds problem solving machines.
The trust layer makes previously unsolvable problems solvable.