How Risk Scores Are Calculated
Scores are deterministic, auditable, and transparent. Every input to the model is documented here. Platform members can understand exactly why a given score was calculated.
Risk Ratings
Scores range from 0 to 100. Each score maps to a named rating your platform can act on directly.
| Rating | Score Range | Recommended Action |
|---|---|---|
| clear | 0–10 | No meaningful history. Treat as unknown; no action required. |
| flagged | 11–30 | Some reports on record. Consider light friction or passive monitoring. |
| cautioned | 31–60 | A pattern is emerging. Consider additional verification or restricted access. |
| restricted | 61–85 | Significant confirmed history. Recommend: deny access or require human review. |
| blacklisted | 86–100 | Severe or repeated violations across multiple platforms. Block. |
Calculation Overview
A score is calculated from all confirmed reports for an identity. Each report contributes to a dimensional score (per violation category), which are then combined into a composite 0–100 score.
- 1 Per-report weight
Each report is assigned a base weight from its severity multiplier, then multiplied by the submitting platform's trust score (0–1), then by a time decay factor based on report age.
- 2 Diminishing returns per platform
For each platform submitting multiple reports, each additional report carries 0.8× the weight of the previous one. This prevents a single platform from overwhelming the score.
- 3 Dimensional scores
Weighted report values are accumulated per violation category to produce five independent dimensional scores (0–100 each), then normalized.
- 4 Composite score
Dimensional scores are combined using category weights (see table below) into a single 0–100 composite score. This is the score returned by
GET /v1/scores.
Category Weights
Five violation categories contribute to the composite score. Weights reflect the severity of harm associated with each category. Weights sum to 1.0.
| Category | Weight | Description |
|---|---|---|
| harassment | | Direct targeting, threats, sustained unwanted contact |
| fake_profile | | Identity fraud, impersonation, sockpuppet accounts |
| explicit_content | | Unsolicited explicit material, non-consensual sharing |
| unsolicited_dm | | Unsolicited direct messages, repeated unwanted outreach |
| spam | | Mass unsolicited messages, automated bulk activity |
Severity Multipliers
Each report is submitted with a severity level. The severity multiplier scales the report's contribution to the score.
| Severity | Multiplier | Use when |
|---|---|---|
| low | 0.5× | Minor policy violation, first-time, low impact |
| medium | 1.0× | Clear violation, confirmed intent, single incident |
| high | 1.75× | Serious violation, pattern of behavior, or victim impact |
| critical | 3.0× | Extreme violation, illegal content, credible threat, CSAM |
Score Modifiers
Time Decay
Reports older than 365 days carry reduced weight. The decay factor reaches a floor of 0.2 — old reports never drop to zero. Recent confirmed behavior is weighted most heavily.
Platform Trust
Each report is weighted by the submitting platform's trust score (0–1). All platforms start at 0.5. Platforms that consistently submit accurate reports earn higher trust over time.
Diminishing Returns
Each additional report from the same platform carries 0.8× the weight of the previous one. Prevents a single platform from dominating an identity's score.
Confidence Levels
Alongside the score, the API returns a confidence level reflecting how much corroborating evidence exists. A high score with low confidence may warrant more caution interpreting it.
| Confidence | Condition | Interpretation |
|---|---|---|
| low | Fewer than 3 confirmed reports | Limited data. Score reflects few data points — treat with caution. |
| medium | 3+ reports, fewer than 3 contributing platforms | Pattern established but corroboration is limited to one or two sources. |
| high | 3+ reports from 3+ distinct platforms | Well-corroborated. Independent platforms have each independently confirmed violations. |