AI-Assisted Result Analysis¶
Balansor's AI Analyst evaluates survey data quality across two complementary dimensions: sample representativeness (does the sample match the target population?) and response quality (are questions producing informative, diverse answers?). The analyst combines statistical metrics with methodological expertise to produce structured reports with actionable recommendations.
Overview¶
After collecting survey responses, researchers need to answer two fundamental questions before trusting their data:
- Is the sample representative? — Do respondent demographics match the target population defined in the sampling strategy?
- Are the responses reliable? — Do answer patterns indicate genuine engagement, or are there signs of satisficing, inattention, or bias?
Balansor addresses both questions through computed metrics and an AI analyst agent that interprets results in methodological context.
flowchart LR
subgraph input["Input"]
b[Bronze Dataset]
s[Sampling Strategy]
end
subgraph metrics["Quality Metrics"]
r[Sample Representativeness]
q[Response Quality]
end
subgraph output["Output"]
w[Silver Dataset]
rpt[Analyst Report]
end
b --> r
s --> r
b --> q
r --> rpt
q --> rpt
b -->|apply raking| w
w --> r
Quality Metrics¶
The platform computes quality metrics across two dimensions, grounded in established survey methodology (Groves, Kish, Krosnick, Shannon, Cronbach):
| Dimension | Key Metrics |
|---|---|
| Sample Representativeness | RMSE, MAE, Chi-Square, Max Deviation, Composite Quality Score |
| Weighting Diagnostics | Design Effect (DEFF), Effective Sample Size, Weight CV, Completion Rate |
| Response Quality | Normalized Entropy, Straightlining Score, Cronbach's Alpha, Acquiescence Bias, Speeder Detection, Multi-Flag Aggregation |
See the Quality Metrics Reference for complete definitions, mathematical formulas, interpretation thresholds, and recommended actions.
The AI Analyst Agent¶
The AI Analyst is a specialized agent that interprets quality metrics in methodological context and produces structured reports. It runs as a background task in Balansor, powered by the askalot_ai agent framework.
How It Works¶
sequenceDiagram
participant U as Researcher
participant B as Balansor
participant R as Redis
participant A as Analyst Agent
participant P as Portor MCP
U->>B: Start Quality Analysis
B->>R: Create progress queue
B->>A: Spawn background thread
B-->>U: Return task_id
A->>P: get_dataset_quality(dataset_id)
P-->>A: RMSE, MAE, per-factor breakdown, weighting diagnostics
A->>P: get_sampling_strategy(strategy_id)
P-->>A: Target distributions
A->>P: get_campaign(campaign_id)
P-->>A: Completion rates, respondent counts
A->>P: get_dataset_response_quality(dataset_id)
P-->>A: Entropy, straightlining, speeders, multi-flag respondents
A->>P: compare_dataset_quality(bronze_id, silver_id)
P-->>A: Before/after weighting comparison
A->>R: Emit progress events
A->>R: Emit completed report
U->>B: Poll for status
B->>R: Read progress
B-->>U: Display report
- Credential resolution: The agent resolves AI provider credentials from user settings, organization configuration, or environment variables (in that priority order)
- Background execution: The analysis runs in a background thread via
askalot_ai.runner.subprocess, isolating it from the web request lifecycle - MCP tool access: The agent connects to Portor's MCP interface to call quality assessment tools, gather campaign context, and retrieve sampling strategy targets
- Progress tracking: Redis Streams provide cross-worker progress reporting — safe across Gunicorn's multiple worker processes
- Report generation: The agent produces a structured markdown report following a defined template
Agent Profile¶
| Property | Value |
|---|---|
| Agent type | analyst |
| Model tier | High (Claude Opus) |
| Temperature | 0.4 (low — favors precision over creativity) |
| Max turns | 30 |
| Knowledge base | Data quality assessment, weighting methodology, response quality metrics |
The analyst's system prompt encodes survey methodology expertise from AAPOR standards, Kish (1965), Groves et al. (2009), Kalton & Flores-Cervantes (2003), Krosnick (1991), and ESOMAR guidelines — including specific formulas, thresholds, and decision frameworks for interpreting design effects, straightlining scores, speeder flags, and multi-flag aggregation.
Report Structure¶
The analyst produces a report with five sections:
1. Executive Summary 2–3 sentences: overall quality assessment, fitness for purpose, and the single most important recommendation.
2. Sample Representativeness
- Overall quality score with interpretation
- RMSE and MAE values with context
- Per-factor analysis identifying which demographics match targets and which deviate
- Specific numbers: "Age 18–24 is over-represented by 8pp (32% actual vs 24% target)"
3. Weighting Assessment (if Silver dataset exists)
- Quality improvement percentages from raking
- Per-factor improvement breakdown
- Weighting diagnostics: design effect (DEFF), effective sample size (ESS), weight CV, and weight ratio interpretation
- Flags for any factors that worsened after weighting
- Assessment of whether weighting was effective or structural changes are needed
4. Response Quality
- Speeder detection: number flagged, percentage of sample, median completion time
- Straightlining: groups with high scores (> 0.5)
- Item non-response: questions exceeding 10% missing rates
- Acquiescence bias index (if Likert scales present)
- Multi-flag respondents (baseball rule) — exclusion recommendation if any
5. Key Findings 3–5 specific, data-driven findings. Each references actual numbers from quality metrics.
6. Recommendations 2–4 prioritized, actionable recommendations. Each includes what to change, why (linked to specific finding), and expected impact.
Using Quality Analysis in Balansor¶
Viewing Quality Metrics¶
- Navigate to Quality from the main menu
- Select a dataset from the dropdown
- Two tabs are available:
- Sample Representativeness — demographics vs strategy targets
- Response Quality — answer diversity, straightlining, consistency
Sample Representativeness Tab¶
Shows per-factor breakdowns with actual vs target distributions. If both Bronze and Silver datasets exist, a side-by-side comparison shows improvement from weighting.
If the dataset has no linked sampling strategy, a strategy selector appears so you can assign one.
Response Quality Tab¶
Displays four aggregate summary cards:
| Card | Metric | What It Shows |
|---|---|---|
| Diversity | Mean Normalized Entropy | Average answer diversity across categorical questions |
| Acquiescence | Bias Index | Agreement tendency in Likert scales |
| Non-Response | Overall Rate | Average skip rate across questions |
| Coverage | Question Count | Number of questions analyzed by type |
Below the summary: a per-question metrics table, straightlining detection panel (for groups), internal consistency panel (Cronbach's alpha for groups with 3+ sub-items), speeder detection results, and multi-flag respondent aggregation.
Running AI Analysis¶
- On the Quality page, click "Run AI Analysis"
- The analyst agent starts in the background — progress updates appear in real time
- When complete, the structured report appears with methodology-grounded interpretation
The AI analysis requires:
- An AI provider API key (configured in user profile, organization settings, or environment)
- A Portor MCP endpoint (for accessing quality tools and campaign context)
- Redis (for cross-worker progress tracking)
Dataset Detail Page¶
Each dataset's detail page includes a compact Response Quality card showing diversity score, non-response rate, straightlining summary, and question type breakdown. Click through to the full quality analysis page for detailed metrics.
Interpreting Results¶
For detailed interpretation guidance — common patterns, recommended actions, and the two-dimensional quality matrix — see the Quality Metrics Reference: Interpreting Results.
MCP Tools¶
Quality analysis tools are available through the MCP interface for programmatic or AI-assisted access:
| Tool | Purpose |
|---|---|
get_dataset_quality |
Sample representativeness metrics (RMSE, MAE, Chi-Square, per-factor). Silver datasets include weighting diagnostics (DEFF, ESS, weight CV) and completion rate |
get_dataset_response_quality |
Response quality metrics (entropy, straightlining, consistency, acquiescence, speeder detection, multi-flag aggregation) |
compare_dataset_quality |
Side-by-side Bronze vs Silver comparison |
See the Dataset Tools Reference for complete parameter documentation.
Related Documentation¶
- Quality Metrics Reference — Metric definitions, formulas, interpretation thresholds
- Data Analysis Guide — Bronze/Silver/Gold pipeline, weighting, export
- Campaign Management — Sampling strategies and respondent pools
- Agentic Response Generation — Synthetic data for pipeline testing
- MCP Dataset Tools — Programmatic access to quality metrics