Agentic Response Generation¶
This document describes the Campaign Simulation feature that uses AI agents to generate synthetic survey responses based on persona profiles, enabling testing of hypothetical campaigns before real deployment.
Overview¶
Campaign Simulation allows users to test hypothetical campaign scenarios with synthetic data. AI agents complete surveys as virtual respondents, generating realistic responses based on demographic profiles and behavioral traits.
Use Cases¶
- Validate questionnaire flow and logic before real deployment
- Test data analysis pipelines with realistic synthetic data
- Train interviewers with simulated survey sessions
- Estimate resource requirements for campaign execution
- Generate demo data for stakeholder presentations
Response Generation Modes¶
LLM-Powered Responses (Recommended)¶
The distribution="llm" mode uses Claude Haiku to generate contextually-aware responses based on persona profiles. Each question is answered by an LLM that receives the full persona profile and all previous Q&A pairs, producing responses that are consistent across the entire survey.
Key advantages over rule-based generation:
- Contextual consistency: A respondent who answers "unemployed" won't later describe a workplace
- Natural language: Open-ended text responses reflect the persona's communication style
- Nuanced choices: Option selection considers the full question context, not just weighted randomness
Requirements: ANTHROPIC_API_KEY environment variable must be set. The mode fails immediately if the key is missing — no silent fallback.
Example:
User: mass_fill_surveys(campaign_id="abc", distribution="llm", profiles=["young_single", "retired"])
Result:
Young Professional answering "How often do you shop online?"
→ "Several times a week" (tech-savvy, time-conscious)
Retired Senior answering the same question
→ "A few times a month" (traditional, deliberate shopper)
Rule-Based Responses¶
The default distribution="realistic" mode uses weighted random selection based on demographic profiles. Faster but less contextually aware — good for high-volume data pipeline testing where response quality is less important.
| Distribution | Method | Speed | Quality |
|---|---|---|---|
llm |
Claude Haiku per question | ~1s/question | High — contextual, consistent |
realistic |
Weighted random | Instant | Medium — demographically influenced |
random |
Pure random | Instant | Low — no demographic influence |
stratified |
Strata-matched random | Instant | Medium — matches demographic strata |
Campaign Simulation (Three-Phase)¶
Simulate a complete campaign lifecycle with multiple respondents.
flowchart LR
subgraph phase1["Phase 1: Preparation"]
p1a[Create Project] --> p1b[Create Campaign]
p1b --> p1c[Generate Respondents]
p1c --> p1d[Create Strategy & Pool]
p1d --> p1e[Bulk Create Surveys]
end
subgraph phase2["Phase 2: Execution"]
p2a[Assign Personas] --> p2b[Complete Surveys]
p2b --> p2c[Handle Branching]
end
subgraph phase3["Phase 3: Analysis"]
p3a[Create Bronze Dataset] --> p3b[Apply Raking]
p3b --> p3c[Export Data]
end
phase1 --> phase2
phase2 --> phase3
Phase 1: Preparation¶
- Creates project, questionnaire, campaign
- Generates synthetic respondents with realistic demographics
- Creates sampling strategy and respondent pool
- Bulk creates surveys for all respondents
Phase 2: Execution¶
- Completes surveys using persona-based response generation
- Simulates realistic response patterns based on demographics
- Handles questionnaire branching and skip logic
Phase 3: Analysis¶
- Extracts responses into Bronze dataset
- Applies post-stratification weighting (raking)
- Creates Silver/Gold datasets
- Exports in multiple formats (CSV, XLSX, SPSS, Parquet)
Persona Profiles¶
Personas define the characteristics that guide synthetic response generation. Each persona has demographic attributes and behavioral traits that influence how they answer questions.
Built-in Profiles¶
| Profile | Demographics | Response Characteristics |
|---|---|---|
| Young Professional | Age 25-35, single, employed, $50-80k income | Tech-savvy, time-conscious, brief responses |
| Family Oriented | Age 30-50, married, children, suburban | Family-focused decisions, detailed responses |
| Retired Senior | Age 65+, retired, fixed income | Traditional values, thorough responses |
| High Earner | Age 35-55, high income (>$100k) | Quality-focused, premium preferences |
| Random | Varied demographics | Unpredictable response patterns |
Profile Structure¶
Each persona profile is defined with:
name: Young Professional
description: Young adult professional, tech-savvy and time-conscious
demographics:
age_range: 25-35
gender: any
marital_status: single
employment: full-time professional
income: $50,000 - $80,000
education: bachelor's degree
location: urban/suburban
behavioral_traits:
- tech-savvy
- time-conscious
- values convenience
- early adopter
response_style:
verbosity: brief
decision_speed: quick
detail_level: moderate
Custom Profiles¶
Users can create custom personas for specific research needs:
User: "Create a persona for budget-conscious students"
AI: Creating custom persona: Budget Student
Demographics:
- Age: 18-24
- Employment: Part-time or student
- Income: < $25,000
- Education: Currently enrolled
- Location: University town
Behavioral Traits:
- Price-sensitive
- Social media active
- Peer-influenced decisions
- Values free trials/samples
Response Style:
- Casual language
- Quick decisions
- Strong opinions on value
Save this persona for future simulations?
Response Generation¶
How LLM Responses Work¶
For each survey question, the LLM receives a single prompt containing:
- System prompt: Persona profile text with demographic details and behavioral traits
- Respondent demographics: Actual demographic data from the respondent record (age, gender, education, etc.)
- Q&A history: All previous question-answer pairs from this survey session
- Current question: Question text, available options, and expected JSON response format
The LLM returns a structured JSON response that is parsed into the format expected by the survey engine (single value, array, or dictionary depending on question type).
Conversation History¶
Each survey maintains a running history of Q&A pairs. This history is included in every subsequent prompt, enabling the LLM to:
- Maintain consistency — someone who said "unemployed" won't describe a workplace later
- Build on context — satisfaction ratings align with earlier positive/negative experiences
- Avoid contradictions — demographic answers stay internally coherent
Response Format by Question Type¶
| Question Type | Expected Format | Example |
|---|---|---|
| Single choice (radio/dropdown) | "option_value" |
"very_satisfied" |
| Multiple choice (checkbox) | ["val1", "val2"] |
["email", "phone"] |
| Slider (numeric) | 42 |
7.5 |
| Switch (boolean) | true / false |
true |
| Open text | "response text" |
"Great service overall" |
| Question group | ["val1", "val2", ...] |
["yes", "no", "maybe"] |
| Matrix question | {"_0_0": "val", ...} |
{"_0_0": "agree", "_1_0": "neutral"} |
Researcher Workflow¶
Persona Cocktail → Mass Fill → Analyze¶
The recommended workflow for generating synthetic data with LLM responses:
- Design your persona mix — choose which profiles to include and in what proportion
- Call
mass_fill_surveyswithdistribution="llm"and your chosen profiles - Create Bronze dataset — extract raw responses
- Apply raking — weight responses to match target demographics
- Export and analyze — CSV, XLSX, SPSS, or Parquet
Example: MCP Tool Call¶
mass_fill_surveys(
campaign_id="campaign-uuid",
distribution="llm",
profiles=["young_single", "married_family", "retired", "high_income"],
max_surveys=50
)
This cycles through the four profiles across 50 surveys, producing contextually-aware responses for each persona.
Chat-Based Configuration¶
User: "Run a simulation of the NPS campaign with 50 respondents,
mostly young professionals and families, using LLM responses"
AI: Configuring simulation:
Campaign: NPS Customer Satisfaction
Distribution: LLM (Claude Haiku)
Respondents: 50 synthetic profiles
Persona Distribution:
- Young Professional: 40% (20 respondents)
- Family Oriented: 40% (20 respondents)
- Random: 20% (10 respondents)
Each survey will maintain conversation history for consistent responses.
Output:
- Bronze dataset (raw responses)
- Silver dataset (weighted)
- XLSX export
Start simulation?
Simulation Output¶
After simulation completes, users receive:
1. Campaign Summary¶
Simulation Complete: "Q1 Satisfaction Test Run"
Surveys:
- Total: 100
- Completed: 98
- Terminated (screening): 2
Response Statistics:
- Average completion time: 6m 42s
- Questions answered: 1,470
- Skip rate: 8.3%
NPS Results:
- Promoters (9-10): 34%
- Passives (7-8): 41%
- Detractors (0-6): 25%
- NPS Score: +9
2. Data Quality Report¶
Demographic Distribution vs. Targets:
┌─────────────┬──────────┬──────────┬─────────┐
│ Factor │ Target │ Actual │ RMSE │
├─────────────┼──────────┼──────────┼─────────┤
│ Male │ 48% │ 47% │ 0.01 │
│ Female │ 50% │ 51% │ 0.01 │
│ Age 18-24 │ 15% │ 14% │ 0.01 │
│ Age 25-34 │ 25% │ 26% │ 0.01 │
│ Age 35-44 │ 25% │ 24% │ 0.01 │
└─────────────┴──────────┴──────────┴─────────┘
Overall Quality: ✅ Excellent (RMSE: 0.012)
Weighting Efficiency: 94% (Design Effect: 1.06)
3. Exportable Datasets¶
| Dataset | Description | Format |
|---|---|---|
| Bronze | Raw responses as collected | Parquet |
| Silver | Weighted responses (raking applied) | Parquet |
| Gold | Refined, export-ready | CSV, XLSX, SPSS |
Architecture¶
Direct SDK Approach¶
The LLM response generation uses the Anthropic SDK directly in Portor rather than the askalot_ai multi-agent framework. This is intentional:
- Portor already has the survey state in-process via
FlowProcessor— no need for the agent to call MCP tools back into itself - Each question is a single stateless LLM call ("given this profile, answer this question")
- No circular MCP calls, simpler error handling, lower latency
| Aspect | Configuration |
|---|---|
| Model | Claude Haiku 4.5 (claude-haiku-4-5-20251001) |
| Max Tokens | 256 per response |
| Temperature | 0.8 (natural variation) |
| Context | Persona profile + respondent demographics + Q&A history |
Batch Processing¶
Surveys are completed sequentially within mass_fill_surveys. Each survey accumulates its own Q&A history for internal consistency while remaining independent of other surveys.
Persona Profiles¶
Profile files are markdown documents with YAML frontmatter, mounted into Portor containers from the shared askalot_ai prompts directory. The same profiles are used by both the LLM mode and the rule-based mode.
Provider¶
The LLM mode uses Anthropic only — Claude Haiku provides the best balance of speed, cost, and quality for survey response generation. The platform's local Ollama instance is used only for document embedding, not for response generation.
Best Practices¶
When to Use Simulation¶
- Before launching expensive campaigns
- Testing new questionnaire designs
- Training data analysts on the data pipeline
- Validating sampling strategies
- Generating demo data for stakeholder presentations
Simulation Limitations¶
- Not real opinions: Synthetic data reflects persona templates, not actual attitudes
- Consistency vs. variety: Personas may be more consistent than real humans
- Edge cases: Unusual response combinations may be underrepresented
- Open-ended questions: AI-generated text may lack authentic diversity
- API dependency: LLM mode requires a valid Anthropic API key and internet access
Error Handling¶
The LLM mode follows the platform's "No Silent Fallbacks" principle:
- Missing API key: Fails immediately before processing any surveys
- API errors: Individual survey failures are logged and reported in the result
- Invalid JSON from LLM: Falls back to best-effort parsing (string value, first option)
- No silent degradation: The mode never silently switches to rule-based generation
Recommendations¶
- Use LLM mode for quality — when response realism matters (presentations, analysis testing)
- Use rule-based for speed — when you need high volume quickly (pipeline stress testing)
- Mix personas — include "random" profile for variety
- Compare to pilots — validate simulation patterns against small real samples
- Document synthetic data — clearly label datasets as simulated
Related Documentation¶
- AI-Assisted Campaign Management - Campaign setup and management
- Data Analysis - Bronze/Silver/Gold pipeline
- Campaign Management - Core campaign concepts