Questionnaire Analysis: Foundations¶
This document introduces the formal mathematical framework for questionnaire analysis used in Askalot. We define the core concepts of questionnaires as mathematical objects and establish the notation used throughout the theory section.
Overview¶
Askalot treats questionnaires as formal systems that can be mathematically analyzed for consistency, reachability, and logical coherence. This approach enables automated detection of design flaws before surveys go live, ensuring respondents never encounter impossible situations or unreachable questions.
Why Formal Analysis Matters¶
Traditional survey platforms allow designers to create logically inconsistent questionnaires:
- Questions that can never be reached due to impossible preconditions
- Postcondition constraints that cannot be satisfied
- Circular dependencies between questions
- Global inconsistencies where no valid survey completion path exists
Askalot's formal framework detects these issues automatically through SMT-based analysis, ensuring survey quality before deployment.
Setup and Notation¶
Answer Variables¶
Let \(\mathbf{S} := (S_1, \dots, S_n)\) denote the vector of integer answer variables representing outcomes for questionnaire items. An assignment is a vector \(\mathbf{a} := (a_1, \dots, a_n)\) over the corresponding domains.
We write \(\mathbf{a} \models \phi\) to denote that the formula \(\phi\) evaluates to true under the assignment \(\mathbf{a}\).
Domain Constraints¶
Let the set of domain-respecting assignments be:
Let the conjunction of all base domain constraints be:
Satisfiability Notation¶
We use standard satisfiability notation:
- \(\mathrm{SAT}(\phi)\) indicates \(\phi\) has a model (is satisfiable)
- \(\mathrm{UNSAT}(\phi)\) indicates no model exists (is unsatisfiable)
- Semantic entailment is written \(\Gamma \models \phi\)
Logic Notation
The symbol "\(\models\)" is used in two related senses:
- \(\mathbf{a} \models \phi\) means assignment \(\mathbf{a}\) satisfies formula \(\phi\) (evaluates to true)
- \(\Gamma \models \phi\) means formula set \(\Gamma\) entails \(\phi\) (every model of \(\Gamma\) is also a model of \(\phi\))
Context determines which sense applies.
Questionnaire Structure¶
Formal Definition¶
Definition (Questionnaire): A questionnaire \(\mathcal{G}\) is a tuple \((\mathcal{I}, \mathbf{S}, \mathcal{D}, \mathcal{P}, \mathcal{Q}, I_{\mathrm{start}})\) where:
- \(\mathcal{I} = \{I_1, \dots, I_n\}\) is a finite set of items
- \(\mathbf{S} = (S_1, \dots, S_n)\) is a vector of integer answer variables
- \(\mathcal{D} = (D_1, \dots, D_n)\) where \(D_i(S_i)\) specifies the domain constraint for \(S_i\)
- \(\mathcal{P} = (P_1, \dots, P_n)\) where \(P_i(\mathbf{S})\) is the precondition formula for \(I_i\)
- \(\mathcal{Q} = (Q_1, \dots, Q_n)\) where \(Q_i(\mathbf{S})\) is the postcondition formula for \(I_i\)
- \(I_{\mathrm{start}} \in \mathcal{I}\) is the designated starting item
Nature of Items¶
An item \(I_i\) is a questionnaire element that may take several forms:
- Informational items
- Comments or instructions without associated outcomes (no answer variable)
- Scalar questions
- Single questions with a single integer outcome \(S_i\)
- Question groups
- Collections of related sub-questions sharing a common precondition, postcondition, and domain constraint
- Matrix questions
- Grid-based questions where all cells share a common domain (e.g., rating scales), with vector or matrix outcomes
For complex items (groups and matrices), all individual elements within the item share the same domain specification—either a numeric interval \([\ell, u]\) or a discrete set of allowed values.
Simplified Presentation
For clarity, this foundational section focuses on scalar items with single integer outcomes. Complex item types (vectors, matrices) are covered in Complex Types.
Components Explained¶
Items¶
We have a finite set of \(n\) items labeled \(I_1, I_2, \dots, I_n\). Each item represents a question, informational text, or group of related questions.
Answer Variables¶
Each item \(I_i\) that collects responses has an associated integer variable \(S_i\) (or vector of variables for complex items). Informational items without responses have no associated outcome variable.
Domain Constraints¶
Each variable \(S_i\) must lie within a specified integer range:
for some lower bound \(\ell_i\) and upper bound \(u_i\).
Together, all domain constraints form:
Preconditions¶
Each item \(I_i\) has a precondition \(P_i(\mathbf{S})\), a Boolean predicate over the set of answer variables \(\mathbf{S} = (S_1, \dots, S_n)\).
- \(P_i\) determines whether \(I_i\) is accessible or asked in a given scenario
- If \(P_i(\mathbf{S})\) is False under a particular assignment, then \(I_i\) does not appear
Example: If item \(I_3\) asks "How many children do you have?" we might have:
where \(S_2\) encodes the answer to "Do you have children?" (1 = yes, 0 = no).
Postconditions (\(Q_i\))¶
Each item \(I_i\) also has a postcondition \(Q_i(\mathbf{S})\), another Boolean formula over \(\mathbf{S}\).
- A postcondition \(Q_i\) only matters if \(I_i\) is asked (i.e., if \(P_i(\mathbf{S})\) was True)
- Postconditions constrain acceptable outcomes to maintain semantic consistency with prior answers
Formally, we expect:
Example: For "combined family income" item \(I_5\), we might require:
where \(S_3\) and \(S_4\) are individual incomes from items \(I_3\) and \(I_4\). This ensures logical consistency: combined income cannot be less than the sum of individual incomes.
Implementation via Python Expressions¶
In practice, preconditions \(P_i\) and postconditions \(Q_i\) are specified as Python boolean expressions in QML files. These expressions:
- Reference answer variables (e.g.,
item1.outcome,item2.outcome) - Are compiled to Z3 constraints before satisfiability checking
- May include Python statement blocks for complex computations (compiled via SSA transformation)
Throughout this documentation, we use the term Boolean formula in the mathematical sense to refer abstractly to these predicates, while the concrete implementation uses Python expressions compiled to first-order logic constraints for solver-based analysis.
Item Range¶
Each outcome variable \(S_i\) is constrained to a well-defined domain. In practice, domains are specified as static numeric intervals (min/max) and enforced by the base constraints \(D_i(S_i)\).
Conceptually, discrete sets (lists of allowed values) can be represented by disjunctions or membership constraints. The current implementation primarily realizes numeric intervals.
Key Validation Questions¶
Given a questionnaire \(\mathcal{G}\), we can ask:
- Reachability: Is item \(I_i\) ever reachable? (See Preconditions and Reachability)
- Constraint feasibility: Can postcondition \(Q_i\) be satisfied when \(P_i\) holds?
- Global consistency: Does there exist at least one valid completion path through the survey?
- Dependency cycles: Do question dependencies form cycles that prevent evaluation?
The following pages in this theory section address each of these questions with precise mathematical definitions, algorithms, and proofs.
Summary¶
We have established:
- Formal representation of questionnaires as tuples of items, variables, domains, and logical formulas
- Mathematical notation for assignments, satisfiability, and entailment
- Core concepts of preconditions (when items are asked) and postconditions (what answers are valid)
- Motivation for formal analysis: detecting design flaws automatically
This foundation enables rigorous analysis of questionnaire logic using SMT solvers. The next sections build on these definitions to analyze specific properties of questionnaires.
Further Reading¶
- Preconditions and Postconditions - Reachability analysis and constraint classification
- Global vs Local Validity - Understanding different notions of questionnaire consistency
- Complex Types - Extensions for vector and matrix question types