Skip to content

Comprehensive Questionnaire Validation

This document clarifies the relationship between a global satisfiability check of the questionnaire and the per-item implication checks used in the implementation. We formalize both notions and prove what is (and is not) equivalent, with practical consequences for validation.

Setup

Recall from Questionnaire Analysis that each outcome variable \(S_i\) has an associated domain constraint \(D_i(S_i)\). Let \(B := \bigwedge_{i=1}^n D_i(S_i)\) denote the conjunction of all domain constraints in the questionnaire. For each item \(I_i\), let \(P_i\) be its precondition and \(Q_i\) its postcondition, expressed as Boolean formulas over the answers \(\mathbf{S} = (S_1,\dots,S_n)\).

Global Satisfiability Formula

Definition (Global Satisfiability Formula): The global satisfiability formula for a questionnaire is:

\[ \mathcal{F} := B \wedge \bigwedge_{i=1}^n \bigl(P_i \Rightarrow Q_i\bigr) \]

We say the questionnaire passes the global satisfiability check if \(\mathrm{SAT}(\mathcal{F})\) holds.

Interpretation: There exists at least one assignment to all variables that:

  1. Satisfies all domain constraints (\(B\))
  2. For each item, if its precondition holds, its postcondition also holds

Witness Formula

Definition (Witness Formula): For an item \(I_i\), the witness formula is:

\[ W_i := B \wedge P_i \wedge \lnot Q_i \]

If \(\mathrm{SAT}(W_i)\), then there exists an assignment satisfying the domain constraints where item \(I_i\) is reached (precondition \(P_i\) holds) but the postcondition \(Q_i\) is violated. Such an assignment demonstrates that the item can fail validation.

The key diagnostic question is whether \(W_i\) is satisfiable:

  • \(\mathrm{UNSAT}(W_i)\) means item \(I_i\) is locally valid (whenever it is reached, its postcondition must hold)
  • \(\mathrm{SAT}(W_i)\) indicates a potential violation

Key Relations

The following theorems capture the exact relationship between the global check and the per-item checks.

Theorem: Soundness of Per-Item Validity

Theorem: If \(\mathrm{SAT}(B)\) and for every \(i \in \{1, \ldots, n\}\) we have \(\mathrm{UNSAT}(W_i)\), then \(\mathrm{SAT}(\mathcal{F})\).

Informal statement: If the base constraints are satisfiable and every item's postcondition is per-item valid (cannot be violated), then the global formula is satisfiable.

Complete Formal Proof

Assume \(\mathrm{SAT}(B)\) and \(\mathrm{UNSAT}(W_i)\) for all \(i \in \{1, \ldots, n\}\), where \(W_i = B \wedge P_i \wedge \lnot Q_i\).

Step 1: Transform unsatisfiability to entailment for each item.

For each \(i\), we have \(\mathrm{UNSAT}(B \wedge P_i \wedge \lnot Q_i)\).

By definition of unsatisfiability:

\[ \nexists \mathbf{a} : \mathbf{a} \models (B \wedge P_i \wedge \lnot Q_i) \]

Equivalently, by the quantifier negation rule (i.e., \(\neg \exists x \, P(x) \equiv \forall x \, \neg P(x)\)):

\[ \forall \mathbf{a} : \mathbf{a} \not\models (B \wedge P_i \wedge \lnot Q_i) \]

By the definition of satisfaction for conjunction (an assignment fails to satisfy a conjunction if it fails to satisfy at least one conjunct), this means for all assignments \(\mathbf{a}\):

\[ \mathbf{a} \not\models B \text{ or } \mathbf{a} \not\models P_i \text{ or } \mathbf{a} \not\models \lnot Q_i \]

Note that \(\mathbf{a} \not\models \lnot Q_i\) is equivalent to \(\mathbf{a} \models Q_i\) (by double negation).

Rewriting the disjunction using the logical equivalence \((\neg A \vee \neg B \vee C) \equiv ((A \wedge B) \Rightarrow C)\), we get: for all assignments \(\mathbf{a}\):

\[ \text{if } \mathbf{a} \models B \wedge P_i, \text{ then } \mathbf{a} \models Q_i \]

by definition of \(\models\), this is equivalent to:

\[ B \wedge P_i \models Q_i \]

By the semantic deduction theorem (moving \(P_i\) to the right), this is the definition of:

\[ B \models (P_i \Rightarrow Q_i) \]

Step 2: Combine entailments for all items.

We have established that \(B \models (P_i \Rightarrow Q_i)\) for each \(i \in \{1, \ldots, n\}\).

By the monotonicity of entailment: if \(B \models \phi_i\) for each \(i\), then \(B \models \bigwedge_i \phi_i\).

Therefore:

\[ B \models \bigwedge_{i=1}^n (P_i \Rightarrow Q_i) \]

Step 3: Construct a model for \(\mathcal{F}\).

By hypothesis, \(\mathrm{SAT}(B)\), so there exists an assignment \(\mathbf{a}\) such that \(\mathbf{a} \models B\).

From Step 2, we have \(B \models \bigwedge_{i=1}^n (P_i \Rightarrow Q_i)\).

By the definition of semantic entailment, every model of \(B\) is also a model of \(\bigwedge_{i=1}^n (P_i \Rightarrow Q_i)\).

Since \(\mathbf{a} \models B\), we have \(\mathbf{a} \models \bigwedge_{i=1}^n (P_i \Rightarrow Q_i)\).

Therefore:

\[ \mathbf{a} \models B \wedge \bigwedge_{i=1}^n (P_i \Rightarrow Q_i) = \mathcal{F} \]

Hence, \(\mathrm{SAT}(\mathcal{F})\). \(\square\)

Practical implication: Per-item validation provides a sufficient condition for global consistency. If each item's postcondition passes individual validity checks (\(\mathrm{UNSAT}(W_i)\)), we can conclude the questionnaire is globally consistent without performing the more expensive global SAT check.

Theorem: Local Invalidity Does Not Imply Global Unsatisfiability

Theorem: If \(\mathrm{SAT}(W_i)\) for some \(i\), it does not follow that \(\mathrm{UNSAT}(\mathcal{F})\).

The witness \(W_i\) only shows that there exists a violating model of \(B\); the global check asks whether there exists a (possibly different) model that satisfies all implications simultaneously.

Driving Experience Survey

Consider a two-item questionnaire:

  • \(I_1\): "What is your age?" with \(S_1 \in [0, 120]\), \(P_1 = \mathit{true}\), \(Q_1 = \mathit{true}\)
  • \(I_2\): "Years of driving experience?" with \(S_2 \in [0, 100]\), \(P_2 = (S_1 \geq 16)\), \(Q_2 = (S_2 \leq S_1 - 16)\)

The postcondition \(Q_2\) ensures driving experience cannot exceed years since turning 16. Consider the witness formula for \(I_2\):

\[ W_2 = B \wedge P_2 \wedge \lnot Q_2 = B \wedge (S_1 \geq 16) \wedge (S_2 > S_1 - 16) \]

This is satisfiable: \(S_1 = 20\), \(S_2 = 10\) satisfies \(W_2\) (claiming 10 years of experience at age 20, which violates \(Q_2\) since \(10 > 20 - 16 = 4\)). So \(\mathrm{SAT}(W_2)\)—the per-item check reports item \(I_2\) as locally invalid.

However, the global formula \(\mathcal{F}\) is still satisfiable. Consider \(S_1 = 30\), \(S_2 = 5\):

  • \(P_2 = (30 \geq 16) = \mathit{true}\), and \(Q_2 = (5 \leq 30 - 16) = (5 \leq 14) = \mathit{true}\)
  • Thus \((P_2 \Rightarrow Q_2) = \mathit{true}\)

This assignment satisfies \(\mathcal{F}\), so \(\mathrm{SAT}(\mathcal{F})\) despite \(\mathrm{SAT}(W_2)\). The local invalidity witness shows that some respondents could violate the postcondition, but the questionnaire still has valid completions.

Theorem: Accumulated Constraints Can Cause Global Unsatisfiability

Theorem: There exist questionnaires where no item is INFEASIBLE (each \(\mathrm{SAT}(B \wedge P_i \wedge Q_i)\)), yet the global formula \(\mathcal{F}\) is unsatisfiable.

Per-item checks examine each postcondition independently against the domain constraints \(B\). However, during questionnaire execution, items are evaluated in dependency order, and postconditions from earlier items must be satisfied before proceeding to later items. These accumulated constraints are not visible to per-item checks.

Conflicting Postconditions

Consider a two-item questionnaire where \(I_2\) depends on \(S_1\):

  • \(I_1\): "Rate your experience (1--100)" with \(S_1 \in [1, 100]\), \(P_1 = \mathit{true}\), \(Q_1 = (S_1 > 50)\)
  • \(I_2\): "Confirm your rating" with \(S_2 \in \{0, 1\}\), \(P_2 = \mathit{true}\), \(Q_2 = (S_1 < 30)\)

Since \(Q_2\) references \(S_1\), the dependency graph requires \(I_1 \to I_2\) ordering. The respondent must satisfy \(Q_1\) (i.e., \(S_1 > 50\)) before proceeding to \(I_2\).

Per-item analysis:

  • \(W_1 = B \wedge \mathit{true} \wedge (S_1 \leq 50)\) is SAT (e.g., \(S_1 = 25\)) \(\Rightarrow\) CONSTRAINING
  • \(W_2 = B \wedge \mathit{true} \wedge (S_1 \geq 30)\) is SAT (e.g., \(S_1 = 50\)) \(\Rightarrow\) CONSTRAINING
  • Neither item is INFEASIBLE: \(\mathrm{SAT}(B \wedge P_1 \wedge Q_1)\) and \(\mathrm{SAT}(B \wedge P_2 \wedge Q_2)\)

Global analysis:

\[ \mathcal{F} = B \wedge (\mathit{true} \Rightarrow S_1 > 50) \wedge (\mathit{true} \Rightarrow S_1 < 30) = B \wedge (S_1 > 50) \wedge (S_1 < 30) \]

This is unsatisfiable—no value of \(S_1\) can simultaneously satisfy \(S_1 > 50\) and \(S_1 < 30\). Therefore \(\mathrm{UNSAT}(\mathcal{F})\).

The per-item checks miss this conflict because they evaluate each postcondition against all possible values of \(S_1\), not accounting for the fact that \(Q_1\) must already be satisfied when \(I_2\) is reached.

Path-Based Validation

Path-based validation addresses a limitation of the global formula \(\mathcal{F}\): the use of implications \((P_i \Rightarrow Q_i)\) that are vacuously true when preconditions are false. This section formalizes the path-based approach and proves its relationship to global validation.

Path-based validation builds on the dependency graph constructed from item dependencies. Recall that the dependency graph \(G = (V, E)\) captures data flow between items, and Kahn's algorithm computes a topological ordering that determines valid evaluation sequences.

Execution Path

Definition (Execution Path): An execution path \(\pi\) through a questionnaire is a sequence of items \((I_{k_1}, I_{k_2}, \ldots, I_{k_m})\) respecting the dependency order (as determined by the topological sort of the dependency graph), where each item's precondition is satisfied given the accumulated constraints from preceding items.

Path Constraint Formula

Definition (Path Constraint Formula): For an execution path \(\pi = (I_{k_1}, \ldots, I_{k_m})\), the path constraint formula is:

\[ \Phi_\pi := B \wedge \bigwedge_{j=1}^{m} \bigl(P_{k_j} \wedge Q_{k_j}\bigr) \]

The path is feasible if \(\mathrm{SAT}(\Phi_\pi)\).

Note the key difference from the global formula: path constraints use conjunctions \((P_i \wedge Q_i)\) for items on the path, not implications \((P_i \Rightarrow Q_i)\). This ensures that preconditions are actually satisfiable, not just vacuously handled.

Theorem: Global Validity is Necessary for Path Validity

Theorem: If \(\mathrm{UNSAT}(\mathcal{F})\), then for every execution path \(\pi\), we have \(\mathrm{UNSAT}(\Phi_\pi)\).

Proof

For any path \(\pi = (I_{k_1}, \ldots, I_{k_m})\), if \(\mathrm{SAT}(\Phi_\pi)\), then there exists an assignment \(\mathbf{a}\) satisfying \(B \wedge \bigwedge_{j=1}^m (P_{k_j} \wedge Q_{k_j})\).

For items on the path: \(\mathbf{a} \models P_{k_j}\) and \(\mathbf{a} \models Q_{k_j}\), so \(\mathbf{a} \models (P_{k_j} \Rightarrow Q_{k_j})\).

For items not on the path: their preconditions are false under \(\mathbf{a}\) (otherwise they would be on the path), so \((P_i \Rightarrow Q_i)\) is vacuously true.

Therefore \(\mathbf{a} \models \mathcal{F}\), contradicting \(\mathrm{UNSAT}(\mathcal{F})\). \(\square\)

Theorem: Global Validity Does Not Guarantee All Items Reachable

Theorem: There exist questionnaires where \(\mathrm{SAT}(\mathcal{F})\) but some item \(I_i\) with \(P_i \neq \mathit{false}\) is unreachable on every feasible path.

Dead Code Detection

Consider:

  • \(I_1\): \(S_1 \in [1, 100]\), \(P_1 = \mathit{true}\), \(Q_1 = (S_1 > 80)\)
  • \(I_2\): \(S_2 \in \{0, 1\}\), \(P_2 = (S_1 < 50)\), \(Q_2 = \mathit{true}\)

The dependency graph requires \(I_1 \to I_2\) (since \(P_2\) references \(S_1\)).

Per-item analysis:

  • \(P_2 = (S_1 < 50)\) is CONDITIONAL: \(\mathrm{SAT}(B \wedge S_1 < 50)\) with \(S_1 = 25\)
  • \(Q_1 = (S_1 > 80)\) is CONSTRAINING: \(\mathrm{SAT}(W_1)\) with \(S_1 = 25\)

Global analysis:

\[ \mathcal{F} = B \wedge (\mathit{true} \Rightarrow S_1 > 80) \wedge ((S_1 < 50) \Rightarrow \mathit{true}) \equiv B \wedge (S_1 > 80) \]

This is satisfiable (e.g., \(S_1 = 90\)), so \(\mathrm{SAT}(\mathcal{F})\).

Path-based analysis: Any path including \(I_2\) requires:

\[ \Phi_\pi = B \wedge (P_1 \wedge Q_1) \wedge (P_2 \wedge Q_2) \equiv B \wedge (S_1 > 80) \wedge (S_1 < 50) \]

This is unsatisfiable—no value of \(S_1\) can satisfy both \(S_1 > 80\) and \(S_1 < 50\).

Therefore, \(I_2\) is classified as CONDITIONAL by per-item analysis (reachable when \(S_1 < 50\)), the global check passes, but \(I_2\) is actually unreachable on any valid execution path. This is dead code that path-based validation detects.

Accumulated Reachability

In practice, path enumeration leverages the dependency graph structure. The topological ordering computed by Kahn's algorithm (see Dependency Analysis) determines the evaluation order.

Definition (Accumulated Reachability): For item \(I_i\), let \(\mathrm{Pred}(i) = \{j : I_j \prec I_i\}\) be the set of predecessor items, where \(I_j \prec I_i\) denotes that \(I_j\) precedes \(I_i\) in the topological order. The accumulated constraint formula is:

\[ \mathcal{A}_i := B \wedge \bigwedge_{j \in \mathrm{Pred}(i)} (P_j \Rightarrow Q_j) \]

Item \(I_i\) is accumulated-reachable if \(\mathrm{SAT}(\mathcal{A}_i \wedge P_i)\).

Accumulated Reachability Analysis

Consider a three-item questionnaire:

  • \(I_1\): "Annual income" with \(S_1 \in [0, 1000000]\), \(P_1 = \mathit{true}\), \(Q_1 = (S_1 \geq 50000)\)
  • \(I_2\): "Tax bracket" with \(S_2 \in \{1, 2, 3\}\), \(P_2 = \mathit{true}\), \(Q_2 = \mathit{true}\)
  • \(I_3\): "Low-income assistance" with \(S_3 \in \{0, 1\}\), \(P_3 = (S_1 < 30000)\), \(Q_3 = \mathit{true}\)

The dependency graph has \(I_1 \to I_3\) (since \(P_3\) references \(S_1\)), while \(I_2\) is independent.

Per-item analysis:

  • \(P_3 = (S_1 < 30000)\) is CONDITIONAL: \(\mathrm{SAT}(B \wedge S_1 < 30000)\) holds

Accumulated reachability for \(I_3\):

The predecessor set is \(\mathrm{Pred}(3) = \{1\}\) (only \(I_1\) precedes \(I_3\) in the dependency order; \(I_2\) is independent and not a predecessor).

\[ \mathcal{A}_3 = B \wedge (P_1 \Rightarrow Q_1) = B \wedge (\mathit{true} \Rightarrow S_1 \geq 50000) = B \wedge (S_1 \geq 50000) \]

Now check accumulated reachability:

\[ \mathcal{A}_3 \wedge P_3 = B \wedge (S_1 \geq 50000) \wedge (S_1 < 30000) \]

This is unsatisfiable—no income value can be both \(\geq 50000\) and \(< 30000\). Therefore \(I_3\) is not accumulated-reachable.

The questionnaire designer intended \(I_3\) (low-income assistance) to appear for respondents with income below $30,000, but \(Q_1\) requires income \(\geq\) $50,000 to proceed. This is dead code that accumulated reachability analysis detects.

If \(\mathrm{UNSAT}(\mathcal{A}_i \wedge P_i)\) for a CONDITIONAL item, the item is dead code—its precondition can never be satisfied given the constraints imposed by predecessor items.

Validation Hierarchy

Questionnaire validation can be performed at three levels of increasing thoroughness and computational cost:

Level 1: Per-Item Validation

For each item \(I_i\), check independently:

  • Reachability: Classify \(P_i\) as ALWAYS, NEVER, or CONDITIONAL using \(\mathrm{SAT}(B \wedge P_i)\)
  • Validity: Classify \(Q_i\) as TAUTOLOGICAL, CONSTRAINING, or INFEASIBLE using the witness formula \(W_i\)

This is the fastest level—each item can be checked independently and in parallel.

See Preconditions and Postconditions for details.

Level 2: Global Validation

Check \(\mathrm{SAT}(\mathcal{F})\) where \(\mathcal{F} = B \wedge \bigwedge_i (P_i \Rightarrow Q_i)\).

This verifies that at least one valid questionnaire completion exists, but uses implications that are vacuously true when preconditions are false.

Level 3: Path-Based Validation

Enumerate all execution paths through the questionnaire and check each path's accumulated constraints.

This is the most thorough level, detecting items that become unreachable due to accumulated postconditions (dead code).

Relationships Between Levels

The three validation levels form a hierarchy with precise logical relationships:

Per-item passes \(\Rightarrow\) Global passes
By the Soundness theorem, if all \(W_i\) are UNSAT (all postconditions TAUTOLOGICAL), then \(\mathrm{SAT}(\mathcal{F})\) is guaranteed. No further checks needed.
Global fails \(\Rightarrow\) Path-based fails
If \(\mathrm{UNSAT}(\mathcal{F})\), then no execution path can be valid. The global formula is a necessary condition for questionnaire validity.
Global passes \(\not\Rightarrow\) All paths valid
The global check can pass while some items are unreachable, because implications \((P_i \Rightarrow Q_i)\) are vacuously true when \(P_i\) is false.

When Each Level Suffices

Per-item validation suffices
When all postconditions are TAUTOLOGICAL. The questionnaire has no constraining logic, so any answer combination is valid.
Global validation suffices
When you only need to verify that some valid completion exists. Dead code (unreachable items) is acceptable.
Path-based validation is needed
When you require that every conditional item is actually reachable on at least one valid path. This detects design errors where accumulated postconditions make items unreachable.

Summary of Validation Results

All \(W_i\) UNSAT
All postconditions are TAUTOLOGICAL. By the Soundness theorem, global and path-based validity are guaranteed—no further checks needed.
For some \(i\), \(\mathrm{SAT}(W_i)\), \(\mathrm{UNSAT}(\mathcal{F})\)
Accumulated CONSTRAINING postconditions conflict. By the Global Validity theorem, no execution path is valid. This is a design error.
For some \(i\), \(\mathrm{SAT}(W_i)\), \(\mathrm{SAT}(\mathcal{F})\), all items accumulated-reachable
The questionnaire is fully valid—completable with no dead code.
For some \(i\), \(\mathrm{SAT}(W_i)\), \(\mathrm{SAT}(\mathcal{F})\), some items not accumulated-reachable
The questionnaire is completable, but contains dead code (unreachable items). Report as warning.
Some item INFEASIBLE
The postcondition cannot be satisfied when the item is reached—design error requiring fix.
Some item NEVER reachable (per-item)
Precondition is unsatisfiable even without accumulated constraints—design error.

Visual Summary

graph TD
    A[Base Constraints B] --> B{SAT?}
    B -->|UNSAT| C[All W_i are UNSAT<br/>F is UNSAT<br/>Broken base system]
    B -->|SAT| D{All W_i UNSAT?}
    D -->|Yes| E[F is SAT<br/>Questionnaire consistent<br/>Per-item checks sufficient]
    D -->|No| F{Some W_i SAT}
    F --> G{Check SAT of F}
    G -->|SAT| H{All items<br/>accumulated-reachable?}
    G -->|UNSAT| I[F is UNSAT<br/>No valid completion path]
    H -->|Yes| J[Fully valid<br/>No dead code]
    H -->|No| K[Completable but<br/>contains dead code]

Practical Validation Strategy

Given these relationships, a sound validation strategy proceeds in layers:

Layer 1: Base Constraint Check

  1. Verify \(\mathrm{SAT}(B)\) first
  2. If UNSAT: Report fundamental design error in domain constraints
  3. If SAT: Proceed to per-item analysis

Layer 2: Per-Item Analysis

  1. For each item \(i\):
  2. Check precondition reachability: ALWAYS, NEVER, or CONDITIONAL
  3. Check postcondition status: VACUOUS, TAUTOLOGICAL, CONSTRAINING, or INFEASIBLE
  4. Report specific violating assignments as diagnostics
  5. Flag NEVER-reachable items and INFEASIBLE postconditions as design errors

Layer 3: Global Consistency (Optional)

  1. If high confidence is needed, perform global check:
  2. Solve \(\mathrm{SAT}(\mathcal{F})\) to confirm at least one valid completion path exists
  3. If UNSAT despite per-item validity, investigate inter-item conflicts
  4. If SAT, optionally report witness assignment as example valid survey path

Layer 4: Path-Based Analysis (Optional)

  1. For comprehensive validation:
  2. Check accumulated reachability for each CONDITIONAL item
  3. Report items that become unreachable as dead code warnings
  4. Ensure all intended survey paths are actually feasible

Practical Validation Heuristic

For most questionnaires:

  • If \(\mathrm{SAT}(B)\) and all items pass per-item checks (\(\mathrm{UNSAT}(W_i)\)), the questionnaire is valid
  • Global check is only needed when:
    • Extra confidence is required (critical surveys)
    • Some per-item violations exist but you suspect global consistency
    • Debugging complex inter-item dependencies
  • Path-based analysis is needed when:
    • You need to ensure no dead code exists
    • Complex conditional logic with many CONSTRAINING postconditions

Further Reading