Skip to content

Questionnaire Analysis: Foundations

This document introduces the formal mathematical framework for questionnaire analysis used in Askalot. We define the core concepts of questionnaires as mathematical objects and establish the notation used throughout the theory section.

Overview

Askalot treats questionnaires as formal systems that can be mathematically analyzed for consistency, reachability, and logical coherence. This approach enables automated detection of design flaws before surveys go live, ensuring respondents never encounter impossible situations or unreachable questions.

Why Formal Analysis Matters

Traditional survey platforms allow designers to create logically inconsistent questionnaires:

  • Questions that can never be reached due to impossible preconditions
  • Postcondition constraints that cannot be satisfied
  • Circular dependencies between questions
  • Global inconsistencies where no valid survey completion path exists

Askalot's formal framework detects these issues automatically, ensuring survey quality before deployment.

Axiomatic Building Blocks

We build up the formal definitions from fundamental components to complete questionnaire structures.

Items

Definition (Item): An item \(I_i\) is a questionnaire element that presents content to respondents and optionally collects responses. Items may take several forms:

Informational items
Comments or instructions without associated outcomes (no answer variable)
Scalar questions
Single questions with a single integer outcome
Question groups
Collections of related sub-questions sharing common logical properties
Matrix questions
Grid-based questions where all cells share a common domain (e.g., rating scales)

Items form the basic structural units of a questionnaire, indexed as \(I_1, I_2, \dots, I_n\) for a questionnaire with \(n\) items.

Real-Life Items

  • Informational: "Please answer the following questions honestly."
  • Scalar: "What is your age?" (single integer response)
  • Question group: "Rate your satisfaction with our service in the following areas: delivery, product quality, customer support" (multiple related ratings)
  • Matrix: "Rate each product (rows) on the following attributes (columns): price, quality, design" (grid of ratings)

Answer Variables (Outcome Variables)

Definition (Outcome Variable): For each item \(I_i\) that collects responses, there is an associated outcome variable \(S_i\) representing the respondent's answer. The type of \(S_i\) depends on the item type:

  • For scalar questions: \(S_i \in \mathbb{Z}\) (single integer)
  • For vector questions: \(S_i \in \mathbb{Z}^k\) (vector of \(k\) integers)
  • For matrix questions: \(S_i \in \mathbb{Z}^{m \times n}\) (matrix of \(m \times n\) integers)
  • For informational items: no associated outcome variable

We denote the complete vector of outcome variables as \(\mathbf{S} = (S_1, \dots, S_n)\).

Simplified Presentation

For clarity, this foundational section focuses on scalar items with single integer outcomes. Complex item types (vectors, matrices) are covered in Complex Types.

Assignment: An assignment is a vector \(\mathbf{a} := (a_1, \dots, a_n)\) over the outcome variables. We write \(\mathbf{a} \models \phi\) to denote that the formula \(\phi\) evaluates to true under the assignment \(\mathbf{a}\).

Ranges and Base Domain

Definition (Domain Constraint): Each outcome variable \(S_i\) has an associated domain constraint \(D_i(S_i)\) that restricts its valid values. For scalar outcomes, the domain can be specified in two ways:

Range constraint
\(D_i(S_i) : \ell_i \leq S_i \leq u_i\), where \(\ell_i\) and \(u_i\) are the lower and upper bounds respectively
Enumeration constraint
\(D_i(S_i) : S_i \in \{v_1, v_2, \ldots, v_k\}\), where \(\{v_1, v_2, \ldots, v_k\}\) is a finite set of allowed integer values

The conjunction of all domain constraints forms the base constraint:

\[ B := \bigwedge_{i=1}^n D_i(S_i) \]

The set of domain-respecting assignments is:

\[ \mathrm{Dom} := \bigl\{\, \mathbf{a} \mid \bigwedge_{i=1}^{n} D_i(a_i) \text{ holds } \bigr\} \]

Real-Life Domain Constraints

  • Age question: \(D_1(S_1) : 0 \leq S_1 \leq 120\)
  • Satisfaction rating: \(D_2(S_2) : 1 \leq S_2 \leq 5\)
  • Multiple choice: \(D_3(S_3) : S_3 \in \{1, 2, 3, 4\}\) where 1=Strongly Disagree, 2=Disagree, 3=Agree, 4=Strongly Agree

Preconditions and Postconditions

Definition (Precondition): Each item \(I_i\) has a precondition \(P_i\), a Boolean formula over the outcome variables \(\mathbf{S} = (S_1, \dots, S_n)\) that determines whether the item is presented to the respondent.

  • If an assignment \(\mathbf{a}\) satisfies \(P_i\) (written \(\mathbf{a} \models P_i\)), then item \(I_i\) is accessible (asked)
  • If \(\mathbf{a} \not\models P_i\), then item \(I_i\) is not presented

Preconditions encode the conditional logic that creates branching paths through the questionnaire based on previous answers.

Real-Life Preconditions

Health Survey:

  • \(I_1\): "Do you have children?" with \(S_1 \in \{0, 1\}\) (0=No, 1=Yes)
  • \(I_2\): "How many children do you have?" with precondition \(P_2 = (S_1 = 1)\)
  • \(I_3\): "What is the age of your oldest child?" with precondition \(P_3 = (S_2 > 0)\)

Here \(I_2\) is only asked if the respondent has children, and \(I_3\) is only asked if they reported having at least one child.

Definition (Postcondition): Each item \(I_i\) has a postcondition \(Q_i\), a Boolean formula over the outcome variables \(\mathbf{S} = (S_1, \dots, S_n)\) that constrains valid responses for that item.

  • Postconditions are only evaluated when the item is asked (i.e., when \(\mathbf{a} \models P_i\) for the current assignment \(\mathbf{a}\))
  • Valid questionnaire responses must satisfy: \(P_i \implies Q_i\) (if the precondition holds, then the postcondition must hold)

Postconditions ensure logical consistency between answers and can dynamically restrict acceptable values based on prior responses.

Real-Life Postconditions

Income Survey:

  • \(I_1\): "Your personal income" with \(S_1 \in [0, 1000000]\)
  • \(I_2\): "Your spouse's income" with \(S_2 \in [0, 1000000]\)
  • \(I_3\): "Total household income" with postcondition \(Q_3 = (S_3 \geq S_1 + S_2)\)

The postcondition ensures logical consistency: household income must be at least the sum of individual incomes (other household members may contribute additional income).

Satisfiability Notation

We use standard satisfiability notation:

  • \(\mathrm{SAT}(\phi)\) indicates \(\phi\) has a model (is satisfiable)
  • \(\mathrm{UNSAT}(\phi)\) indicates no model exists (is unsatisfiable)
  • Semantic entailment is written \(\Gamma \models \phi\)

Logic Notation

The symbol "\(\models\)" is used in two related senses following standard logical notation:

  • \(\mathbf{a} \models \phi\) means assignment \(\mathbf{a}\) (a valuation) satisfies formula \(\phi\) (the formula evaluates to true under that assignment)
  • \(\Gamma \models \phi\) means formula set \(\Gamma\) entails formula \(\phi\) (every model of \(\Gamma\) is also a model of \(\phi\))

Context determines which sense applies:

  • If the left side is an assignment (valuation), it denotes satisfaction
  • If the left side is a formula or set of formulas, it denotes entailment

Entailment in Questionnaire Analysis

Consider a health survey with age-restricted questions about alcohol consumption:

  • \(I_1\): "What is your age?" with \(S_1 \in [1, 120]\)
  • \(I_2\): "Do you consume alcohol?" with \(S_2 \in \{0, 1\}\) (0=No, 1=Yes)
  • \(I_3\): "How many drinks per week?" with \(S_3 \in [0, 100]\)

With constraints:

  • \(P_2 = (S_1 \geq 18)\) — alcohol question only asked if age ≥ 18
  • \(P_3 = (S_1 \geq 18) \land (S_2 = 1)\) — drinks question only if legal age AND consumes alcohol
  • \(Q_3 = (S_3 > 0)\) — if answering drinks question, must report at least 1 drink

Let \(\Gamma = \{S_1 \geq 18,\, S_2 = 1,\, P_3,\, Q_3\}\) represent "person is 18+, consumes alcohol, item 3 is reached, and its postcondition holds."

Let \(\phi = (S_3 > 0)\) be "person drinks at least 1 drink per week."

Then \(\Gamma \models \phi\) because every assignment satisfying all formulas in \(\Gamma\) must also satisfy \(\phi\).

Questionnaire Structure

Having defined the building blocks, we now formally define a complete questionnaire structure.

Definition (Questionnaire): A questionnaire \(\mathcal{G}\) is a tuple \((\mathcal{I}, \mathbf{S}, \mathcal{D}, \mathcal{P}, \mathcal{Q}, I_{\mathrm{start}})\) where:

  • \(\mathcal{I} = \{I_1, \dots, I_n\}\) is a finite set of items
  • \(\mathbf{S} = (S_1, \dots, S_n)\) is a vector of outcome variables
  • \(\mathcal{D} = (D_1, \dots, D_n)\) where \(D_i(S_i)\) specifies the domain constraint for \(S_i\)
  • \(\mathcal{P} = (P_1, \dots, P_n)\) where \(P_i\) is the precondition formula for \(I_i\) over variables \(\mathbf{S}\)
  • \(\mathcal{Q} = (Q_1, \dots, Q_n)\) where \(Q_i\) is the postcondition formula for \(I_i\) over variables \(\mathbf{S}\)
  • \(I_{\mathrm{start}} \in \mathcal{I}\) is the designated starting item

Key Validation Questions

Given a questionnaire \(\mathcal{G}\), we can ask:

  1. Reachability: Is item \(I_i\) ever reachable? (See Preconditions and Postconditions)
  2. Constraint feasibility: Can postcondition \(Q_i\) be satisfied when \(P_i\) holds? (See Preconditions and Postconditions)
  3. Dependency cycles: Do question dependencies form cycles that prevent evaluation? (See Dependency Analysis and Cycle Detection)
  4. Global consistency: Does there exist at least one valid completion path through the survey? (See Global vs Local Validity)

The following pages in this theory section address each of these questions with precise mathematical definitions and algorithms.

Further Reading