Dependency Analysis and Cycle Detection¶

This document explains how Askalot analyzes dependencies between questionnaire items to ensure they can be evaluated in a consistent order and to detect circular dependencies that would make evaluation impossible.

Overview¶

To ensure questionnaires can be evaluated in a consistent order, we analyze dependencies between items. Understanding which items depend on which is crucial for:

Evaluation order: Items must be evaluated after their dependencies
Cycle detection: Circular dependencies make on-the-fly navigation impossible
Coverage analysis: Identifying which item answers affect which subsequent items

Path in Dependency Graph¶

Definition (Path): A path from item \(I_j\) to item \(I_i\) in the dependency graph \(G = (V, E)\) is a sequence of distinct items:

\[ \pi = (I_j = I_{k_0}, I_{k_1}, \dots, I_{k_m} = I_i) \]

where \(m \geq 0\) and for each \(\ell \in \{0, \dots, m-1\}\), there exists an edge \((k_\ell \to k_{\ell+1}) \in E\).

When \(m = 0\), the path consists of a single item (\(I_j = I_i\)), representing a trivial path from an item to itself.

Dependency Graph Construction¶

The dependency graph \(G = (V, E)\) captures data flow dependencies between items through both direct variable usage and intermediate computations. Construction proceeds in three stages.

Stage 1: Build Data Flow Graph¶

Through static analysis of preconditions, postconditions, and code blocks, we construct an intermediate directed graph where vertices represent both items and variables:

Item vertices: Each questionnaire item \(I_i\) is a vertex
Variable vertices: Each variable (item outcome \(S_i\) or intermediate variable \(v\)) is a vertex

We extract dependency edges by analyzing all formulas and code blocks:

When item \(I_j\) is answered, its outcome \(S_j\) is assigned \(\Rightarrow\) edge \((I_j \to S_j)\)
When code assigns item outcome to variable \(v\) (e.g., \(v = S_j\)) \(\Rightarrow\) edge \((S_j \to v)\)
When code modifies variable \(v\) based on \(S_j\) (e.g., \(v \mathrel{+}= S_j\)) \(\Rightarrow\) edge \((S_j \to v)\)
When variable \(v\) is used in \(I_i\)'s precondition \(P_i\) or postcondition \(Q_i\) \(\Rightarrow\) edge \((v \to I_i)\)
When \(S_j\) is directly used in \(P_i\) or \(Q_i\) \(\Rightarrow\) edge \((S_j \to I_i)\)

Data Flow Graph Construction

Survey with Intermediate Calculations:

\(I_1\): "Your monthly rent" with outcome \(S_1\)
\(I_2\): "Your monthly utilities" with outcome \(S_2\)
Code block: \(\mathtt{total\_housing} = S_1 + S_2\)
\(I_3\): "Other monthly expenses" with outcome \(S_3\), precondition \(P_3 = (\mathtt{total\_housing} < 3000)\)

Data flow edges:

\((I_1 \to S_1)\) — answering \(I_1\) assigns \(S_1\)
\((I_2 \to S_2)\) — answering \(I_2\) assigns \(S_2\)
\((S_1 \to \mathtt{total\_housing})\) — \(S_1\) used to compute \(\mathtt{total\_housing}\)
\((S_2 \to \mathtt{total\_housing})\) — \(S_2\) used to compute \(\mathtt{total\_housing}\)
\((\mathtt{total\_housing} \to I_3)\) — \(\mathtt{total\_housing}\) used in \(P_3\)

Stage 2: Compute Item-to-Item Dependencies¶

From the data flow graph, we construct the item dependency graph \(G = (V, E)\) where \(V = \{I_1, \dots, I_n\}\) contains only item vertices.

An edge \((I_j \to I_i) \in E\) exists if and only if there is a path in the data flow graph from \(I_j\) to \(I_i\) through any sequence of variable vertices.

This captures transitive dependencies through intermediate variables: if \(I_j\)'s outcome flows through variables \(v_1, \dots, v_k\) to affect \(I_i\)'s evaluation, then \(I_i\) depends on \(I_j\).

Item Dependency Extraction

Continuing the previous example, the data flow graph contains paths:

\[ I_1 \to S_1 \to \mathtt{total\_housing} \to I_3 \]

\[ I_2 \to S_2 \to \mathtt{total\_housing} \to I_3 \]

Resulting item dependencies:

\((I_1 \to I_3)\) — \(I_3\) depends on \(I_1\)
\((I_2 \to I_3)\) — \(I_3\) depends on \(I_2\)

Item \(I_3\) cannot be evaluated until both \(I_1\) and \(I_2\) have been answered.

General example: Consider code that introduces variable \(v\), assigns \(v = S_j\), then updates \(v = v + S_k\), and finally uses \(v\) in both \(P_i\) and \(P_m\). The data flow graph contains:

\[ I_j \to S_j \to v, \quad I_k \to S_k \to v, \quad v \to I_i, \quad v \to I_m \]

resulting in item dependencies: \((I_j \to I_i)\), \((I_j \to I_m)\), \((I_k \to I_i)\), \((I_k \to I_m)\).

Stage 3: Topological Ordering (Kahn's Algorithm)¶

To determine a valid evaluation order, we compute a topological sort of \(G\) using Kahn's algorithm:

Algorithm (Kahn's Algorithm for Topological Sorting):

Input: Dependency graph \(G = (V, E)\)

Output: Topological order \(L\), or indication of cycle

Initialize \(S \gets \{\text{items with in-degree } 0\}\), \(L \gets \text{empty list}\)
While \(S\) is non-empty:
Remove an item \(n\) from \(S\)
Add \(n\) to tail of \(L\)
For each item \(m\) with edge \((n \to m)\):
- Remove edge \((n \to m)\) from graph
- If \(m\) has in-degree \(0\), add \(m\) to \(S\)
If graph still has edges:
Return CYCLE DETECTED
Otherwise:
Return \(L\) as valid topological order

The resulting topological order \(L\) ensures that items are evaluated in dependency-respecting sequence: when item \(I_i\) is reached, all variables referenced by \(P_i\) have been assigned.

Topological Ordering

Survey with Dependencies:

\(I_1\): "Annual income" (no dependencies, in-degree 0)
\(I_2\): "Monthly savings" (no dependencies, in-degree 0)
\(I_3\): "Savings rate" with \(P_3\) depending on \(S_1\) and \(S_2\) (in-degree 2)
\(I_4\): "Financial health" with \(P_4\) depending on \(S_3\) (in-degree 1)

Dependency graph: \((I_1 \to I_3)\), \((I_2 \to I_3)\), \((I_3 \to I_4)\)

Kahn's algorithm execution:

Initially \(S = \{I_1, I_2\}\) (in-degree 0)
Remove \(I_1\), add to \(L = [I_1]\), decrement in-degree of \(I_3\) to 1
Remove \(I_2\), add to \(L = [I_1, I_2]\), decrement in-degree of \(I_3\) to 0
Add \(I_3\) to \(S\), remove \(I_3\), add to \(L = [I_1, I_2, I_3]\), decrement in-degree of \(I_4\) to 0
Add \(I_4\) to \(S\), remove \(I_4\), add to \(L = [I_1, I_2, I_3, I_4]\)
No edges remain, return \(L\)

Valid evaluation order: Items can be evaluated in the order \(I_1, I_2, I_3, I_4\) (or \(I_2, I_1, I_3, I_4\)).

Cycle Detection¶

A questionnaire has no cycles if \(G\) is a Directed Acyclic Graph (DAG). Equivalently, there is no sequence of distinct items \(i_1, i_2, \dots, i_k\) (\(k \geq 2\)) such that:

\[ i_1 \to i_2 \to \dots \to i_k \to i_1 \]

Kahn's algorithm detects cycles: if the algorithm terminates with edges remaining in the graph, those edges form cycles.

Cycles Threaten Navigation

Cycles threaten on-the-fly navigation because no item on the cycle can be evaluated first—each depends on others in the cycle.

Cycle Detection

Survey with Circular Dependency (Design Error):

\(I_1\): "Estimated total cost" with precondition \(P_1\) depending on \(S_3\)
\(I_2\): "Budget available" with precondition \(P_2\) depending on \(S_1\)
\(I_3\): "Cost overrun" with precondition \(P_3\) depending on \(S_2\)

Dependency graph: \((I_1 \to I_2)\), \((I_2 \to I_3)\), \((I_3 \to I_1)\)

Kahn's algorithm execution:

Initially \(S = \{\}\) (all items have in-degree 1, none can start)
Algorithm terminates immediately with edges remaining
Return CYCLE DETECTED: \(I_1 \to I_2 \to I_3 \to I_1\)

This circular dependency makes evaluation impossible—no item can be evaluated first because each depends on another item in the cycle.

Dependency Analysis with Complex Types¶

The dependency graph construction naturally extends to complex types (vectors and matrices). See Complex Types for details.

For vector and matrix items:

An edge \((I_j \to I_i)\) exists if \(P_i\) references any component of \(S_j\)
For vector \(S_j\): references to \(S_j[k]\) for any \(k\) create a dependency
For matrix \(S_j\): references to \(S_j[k][\ell]\) for any \(k, \ell\) create a dependency

Dependency analysis ensures questionnaires can be evaluated in a consistent order and detects design errors where circular dependencies would make navigation impossible.