LARA – Lab for Automated Reasoning and Analysis -

Complexity of the verification process

Preliminary remarks

Terminology

We say that two patterns have the same signature if, without their (optional) guard, they are identical.
The set of pattern is assumed to be the cartesian product representing the set of inputs that a pattern would match.

Number of different positions

The maximum number of different positions $max_p(E)$ which can be found in a given pattern matching expression $E$ is naturally linear in terms of the size of the source code (whether you want to measure this as tokens, elements of the AST or characters is irrelevant).

Computing the set corresponding to a pattern can be done in linear time

…in terms of the size of the source code, that is. This should be obvious…

Consistency of the dimensions of the cartesian products

For every pattern, we build a cartesian product representing the set of inputs which it would match. By construction, the dimension of these products is always the same since it corresponds to $max_p(E)$ . In particular, this means that taking the union of two “pattern sets” will yield yet another cartesian product with the same cardinality.

Note that we now also have:

$\begin{equation*} (A_1 \times A_2 \times \ldots \times A_n) \cap (B_1 \times B_2 \times \ldots \times B_n) = \emptyset \iff (A_1 \cap B_1 = \emptyset ) \vee (A_2 \cap B_2 = \emptyset ) \vee \ldots \vee (A_n \cap B_n = \emptyset ) \end{equation*}$

The proof is straightforward, but still requires that the dimensions match, otherwise the intersection could be empty for this other reason.

Translation of the assumptions/axioms

We will always use unary relations to represent sets (and therefore set membership): $x \in A$ will be translated to $A(x)$ , and $x \notin A$ to $\neg A(x)$ .

For each statement which we want to prove or disprove, we make use of the user-provided informations about properties on the class hierarchy and the extractors.

These properties are always in one of the following forms:

Set of extractors covering a given type: $E_1 \cup \ldots \cup E_n \supseteq T$
Sealed and abstract classes: $T = S_1 \cup \ldots \cup S_n$
Disjointness of extractors: $E \cap F = \emptyset$

This translates to:

$\forall x . E_1(x) \vee \ldots \vee E_n(x) \Rightarrow T(x)$
$\forall x . S_1(x) \vee \ldots \vee S_n(x) \Leftrightarrow T(x)$
and $\forall x . \neg (E(x) \wedge F(x))$ respectively.

Verifying disjunction

Recall that disjunction is verified for each pair of patterns. There are two options:

Either the patterns have different signatures, in which case we ignore their guards and verify that the interesection of their sets is empty…
…or they have the same signature, in which case we simply verify that the conjunction of their guards is false.

Since we limit our boolean expressions to QFBAPA ( do we?), the formula for verifying the disjointness of two guards $g_1$ and $g_2$ , namely $\neg(g_1 \wedge g_2)$ is itself in QFBAPA and its satisfiability can hence be checked with an algorithm in NP [1]. This solves the second case.

The first case is not much harder. We can use the abovementioned equivalence for the intersection of cartesian products of same dimensions, and we need to insert only one quantifier to transform it to the desired form:

$(A_1 \times A_2 \times \ldots \times A_n) \cap (B_1 \times B_2 \times \ldots \times B_n) = \emptyset$

…is translated to:

$\forall x . \neg(A_1(x) \wedge B_1(x)) \vee \ldots \vee \neg(A_n(x) \wedge B_n(x))$

Note that checking the satifiability should be doable in polynomial time, as all needs to be done is find one pair of sets among a finite list which are disjoint (and this information can only come from the assumptions/axioms, which are themselves in a finite number. ( : I know this doesn't sound convincing.. let's do better).

We can also rely on the fact that we introduced only one quantifier and that the size of the generated formula is linear in terms of the size of the original one. This formula is hence in $[\exists^{*} \forall^{1}]_{=}$ and checking its validity can hence be done in NP time.

Verifying completeness/reachability

As an introductory remark, note that checking for reachability of a pattern p is equivalent to checking that the preceding patterns do not form a complete set over the values matched by p. With this in mind, observe that all formulas for checking completeness and reachability are in the form:

$\begin{equation*} (S_1 \times \ldots \times S_n) \subseteq (A_1 \times \ldots \times A_n) \cup (\ldots) \cup (Z_1 \times \ldots \times Z_n) \end{equation*}$

When checking completeness, the set $(S_1 \times \ldots \times S_n)$ is the domain of the scrutinee, when checking reachability, it is the set of the observed pattern. (Note that this is in fact checking subsumption. Proper reachability checking would include verifying that the pattern set is a subset of the scrutinee set, but this does not introduce any additional complexity.) Also note that the sets over which the union is taken correspond to patterns which satisfy one of the following properties:

the pattern has no guard
the pattern has a guard, but there exists a set of patterns with the same signature such that the disjunction of their guards is equivalent to the true statement

Again, checking that second property on a disjunction of guards $\bigvee_i g_i$ can be done in NPTIME.

Using $n$ quantifiers, we translate the previous (meta-)formula to:

$\begin{equation*} \forall x_1, \ldots , x_n . (S_1(x_1) \wedge \ldots \wedge S_n(x_n)) \Rightarrow (A_1(x_1) \wedge \ldots \wedge A_n(x_n)) \vee (\ldots ) \vee (Z_1(x_1) \wedge \ldots \wedge Z_n(x_n)) \end{equation*}$

Along with the axioms and with proper renaming of variables, we get a formula in the decidable prefix class $[\exists^* \forall^*] _{=}$ ..

() Now I'm not so sure.. I'd like to say that it's in fact in $[\exists^* \forall^{max_p(E)}]_{=}$ , and that for a given pattern matching expression $max_p(E)$ is fixed, so the satisfiability of the underlying class is in NP, but I'm not sure that's allowed.. Is the fact that we can easily put a bound on $max_p(E)$ for a given expression enough to say it's a fixed constant? Since it's dependant on the input, I would not think so..

References

[1] Quantifier Free Boolean Algebra with Presburger Arithmetic is NP-Complete, technical report somewhere IIRC.