Strict Orders Are Unique Sorted Sequences
We want to show the equivalence between these two properties:
- A sequence that is sorted, and only contains unique elements
- A sequence that is strict-order sorted.
Note that sortedness is a relaxation of strict-order sortedness; the former permits duplicate elements (in other words, it defines a partial order on the elements of the sequence), while the latter requires strict order.
Taking that into account, these two statements ARE equivalent, at least textually. However, we're going to define the uniqueness predicate in a different way that makes the equivalence a little harder to prove.
Some definitions
Let's begin by defining sortedness and strict-order sortedness:
predicate SortedSeq(s: seq<int>) { forall i, j :: 0 <= i < j < |s| ==> s[i] <= s[j] } predicate StrictOrderSortedSeq(s: seq<int>) { forall i, j :: 0 <= i < j < |s| ==> s[i] < s[j] }
The only difference between the two are the inequality operators, as explained in the introduction.
For uniqueness, a standard way is to define it is by the pairwise inequality of elements in the sequence.
predicate NoDuplicateSeq(s: seq<int>) { forall i, j :: 0 <= i < j < |s| ==> s[i] != s[j] }
We're going to define this differently: By first converting the sequence into a multiset, then stating that the cardinality of each element in the resulting multiset is equal to one. Essentially, we're saying that we could've also converted the sequence into a set without losing any information on our elements.
predicate UniqueSeq(s: seq<int>) { var mset := multiset(s); forall e :: e in s ==> mset[e] == 1 }
Why use this definition? A common verification technique of sequences (and arrays)
in Dafny involves converting a sequence into a multiset, thus allowing us to
reason about the elements of the sequence without caring about the order of the
elements. For instance, we can define that a sorting function in Dafny does not
modify its elements, by stating multiset(s) == multiset(Sort(s))
.
A Unique Multiset Implies No Duplicate Elements
We can show the equivalence between NoDuplicateSeq
and UniqueSeq
.
Actually, we'll only show that UniqueSeq(s) ⇒ NoDuplicateSeq(s)
and
leave the other direction as an exercise.
First, we need an auxiliary lemma: That splitting an element out of a no-duplicates sequence preserves that property:
lemma NoDupSplitIsNoDup(s: seq<int>, n: int, i: int) requires NoDuplicateSeq(s) requires n !in s requires 0 <= i <= |s| ensures NoDuplicateSeq(s[..i] + [n] + s[i..]) { }
All the hard work is done in the preconditions, and Dafny is able to automatically prove this for us.
The main proof itself requires more work, and some knowledge on proof methods on multisets. First, the proof, with comments:
lemma UniqueMultisetImpliesUniqueElements(s: seq<int>) requires UniqueSeq(s) ensures NoDuplicateSeq(s) { if |s| <= 1 {} else { // First, let's make Dafny aware of sequence splitting; // This is used to pass the precondition check of the inductive call // It's a bit mysterious, but I think Dafny uses this fact to // deduce that splitting a unique sequence ensures the // uniqueness of its parts assert s == [s[0]] + s[1..]; // Using the let-such-that operator, we pick an element at index `i` var i :| 0 <= i < |s|; var n := s[i]; // We define the "leftover" multiset `mset'` and sequence `s'` respectively // We show how they are related to `mset` and `s` respectively var mset' := multiset(s) - multiset({n}); var s' := s[..i] + s[i+1..]; assert s == s[..i] + [n] + s[i+1..]; assert mset' + multiset({n}) == multiset(s); // The inductive call happens here; Dafny can prove the precondition that // s' is unique, thanks to our work above UniqueMultisetImpliesUniqueElements(s'); // Finally, using our defined auxillary lemma, we bring the fact that // inserting a unique element into a unique sequence preserves non-duplicates NoDupSplitIsNoDup(s', n, i); } }
Sequence Splitting and the let-such-that
operator
The first assertion is necessary to make Dafny aware that we are
splitting over sequences; I've tested other assertions such
as assert s == s[..|s|-1] + [s[|s|-1]]
, with the same outcome. It's
mysterious, but not having this would cause Dafny to say that the
precondition of the later recursive call of the lemma doesn't hold.
The next two lines uses a less-known Dafny operator:
var i :| 0 <= i < |s|; var n := s[i];
This is a let-such-that
operator, picking an arbitrary index of the sequence.
This statement, otherwise known as the choice operator, doesn't concern itself
about which index is picked, but only that such an index exists that satisfies
the predicate given. Using this index i
, we can assign the variable n
to be
s[i]
, using it as the element we will extract from our sequence.
Manipulating the implicit multiset
The idea here is that if we extract n
from the sequence, the remaining
sequence will still only contain unique elements, by way of induction.
However, we still need to show how this affects the multiset of the sequence.
We did this with the two lines
var mset' := multiset(s) - multiset({n}); assert mset' + multiset({n}) == multiset(s);
mset'
is the multiset without n
, and the assertion lets Dafny know
of the relationship between the new and old multisets. It's like rearranging
an equation; if we're subtracting multiset({n})
from multiset(s)
to get
mset'
, we can "bring over" the n
to the other side, converting the
subtraction to a plus. (In this context, subtraction is multiset difference,
and addition is multiset union).
Constructing s'
and finishing the proof
Now, we want to construct the sequence without n
. This can be done
by splitting our sequence into two parts: The part before n
, and the
part after n
:
var s' := s[..i] + s[i+1..]; assert s == s[..i] + [n] + s[i+1..];
s'
is our new sequence, and the assertion alerts Dafny that we are splitting
the sequence over n
, even though it doesn't explicitly mention s'
.
Finally, we call the induction hypothesis over s'
with
UniqueMultisetImpliesUniqueElements(s')
, and use the
auxiliary lemma from above, saying that we can reinsert
n
into s'
and still preserve non-duplicates.
That took some work! On to the main theorem.
Proving Strict Orders Are Unique Sorted Sequences
Again, I'll present the proof first, then give an explanation:
lemma NoDuplicateSortedImpliesStrictOrder(s: seq<int>) requires SortedSeq(s) requires UniqueSeq(s) ensures StrictOrderSortedSeq(s) { if |s| <= 1 {} else { // We're splitting the sequence on it's first element, thus // requiring an assertion on how to reconstruct `s` assert [s[0]] + s[1..] == s; // Using the auxiliary lemma, we can then remove the first // element of the sequence and still preserve the // property of non-duplicates // The precondition of the lemma is satisfied via UniqueSeq(s) UniqueMultisetImpliesUniqueElements(s); var (x, rest) := (s[0], s[1..]); assert x !in rest; // As with the auxiliary lemma, we're indirectly modifying // the multiset of the sequence. We show that multiset(rest) // is the same as removing the first element of the sequence // from multiset(s) var mset' := multiset(s) - multiset({x}); assert mset' == multiset(rest); NoDuplicateSortedImpliesStrictOrder(rest); } }
This is shorter than our auxiliary lemma, as it is in fact proved by using
a weaker variation of it - that is, we're deliberately removing
the first element of the sequence instead of removing at an arbitrary index.
We invoke it with UniqueMultisetImpliesUniqueElements(s)
, allowing
us to perform this removal with the verification in place.
Again, we have to show that the sequence s
can be reconstructed from
its first element x
and the rest
of the element. The assertion
x ∉ rest
isn't necessary, but serves as a sanity check.
Then, we do the same multiset manipulation as above, but this time,
we just need to show that multiset(rest)
is equal to removing
x
from the original multiset.
A recursive call to our lemma on rest
dispatches the induction hypothesis,
and we're finally done.
Well, what allows us to deliberately keep picking the first element
of our sequence? That's right, we needed our sequence to be sorted in the
first place! That's the purpose of SortedSeq(s)
, and Dafny can infer
from the induction proof that our smaller subsequence will still be sorted
in the recursive call.