Strict Orders Are Unique Sorted Sequences

We want to show the equivalence between these two properties:

A sequence that is sorted, and only contains unique elements
A sequence that is strict-order sorted.

Note that sortedness is a relaxation of strict-order sortedness; the former permits duplicate elements (in other words, it defines a partial order on the elements of the sequence), while the latter requires strict order.

Taking that into account, these two statements ARE equivalent, at least textually. However, we're going to define the uniqueness predicate in a different way that makes the equivalence a little harder to prove.

Some definitions

Let's begin by defining sortedness and strict-order sortedness:

predicate SortedSeq(s: seq<int>)
{
  forall i, j :: 0 <= i < j < |s| ==> s[i] <= s[j]
}

predicate StrictOrderSortedSeq(s: seq<int>)
{
  forall i, j :: 0 <= i < j < |s| ==> s[i] < s[j]
}

The only difference between the two are the inequality operators, as explained in the introduction.

For uniqueness, a standard way is to define it is by the pairwise inequality of elements in the sequence.

predicate NoDuplicateSeq(s: seq<int>)
{
  forall i, j :: 0 <= i < j < |s| ==> s[i] != s[j]
}

We're going to define this differently: By first converting the sequence into a multiset, then stating that the cardinality of each element in the resulting multiset is equal to one. Essentially, we're saying that we could've also converted the sequence into a set without losing any information on our elements.

predicate UniqueSeq(s: seq<int>) {
  var mset := multiset(s);
  forall e :: e in s ==> mset[e] == 1
}

Why use this definition? A common verification technique of sequences (and arrays) in Dafny involves converting a sequence into a multiset, thus allowing us to reason about the elements of the sequence without caring about the order of the elements. For instance, we can define that a sorting function in Dafny does not modify its elements, by stating multiset(s) == multiset(Sort(s)).

A Unique Multiset Implies No Duplicate Elements

We can show the equivalence between NoDuplicateSeq and UniqueSeq. Actually, we'll only show that UniqueSeq(s) ⇒ NoDuplicateSeq(s) and leave the other direction as an exercise.

First, we need an auxiliary lemma: That splitting an element out of a no-duplicates sequence preserves that property:

lemma NoDupSplitIsNoDup(s: seq<int>, n: int, i: int)
  requires NoDuplicateSeq(s)
  requires n !in s
  requires 0 <= i <= |s|
  ensures NoDuplicateSeq(s[..i] + [n] + s[i..])
{

}

All the hard work is done in the preconditions, and Dafny is able to automatically prove this for us.

The main proof itself requires more work, and some knowledge on proof methods on multisets. First, the proof, with comments:

lemma UniqueMultisetImpliesUniqueElements(s: seq<int>)
  requires UniqueSeq(s)
  ensures NoDuplicateSeq(s)
{
  if |s| <= 1 {}
  else {
    // First, let's make Dafny aware of sequence splitting;
    // This is used to pass the precondition check of the inductive call
    // It's a bit mysterious, but I think Dafny uses this fact to
    // deduce that splitting a unique sequence ensures the
    // uniqueness of its parts
    assert s == [s[0]] + s[1..];
    // Using the let-such-that operator, we pick an element at index `i`
    var i :| 0 <= i < |s|;
    var n := s[i];
    // We define the "leftover" multiset `mset'` and sequence `s'` respectively
    // We show how they are related to `mset` and `s` respectively
    var mset' := multiset(s) - multiset({n});
    var s' := s[..i] + s[i+1..];
    assert s == s[..i] + [n] + s[i+1..];
    assert mset' + multiset({n}) == multiset(s);
    // The inductive call happens here; Dafny can prove the precondition that
    // s' is unique, thanks to our work above
    UniqueMultisetImpliesUniqueElements(s');
    // Finally, using our defined auxillary lemma, we bring the fact that
    // inserting a unique element into a unique sequence preserves non-duplicates
    NoDupSplitIsNoDup(s', n, i);
  }
}

Sequence Splitting and the `let-such-that` operator

The first assertion is necessary to make Dafny aware that we are splitting over sequences; I've tested other assertions such as assert s == s[..|s|-1] + [s[|s|-1]], with the same outcome. It's mysterious, but not having this would cause Dafny to say that the precondition of the later recursive call of the lemma doesn't hold.

The next two lines uses a less-known Dafny operator:

    var i :| 0 <= i < |s|;
    var n := s[i];

This is a let-such-that operator, picking an arbitrary index of the sequence. This statement, otherwise known as the choice operator, doesn't concern itself about which index is picked, but only that such an index exists that satisfies the predicate given. Using this index i, we can assign the variable n to be s[i], using it as the element we will extract from our sequence.

Manipulating the implicit `multiset`

The idea here is that if we extract n from the sequence, the remaining sequence will still only contain unique elements, by way of induction. However, we still need to show how this affects the multiset of the sequence. We did this with the two lines

    var mset' := multiset(s) - multiset({n});
    assert mset' + multiset({n}) == multiset(s);

mset' is the multiset without n, and the assertion lets Dafny know of the relationship between the new and old multisets. It's like rearranging an equation; if we're subtracting multiset({n}) from multiset(s) to get mset', we can "bring over" the n to the other side, converting the subtraction to a plus. (In this context, subtraction is multiset difference, and addition is multiset union).

Constructing `s'` and finishing the proof

Now, we want to construct the sequence without n. This can be done by splitting our sequence into two parts: The part before n, and the part after n:

    var s' := s[..i] + s[i+1..];
    assert s == s[..i] + [n] + s[i+1..];

s' is our new sequence, and the assertion alerts Dafny that we are splitting the sequence over n, even though it doesn't explicitly mention s'.

Finally, we call the induction hypothesis over s' with UniqueMultisetImpliesUniqueElements(s'), and use the auxiliary lemma from above, saying that we can reinsert n into s' and still preserve non-duplicates.

That took some work! On to the main theorem.

Proving Strict Orders Are Unique Sorted Sequences

Again, I'll present the proof first, then give an explanation:

lemma NoDuplicateSortedImpliesStrictOrder(s: seq<int>)
  requires SortedSeq(s)
  requires UniqueSeq(s)
  ensures StrictOrderSortedSeq(s)
{
  if |s| <= 1 {}
  else {
    // We're splitting the sequence on it's first element, thus
    // requiring an assertion on how to reconstruct `s`
    assert [s[0]] + s[1..] == s;
    // Using the auxiliary lemma, we can then remove the first
    // element of the sequence and still preserve the
    // property of non-duplicates
    // The precondition of the lemma is satisfied via UniqueSeq(s)
    UniqueMultisetImpliesUniqueElements(s);
    var (x, rest) := (s[0], s[1..]);
    assert x !in rest;
    // As with the auxiliary lemma, we're indirectly modifying
    // the multiset of the sequence. We show that multiset(rest)
    // is the same as removing the first element of the sequence
    // from multiset(s)
    var mset' := multiset(s) - multiset({x});
    assert mset' == multiset(rest);
    NoDuplicateSortedImpliesStrictOrder(rest);
  }
}

This is shorter than our auxiliary lemma, as it is in fact proved by using a weaker variation of it - that is, we're deliberately removing the first element of the sequence instead of removing at an arbitrary index. We invoke it with UniqueMultisetImpliesUniqueElements(s), allowing us to perform this removal with the verification in place.

Again, we have to show that the sequence s can be reconstructed from its first element x and the rest of the element. The assertion x ∉ rest isn't necessary, but serves as a sanity check.

Then, we do the same multiset manipulation as above, but this time, we just need to show that multiset(rest) is equal to removing x from the original multiset.

A recursive call to our lemma on rest dispatches the induction hypothesis, and we're finally done.

Well, what allows us to deliberately keep picking the first element of our sequence? That's right, we needed our sequence to be sorted in the first place! That's the purpose of SortedSeq(s), and Dafny can infer from the induction proof that our smaller subsequence will still be sorted in the recursive call.