Looks like we were also computing our test cases in a slightly sketchy way, and just testing that we failed in exactly the same way. We do, but now we generate better test data.
697 KiB
697 KiB