Testing · bfc

This page describes how bfc is tested. This is primarily useful for contributors, but is also interesting for readers working on their own PL implementations.

Unit Tests

You can run all bfc's unit tests with Cargo.

$ cargo test

This runs basic unit tests, property-based tests, and LLVM snapshots tests.

Quickcheck

bfc includes property-based tests using the Rust quickcheck library. These operate on the intermediate representation used in bfc, called BFIR.

These tests verify several properties for optimisations: runtime improvement, idempotence and behavioural equivalence.

Verifying Improvement

bfc optimisation should never increase the number of BFIR instructions.

f(program) should have fewer (or the same) number of instructions as program.

Verifying Idempotence

All bfc optimisation passes are idempotent. Running an optimisation twice should produce the same output as running it once.

This property applies to individual peephole optimisations, as well as the combined optimiser. Testing this catches phase ordering issues, as well as optimisations that need to compute a fixpoint.

f(f(program)) should be the same as f(program).

Verifying Behavioural Equivalence

bfc opimisation passes should never change the semantics of a program.

eval(program) should give the same result as eval(f(program)), provided that eval(program) terminates within a specific number of steps. bfc checks this by evaluating the code within a sandbox.

This same evaluator is also used during compilation, for compile-time evaluation of code!

If program relies on runtime inputs, the test can verify the program state prior to the first runtime input, or choose an arbitrary input.

Some tests also check the cell state after executing program, which should be the same as executing f(program). This is not true of all optimisation passes: dead code may modify cells that are unused.

Finding Interesting Programs

BFIR defines 7 different expressions. Randomly generated IR would only have a 1/7 chance of producing a loop, so only a 1/49 chance of producing a nested loop.

Nested loops, particularly including multiply loops, tend to expose interesting bugs. BFIR generation in tests overweights these kind of loops, so tests spend more time exercising more complicated loops.

LLVM Snapshot Tests

The file llvm_tests.rs tests that certain BF programs produce the expected LLVM IR output.

These tests tend to be brittle, especially when upgrading major LLVM versions. They are useful for verifying that unrelated refactoring hasn't changed the compiled output.

Integration Tests

$ ./integration_tests.sh

This script compiles real BF programs, runs them, and verifies their output matches the corresponding .out file.

This is the final step in bfc testing. It catches issues that only occur in larger, real-world BF programs.