Making MelVM Turing-complete

Currently, MelVM is not Turing-complete, and it implements a “bloop” (bounded loop) language. This is so that fees can be calculated up-front before a covenant is run.

This does slightly simplify the transaction validation logic, but it actually does almost nothing more than that:

  • It doesn’t mean we are resistant to DoS. Covenants can fail for reasons other than not having enough gas, and such a covenant can still DoS the network if anti-griefing measures aren’t taken.
  • It doesn’t mean much simpler VM execution. Keeping track of gas costs is probably the cheapest thing ever.

Thus, it wouldn’t really hurt to make MelVM Turing-complete. There are two ways we can do this:

  • The easiest way to do this would be to add an instruction UbLoop that has the same semantics as Loop, except it pops the number of iterations from the stack.
  • Another way is to allow backwards jumping. This has the advantage that it essentially makes the Loop instruction obsolete, allowing us to completely get rid of the Loop instruction in a TIP (are there mainnet instructions even using Loop? If not, we don’t even need to make things backwards-compatible)
    • This is good, because Loop is a gaping edge case in the whole VM interpretation loop, and removing it simplifies things a lot.
    • This would require changing codegen in mil

I am leaning towards the second option, but happy to hear more thoughts.

Implications to Melodeon

This lets Melodeon become totally Turing complete. We can:

  • Rip out the entire const-generics system, which is limiting and somewhat buggy
  • Simplify the type system and stdlib by getting rid of viral const-generics and constexprs (Nat and %[] would become very usable types!)
  • Generally be a smaller language with much more confidence in correctness
3 Likes

Looking forward to it!
:rocket::rocket::rocket:

Function calls, etc.

If all we add are backwards jumps, implementing function calls would be a little tricky. Consider the following dynamically-typed pseudocode:

def foo(x, y) = x * y

foo(1, 2) + foo(3,4) 

A straightforward compilation produces something like this in MelVM:

push 4
push 3
# ?? call foo
push 2
push 1
# ?? call foo
add

But how would we call foo? Simply jumping to the start of the foo function wouldn’t work, since we don’t know where to return.

There are a few options:

  • Just add a call opcode
    • simplest solution
    • implicitly forces the VM state to have a separate “call stack”
  • Add a dynjmp opcode that pops a program-counter and jumps to it, as well as a pushpc opcode the pushes the current program-counter.
    • does not change the VM model
    • needs two more opcodes
    • function calls become rather verbose
  • Don’t add opcodes, but use techniques like trampolining or CPS
    • does not change the VM at all
    • CPS makes tail-call optimization straightforward
    • makes the compiler more complex, and MelVM code less legible

Transitioning smoothly

An easy way to transition is to add an opcode TuringFuel(u32).

  • A covenant starting with this can jump backwards. Otherwise, no.
  • Covenants without this opcode have weights computed the old-fashioned way.
  • Covenants with this opcode have weight equal to the maximum of the legacy weight and the fuel given.
  • Most of the ecosystem code stays the same.

When computing the hash of a covenant, all turing-fuel hashes are computed with the fuel opcode’s field set to zero.

Melodeon changes

Feature changes

The most notable changes:

  • Const-generics are gone
  • Recursion is allowed
  • First-class functions (lambdas) are allowed

But this also means deep changes in the compiler stack.

Code generation pipeline

The current code generation pipeline roughly goes like:

Melodeon 
-> typechecked Melodeon
-> monomorphed Melodeon
-> Mil
-> MelVM

The new pipeline will be simpler to implement, since we no longer need monomorphization, and we can fully erase types (since we can always fall back to dynamic dispatch for things like ++)

Melodeon
-> typechecked Melodeon (w/o CG!)
-> erased AST (with any type specialization done before this phase)
-> closure-free AST
-> MelVM (by direct recursive codegen)

Type system

The new type system has the following basic hierarchy:

  • Nat: 256-bit unsigned numbers
    • Individual numbers like 1 can also be types
  • Bytes: a bytestring
  • [T]: A list where element is of type T
    • [T, U, V...]: fixed-length arrays
  • T | U: untagged union
    • Can be refined with is or as

Symbols

Symbols aren’t really their own type. Instead, a symbol literal like $sym represents a unique Nat that $sym maps to. The exact value of this Nat is unspecified.

Structs

Structs are implemented as tagged assoc-lists. For example, a Point with fields x and y:

[[$x, 12], [$y, 34]]

The order of the list is sorted by the tag. foo.bar desugars into binary searching for the element associated with $bar in the assoc-list.

This desugaring should be a sort of macro expansion, because otherwise refinement typing cannot infer what foo.bar’s type actually is.

Lambdas and function-types in Melodeon

Something I’ve realized is that these MelVM changes would make it very easy to implement lambdas and first-class functions in Melodeon.

A rough list of changes needed:

  • Add function-types to the type system
    • This also means adding syntax to the pest grammar
  • Stop special-casing functions. Function definitions just mean lambda bindings.
  • In code-gen, the intermediate language will be “lambdaful”, but compile down to whatever function representation technique we decide on.
    • This is most likely “function pointer plus struct that is passed in”

I decided on the last strategy — CPS turns out to be a useful transformation for optimizations etc in the new Melodeon pipeline, and it straightforwardly eliminates the call stack.