Currently, MelVM is not Turing-complete, and it implements a “bloop” (bounded loop) language. This is so that fees can be calculated up-front before a covenant is run.
This does slightly simplify the transaction validation logic, but it actually does almost nothing more than that:
- It doesn’t mean we are resistant to DoS. Covenants can fail for reasons other than not having enough gas, and such a covenant can still DoS the network if anti-griefing measures aren’t taken.
- It doesn’t mean much simpler VM execution. Keeping track of gas costs is probably the cheapest thing ever.
Thus, it wouldn’t really hurt to make MelVM Turing-complete. There are two ways we can do this:
- The easiest way to do this would be to add an instruction
UbLoop that has the same semantics as
Loop, except it pops the number of iterations from the stack.
- Another way is to allow backwards jumping. This has the advantage that it essentially makes the
Loop instruction obsolete, allowing us to completely get rid of the
Loop instruction in a TIP (are there mainnet instructions even using
Loop? If not, we don’t even need to make things backwards-compatible)
- This is good, because
Loop is a gaping edge case in the whole VM interpretation loop, and removing it simplifies things a lot.
- This would require changing codegen in
I am leaning towards the second option, but happy to hear more thoughts.
Implications to Melodeon
This lets Melodeon become totally Turing complete. We can:
- Rip out the entire const-generics system, which is limiting and somewhat buggy
- Simplify the type system and stdlib by getting rid of viral const-generics and constexprs (
% would become very usable types!)
- Generally be a smaller language with much more confidence in correctness
Function calls, etc.
If all we add are backwards jumps, implementing function calls would be a little tricky. Consider the following dynamically-typed pseudocode:
def foo(x, y) = x * y
foo(1, 2) + foo(3,4)
A straightforward compilation produces something like this in MelVM:
# ?? call foo
# ?? call foo
But how would we call
foo? Simply jumping to the start of the
foo function wouldn’t work, since we don’t know where to return.
There are a few options:
- Just add a
- simplest solution
- implicitly forces the VM state to have a separate “call stack”
- Add a
dynjmp opcode that pops a program-counter and jumps to it, as well as a
pushpc opcode the pushes the current program-counter.
- does not change the VM model
- needs two more opcodes
- function calls become rather verbose
- Don’t add opcodes, but use techniques like trampolining or CPS
- does not change the VM at all
- CPS makes tail-call optimization straightforward
- makes the compiler more complex, and MelVM code less legible
An easy way to transition is to add an opcode
- A covenant starting with this can jump backwards. Otherwise, no.
- Covenants without this opcode have weights computed the old-fashioned way.
- Covenants with this opcode have weight equal to the maximum of the legacy weight and the fuel given.
- Most of the ecosystem code stays the same.
When computing the hash of a covenant, all turing-fuel hashes are computed with the fuel opcode’s field set to zero.
The most notable changes:
- Const-generics are gone
- Recursion is allowed
- First-class functions (lambdas) are allowed
But this also means deep changes in the compiler stack.
Code generation pipeline
The current code generation pipeline roughly goes like:
-> typechecked Melodeon
-> monomorphed Melodeon
The new pipeline will be simpler to implement, since we no longer need monomorphization, and we can fully erase types (since we can always fall back to dynamic dispatch for things like
-> typechecked Melodeon (w/o CG!)
-> erased AST (with any type specialization done before this phase)
-> closure-free AST
-> MelVM (by direct recursive codegen)
The new type system has the following basic hierarchy:
Nat: 256-bit unsigned numbers
- Individual numbers like
1 can also be types
Bytes: a bytestring
[T]: A list where element is of type T
[T, U, V...]: fixed-length arrays
T | U: untagged union
- Can be refined with
Symbols aren’t really their own type. Instead, a symbol literal like
$sym represents a unique Nat that
$sym maps to. The exact value of this
Nat is unspecified.
Structs are implemented as tagged assoc-lists. For example, a
Point with fields
[[$x, 12], [$y, 34]]
The order of the list is sorted by the tag.
foo.bar desugars into binary searching for the element associated with
$bar in the assoc-list.
This desugaring should be a sort of macro expansion, because otherwise refinement typing cannot infer what
foo.bar’s type actually is.
Lambdas and function-types in Melodeon
Something I’ve realized is that these MelVM changes would make it very easy to implement lambdas and first-class functions in Melodeon.
A rough list of changes needed:
- Add function-types to the type system
- This also means adding syntax to the pest grammar
- Stop special-casing functions. Function definitions just mean lambda bindings.
- In code-gen, the intermediate language will be “lambdaful”, but compile down to whatever function representation technique we decide on.
- This is most likely “function pointer plus struct that is passed in”
I decided on the last strategy — CPS turns out to be a useful transformation for optimizations etc in the new Melodeon pipeline, and it straightforwardly eliminates the call stack.