Currently, MelVM is not Turing-complete, and it implements a “bloop” (bounded loop) language. This is so that fees can be calculated up-front before a covenant is run.
This does slightly simplify the transaction validation logic, but it actually does almost nothing more than that:
- It doesn’t mean we are resistant to DoS. Covenants can fail for reasons other than not having enough gas, and such a covenant can still DoS the network if anti-griefing measures aren’t taken.
- It doesn’t mean much simpler VM execution. Keeping track of gas costs is probably the cheapest thing ever.
Thus, it wouldn’t really hurt to make MelVM Turing-complete. There are two ways we can do this:
- The easiest way to do this would be to add an instruction
UbLoop
that has the same semantics as Loop
, except it pops the number of iterations from the stack.
- Another way is to allow backwards jumping. This has the advantage that it essentially makes the
Loop
instruction obsolete, allowing us to completely get rid of the Loop
instruction in a TIP (are there mainnet instructions even using Loop
? If not, we don’t even need to make things backwards-compatible)
- This is good, because
Loop
is a gaping edge case in the whole VM interpretation loop, and removing it simplifies things a lot.
- This would require changing codegen in
mil
I am leaning towards the second option, but happy to hear more thoughts.
Implications to Melodeon
This lets Melodeon become totally Turing complete. We can:
- Rip out the entire const-generics system, which is limiting and somewhat buggy
- Simplify the type system and stdlib by getting rid of viral const-generics and constexprs (
Nat
and %[]
would become very usable types!)
- Generally be a smaller language with much more confidence in correctness
3 Likes
Function calls, etc.
If all we add are backwards jumps, implementing function calls would be a little tricky. Consider the following dynamically-typed pseudocode:
def foo(x, y) = x * y
foo(1, 2) + foo(3,4)
A straightforward compilation produces something like this in MelVM:
push 4
push 3
# ?? call foo
push 2
push 1
# ?? call foo
add
But how would we call foo
? Simply jumping to the start of the foo
function wouldn’t work, since we don’t know where to return.
There are a few options:
- Just add a
call
opcode
- simplest solution
- implicitly forces the VM state to have a separate “call stack”
- Add a
dynjmp
opcode that pops a program-counter and jumps to it, as well as a pushpc
opcode the pushes the current program-counter.
- does not change the VM model
- needs two more opcodes
- function calls become rather verbose
- Don’t add opcodes, but use techniques like trampolining or CPS
- does not change the VM at all
- CPS makes tail-call optimization straightforward
- makes the compiler more complex, and MelVM code less legible
Transitioning smoothly
An easy way to transition is to add an opcode TuringFuel(u32)
.
- A covenant starting with this can jump backwards. Otherwise, no.
- Covenants without this opcode have weights computed the old-fashioned way.
- Covenants with this opcode have weight equal to the maximum of the legacy weight and the fuel given.
- Most of the ecosystem code stays the same.
When computing the hash of a covenant, all turing-fuel hashes are computed with the fuel opcode’s field set to zero.
Melodeon changes
Feature changes
The most notable changes:
- Const-generics are gone
- Recursion is allowed
- First-class functions (lambdas) are allowed
But this also means deep changes in the compiler stack.
Code generation pipeline
The current code generation pipeline roughly goes like:
Melodeon
-> typechecked Melodeon
-> monomorphed Melodeon
-> Mil
-> MelVM
The new pipeline will be simpler to implement, since we no longer need monomorphization, and we can fully erase types (since we can always fall back to dynamic dispatch for things like ++
)
Melodeon
-> typechecked Melodeon (w/o CG!)
-> erased AST (with any type specialization done before this phase)
-> closure-free AST
-> MelVM (by direct recursive codegen)
Type system
The new type system has the following basic hierarchy:
-
Nat
: 256-bit unsigned numbers
- Individual numbers like
1
can also be types
-
Bytes
: a bytestring
-
[T]
: A list where element is of type T
-
[T, U, V...]
: fixed-length arrays
-
T | U
: untagged union
- Can be refined with
is
or as
Symbols
Symbols aren’t really their own type. Instead, a symbol literal like $sym
represents a unique Nat that $sym
maps to. The exact value of this Nat
is unspecified.
Structs
Structs are implemented as tagged assoc-lists. For example, a Point
with fields x
and y
:
[[$x, 12], [$y, 34]]
The order of the list is sorted by the tag. foo.bar
desugars into binary searching for the element associated with $bar
in the assoc-list.
This desugaring should be a sort of macro expansion, because otherwise refinement typing cannot infer what foo.bar
’s type actually is.
Lambdas and function-types in Melodeon
Something I’ve realized is that these MelVM changes would make it very easy to implement lambdas and first-class functions in Melodeon.
A rough list of changes needed:
- Add function-types to the type system
- This also means adding syntax to the pest grammar
- Stop special-casing functions. Function definitions just mean lambda bindings.
- In code-gen, the intermediate language will be “lambdaful”, but compile down to whatever function representation technique we decide on.
- This is most likely “function pointer plus struct that is passed in”
I decided on the last strategy — CPS turns out to be a useful transformation for optimizations etc in the new Melodeon pipeline, and it straightforwardly eliminates the call stack.