Erlang for .NET

Tuesday, September 18, 2007

Erlang and .NET: Strange bedfellows?

There are a few challenges in implementing Erlang for .NET:
  1. Erlang is dynamic, functional, concurrent and distributed. The original languages on the CLR were static, object-oriented, not especially concurrent (there are limited language primitives for per-thread locks) and not distributed (there is a deprecated RMI remote-stub mechanism, Remoting; and a few web services libraries.) However there have been some bold attempts at dynamic languages (IronPython, Gardens Point Ruby) and functional languages (F#.)
  2. Erlang processes are cheap. In the CLR, processes are expensive (a CLR thread reserves 1 MB of stack space!) In fact, reliable, fast concurrent programming on Windows with the CLR is downright difficult. Objects are about the cheapest interesting thing in the CLR, so I guess Erlang processes map to objects.
  3. Although the core Erlang language is small, it isn't set down formally. The implementation is the definition, which is one of the problems that derailed a Perl implementation for .NET. For a working definition, Erlang's bytecode format might be a good starting point, however there have been many of those, and like Erlang itself, the meaning of the bytecodes aren't written down either. The current incarnation ('BEAM') is embodied in about 100,000 lines of C code.
  4. Because there is no shared memory between processes, Erlang's garbage collector can collect per-process. This means even naive stop-the-world garbage collection is highly concurrent in a global sense because of Erlang programs' heavy use of processes. For short-lived processes, the default heap is small enough for per-process memory management to work effectively as an area-based allocator. This limits the benefit of using the CLR's excellent generational garbage collector, because there is no way to hint the CLR that memory will be scoped to a particular process (although arguably short-lived processes should fit into gen-0 heap.)
  5. Erlang has C and Java interoperation, but C and Java aren't .NET (the former supported reasonably well with P/Invoke, and the latter with an object model like .NET's, yet not quite.) More difficult than just calling .NET libraries from Erlang, .NET developers typically need a language to produce static constructs to have something to call, which could be difficult given Erlang's baroque syntax and dynamic nature. .NET has little-known terms for various aspects of interoperation (CTS and CLS producer/consumer) and even these don't cover many common scenarios (such as .NET generics.) Nonetheless, the CLR is leveraging Metcalfe's law for heterogenous components, and Erlang should be another exponent in that equation.
  6. Erlang for the CLR should have syntax highlighting, IntelliSense, and debugger integration with Visual Studio.
  7. The Erlang community is relatively small, and disjoint to the Microsoft developer community, and likely not interested in Erlang for the CLR.
There are, however, some consoling things:
  1. The CLR has an excellent JIT, and Erlang has not been exposed to an excellent JIT. Because Erlang is essentially a dynamic language, I would have dismissed this--surely Erlang has more to gain from a better garbage collector than a JIT? But anecdotally (from a CLR architect) IronPython's solid performance is due to both the JIT and the GC.
  2. Erlang libraries are mostly implemented in Erlang, and not C. This means porting the relatively small Erlang runtime system should give immediate access to OTP, Mnesia, and other big Erlang libraries.
  3. The CLR supports asynchronous IO with thread pools and IAsyncResult. The lack of asynchronous IO was one of the factors making Erlang implementations on the JVM impractical--Erlang's thread scheduler is built around asynchronous IO, and the JVM's wasn't. (Since Java 1.5, java.util.concurrent has been addressing this issue though.)
  4. Erlang's dynamism is relatively constrained. For example, Erlang's eval takes a list of bindings for the evaluator, as opposed to JavaScript's eval which can access variables in the callee's scope.

3 Comments:

  • I read your article twice, with some time in between to let it sink in.

    Both times I thought afterwards that this is a silly idea.

    However I might not grasp what you are actually trying. To me it looks like you want to compile Erlang sources into CLR bytecode, somehow raping an Erlang VM into it. How this will yield cheap concurrency, distribution and code loading during run-time I have no clue.

    By Blogger Bär, At September 18, 2007 2:16 PM  

  • Given the enormous difference between Erlang processes and CLR threads, it would seem you will have to treat CLR threads similarly to how Erlang treats OS threads: you are going to have to have schedulers that run in the CLR threads and multiplex the Erlang processes. Therefore, you are going to have to write the schedulers (and hence, most of the rest of the VM) in a language already running on the CLR. Given how much slower even the best CLR language is compared to raw C, you'll have to take a speed penalty. I don't think there is any other solution that would leave you something resembling Erlang in the end.

    -- wingedsubmariner

    By Blogger Stephen, At September 28, 2007 9:23 AM  

  • stephen: With the exception of some specialty domains (heavy duty bit-fiddling and number crunching) a carefully-written program in a CLR language will generally perform on par with an equivalent C program. Writing naively, without regard for performance, can really bite you, but the same applies in C as well!

    That said, I sympathise with the skeptical responses. There is a huge gulf between Erlang's and the CLR's execution models. It would be relatively straighforward to port the BEAM interpreter to CLR, and probably even to compile to byte code, ala HiPE => native code. But this has been done before for other functional languages, with little success, because it fails to address a wide range of interoperability issues. Erlang would simply be its own little universe, running on top of the CLR and incurring a significant performance hit due to impedance mismatches (e.g., generational vs. per-process GC), but reaping few of the benefits.

    F# is a good example of making a functional language work well with the CLR. It has very strong language-level integration with the CLR type system, enabling calls and data to seamlessly flow to and from other CLR languages. However, F# is derived from OCaml, which already has a rich object-oriented type system and who's concurrency model dovetails quite nicely into .Net's. The gap with Erlang is much greater, perhaps requiring something more like the Java binding.

    I don't buy the processes-as-objects idea. Processes are active; that is, they have their own execution context and can act independently of the rest of the program; CLR objects are little more than data buckets with code attached to them (processes are actors; objects are puppets). On a minor note, CLR objects are subject to garbage collection, whereas Erlang processes aren't. This might cause substantial pain for C# programmers who create gazillions of Erlang 'objects', just expecting them to go away all by themselves. Perhaps this could be resolved by killing processes when the GC comes reaping, but that sounds like curing a headache with a bullet.

    The crucial distinction, IMO, is that Erlang fixed the most important problem (concurrency) first. Almost every other language/runtime on the planet left it till last. We all wear the cost of these bad decisions every day (e.g., every time you get an hourglass and a hung UI). Porting Erlang to the CLR threatens to push that pain back into the one language I know that got it right.

    By Blogger cartoon camels, At February 5, 2008 4:55 PM  

Post a Comment



<< Home