Category Archives: Computers

Actors/monads need stderr

I’ve been watching a number of videos about monads. They’re trendy right now. Essentially, monads are a system for safely chaining commands together. If you’ve used JQuery, you’ve used Monads. People talk a lot about the “actor model”: chaining commands together, where each command is run on a service provider, often on a different computer. In other words, distributed monads.

What people don’t seem to realize is the grandfather of all monadic systems outside the ivory tower is Unix. The pipeline that Unix users love is a system for safely chaining commands together, i.e. monads. And as the saying goes: “those who fail to learn from Unix are doomed to re-implement it poorly.”

So what does Unix have that actors don’t? A system for debugging. Namely an out-of-band logging channel known as “standard error”, or “stderr”.

Why is this important? At the end of this video, the presenter is asked for advice on debugging actors, since when something goes wrong across multiple computers it can be hideous to follow the chain of events that went wrong. I’ve seen this myself. The presenter’s response is simply that actors/monads are easier to debug than threads. That’s like saying that keeping pools of gasoline in your basement next to the furnace is safer than storing a nuclear bomb. Not helpful.

So here’s my plea: anyone who does functional programming should include a “stderr” in their monads.

Imagine if JQuery objects had a “history” function, which would point out where you wrote “foo” instead of “.foo” and suddenly got zero results. Imagine if Akka had a log which showed you the chain of events for a particular query, across computers, rather than logging unrelated events on each computer.

Programmers have gotten used to using a stack trace for debugging. But with actors, the stack tells you nothing, because the action is horizontal–across chains of messages– rather than vertical. The fix is really simple, and it’s something Unix got right.

Introducing Ozone

Here’s a demo of an awesome project I’ve been working on: Ozone, a data visualization OLAP database. If you’re a web developer and you like the demo, check out the source code and see if you’ll find it handy. This is a first cut: future versions will load faster, consume less memory, and have more features—while not compromising ease of use.

Note that the boxes that zoom when clicked are from a demo of the popular D3 graphics library. Ozone adds the ability to quickly filter and partition the underlying data.

This is a research project at Vocalabs. I’ve spent a little over ten days total on it, and I plan to keep working on it. My ultimate goal is a high-performance tool that seamlessly bridges Big Data (databases that require a supercomputing cluster) and realtime browser graphics visualization.

Kotlin vs. Ceylon

My life improved considerably when I started writing servlets in Scala instead of straight Java. But Scala is a complex language. The definitive book, Programming in Scala is as dense as K&R’s classic 300-page book on C, but is 850 pages long. I’ve been programming in Scala for over a year now, and there are still “basic” features I have to look up. Plus, compiling Scala in my IDE takes 10 seconds, versus 1 second for Java. That’s the difference between being able to try out every little change, and getting distracted when it’s time to test.

Two new languages offer most of the best features of Scala, but are intended to be simpler: easier to read, write, and compile. Ceylon is developed by Red Hat as an “Enterprise” Java replacement, and version 1.0 was just released. The other is Kotlin, developed by JetBrains, makers of IDEs such as IntelliJ IDEA and WebStorm. Its primary purpose is to replace Java for developing new features of their IDEs. It’s at Milestone 6, which is to say still under initial development, but is complete enough that the latest version of WebStorm includes a major new feature written in Kotlin.

If you look at a checklist, both languages have most of the same features. If you look at code samples, they look remarkably similar. Both will compile to both Java Virtual Machine (JVM) code and JavaScript. Perhaps this is good; both are intended to be pragmatic languages, and it looks like there is some consensus on what new language features are valuable.

For myself and Vocalabs, I’ve been looking for a “glue” language that compiles to both Java and JavaScript. Last time I tried Kotlin (about 6 months ago) it looked good but was too buggy to get working. So yesterday I took another look at Ceylon. And by “look” I mean “read some documentation”, not “wrote some code.” If you want that sort of review, look here.

Instead, here are the main differences I see which will drive use of one over the other.

Ceylon is currently more mature. It’s at 1.0, with JavaScript and JVM compilers. There will be some minor changes, but it’s ready to be used. Kotlin is maturing, but JetBrains still reserves the right to make significant changes. JavaScript is still very much an experimental feature.
Ceylon is designed for new projects, using its tool chain. The compiler outputs packages (compressed directories) instead of JVM class files. Kotlin is designed for Java interoperability within existing Java projects, and the IDE lets you sprinkle Kotlin files willy-nilly alongside your Java files.
Ceylon requires Java 7. This isn’t a big deal for most users (Android recently began to support Java 7), but it’s a deal killer in some cases. At Vocalabs, we use Azul Zing, a low-latency, big-memory JVM that is still on Java 6.

So for my purposes, these are two deal killers against Ceylon. But I’m still not ready to jump to Kotlin until it matures a bit more. Right now I’m actively writing code in four languages: JavaScript, Typescript, Scala, and Java. Kotlin wouldn’t let me drop any of these languages, but it might replace both Typescript and Java over time. It might even replace Scala, although that’s less likely since Scala lets me drop HTML directly into my code.

A gentle introduction to functional programming

Functional programming is hot right now, and I love it. This XKCD Comic promotes its old (and not undeserved) reputation as complicated and esoteric. Lisp, the original functional language, was developed in the 1960s as perhaps the first computer language for non-gearheads. It was instead designed for mathematicians. And it does a good job of presenting things in a way that makes sense to mathematicians, hence tail recursion. (Which I’m not going to explain; suffice it to say that it is catnip to people who love proofs by induction.)

So for 40 years mathematicians (and their friends, Artificial Intelligence researchers) have been trying to explain why functional programming is the Next Big Thing because it’s so easy and makes so much sense. And people who don’t dig mathematical proofs have seen every argument for functional programming as an argument against it. That is, until Google got involved in a big way.

Functional programming is writing your programs the way you write algebra problems. Sounds scary, if you forget that elementary school kids learn algebra. The name comes from the fact that you’re working with mathematical-style functions. (These days, all programming languages support functions, but in the 1960s this was novel.) To show how this differs from other programming, consider a typical imperative (non-functional) way to build one list from another list. This should look at least somewhat familiar if you’ve ever taken a programming class in your life.


function personalizeList( yourList ) {
    myList = new List();
    for (index = 0; index < yourList.length; index = index+1) {
        yourItem = yourList[index]
        myList[index] = "My " + yourItem
    }
    return myList;
}

You have a variable (index) which starts at 0 and increases until yourList has been traversed. If yourList contains ["apple", "banana", "cranberry"], then at the end myList contains ["My apple", "My banana", "My cranberry"]. This all seems perfectly normal to anyone who has learned to program this way, but there's one very counterintuitive piece, especially when you come from an algebra background:

In mathematics, variables never change once they've been assigned a value.

Imagine how confusing D = Ax² + Bxy + Cy² would be if x and y could change within the equation. So in functional programming, you avoid changing the values of variables. This makes loops impossible, hence tail recursion. But there are other ways to avoid loops. Here's how you'd write personalizeList in a typical functional language:


function personalizeList( yourList ) {
    return yourList.map( function(item) { return "My " + item } )
}

The "map" function applies a function you provide to each item in the list. Functional languages provide a bevy of functions for working with lists without having to explicitly iterate through all the elements. And they can be stacked on top of each other, like this:


yourList.without("cranberry").append("cherry").count()

This takes yourList, creates a similar list with "cranberry" removed, creates another similar list with "cherry" at the end, and then counts the number of items in the list. This follows the no-variable-modification rule: at each step, a new list is created, and each list cannot be modified. Here's an example of jQuery, a popular JavaScript library, for taking every paragraph ("p") on a web page and turning the text green, then counting the number of paragraphs:


var numberOfParagraphs = $("p").style("color", "green").length

If you pull up a JavaScript console on most web pages (at least the ones that use jQuery), this will work. A nerdy party trick. jQuery is functional because not looping lets you do lots of stuff in a single line of code, and when you're making web pages more interactive you often don't want more than a line of code.

Back to Google. Since at least the 1980s, supercomputer designers have been trying to figure out ways for to automatically convert C code that was written for single processors to work on multi-processor (and multi-core) computers. The theory was that writing explicitly parallel programs is too hard. The for-loop in the first example tells the computer to first handle the first item in the list, then the second item in the list, and so on. Meanwhile you have a variable (index) that is changing all the time. What you want on a 4-core computer is to simultaneously have each core process a quarter of the list, thus making the program up to 4 times faster. But as written, each processor would have to wait to see how the previous processor modified index.

The functional version, meanwhile, makes no guarantees about which elements in the list are processed first, and if you follow the no-variable-modification rule, you don't have to worry about coordinating variable changes between processors. Google called its programming system MapReduce. ("Map" you've seen, "reduce" is a generic term for functions like "count()" or "sum()" which reduce a list to something smaller.) The fundamental idea behind MapReduce is this: the programmer writes in terms of map and reduce instead of loops, and Google's system will figure out how to make it run. It works with anywhere from one computer to hundreds of thousands, and Google even built in redundancy so that if a computer fails during the calculation, another computer will take on the work. (When you have hundreds of thousands of computers, one of them is likely to go down at any moment.)

One of the hardest things about parallel programming is keeping data synchronized. The network is often the slowest component. The no-variable-modification rule can be inefficient on a single-processor machine, but on a computer network it speeds things up enormously.

But that's not why I prefer functional programming. For me, the no-variable-modification rule means no-tracking-down-how-that-variable-changed-in-my-buggy-code. And that saves me a ton of time.

Another blog change

Hopefully you won’t notice this one, except that my old posts are finally back. Last year I switched my blogging software to Drupal because my site got hacked and I realized that WordPress is too insecure by default, and I didn’t have the time or interest in keeping it up-to-date. I’ve been frustrated with Drupal, which is a content management system (CMS) with a blogging plug-in, and it too was too much work. It’s more secure by default, but there are still security updates, and it’s still a pain to use. So my current solution is to have a private WordPress site, which I export as static HTML. It will be more of a pain to push updates, and there will be no comments. But I shouldn’t ever have to worry about security updates. And the old (Drupal) URLs should still work.

How to get an iPhone 5

Originally published on Sat, 10/06/2012

Here’s how I’m doing it, not that I recommend this to anyone else.

Pre-order online, on the first day. Get informed that you will need to wait 2-3 weeks.
3 weeks and 1 day later, get a call from Apple saying the order has been put on hold by Sprint.
Call Sprint, wait on hold for 5 minutes. Tell them the name of your first pet.
Get told warmly that they cannot help you– as much as she wants to– because the order was made through a third party (Apple) and Apple must call Sprint’s National Sales Support Desk. They will only talk to third party retailers, “I can’t even talk to them,” she says. Helpfully, she gives you a phone number.
Call Apple, have an automated voice tell you it can help you with anything, just ask in a complete sentence. Humor the computer for a few moments before telling it you need to talk to a human being. It transfers immediately, and a woman answers on the fourth ring. Explain the situation, give her the number. She puts you on hold while she calls.
A few minutes later, she says they gave her a different number to call. She also verifies that she has your wrong phone number (home number, presumably). She calls the next number, with you on hold.
A few minutes later, the Apple rep introduces you to a Sprint rep, who wishes to ask a security question. In anticipation, you tell him your first cat’s name. After ascertaining the situation, the Sprint rep puts the two of you on hold while he talks to his supervisor. After a while, he comes back and says you must contact the National Sales Support Desk. Once informed that you’ve already done that, he talks to his other supervisor (more hold music) and he agrees to wait on hold while he places the call.
All three of you talk to a man who inverstigates and finds no good reason for the delivery of your iPhone to be put on hold. He seems almost confident that the problem can be resolved. But first, he must ask for the Apple rep’s store ID number. She has no such number, since she is a customer service rep. The man replies, “we can only validate through a store” and gives some helpful hints as to where she might find an Apple department that he’s authorized to talk to.
The Apple rep hangs up on Sprint and puts you on hold while she finds a way to communicate through proper channels. Eventually she brings you into a conference call. The other person transfers the two of you to Sprint, this time without hold music. At this point, the Apple rep explains that Apple cannot complete the delivery, so they are cancelling it, but will send you email so that when you place a new order you can respond to that email with your order number and Apple will expedite the delivery. They’ll try to get it to you as close to the original delivery date as possible. You can even order a different color or storage capacity.
The latest Sprint rep assures you that there was no reason for the original hold on your order, claims it may have been a computer glitch– and offers to remove the hold. The Apple rep explains that the order has been cancelled. It will take 24 hours for the cancel to propagate. Do not order a new phone before then, or you will be renewing a contract that has just been renewed– and the order will be placed on hold as invalid.
Wait 24 hours, or better yet, until Monday.
Hope for the best.

That’s 1 hour, 20 minutes for the second call. Mostly on hold.

Scala emoticons

There are a lot of great things about Scala, but there are a few things where that programming language just goes overboard. Consider that there are several emoticons built into the standard library:

+:= (vampire pope) for adding items at the front of a sequence
<:< (sad witch) for comparing generic types
:/ (uneasy) fold right

And that’s just the ones I know about. Unfortunately, you can’t use a search engine to find any more, because none of the major search engines index emoticons. :/ (try it). Fortunately there is a well-hidden reference page.

Thoughts on recursion

I was poking through Code Complete, 2nd Edition by Steve McConnell recently, after I’d handed the book to a computer science student. The book is showing its age, as good object-oriented design is better understood than it used to be, but it remains the best book on how to program well.

One thing that struck me, though, was a particularly strong statement against recursion, on page 397 (2nd Edition):

If a programmer who worked for me used recursion to compute a factorial, I’d hire someone else.

To put this in context, McConnell had just presented a few appropriate uses for recursion (quicksort and maze solving [i.e. graph searching]), and is railing against the simple examples one typically learns in computer science classes. He sees recursion as an advanced technique, which should be avoided unless it provides a clear advantage.

McConnell says recursive algorithms are harder to read. I don’t buy that; it depends on your audience. But his biggest beef is with the stack. When a function calls itself over and over again, the stack grows (keeping track of all the variables in all those function calls) and you can run out of memory, the dreaded stack overflow.

In many computer science departments, the first language you learn is Scheme, which doesn’t support iteration, so you can’t compute a factorial (or do any other kind of looping) without recursion. Scheme handles this with tail recursion elimination, where if the last thing a function does is call itself (i.e. tail recursion), the Scheme interpreter overwrites the stack frame– essentially replacing recursion with a loop.

Since most languages don’t do tail recursion elimination, and since it can be harder to write a function as tail recursive, I mostly agree with McConnell– despite my initial reaction. However, I was thinking about what makes some algorithms amenable to recursion while others are a bad idea.

McConnell’s examples of good recursion– quicksort and graph searching– both involve branching. Most tree algorithms are recursive: one might say a tree is a recursive data structure. The recursive algorithm uses the stack to track the branching, which is cleaner than handling your own stack. (A good programmer can replace any recursive algorithm with a stack-based one.) What makes factorials different is that there’s no branching, so the stack is redundant. So we have two classes of algorithms:

Branching algorithms, which use recursion to track which branches have been visited.
Linear (non-branching) algorithms, which should be written in a tail-recursive manner to avoid stack overflows, or written with loops when the compiler/interpreter doesn’t support tail recursion elimination.

I should add that branching algorithms are also susceptible to stack overflows, but since the stack is providing a necessary service, it can’t be avoided with a non-recursive algorithm.

So here’s why I end up agreeing with McConnell: to properly write non-branching recursive algorithms, you must (1) write them in a tail-recursive manner, and (2) run them with tail call elimination. After you write a tail-recursive function, it’s easy to accidentally modify it to no longer be tail-recursive, and the difference can be subtle. That’s why Scala provides the tailrec annotation: so a function that’s supposed to be tail recursive won’t compile unless Scala can apply the tail call elimination. If you need the compiler’s help to safely apply a technique, it’s an advanced technique, and shouldn’t be done lightly.

UCU: better than MVC

I propose a replacement for the MVC (Model-View-Controller) model for software design. The problem with MVC is primarily that View and Controller both address the same concern: the user experience. After well over a decade, software developers now know better. My new model keeps the model, but all the words are changed to avoid confusion with MVC.

UCU: Utility — Connector — User Interface

A Utility provides a well-defined service. For example, accessing a file or storing a collection of items. It has a contract which defines what it does, and its correctness is measured against the contract. That contract is some combination of documentation, code (method signatures, interfaces, etc.), and a test suite.
A Connector is how components are discovered and constructed. Thus this category includes constructors, factory methods, and dependency injection. Connectors should be simple, since a program consists of a rat’s nest of interconnected components. Bad connections are hard to test, and even harder to debug, so the only recourse is to make them hard to be buggy. Constructors should do no extraneous work, except for checking for illegal arguments. When there’s an illegal connection, the program should fail immediately. For example, a constructor for a database utility which takes a database URL should check the syntax of the database URL, but it should not try to connect to the database. (If the database is inaccessible, that’s a problem with the database, not with the program trying to access it.)
The User Interface (UI) is the part of the program that relates to how the user perceives and interacts with the program. Its correctness is measured against the user’s experience. (Many practitioners prefer the term UX, User eXperience. In this case, because you can program an interface but not an experience, I am using UI to refer to the software.) Humans have different perspectives, and their expectations may change over time. As a result, UIs are developed using rapid prototyping techniques. Like a Broadway stage, the UI should be designed for frequent and rapid changes. UI components that get reused frequently are really utilities, thus classes tend to migrate from UI status to utility status.

For more reading, here are my earlier thoughts, before UCU gelled. And here’s a Google Tech Talk on writing clean, testable code which influenced UCU.

Dennis Richie, RIP

Dennis Richie was one of the most important computer pioneers. His language, C, is at the foundation of all operating systems in common use today. If you want some code to run on every smartphone, every desktop, and every server, you write it in C. If you got rid of all the C compilers, Apple couldn’t build Mac OS or iOS, Microsoft couldn’t build Windows, and Google couldn’t build Android or their servers. It is no exaggeration to say that all the most important software in the world is written in C.

No Dave, It's Just You

Is it just me, or…