Tuesday, November 6, 2007

Effective .emacs

Tip #0: Use Emacs 22
Emacs 22 is super stable. About half of my .emacs file (before I cleaned it up) was loading stuff that's now part of Emacs 22 and has autoloads.

Tip #1: Never quit Emacs
Okay, this has nothing to do with your .emacs file, but I have to put it in here. Just because your .emacs file should load quickly doesn't imply that you should quit and restart all the time. Figure it out!

Tip #2: (require 'cl)
I put this at the top of my .emacs. It's a no-brainer. It adds in a ton of compatibility with CL, so that you can just use the CL functions you know and love (well, most of them, anyway), without a second thought.

Tip #3: Never LOAD, never REQUIRE
Your .emacs file shouldn't contain any calls to LOAD or REQUIRE (which are slow and often cause errors on startup). The only possible exceptions are loading files that contain nothing but autoloads (or similar stuff). How do you avoid loads and requires? First try removing each call to LOAD or REQUIRE to see if it's needed at all. Often (e.g., if you follow Tip #0) Emacs already has autoloads in place for the library already (e.g., "cc-mode"). For other libraries, where that's not true, put your own autoloads in your .emacs file. For example, rather than load SLIME in my .emacs (so I can bind the F1 key to SLIME-SELECTOR), instead I have:
(autoload 'slime-selector "slime" t)
The only call to LOAD in my .emacs file is for "erlang-start", but if you look inside the file you can see it contains only autoloads (and morally equivalent stuff). I also load the custom file, but that's different, see Tip #7. I don't have a single call to REQUIRE (beyond that mandated by Tip #2).

Tip #4: Understand and use EVAL-AFTER-LOAD
Another reason why you might have strewn needless REQUIRE and LOAD calls throughout your .emacs file is that you need to call a function from a specific library. For example, let's say you want to set your default SQL database type to MySQL:
(sql-set-product 'mysql)
If you put this in your .emacs, you'll get an error because the SQL library isn't loaded so SQL-SET-PRODUCT isn't yet defined. But before you add a LOAD or REQUIRE, stop! Instead do:
(eval-after-load "sql"
'(progn
(sql-set-product 'mysql)
;; any other config specific to sql
))
As the name suggests, this will defer calling that code until the SQL module is actually loaded. This saves startup time and prevents errors!

Tip #5: Time your .emacs
You really ought to know how much time it's taking to load your .emacs file. Use the following in your .emacs:
(require 'cl) ; a rare necessary use of REQUIRE
(defvar *emacs-load-start* (current-time))

;; rest of your .emacs goes here

(message "My .emacs loaded in %ds" (destructuring-bind (hi lo ms) (current-time)
(- (+ hi lo) (+ (first *emacs-load-start*) (second *emacs-load-start*)))))
After Emacs finishing initializing, you can switch to the *Messages* buffer and see how much of that time was taken by loading your .emacs. Mine now contributes less than one second!

Tip #6: Set background colors
Don't just stand for the default colors! Set them to what you really want. In my case I want a reverse video effect:
(set-background-color "black")
(set-face-background 'default "black")
(set-face-background 'region "black")
(set-face-foreground 'default "white")
(set-face-foreground 'region "gray60")
(set-foreground-color "white")
(set-cursor-color "red")

Tip #7: Separate custom file
It's annoying to have your .emacs file modified by Emacs' "custom" library, especially if you check in your .emacs to a source code control system such as Subversion (which you should do) and synchronize it on multiple machines. Keep those customizations in a separate file:
(setq custom-file "~/.emacs-custom.el")
(load custom-file 'noerror)

Sunday, September 2, 2007

The Icarus Ultimatum

It is unlikely that every programmer is familiar with Icarus, but I bet that almost all programmers have something in common with him. Programmers are saturated with advice not to do things, similar to the advice Icarus' dad gave him about aviation. Don't use threads unless you really know what you're doing (and then don't use them anyway.) Don't use new language features (they're too dangerous.) Use the "right tool for the right job" (i.e., not the one you like.) Don't "prematurely" optimize (note: it's always premature.) Don't use macros, nobody will be able to read your code. Don't use multiple inheritance. You ain't gonna need it (you dork!) Don't, don't, don't; no, No, NO! For the rest of this essay I will refer to such advice as "Dead" advice, in honor of Icarus' father, Daedalus (and also to honor my own lack of spelling ability.)

Let's remember one thing about Icarus. Yes, he died a fiery, horrible death. But at the same time, he flew close to the freakin' Sun! How cool is that? Likewise, the most exciting developments in software have been crazy stupid. I remember how quixotic the notion that you could index and search the entire Internet (or even a significant part of it) once seemed. Only by thinking big does anything cool get done, it doesn't get done by sitting around pondering all the things you shouldn't be doing.

Dead advice exists for many reasons, of course. Programming is so open-ended that it is easy to create unmaintainable, unreliable, slow software. Programs start out as a blank slate; one can do anything. Without discipline, anything turns into unmitigated crap. Thus experienced programmers observe mistakes (including their own) and, sometimes, craft memorable axioms as a result. Unfortunately, these well-intentioned rules-of-thumb often turn into blindly-followed commandments by the time they penetrate the consciousness of millions of programmers.
For example, the original advice about premature optimization has become twisted over the years, such that programmers are afraid ever to optimize lest their effort be labeled "premature". Similarly, Microsoft's infamous detour into Hungarian notation started with a misunderstood paper by future astronaut Charles Simonyi. But this sort of telephone game isn't the only source of Dead advice.
"It's not guns that kill people, It's these little hard things!"
--- the Flash from "The Trickster Returns"
Sometimes Dead advice emerges not from a misinterpretation of the original advice but from a failure to perceive a problem's underlying cause. The treatment of threads in computer science illustrates this phenomenon well. Typical computer science programs teach little, if anything about threads; threads are an "implementation detail" with minor theoretical import (compared, say, to data structures, Big O, recursion, etc). What one does learn about them, though, makes them sound cool. Heck, let's face it, they are pretty darn cool. They're the expression of multitasking; a piece of magic. Junior programmers are disappointed when, on their first job, their mentors seem ready to pour boiling sulphuric acid on their keyboard if they so much as mention the possibility of using a thread. "My advice is not to use threads unless you're extremely experienced, a thread wizard, and even then don't use them." That's what they'll hear.

I'm going to step out on a limb here, insert earplugs to drown out the chorus of "boos" that I can already hear even though nobody's read this post yet, and state: there's nothing wrong with using threads. This is a big reversal for me; I spread the threads are evil gospel for many years. Yet I have confidence in this assertion for a number of reasons, foremost among them being that the admonitions against threads haven't helped much. How many applications have you used where the user interface freezes up every time the application does something that takes longer than a second or two? How many times have you found Cancel buttons (you know, those ones you're trying to hit when you realize you're accidentally deleting all of your un-backed-up files?) that take minutes to have any effect?

If anti-thread prohibitions haven't worked, what would? I've been casually researching this issue for a long time. At first I decided, like many, that threads are indeed evil, and that event-driven programming was the only way to go (programmers love nothing more than a dichotomy.) When other programmers would point out to me that event-driven programming is, at times, a giant pain in the rear, I'd silently mark them as wimps. Finally, I read a series of papers, including this one, that convinced me there was, at least, more to the story. Even more recently, someone clued me in to the work of Joe Armstrong, one of the only people outside of academia to have stepped back from the problem of threads and tackle, instead, concurrency, which was the real issue all along. By far the coolest thing Joe did was realize that you can in fact fly close to the Sun and spawn as many darn "threads" as the problem you're solving truly warrants. You can also use events, too, there having been, it turns out, no dichotomy between events and threads either.

I found this revelatory after having wasted a lot of time, over the years, writing things like HTTP parsers in frameworks that either force you to use only events, or frameworks that let you spawn threads easily (though not necessarily efficiently) but have no standard event-passing mechanism. It's not just that I didn't think of it myself, it's that almost everyone I talked to about this sort of issue was either in the "threads are evil" camp or the "events are confusing" camp. I wasted a lot of time following bad advice.

Experiences with Dead advice such as "threads are evil" have led me to question other Dead advice, such as:
  • Premature optimization is the root of all evil.

  • Never use multiple inheritance.

  • YAGNI.

  • Never use macros.
Each of these have a kernel of truth to them, but are also too easily misunderstood and have not had the overall effect on programming that was originally intended. If I have some time I'll try to give more examples of each of the above. In the meantime, I do have to mention one piece of (arguably) Dead advice that I haven't found any fault with yet:

Saturday, August 25, 2007

Macrophobia

As a fledgling CL programmer, I've found macros to be one of its most enduring charms. Yet it is also one of its most criticized (or at least feared) features. Much like threads have become the bogeyman of the programming world, so macros would be too, if more than a handful of programmers actually knew about them. Among this handful, though, it would be nice to dispel the myth that macros are too powerful.

I mean, obviously, they are, in some sense, too powerful. The common complaint about them is that junior programmers (especially) can create what appear to be language-level constructs that confuse the heck out of everyone else, impairing the code's "readability". As with most broadly-accepted maxims (in this case, broadly accepted by the 15 people in the world who actually know about the issue in the first place), it is both well-intended, intuitively reasonable, and incorrect, for the simple reason that if, at your company, junior programmers are running amok with no mentors to teach them, the code will be unreadable whether or not it uses macros.

Frankly, I'd like to dispel the whole myth that there's great import to be placed on the "readability" of a programming language. I see blog after blog, each about as scientifically sound as my 10th grade Chemistry report (in case you're wondering, I'm not a chemist), about whether Perl is more readable than C, or vice-versa, or whether Python is more readable than CL, etc. This tiresome, endless debate stirs the fanboy in all of us, but is missing not just the forest for the trees, but the solar system for the planet.

I've read a lot of code in my day, I'll have you know. What matters most to readability is, first, comments, followed by identifier names, followed by there being as little code as possible, followed by program structure. Ok, I can hear the chorus of "boos" in the background at the preceding statement. All I can say is, most programmers have no idea how to write good comments (even though, as they say, the "force" is with them; they just have no idea, and lack a Yoda-like figure to explain it to them). When comments are good, there is nothing quite like it. The code could be written in TECO for all I care. Maybe blog, will I, about it, some day.

So macros, in fact, cannot really get in the way of readability. In fact, they have the potential to greatly enhance readability, used judiciously (and as pointed out earlier, if your company is not run judiciously, you have larger issues to work out first). How many of you out there have worked somewhere that had some kind of coding guidelines? I'd warrant that a fair number of you have. I certainly have. I'm talking about stuff like Ellemtel's Programming in C++, Rules and Recommendations.

Give those rules a once-over, if you haven't already. It's pretty complicated stuff. Use this feature this way, use that feature that other way, don't do this, do do that, etc, etc. Yawn. I've worked at companies that successfully enforced such rules (at least 80% of the time), and some that failed utterly. In both cases, it was wickedly hard to enforce these rules (successfully or not). All programmers had to read the coding guidelines document, understand it, remember it, and periodically re-read it when it was updated. Senior programmers had to enforce the guidelines during code reviews (which was sometimes easier than other times, depending on the stubbornness of the reviewee).

At more sophisticated shops, one could imagine, at least, that some of the rules would be enforced by build scripts. A poor solution if ever there was one. For one thing, build scripts tend to be written in a different language (e.g., Perl), and usually resort to crude regexp-matching in order to catch offending code. Only a handful of programmers at your company are brave (or foolish) enough to grok the build scripts. These poor souls find themselves, before too long, at the bottom of a pit of despair from which they never return (save by leaving for another team or company). Eventually they label themselves with self-deprecating monikers such as "build bitch".

Macros afford the possibility, at least, that some of these conventions be enforced within the language itself, rather than by a bunch of hacky Perl scripts (or no automated process at all). Take, for instance, the use of CL's DEFCLASS. Now DEFCLASS has roughly one grillion options for how to do this, that, or the other thing. It's also remarkably lax about, well, almost everything. If you want a member variable ("slot") called FOO but wish for the accessor to it to be called BAR, you can do that.

If you wanted to prevent such wanton creativity on the part of your less-trustworthy programmers, you could do so by writing a macro wrapping DEFCLASS which might both limit its complexity and enforce constraints such as the regular naming of accessors and the like. You could prevent the use of multiple inheritance if you found it too frightening (I'm not suggesting you go out and do this, just pointing out that you could). These rules would be enforce using code written in the same language as everything else, making them easier to write (in that they can harness the reflective power of the language itself, i.e., no hack regexps) and easier to find volunteers to maintain and develop.

I could go on and on about this, but it's getting late so I'll just wrap it up there. One last thing, though, I urge the readers of this blog (all three of you) to re-evaluate the various maxims you may have assimilated over the years. Are threads really that bad if there is a problem facing you that is genuinely concurrent? Are you really optimizing prematurely? Is there really such thing as a "scripting language"? Should operating systems and programming languages keep each other at arm's length? The list goes on.

Friday, August 24, 2007

Proactive Optimization

In the programming world, one of the most widely repeated maxims is, "Premature optimization is the root of all evil." Having found this to be one of the most well-intended yet misunderstood pieces of advice in the history of the Universe, I was all set to write a blog about why. Unfortunately, for me, a quick trip to Wikipedia led me to this article: The Fallacy of Premature Optimization, by Randall Hyde, which said everything I intended to say, and more, and did so more eloquently that I could ever hoped to have (although I might have put in some better jokes).

So the only thing I really have to add to that article is the following crude attempt at marketing: let's put a little spin on that phrase and turn it around! Whenever you hear "premature optimization is the root of all evil", retort (politely) with "but proactive optimization is the root of all goodness."

Sunday, June 17, 2007

CL-ROGUE

Over the last six months I have been porting the 1981 version of Rogue to CL. My intention was to learn the basics of CL; it's a straight port of the game. I wish I had thought of porting something sooner (especially Rogue, which I once ported to PC/GEOS), but for some reason this didn't occur to me until recently. It is a great way to learn a language if you don't have much time. I heartily recommend it (over, say, trying to think of a "pet project" to code up in the new language, which is cool, but requires a substantial investment). With porting, the project is already written and working, and you can pick it up and drop it frequently (if you have only a few hours a week, for example) without losing too much context (the original authors did all the hard work!) In the case of C and CL, porting was a good way to learn all the side-effect-y features of CL that are often glossed over in CL books (which focus more on functional programming).

Having spent most of my career doing C/C++/Unix (and a bit of C#/Windows), it is interesting to see first-hand the alternative road that CL would have taken us down. One cannot compare C and CL directly, because in CL the operating system and language are indivisible. That is probably its biggest strength. By contrast, although C is the "first class" language of Unix, it was not treated especially royally. For example, you cannot use C directly in the command shell. When I was first learning Unix, it was confusing that I couldn't call C functions from the so-called "C Shell".

At first, CL's REPL seems alien. Once you realize it is the same thing as the command shell, all makes sense. For example, I had a little difficulty understanding what ASDF was until I realized it's equivalent to make or ant. Many CL functions and packages correspond to Unix utilities that you'd normally have to run from the command-line. Likewise, the lack of an Eclipse- or Visual Studio-like debugger at first seemed to be a major weakness, until it dawned on me that you're always in the debugger in CL.

Porting C to CL was mostly mechanical and surprisingly easy. What was most intriguing was that some relatively obscure-sounding features of CL proved extremely useful (unfortunately I didn't discover this until late in the game). In particular, SYMBOL-MACROLET and DEFINE-SYMBOL-MACRO helped immensely (or at least they did once I figured out how to use them!)

The port itself is here.

Monday, March 12, 2007

Everything in Moderation

On the night of November 18th, 1849, lovable, bearded mountain man Grizzly Adams found himself stranded on the Alaskan tundra in his simple canvas tent, with nothing to do but wait out the blizzard raging all around him. At such times Adams was wont to compose computer programs in his head. Sometimes he would write the code on his notepad, and when next he found himself in a larger town, such as Fairbanks, he would have it telegraphed to the nearest computing facility. On this particular night, however, he did something extraordinary; he conceived of a brilliant improvement over assembly language, until then the only option available to programmers on the Difference Engine MXVII. His conception was the "C" programming language, which has dominated the programming world ever since (along with its minor variants, C++, C#, Java, Smalltalk and Lisp). C is a coating of syntactic sugar over Difference Engine Assembler (DEA). For example, C's "++" operator corresponds to the DEA ADD-YE-THE-NUMERAL-ONE-TO-THE-NUMERAL-IN-QUO operation code (which was a real b***h to write, by all accounts). C's other statement types translated readily into DEA, too. For example, the "for" loop reduced 30 lines of DEA (on average) down to about 10 lines (on average) of C: a big savings in those days, when one had to write code by hand, telegraph it to a computing facility (at a penny a word!), and wait patiently for the results.

Computer Science being the conservative field that it is, programming languages have changed little in the 158 years following C's inception. A few small enhancements have been added by C's successors, but these have been relatively minor, and have upheld the "spirit" of C. For example, Danish inventor Bjarne Stroustrup created "C++" by adding to C the concept of a "class", a convenient syntax for limiting the scope of global variables. Meanwhile, other language designers took different tacks. Whereas Stroustrup created a "higher-level" C with C++, Stanford professor John McCarthy created a "lower-level" C, which he named "Lisp", wherein one programs directly in the intermediate representation (aka "abstract syntax tree") that C compilers generate. Similarly, Alan Kay's "Smalltalk" is a minor variation of Lisp: it represents the C abstract syntax tree using fewer parentheses. Although Smalltalk and Lisp remain exceedingly close to C, they are the most daring of any new programming languages. Perl, Ruby, ML, Haskell, Python, SNOBOL, and others all fall somewhere on the spectrum between C and Lisp/Smalltalk.

Lest you worry that I'm suggesting progress in the field of programming languages has been glacial, fear not! I commend heartily the intentionally moderate pace of development. It has been for a good cause, after all: ensuring that programmers do not misuse overly-powerful language features which they do not understand. By limiting programming languages to the small set of primitive operations that Charles Babbage developed, over 160 years ago, we ensure that programmers only attempt those problems for which they are qualified. Think of these intentional limitations on programming languages as being akin to those signs you see at amusement parks, consisting of a leering Technicolor elf with a big cartoon bubble saying, "You must be this tall to ride The Bombaster". The intense feat of abstract thinking required to see how useful applications could be built from such low-level functionality acts as a sort of programmer's amusement park sign. Weak programmers recognize when they're "too short" and stay off the ride (and stay out of trouble!) This is, in fact, the cornerstone of the widespread high quality, stability and performance of modern software applications that we experience every day. So do not worry, I'm not suggesting it ever change!

That said, it is interesting, purely as an exercise, to consider what scientific and technological breakthroughs from the last 158 years could be taken advantage of in a new programming language. Let's see.

First, our basic notions about time have changed since 1849. Back then, Einstein's theory of Special Relativity, and all of the subsequent work in modern Physics, was, of course, unknown. Newtonian mechanics was the only thing Grizzly Adams would have known about, and that meant action-at-a-distance. Although Newton would not have told Grizzly that everything happened simultaneously, he would have become downright confused trying to explain to Grizzly how multiple concurrent activities would actually work. Thus C, and its variants, have no easy way to represent simultaneous activities. Everything happens in a single stream of time. For example, if you want multiple things to happen simultaneously, you have to "fake it" by interleaving them in your code. For example, take this simple email client written in C:
TextBox toTxt = new TextBox("To: ", ...);
TextBox ccTxt = new TextBox("CC: ", ...);
TextBox cc2Txt = new TextBox("CCRider: ", ...);
TextBox subjectTxt = new TextBox("Subject: ", ...);
TextBox bodyTxt = new TextBox("", 24, ...);
TextBox statusTxt = new TextBox("Status", ReadOnly, ...);
Button sendButton = new Button("Send", &doSend);
Button cancelButton = new Button("Cancel");
Button resetButton = new Button("Reset");
Frame frame = new MainFrame("Email");
frame.Add(toTxt, ccTxt, cc2Txt, ...);

// ... imagine lots of gnarly code here ...

private const ipated void doSend(Button b)
{
String[] smtpHosts = Config["smtp-hosts"];
if (smtpHosts != null) {
for (smtpHostIP, port in LookupViaDNS(smtpHosts)) {
Socket sock = Socket.Open(smtpHostIP, port);
int err = sock.Send("ELHO", ...);
if (err != 0) {
statusTxt.Update(String.Format("send error {0}", strerror(err)));
}
// etc...
}
}
}
This code functions, but suffers from problems such as the entire user interface freezing while hosts are being looked up via DNS, connections opened, handshakes shook, the contents of the mail being sent, etc. That could take a long time, and users get peevish when they cannot interact with an application for extended periods.

Since Grizzly Adam's day, Albert Einstein (and some other physicists, too, I think), frustrated by the dearth of good email clients, spent time extending Newton's concept of time. The details are far beyond the scope of this essay (or, for that matter, my own comprehension) but suffice it to say that they can all be distilled thusly: everything is relative. For example, if I am riding my skateboard down the freeway at 10mph, then an 18-wheel tractor-trailer rig bearing down on me from behind, at 60mph, will appear to me to be going 50mph (if I should be smart enough to turn around and look, that is). Of course, the driver of the truck perceives her truck to be moving at 60mph, and me to be moving at negative 50mph. Furthermore, the only way I'm even aware of the tractor-trailer's presence is due to signals of one sort or the other (light and sound) bouncing around between the truck and myself. For example, I can see the photons that have bounced my way from the gleaming chrome grille on the front of the truck, and I can hear the desperate shrieking of the truck's horn as the driver attempts to avoid turning me (and my skateboard) into a pancake.

To keep programmers from turning themselves (and others) into metaphorical pancakes, these two concepts (concurrent activities happening in their own space/time continuum, and the signals that can pass between them) have been kept strictly separate, and have been kept rigorously out of any programming languages. This has been accomplished through the tireless efforts of a secret, underground organization (the cryptically-named Association for Moderation in Computing (AMC), a group about which little is known beyond their name and the few scant facts I am about to relay to you) that has maintained a long-standing disinformation campaign:
  1. AMC agents have infiltrated all language development committees and have ensured that these groups allow concurrency and signaling primitives to exist only in libraries, if indeed they exist at all.
  2. These same agents are also responsible for promulgating a false dichotomy between the two features, via strategically placed blogs, Usenet posts, blackmail and hallway conversations. Calling concurrency "threads" and signals "events", the agents have, Jason-like, thrown a proverbial rock into the crowd of computer scientists. Half of them are convinced that threads are evil and must be obliterated in favor of events, and the other half that the exact opposite is true. These camps war constantly with each other. And while the AMC's real aim is to prevent use of either feature, at least this way the damage is mitigated.
With all of that as a caveat, one could imagine a language where this sort of everything-is-relative concept, as well as signaling, were both built in directly to the language. To save me some typing in the remainder of this essay, I refer to this hypothetical language as Relang (for RElative LANGuage; get it?) In Relang, anything logically concurrent could simply be coded that way, explicitly, without significant performance penalties. Just as in real life, the only way these concurrent activities could interact with each other would be through "signals", sort of like the photons and sound waves that physicists have discovered since Grizzly Adam's time. Each concurrent activity would probably have a data structure, such as a queue, to hold incoming signals, so that they could be processed in a comprehensible way.

If we were to imagine the above example code re-written in this new language, each of the UI objects would exist in its own space/time continuum and communicate with other UI objects by sending signals (such as photons, sound waves, folded-up pieces of paper, and lewd gestures) to them. One required change would be that lines like this:
statusTxt.Update(String.Format("send error {0}", strerror(err)));
would now read:
send(statusTxt, update, String.Format("send error {0}", strerror(err)));

Meanwhile, the statusTxt object would be sitting in a loop looking something like so:
// This function is operating in its own
// space/time continuum.
static private isolated formless void superTextTNT()
{
E = receive(infinity);
case E:
{update, txt} -> self.txt = txt; send(mainFrame, update, self);
{invert} -> self.invert(); send(mainFrame, update, self);
...etc...
superTextTNT(); // this is a tail-call optimizing version of C
}
Now it is clear that each UI element is "doing its own thing" in its own personal space, and not trying to grope any of the attractive UI elements sitting on neighboring barstools. As a side-benefit, the UI would no longer freeze up all the time.

Now, of course, you are probably thinking, "Ok, wise guy, why don't languages work this way?" Well, first off, this feature falls smack dab into the too powerful category; it would be misused left, right, up, down, and sideways (hence the AMC's disinformation campaign). Even were that not the case, however, there's this other problem. To implement the above efficiently, the language would have to be aware of when blocking (e.g., I/O) could occur. Otherwise each of those "receive" statements would tie up a real OS thread, which is too expensive in the general case. And so here is where the story takes a slightly sad turn. Why not make the programming aware of when blocking occurs? Simple, because it is impossible to do that. While I am unsure of all the particulars, I have been assured by many top-notch computer scientist that that would be flat-out, 100%, totally, and absolutely, impossible. No way in heck it could ever be implemented. Oh well.

On the bright side, while the language feature is unavailable, there is an easy work-around. Use just two threads of execution. One creates a "user interface" (UI) thread and an "everything else" thread. Then one ensures that only UI-related activities occur on the UI thread and that, well, everything else occur on the other. In practice, this has turned out to be a model of simplicity (as anyone who has done UI programming could tell you, it is trivial to separate UI from "everything else") and is a big part of why most applications with graphical user interfaces work so flawlessly. It is unfortunate that one uses threads at all (they are deprecated for a reason, you know), but at least there are only two, and they can be implemented in a library, without polluting the language itself. Further, because there are only two threads, signaling is kept entirely out of the picture; that extra bit of complexity is unnecessary with so few space/time continua to keep track of. In fact, the only downside is that two or more UI elements cannot update themselves simultaneously, since they are both operating on the UI thread. Fortunately, I have never heard of a graphical application that actually required this. Let us hope that such a beast never comes crawling out of the proverbial swamp.

Speaking of UI, let us now consider another as-yet untapped technological advance: modern display technology. Back when C was first invented, all programs had to be written down in long hand by the programmer, and then a telegraph operator had to manually translate it into Morse code (unless the programmer happened to be at the computing facility). C's extreme terseness stems from this. All languages inspired by C have kept true to its roots; they are easy to write longhand, too, just in case one has to write software without a modern display handy. If we were to give up this requirement (a bad idea in practice, mind you), how might a language make use of this Faustian feature?

Since the features that have been added to C (e.g., object-oriented programming) have focused on new ways to scope data, why not start using, say, color for scope, too? For example:
int i = 0;
private explicit adults only int foo()
{
return i++;
}
This trivial example shows how one can avoid typing "static", and make it a little easier to see where the variable is used. You could imagine more complicated uses of this, though:
public mutable implicit void CrawlSites(String hostsFile)
{
try {
Stream hostStream = File.Open(hostsFile, "r");
String host;
int port;
while (host, port = Parse(hostStream)) {
sock = Socket.Open(host, port);
CrawlSite(sock);
}
} catch (IOError fileError) {
Blub.System.Console.ErrorStream.Writeln("unable to open hosts file: %s", fileError.toString());
} catch (IOError socketError) {
MaybeThrottleFutureConnections(host, port);
Blub.System.Console.ErrorStream.Writeln("unable to connect to %s:%s: %s", host.toString(), port.toString(), socketError.toString());
} catch (IOError parseError) {
Blub.System.Console.ErrorStream.Writeln("error reading from '%s': %s", hostsFile, parseError.toString());
} catch (IOError socketReadError) {
Blub.System.Console.ErrorStream.Writeln("error reading from host '%s':%d: %s", host, port, socketReadError.toString());
}

One could do more than just set the color of the variables themselves. For example, one could use the mouse to right-click on an entire line and set its background. Say you want to make sure that nobody messes with the log object in the following function:
private static cling void foo()
{
Log log = new Log("foo starting");
// ...imagine lots of stuff here...
log.End();
}
The rule would be that lines with the same background color would all be in a special sub-scope that was unavailable to anyone else. If someone accidentally added in some code that referred to "log" in the middle of the big, hairy function, it would be flagged as an error.

As stated earlier, this feature (and related features that would require a graphical user interface) is kept out of programming languages for a reason: we might have to start writing code without computers around, someday. If languages themselves depended on a GUI, society would probably collapse while programmers scrambled to re-write everything in C or one of its variants.

A friend of mine thought of a solution for this problem, but it is so far "out there", I'm only going to mention it in passing: one could use XML to "annotate" the hand-writable parts of code. He noticed noticed this as he was typing in the examples such as the ones above, in fact: he was doing stuff like... [Editor's note: Ok, dammit, here's the problem. I can't figure out how to "quote" XML in this dang blog. It keeps getting interpreted as HTML, and so, while I'd like to show you what the code would look like the way I envision it, I'm going to have to hack around the problem. Instead of actual XML, I'm going to use a mock XML that uses parentheses instead of less-than and greater-than. It's also kind of late at night, and I'm getting tired, so I'm going to not type the end tags in. Let's call this XML variant PXML, for Parenthetical XML.] The example above, in PXML, would look like:
private static cling void foo()
{
(color :yellow Log log = new Log("foo starting"));
// ...imagine lots of stuff here...
(color :yellow log.End());
}
The way this would work, with PXML (again, sorry I can't write out the real XML as I would prefer), is that, if you were writing out the code using a pen and paper (or a typewriter, or the dusty teletype you came across in your company's storeroom while looking for a place to smoke) then you'd write out the annotations such as the "color" PXML form directly. If you were using a modern IDE, on the other hand, the IDE wouldn't actually show you the PXML (well, not unless you actually asked it to). It would just show the whizzy colors (again, unless for some reason you preferred the PXML). And of course, you could use this mechanism for something more than just coloring and scope.

For example, suppose you were a big Python fan. You could edit your code in a sort of Python-looking mode. Under the covers, the IDE would be generating PXML, but it wouldn't show that to you, after all, you're a Python fan, right? I.e., if you wrote this in Python:
def foo(a, b):
return a + b;
Then the IDE could just generate the following under the covers:
(def foo (a b)
(return (+ a b)))
Of course, in order to let you write code using any Python feature (list comprehensions, iterators, etc.), there would have to be some way for you to express those things in XML. Fortunately, as we all know, XML can represent anything, so this would not be a problem.
# Python
(x * y for x in l1 for y in l2 if x % 2 == 0])
would be:
(generator (* x y)
((x l1))
(y l2))
(= (mod x 2) 0))

[Again, sorry for not being able to write out the actual XML, which would look a lot cooler; I know that PXML is painful to look at. Darn blog! Actually, I guess since there is no fundamental difference between XML and PXML, the IDE could probably use either one, interchangeably, under the covers. But I doubt anyone in their right mind would actually choose PXML over XML, so it would not be worth the extra time to implement that.]

In any case, the AMC would look askance at any of these proposed features, so let us resume thinking about another technology!

Perhaps the biggest invention to come along has been the "Internet". No longer are computers linked together via a tenuous chain of telegraph wires and human operators, as Grizzly Adams had to endure. Modern computers are linked up more or less directly by a vast web of routers, operated not by humans but tiny elves who can move with great precision at near-light speed. It would be incredibly dangerous to make this "Internet" into a first-class concept within a programming language. Programmers would just share bad code with each other that much more easily! Nevertheless, just for fun, let's look at how things might work if one were to add such a dangerous feature. For inspiration, let us consider a feature that Sun Microsystems playfully suggested via an in-joke, of sorts, in Java's package system.

To use another company's Java package, Sun requires the following procedure:
  1. The programmer mails in a form, ordering a magnetic tape containing the desired Java package from the appropriate vendor.
  2. After a few weeks it arrives on a donkey cart, driven by a crusty old farmer.
  3. The programmer tips the farmer generously (if they ever want another delivery).
  4. The programmer must then use the cpio utility to remove the package from the tape.
  5. At this point, the package (which is in a "jar" file, named for irrepressible Star Wars character Jar Jar Binks) is intentionally not usable until you go through the further step of adding its location to an environment variable called the CLASSPATH.
  6. The programmer is now almost ready to use the package, provided that s/he restarts any Java software running on their machine, and did not mistype the path to the jar file.
  7. It may also be helpful, at this point, for them to do a Navajo rain dance and sacrifice a chicken, especially if trying to get more than one "jar" file installed properly.
Now, the in-joke that I referred to above is the following: Sun's convention for naming these packages is to use "Internet" addresses (well, actually, reversed addresses, to make it more of an in-joke). I think they were making a subtle reference to the unimplemented feature whereby this:
import com.sun.java.jaaxbrpc2ee;
would cause the Java Virtual Machine to download the jaaxbrpc2ee package directly from "java.sun.com", without the programmer having to do anything. For numerous reasons this would never work. What if your computer weren't connected to this "Internet"? Or what if you were to download an evil version written by evil hackers? It is well-known that both of these problems are completely intractable, whereas the manual steps listed above ensure nothing like this could ever happen. Nonetheless, the direct downloading way does seem like it might speed up the pace of software development a bit! Too bad there are all those problems with it.

What if writing:
import com.sun.java.jaaxbrpc2ee;
actually did download, install, and, well, import the jaaxbrpc2ee package? Ignoring the impossibility of actually making a socket connection, finding space on local disk, etc (more intractable problems, I'm afraid), I can think of two major issues with this:
  1. Security
  2. Versioning
Security would be a toughie. You might have to have an MD5 or SHA1 hash of the Jar Jar file available somewhere for the JVM to check on in order to determine if it has downloaded a valid copy. Not sure this would work, but if it did you could actually download the Jar Jar file from your cousin Harold's warez site that he runs out of his basement. That way if com.sun.java was down you'd have at least one other option.

Then there's the versioning issue. What if you were developing against version 2.3.5 of jaaxbrpc2ee and Sun released version 3.0.0? How would the JVM know not to use it? Chances are, you'd have to add some syntax to Java in order to handle this. You could use a regular expression to indicate which versions were acceptable:
import com.sun.java.jaaxbrpc2ee 2.3.*;
import org.asm.omygawd *.*.*.*;
import org.standards.jdbc.odbc.dbase2 19.007a*;

Of course, as you can see, anyone can publish a package that others could download and use directly. There could even be a free service that maintained these packages, for people who lacked the wherewithal to host their own packages. They'd just have to prove they owned their domain name. Also, people wouldn't be allowed to update an existing version, they'd have to release a new version.

Ideally, one would be able to use the "Internet" anywhere in your code, not just when importing packages. Pretty much anything could use the "Internet":
com.att.bell-labs.int32 i = org.ieee.math.pi * (radius ^ 2);
For example, if you wanted to run code on another machine, you might say (in PXML with Relang extensions):
(on com.megacorp.server11:90210
(if ...mumble... (send com.megacorp.server23:2001 '(d00d u suck))
The above would run the "if" expression on server11.megacorp.com (port 90210), and that code would sometimes send a message to yet another machine. Of course, you could use variables instead of hard-coded machine names and ports. Web servers could communicate with web browsers (if they supported PXML/Relang) like so:
(import com.niftystuff.clock 1.*)

(def clock ()
(register :clock)
(receive infinity
(update-thine-self (com.niftystuff.clock:draw)))
(clock))

(def display-time ()
(while true
(sleep 1)
(send :clock 'update-thine-self)))
;; imagine lots more code here...
(html
(body
(on browser
(spawn clock)
(spawn display-time)
(do some other stuff))))

Ah, but it's getting late, and I can see a strange van parked outside, and my dog is acting nervous (and I have heard something about an AMC "wet" team.) Maybe it is time to post this blog and go to bed! The crazy language sketched out above, is, well, just that: crazy!

Friday, February 16, 2007

Losing Big Is Easy

What's the connection between soda and the failure of programming languages? I would have thought "zero" had I not recently stumbled across a wikipedia article about a little-known predecessor of Coca-Cola. As it turns out, in 1879, a scrappy Norwegian immigrant named Jan Kjarti (who, incidentally, also coined the term "artificial sweetener") started a soda company. His company, Slug Cola (the motto was "you'll love to Slug it down!"), produced a soda that cost a little more than Coke but which actually tasted much better (note that it didn't actually use artificial sweetener, either). It's hard to even find out what it tasted like anymore, as none of the small but loyal customer base of Slug Cola are still alive. Fortunately, a noted Boston historian and Slug researcher has found a few scattered journal entries and newspaper clippings from the era. By all accounts drinking Slug, relative to drinking Coke, was something akin to a transcendent experience. Of course, since you've heard of Coke and haven't heard of Slug, you can probably guess what happened: Coke kicked their ass. One of Jan's protégés (one Richard Angel) later wrote a newspaper editorial entitled "Coke is Better", about the failure of their company. It was both heartfelt and poignant, and included a frank assessment of why Slug Cola failed. He pointed to the fact that, just as Slug was making a few gains in the marketplace, the US entered what became known as the "Soda Winter", a period of about 10 years where the public became irrationally convinced that carbonated water was bad for your stomach. Coke survived this period relatively unscathed, in part aided by the fact that people felt that some of Coke's ingredients actually offset the purported stomach-damaging properties of carbonation. Aside from that, and perhaps more importantly, Coke was cheaper to produce, allowing Coke to expand more rapidly into new markets. Try as they might, the proprietors of Slug Cola just couldn't convince members of the public (who had invariably tasted Coke already) of Slug's merits.

Well, that's most of the story, at least. At the very end of the wikipedia article, the author points out a fact that, once I read it, I could have kicked myself for not having figured out myself. I mean, it was a long article and went into all this detail about how the company failed. And not once did it dawn on me, and this is the part the author pointed out, that the name of the product might be the most significant factor in its downfall. Slug! Duh! Of course nobody wants to drink something called Slug! I mean, you can try to tell them that Slug refers to the concept of eagerly drinking (i.e., "SLUGging down") a transcendent carbonated beverage. But I just don't think you can overcome first impressions. And, let's face it, the first impression of Slug is of, well, a slimy, oozing insect. Of course, I'm guessing all of you reading this were way ahead of me this whole time, so I bet you think I'm pretty dumb. All I can say is, go read the article and you might get a sense for how I could have failed to notice the obvious. The story just got so poignant, convoluted, and interesting, that I experienced a momentary lapse of common sense.

Naturally I couldn't help, at this point, thinking about other poignant and convoluted stories that I've read about, and, like a flash, something that has probably been in my mind for years now hit me like a ton of proverbial bricks! The reason why my 3 favorite programming languages failed is that their names sucked ass! Just like Slug! I don't know why this didn't hit me before, but it just seems glaringly obvious now. What's more, I did a bit of research, and determined that there seems to be an almost inverse correlation between the coolness of a language's name and the coolness of the language itself. This might explain why we're still writing code using the equivalent of blunt, hollowed-out turtle bones, bits of twigs and leaves and stuff, just like our primitive Neanderthal ancestors.

If you don't believe me already, let's look at the all-time coolest programming language names:

C. Yup. C. C is cool. It's mysterious. It's cryptic. It's one syllable. It could even stand for Cool. In my mind, there is no better name for a programming language, and never will be (any attempts to replicate its coolness by picking other, single letters of the alphabet, such as D, will come off as mere me-too-ism). I remember in college, everyone wanted to know C. People would ask questions in CS211 that they already knew the answer to, just so they could mention some feature of C that they'd learned. I think the name has a lot to do with it. The language itself was my favorite programming language for many years, until I finally started learning more about real programming languages.

C++. You can't improve upon the name C, but C++ was about as good as you can get. I think C++ kicked the ass of its nearest competitor due to its name. What would you rather use (if you didn't know anything about either language's technical merits): "C++" or "Objective-C". Obj-jeck-tive-See. Talk about clunky. I write code in Ob-jeck-tive-See. By the time you get to the third syllable you've just lost people's attention completely. They'll be staring off in another direction asking questions like, "is there a breeze in here?"

FORTRAN. Yup. You may hate it, but it sure is popular, even to this day. FORTRAN always sounded cool to me, before I learned anything about it. It sounds like "fortress", but with a cool sci-fi twist to it.

Java. Gotta hand it to Sun for picking a kick-ass name. It's short, it's cryptic, it's friendly; it implies a connection with a caffeinated beverage, something near and dear to many programmers' hearts. Its success tracked the explosive growth of upscale coffee shops, such as Starbucks. Yep, Java's stratospheric success has much to do with its name.

Now let's look at the all-time loser names.

Coming in with the 3rd worst name for a computer language, of all time, is Erlang. I write code in Ur-lang. I am caveman. I grunt and say Ur a lot. After laughing at you for a while, anyone you're trying to convince of Erlang's merits next question is, "Why is it called ERlang? Does it stand for Ericsson Language?" You're pretty much sunk at this point. It would be like if you were trying to get people to use MicrosoftLanguage instead of C#. Wouldn't happen. They'd just feel too silly.

Coming in with the 2nd worst name of all time is... Smalltalk. This list just gets sadder as we get towards number one, doesn't it folks? Because the languages just keep getting better. Let me ask you this, folks: Smalltalk was designed by some of the best and brightest computer scientists of all time. Numerous of them have won Turing awards and other accolades. Why wasn't someone amongst them smart enough to point out this dead obvious fact: something called SMALLtalk is never going to be successful? I remember when I took Intro to Programming Languages and we were supposed to use SMALLtalk at one point. I just couldn't believe it. This is America, folks. Bigger is better. I wasn't about to use a puny, wimpy language that went so far as to point out its diminutive nature in its own name! I remember I just suffered through the SMALLtalk portion of the course and didn't pay the slightest attention to any of the language's merits. I mean, even if you ignore the SMALL aspect, what does "small talk" actually mean? It refers to trivial, banal conversation. Who wants to engage in banal conversation? Does that mean the messages you send between objects are trivial and banal? Sigh. It only gets worse. Smalltalk's latest incarnation is called Squeak. That's right, it's named after the sound a small rodent/pest makes. Might as well call it Slug.

Last but certainly not least, here is the worst programming language name of all time: Lisp. OMFG this is a bad name. Bad bad bad. What does "lisp" mean? Lisp means "speech impediment"!! Do you want a speech impediment!? I don't think so! Back in college I had even less patience for Lisp than I did for Smalltalk. I mean, when it came down to it, I'd rather at least be able to make small talk, at a party, without having a lisp. Hint to John McCarthy: next time you come up with something brilliant, name it after something POSITIVE. Geez. And, as with Erlang, it just doesn't get any better when you try to explain why Lisp is called Lisp. It's short for LISt Processing... Get it? Isn't that funny? I don't hear you laughing. Yeah, 'cuz it's one of the worst puns ever. And not only is it not funny, guess what, most programmers don't actually think they're going to do much "list processing", so they're like, "maybe it IS good at List Processing, but I could give a flying monkey's posterior because that's just not what I want to be working on" (these programmers would much rather be working on their ORM layer, ironically).

Phew. So there it is folks. One of the great tragedies of modern computer science turns out to have such a simple, prosaic explanation. I would be more surprised that nobody else has ever mentioned this before, except that it took me 16 years and an obscure wikipedia article to see the light, so I guess I shouldn't expect anyone else to have done so, either.

Wednesday, January 17, 2007

Living Software

Steve Yegge's latest essay, The Pinocchio Problem discusses how liveness, or the QWAN, is what makes software great. Like many great ideas, his is a crystallization of something that one already knows (e.g., Emacs is uniquely amazing) but somehow cannot quite formalize. In retrospect, of course, it seems almost obvious. Now that I've read his essay, I can think of a few other pieces of software that are alive.

One is Erlang, one of the two programming languages in existence that I wish I could use professionally but can't (the other is Common Lisp). Erlang's live code migration feature allows upgrading code in a running system (such as an ATM switch, with tight availability requirements) in an extremely robust way. As far as I know, no other programming language (even ones that support something like this, such as Common Lisp) have sufficient support built in to really be able to do this safely in production systems. It also has a REPL, and is still the only reasonably well-supported language (i.e., one could consider using it without being laughed off the face of the planet) that actually tackles concurrency well.

Another is, well, any serious relational database. Although much maligned, SQL is actually a cool programming language. And the development environment provided by the database is actually a REPL and is much better than your typical dead IDE. Every time I have to do serious database programming, I actually find myself enjoying the interactivity of the experience. Databases tackle some really thorny problems (such as persistence) that most programming languages treat as a secondary concern, unworthy of treatment directly in the language itself. This is one of the major failings of most programming languages. The C# team seems to be at least trying to remedy this with LINQ in C# 3.0, although I don't know too much about it.

So I guess in addition to the qualities that Steve mentions that make software alive, I would add support for concurrency (or, more generally, explicit support for dealing with the passage of time), and persistence. There are probably a few other qualities as well (such as dealing with uncertainty, but it's too late at night to write any more about that).

Avoiding fallback in distributed systems

As previously mentioned , I was recently able to contribute to the Amazon Builders' Library . I'd also like to share another post t...