Saturday, May 8, 2010

Tcl operations, and many different zeroes

One reason for the title of this blog that I've explored fairly little up to now is that I'm one of the gnomes who make the Tcl programming language go. For this reason, many of my computer applications, when they start up, depart from a script - a Tcl script.

Last night, a colleague pointed me to a thread on that begins with a misunderstanding of how data are interpreted in Tcl, and diverts into a screed that Tcl is "objectively bad."

Fundamentally, the original misunderstanding appears to be one of notation. The original poster complains how Tcl is 'clumsy in dealing with binary data.' He has a string comprising a single character, for example 'B', and he expects to be able to write, [expr {$char & 0x80}]. Needless to say, that expression fails:

can't use non-numeric string as operand of "&"

Why is this? Well, fundamentally, in Tcl, Everything Is A String - it's one of the language's guiding principles. That principle means that there is no difference between the string "12" and the number 12: they comprise the same characters, hence they are the same entity. A side effect of this rule is that there is no difference between the character 0 and the number 0: again, they are the same string of characters. This unity is a tremendous convenience when you are using Tcl as a shell (or for writing scripts that serve the same purpose as shell scripts): you don't have to add a lot of excess quotes just to distinguish the types of things.

It gets in the way only when you are trying to do low-level stuff like parsing, such as is done in C:

for (i = 0; (c = str[i]) != '\0'; ++i) {
    /* do something with the character c */

Tcl allows similar processing that (in my opinion) is just as convenient:

foreach c [split $str] {
    # do something with the character c

But inside the loop, the character c is a one-character string in Tcl; in C, it's an integer whose value is the representation of the character in ASCII (or some other encoding).

If you do want, for some reason, to iterate over the bytes of a string as integers, it's easily done:

binary scan $str c* chars
foreach c $chars {
    # do something with the integer c

If there were demand for it - and as far as I'm aware, nobody's ever asked - Tcl's expression engine could be modified to accept single-quoted characters as integer constants. It appears that to Tcl's users, the existing facilities to manipulate bytes are Good Enough - partly because of Tcl's culture of extensibility. If you need high-performance, low-level programming, you do it in C and provide a Tcl API.

And so the original poster's question could be easily dismissed as "not understanding the language" - were it not for the somewhat vitriolic posts that followed. The one that rankled in particular said,

Well, TCL is an objectively bad language because it is (used to be ?) a true source code interpreter. So, syntax errors are not found until containing them code is executed.

For example, in VLSI synthesis may take several days, and it's quite a pity if the whole process ultimately fails because of a silly syntax error at the script end.

That's why static TCL checkers have been invented ...

"Objectively bad?" Oh, my.

Given that static Tcl checkers like Nagelfar do exist, most of that poster's screed can be reduced to a complaint that Tcl doesn't force you to use them. Surely, if you're going to invest days in a VLSI synthesis, you'll want to ensure that you've bulletproofed your program in every way possible: perhaps those who are dealing with lighter-weight tasks where you can fix a script and rerun in literally seconds have different sets of tradeoffs? Does "objectively bad" reduce to "unfit for this purpose without some extra help?"

Admittedly, I am aware of studies claiming that programmers presented with strongly-typed languages debug their programs faster. But most of them are old, and compare the strong typing of a language like Pascal against the untyped pointers of PL\1 : anyone else remember the disaster that those were? I'm not aware of any studies that both are conducted with experienced programmers (beyond the level, say, of first- or second-year undergraduates) and are evaluating modern languages. The folklore says, "the more checking up front, the better," but the actual experience with up-front program checking always seems to reach a point of diminishing, nay, negative returns. After that point, adding additional strictness simply makes the programmer jump through hoops arguing to the compiler that the program is correct before the system will deign to run it. When that point is reached seems to vary, so there's certainly still room for subjectivity in evaluating it; in my opinion "objectively bad" is far too strong a statement.

And the "strongly typed" languages fall prey to the other problem: runtime exceptions (SIGSEGV in C, NullPointerException in Java, etc.). Tcl essentially never gets those: if a Tcl script manages to cause one without help from extension code in another language, it's "all hands on deck" among the maintainers to fix the bug. Even if the script is obviously erroneous, Tcl's maintainers pride themselves on the ability to catch and report all runtime errors.

But Tcl is unquestionably not fit for all purposes. No language is. If you find that you have an application where it does poorly, I probably have similar applications somewhere, and I'd be happy to recommend other languages to you. For a good many "high-level" tasks, it seems to fit the way I think. Argue if you will that I'm not "right-thinking," but don't assert that your right thinking is objectively better without data to back it up!

(Hmm, is this attitude Tcl's marketing problem? The Tcl community seems to be remarkably free of zealots that assert that it is objectively better than its competitors. If that is a problem, it's a problem that I enjoy having; I tend to struggle to deal with zealots of any stripe.)