Professor Ronald Loui has an interesting article on the rise of scripting languages (In Praise of Scripting: Real Programming Pragmatism) in the July 2008 issue of IEEE Computer. It claims scripting languages such as Perl, Python, and Javascript have dramatically fulfilled their early promise, provide many benefits, and are poised to take over the lead from Java. However, the academic programming language community is stuck in theory and hasn't recognized the ascendence of scripting languages.
I agree that scripting languages are on the rise. Most people would agree that they provide rapid development, higher levels of abstraction, and brevity that helps the programmer. The article also describes how scripting languages can be a performance win, since they can allow experimentation and implementation of efficient algorithms that would be too painful in Java or C++. So even if C++ is faster on the micro-benchmark level, a programmer using a scripting language may end up with faster algorithms overall. I've argued somewhat controversally that Arc is too slow for my programming problems, so I remain unconvinced that basic performance can be ignored entirely.
As for the claim that Java is in full retreat, it strikes me as wishful thinking. (I'd believe "slow decline" though.) It will be interesting to check back on this claim in 5 years.
I personally believe that CS1 [freshman computer science] Java is the greatest single mistake in the history of computing curricula.
-- R. Loui
The article suggests good languages for teaching introductory computer science are gawk, Javscript, PHP, and ASP, but Python is emerging as a consensus for the best freshman programming language. This is the hardest part of the article for me to swallow. The idea of writing real programs in Awk never occurred to me, and I remain skeptical even though the author claims it works well. For those who would suggest Scheme as an introductory programming language, it was displaced as a dominant freshman language by Java a decade ago, and is apparently no longer considered an option.
I can't argue with the author's claim that student learning is enhanced by experimenting, writing code, and getting hands-on experience, and that scripting languages make this faster and easier.
Python and Ruby have the enviable properties that almost no one dislikes them, and almost everyone respects them.
-- R. Loui
In Why your favorite language is unpopular I discussed how the Change Function model can explain the success of programming languages based on maximizing the crisis solved and minimizing the perceived pain of adoption. I can apply this model applies to scripting languages as well:
Magnitude of crisis solved by Tcl/Tk: High - How to add a scripting language to a C program. How to add a GUI to a C program without painful X11 and Motif code.
Total Perceived Pain of Adoption: Low - Link Tcl in with your C program and add a few hooks. Create the GUI with trivial scripts.
Magnitude of crisis solved by Perl: High - How to quickly write CGI scripts. How to solve problems too complex for shell scripts. How to process files. How to develop quickly and iteratively.
Total Perceived Pain of Adoption: Low - Apart from looking like line noise, Perl is easy to get started with, is well integrated with Unix, has the definitive regex implementation, and has libraries for almost everything.
My point is that these languages solved specific painful problems and had low pain of adoption. As a result, they were much more successful than beautiful, powerful languages that were less able to directly solve painful problems or were more painful to adopt.
The real reason why academics were blindsided by scripting is their lack of practicality.
-- R. Loui
A major thrust of the article is that academics are too concerned with theoretical issues of syntax and semantics, rather than pragmatic issues of what a language can achive quickly, inexpensively, and practically. Academics are said to be too tied to theoretical concepts such as object-oriented programming and strong typing, and are missing the real-world benefits of scripting languages.
(Interestingly, Rob Pike made a similar argument against academics in the context of operating systems software (Systems Software Research is Irrelevant), stating that academic research is irrelevant and the real innovation is in industry. Since I have friends doing academic OS research, I should add a disclaimer here that I don't necessarily agree.)
One measure of pragmatics raised by the paper is how well does a language work with other Unix tools. I think the importance of this is underappreciated. In particular, I view this as a significant barrier to adoption of Arc. Running Arc as a shell script instead of a REPL is nontrivial (as is the case with many Lisp and Scheme implementations). Running an external program from Arc is clunky, even though it is often necessary to actually get things done (Kens' law), and real pipes are missing from Arc entirely.
Java's integration with Unix also has painful gaps - where's getpid() for instance? Why is JNI so difficult compared to calling native code from C#? I blame Sun's pure-Java platform independence ideology, and I'm surprised it hasn't hurt Java more.
On the other hand, Python and Perl provide a remarkable degree of integration, which I view as a key factor in their success. Likewise, Visual Basic is highly integrated with the Windows environment and highly successful there.
In conclusion, Loui's paper raises numerous interesting points about the success of scripting languages. I expect that the reasons for the rise of scripting languages will only get stronger, and languages that don't support the scripting model will have an increasingly harder time gaining adoption.
Note: quotes above are from the preprint and may not match the published article.
The total world's population of Haskell programmers fits in a 747.
And if that goes down, nobody would even notice.
-- Erik Meijer
I recently saw an interesting talk on functional programming by Erik Meijer (of Bananas, Lenses, Envelopes, and Barbed Wire fame). Among other things, he discussed why many superior technologies such as Haskell don't catch on.
Geek formula for success
He claims the "Geek formula" for success of a technology is that if a technology is 10 times better, it should catch on and become popular. Even if it is slower, Moore's law will soon make it 10 times faster.
So if Haskell is 10 times better than C and Haskell programs are 10 times shorter, everybody should be using Haskell.
Real-life formula for success
However, as Erik points out, "That's not how it is in real life." In real life, success is based on the perceived crisis divided by the perceived pain of adoption. Users want something that will get the job done and solve their crisis, without a lot of pain to switch.
This argument applies to many languages that remain unpopular despite their technical merits, such as Lisp, Arc, and Erlang, as well as technologies such as the Semantic Web and LaTex.
The Change Function
The above argument is based on the book The Change Function: Why Some Technologies Take Off and Others Crash and Burn by Pip Coburn. To summarize the book, new technologies aren't adopted because they are great, new, and disruptive; they are adopted only if the user's crisis solved by the technology is greater than the perceived pain of adoption. As a result, most new technologies fail.
The first half of the book is a bit fluffy, but gets more interesting when it discusses specific technologies that failed or succeeded. The book also goes out on a limb and predicts future winners (mobile enterprise Email, satellite radio, business intelligence software) and losers (RFID, entertainment PC, WiMax).
Languages and The Change Function
The Change Function argument has a lot of merit for explaining what languages become popular and what languages don't.
If Lisp is so great, why are there 8 million Visual Basic programmers worldwide and few Lisp programmers? The answer isn't pointy-haired bosses (since Lisp isn't popular on SourceForge either). The crisis vs. pain of adoption model provides a powerful explanation:
Magnitude of crisis solved by Visual Basic: High (e.g. how to easily write Windows applications)
Total Perceived Pain of Adoption for Visual Basic: Very Low (hit Alt-F11 in Excel and you're done)
Magnitude of crisis solved by Lisp: Low (metaprogramming, powerful macros, and higher-order functions are solutions in search of problems)
Total Perceived Pain of Adoption for Lisp: High (this shouldn't require explanation)
The same model explains the success of, for instance, Java:
Magnitude of crisis solved by Java: High (originally how to run code in a browser and write portable code, now how to avoid crashes due to memory allocation errors and bad pointers)
Total Perceived Pain of Adoption: Low (syntax similar to C++, easy to deploy)
Applying this model to other languages is left as an exercise for the reader.
Erik points out that Erlang and Haskell are now being marketed according to the second formula: there is a multicore crisis and functional languages are the solution. It will be interesting to see how much additional traction these languages get without addressing the "pain of adoption" part.
The Change Function and startups
The Change Function ends with ten sets of questions and a set of techniques for designing technologies that will be adopted; this part of the book has many ideas that would be beneficial for startups. Many of these are fairly obvious, such as "Fail fast and iterate", and have a customer-centered culture instead of a sales-centered culture, while others are more thought-provoking: "What is the user crisis you intend to solve? What are the top five reasons a user with this crisis would not use your product?" The ultimate conclusion of the book is "Figure out what people really want!", which brings to mind the advice to make something people want.
Yes, that was the big revelation to me when I was in graduate school—when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were “Maxwell’s Equations of Software!”
This quote appears many places on the web, but the code itself is harder to find. What is this amazing half page of code?
The Lisp 1.5 Manual, which was written by John McCarthy et al in 1961, is available at softwarepreservation.org. In it, the "Maxwell's equations" define a universal Lisp function evalquote that can evaluate any given function:
The above code is defined in a meta-language (M-expressions), which can be straighforwardly translated into S-expressions. Functions in M-expressions use square brackets and have arguments separated by semicolons. M-expressions conditionals are of the form [predicate -> value; predicate -> value; ...]. M-expression label is analogous to defun or define.
The point of all this is that M-expressions are the code that operates on the S-expression data, but the M-expression meta-language and S-expression data actually coincide. Thus, code and data are the same thing in Lisp, and a half-page of code is sufficient to define a basic Lisp interpreter in Lisp given a few primitives (car, cdr, cons, eq, atom). The code presents a Meta-circular evaluator for Lisp; see (SICP chapter 4.1 for more details on metacircular evaluators. (Unfortunately, this won't give you a working Lisp interpreter for free; things such as the garbage collector, the list primitives, and parsing need to be implemented somewhere. Also note that this metacircular evaluator doesn't give you niceties such as arithmetic.))
To understand the above code, apply takes a function and argument, while eval acts on a form. The last argument to these is an association list for the environment, which stores the values of bound values and function names. In brief, apply implements CAR, CDR, CONS, ATOM, and EQ in terms of the primitives. It implements LAMBDA by pairing up the variables and arguments and passing them to eval. It implements LABEL (which defines a function) by adding the function name and definition to the association list.
The code for eval processes a form in a straightforward manner. It handles the QUOTE form by returning the quoted value. It handles COND by evaluating the predicates with the help of evcon. Otherwise, it interprets an atom as a variable and returns the value. If given a list, it interprets this as a function application; the arguments are evaluated with evalis and the function is evaluated by apply.
The above code is not quite complete; it relies on some other simple functions that were previously defined in the manual, such as equals and cadr Less obvious functions are pairlis[x;y;a] pairs up lists x and y and adds them to association list a. assoc[x;a] looks up x in association list a. sublis[a;y] treats association list a as a mapping of variables to values, and replaces variables in S-expression y with the associated variables. These functions can be straightforwardly built from the primitive functions.
(By the way, I'm pretty sure the comma in eq[car[e],QUOTE] is a typo, but that's how it is in the original.)
Can Arc be used to create a Tetris-like game? In my previous posting on using OpenGL with Arc, I described how the effort to set up OpenGL had stalled my game-writing adventure. I finally got the game written, only to discover that Conan Dalton had beat me by writing Arc Tetris running on Rainbow, his version of Arc running on Java. Nonetheless, here's my implementation in Arc using OpenGL.
Playing the game
See using OpenGL with Arc for details on setting up OpenGL. To summarize, the files are in the opengl directory of the Anarki git. Run gl-arc.scm in DrScheme. Then load "tet.arc" into the Arc REPL, and the game will start.
To control the game, use the arrow keys. Left and right arrows move the tile. Up-arrow rotates the tile, and down-arrow drops it faster. When the game ends, "s" will start a new game.
Implementing games in Arc
Note that Arc itself doesn't have any graphics support, or any way to get at Scheme libraries. I hacked on Arc's implementation in Scheme to get access to the OpenGL libraries in PLT Scheme.
My first concern with using Arc was speed. It turns out that Arc on a newish laptop is plenty fast to run the game, apart from occasional garbage collection glitches. The game itself doesn't require a fast refresh rate, and the amount of processing per frame is pretty small, so Arc manages just fine.
The biggest problem with implementing the game in Arc was the lack of useful error reporting in Arc; I should have emphasized this in Why Arc is bad for exploratory programming. Every time I wrote more than a few lines at a time, I'd encounter a mystery error such as "ac.scm::18267: car: expects argument of type ; given 1" or "ac.scm::17324: Function call on inappropriate object 2 ()". I often had to stick print statements throughout the code, just to figure out where the error was happening. Reporting what is the error and where an error occurs is basic functionality that a programming language implementation should provide. An actual debugger would be a bonus.
I found that game programming doesn't really let you take advantage of the REPL for development. OpenGL is very stateful, so I can't just call a drawing routine from the REPL to see how it's working. I usually had to restart my code, and often reload the Arc interpreter itself to clear up the various Scheme objects. When I worked incrementally, I often ended up with startup issues when I ran from the start (i.e. variables getting initialized in the wrong order).
One annoyance is Arc's lack of arrays. I used tables in some places, and lists of lists in other places, but vectors and arrays seem like really basic datatypes.
Another annoyance is OpenGL's lack of any text support. If you're wondering about the lack of text in the screenshot, that's why. I implemented a simple 7-segment number-drawing function to display the score.
Some (pretty obvious in retrospect) advice to other game writers: Figure out in advance if the Y axis points up or down. I wrote some of the code assuming Y increases going down before realizing that other parts of the code expected the Y axis to point up. Also, figuring out the origin and scale in advance is a good thing.
Another tricky part of game implementation is state transitions. Both startup and end-of-game caused me more difficulty than I expected.
Implementing the game became much more rewarding once I got a minimal interactive display working. I should have done that sooner, rather than implementing the algorithms up front. On the other hand, I found it useful to disable animation much of the time, using keypresses to single-step movement.
The game has no audio, since PLT Scheme doesn't have audio support.
It took me a long time to realize that the pieces in Tetris don't actually rotate around a fixed point. Once I realized this and hard-coded the rotations instead of implementing a rotation function, things became easier.
I display the piece in motion separately from the playing field array that holds the stationary pieces. When a line is completed, instead of updating the playing field array in place, I decided it would be interesting to write remove-rows in a functional way so it returns an entirely new playing field array, rather than updating the existing array (as I did in saveshape when a piece hits bottom). Writing the whole game in a functional style would be very difficult, as others have described.
I could make the code more "Arcish" with various macros (e.g. w/matrix to wrap code in gl-push-matrix and gl-pop-matrix). But I decided to "ship early". Likewise I didn't bother implementing game levels, although the game does speed up as it progreses. And I've only made minimal efforts to clean up the code, so I'm not presenting it as a model of good style.
Conclusion
It's possible to write a playable game in Arc (with enough hacking on the language to get OpenGL support). I think it would have been considerably easier to write it in PLT Scheme directly, but the Arc implementation builds character. I was pleasantly surprised that Arc had enough speed to run the game. The Arc code works out to 389 lines for those keeping score; OpenGL is responsible for much of the length (50 lines just to display the score). I expect writing in Arc will be considerably easier if the language gets decent error reporting.
I saw a posting on news.ycombinator entitled "Take the Tetris test (an Arc version, anyone?)", which suggested writing a simple Tetris-like game. One obvious problem with doing this in Arc is the lack of graphics support. But how hard could it be to use OpenGL graphics from Arc?
It turns out that using OpenGL from Arc is harder than I expected, but possible, given enough hacking on Arc's underlying Scheme implementation. In this article, I discuss how I got OpenGL to work.
Challenge 1: How to access libraries from Arc
The first challenge is that the official Arc release does not let you access Scheme libraries or functions. Not at all. Even though Arc is implemented on top of MzScheme, there is no mechanism to access the underlying MzScheme implementation. If you want to access Scheme, you must actually modify the Arc language implementation by hacking on the Scheme code that implements Arc.
The unofficial Anarki implementation is a modified version of Arc that provides access to Scheme (as well as many other useful improvements). However, I decided to base my OpenGL project on the offical Arc implementation rather than Anarki.
I replaced the Arc implementation file ac.scm with a modified version called arc-gl.scm that gives me access to the necessary Scheme functions. The relevant Scheme code is:
(require (lib "mred.ss" "mred")
(lib "class.ss")
(lib "math.ss")
(prefix gl- (lib "sgl.ss" "sgl"))
(lib "gl-vectors.ss" "sgl"))
; List of functions to export from Scheme to Arc
(map (lambda (s) (xdef s (eval s)))
'(gl-shade-model gl-normal gl-begin gl-end gl-vertex gl-clear-color gl-clear
gl-push-matrix gl-pop-matrix gl-rotate gl-translate gl-call-list gl-flush
gl-light-v gl-enable gl-new-list gl-gen-lists gl-material-v gl-viewport
gl-matrix-mode gl-load-identity gl-frustum gl-light-v gl-enable gl-end-list
gl-scale sin cos))
; Arc doesn't provide access to vector, so make gl-float-vector
; take individual arguments instead of a vector
(xdef 'gl-float-vector (lambda (a b c d) (vector->gl-float-vector (vector a b c d))))
First, the code imports the necessary libraries. Next, it uses xdef to allow access to the list of specified Scheme functions. I included the OpenGL functions I used; you will need to extend this list if you want additional OpenGL functionality. I also include sin and cos; they are missing from Arc, which is almost pervesely inconvenient.
This contains arc-gl.scm, the updated Scheme code; arch.arc, the arch demo in Arc; and gears.arc, the gears demo in Arc. It is also available from the Anarki git.
Challenge 2: Where is OpenGL?
OpenGL isn't part of the plain vanilla MzScheme that is recommended for Arc, but it is part of DrScheme. Both DrScheme and MzScheme are versions of Scheme under the PLT Scheme umbrella; MzScheme is the lightweight version, while DrScheme is the graphical version that includes MrEd graphics toolbox and the OpenGL bindings for Scheme. Thus:
An alternative to OpenGL would be using MrEd's 2-d graphics; I've described before how to add simple graphics to Arc. However, I wanted to use the opportunity to learn more about OpenGL.
Challenge 3: Running an Arc REPL in PLT Scheme
It's straightforward to run as.scm inside PLT Scheme and get an Arc REPL. However, a problem immediately turns up with OpenGL
An OpenGL animation will lock up as soon as the Arc REPL is waiting for input. The problem is that MrEd is built around an event loop, which needs to keep running (similar to the Windows message loop). When the REPL blocks on a read call, the entire system blocks.
The solution is to implement a new GUI-based Arc REPL instead of the read-based REPL. MrEd provides text fields that can be used to provide non-blocking input. When the input is submitted, the event loop executes a callback, which can then run the Arc code. Of course, the Arc code needs to return reasonably promptly, or else things will be locked up again. However, the Arc code can start a new thread if it needs to do a long-running computation.
The following MzScheme code creates a frame, text-field for output, text-field for input, and a submit button, and executes the submitted code through arc-eval. It is analogous to the REPL code in ac.scm. I put this code in arc-gl.scm.
Load the new REPL code into the same directory as the original Arc files.
Run drscheme.
Go to Language -> Choose Language -> PLT -> Graphical. Click Ok.
Go to File -> Open -> arc-gl.scm
Click Run
As you can see from the screenshot, the REPL doesn't get any points for style, but it gets the job done.
Challenge 4: Using Scheme's object model
The mechanism above can't be used to access PLT Scheme's windowing operations, because they are heavily based on the Scheme object implementation, which is implemented through complex Scheme macros. Thus, I can't simply map the windowing operations into Arc, as I did with sin. If the operations are called directly, they will try to apply Scheme macros to Arc code, which won't work. If they're called after Arc evaluation, the Arc implementation will have already tried to evaluate the Scheme macros as Arc code, which won't work either.
I tried several methods of welding the Scheme objects into Arc, none of which are entirely satisfactory. The first approach was to encapsulate everything in Scheme and provide simple non-object-based methods that can be called from Arc. For example, an Arc function make-window could be implemented by executing the necessary Scheme code. This works for simple operations, and is the approach I used for simple 2-d graphics, but the encapsulation breaks when the code gets more complex, for example with callbacks and method invocations. It is also unsatisfying because most of the interesting code is written in Scheme, not Arc.
Another approach would be to fully implement Scheme's object model in Arc, so everything could be written in Arc. That was way more difficulty and work than I wanted to do, especially since the object system is implemented in very complex Scheme macros.
My next approach was to implement an exec-scheme function that allows chunks of Scheme code to be called directly from inside Arc. This worked, but was pretty hacky.
Minor semantic differences between Arc and Scheme add more ugliness. Arc converts #f to nil, which doesn't work for Scheme code that is expecting #f. I hacked around this by adding a symbol false that gets converted to #f. Another problem is Arc lists get nil added at the end; so the lists must be converted going to and from Scheme.
Finally, I ended up with a somewhat hybrid approach. On top of eval-scheme, I implemented Arc macros to wrap the object operations of send and invoke. These macros try to do Arc compilation on the Arc things, and leave the Scheme things untouched. Even so, it's still kind of an ugly mix of Scheme and Arc with lots of quoting. I found writing these macros surprisingly difficult, mixing evaluated and unevaluated stuff.
As an aside, Scheme's object framework uses send foo bar baz to invoke bar on object foo with argument baz. I.e. foo.bar(baz) in Java. I found it interesting that this semantic difference made me think about object-oriented programming differently: Scheme objects are getting sent a message, doing something with the message, and providing a reply. Of course, this is the same as invoking a method, but it feels more "distant" somehow.
At the end of this, I ended up with a moderately ugly way of creating Scheme objects from Arc, providing callbacks to Arc functions, and implementing instantiate and send in Arc. This isn't a full object implementation, but it was enough to get the job done. For example, to instantiate the MrEd frame% class and assign it to an Arc variable:
Here's a simple example of an animated arch displayed in OpenGL. To run this example, download the code and start up DrScheme as described above. Then:
From the Arc REPL (not the Scheme REPL) execute:
(load "arch.arc")
(motion)
The code, in arch.arc, has several parts. The arch function does the OpenGL work to create the Arch out of triangles and quadrilaterals. It uses archfan to generate the triangle fan for half of the front of the arch. The full code is too long to include here, but the following code, which generates the inside of the arch, should give a taste:
(for i 0 (+ n n -1)
(withs (angle (* (/ 3.1415926 n 2) i)
angle2 (* (/ 3.1415926 n 2) (+ i 1)))
(gl-normal (* r (cos angle)) (* r -1 (sin angle)) 0)
(gl-vertex (* r -1 (cos angle)) (* r (sin angle)) z)
(gl-vertex (* r -1 (cos angle)) (* r (sin angle)) negz)
(gl-normal (* r (cos angle2)) (* r -1 (sin angle2)) 0)
(gl-vertex (* r -1 (cos angle2)) (* r (sin angle2)) negz)
(gl-vertex (* r -1 (cos angle2)) (* r (sin angle2)) z)))
The next section of the code contains the graphics callback functions:
ex-run: the animation entry point called by the timer. Updates the rotation and refreshes the image.
ex-on-paint: uses OpenGL commands to draw the arch
ex-on-size: handles window resize and initial size. The OpenGL model (arch, lighting, projection) is set up here.
The last part of the code calls into Scheme to create the canvas and tie it to the appropriate callbacks, create the frame holding the canvas, and display the frame.
I find the arch-generation code somewhat unsatisfying stylistically, as there is a lot of duplicated code to generate the vertices and normal vectors for the front, back, and sides. I couldn't come up with a nice way to fold everything together. I suppose the Arc-y solution would be to write a DSL to express graphical objects, but that's beyond the scope of this project.
Let me mention that low-level OpenGL is not particuarly friendly for exploratory programming. It's tricky to generate complex shapes correctly: it's really easy to end up with the wrong normals, vertices that aren't clockwise, non-convex polygons, and many other random problems. I find it works much better to sketch out what I'm doing on paper first; if I just start coding, I end up with a mess of bad polygons. In addition, I've found that doing the wrong thing in OpenGL will lock up DrScheme and/or crash my machine if I'm unlucky.
The gears example
I also ported the classic OpenGL "gears" demo from Scheme to Arc. This demo includes GUI buttons to rotate the gears. (The animated GIF at the top of the page shows the program in operation.) This is a fairly straightforward port of gears.scm that comes with DrScheme. To run it:
Enter into the Arc REPL: (load "gears.arc")
The interesting thing to note in gear.arc is the horizontal-panel%, vertical-panel%, and button% objects that provide the UI controls. They are linked to Arc functions to update the viewing parameters. For example, in the following, note that instantiate is passing Arc code (using fn) to the Scheme constructor for button%. The tricky part is making sure the right things get evaluated in the right language:
(instantiate 'button% (list "Right" h (fn x (ex-move-right)))
'(stretchable-width #t))
How did I generate the animated gifs? Just brute force: I took some screenshots and joined them into an animated gif using gimp. The real animation is smoother. I found the animated gifs are a bit annoying, so I added JavaScript to start and stop them. The animation is stopped by substituting a static gif.
Conclusion
So what about the Tetris challenge that inspired my work on OpenGL? After all the effort to get OpenGL to work in Arc, I lost momentum on the original project of implementing the game. (This is an example of why Arc is bad for exploratory programming. If I wanted to get something done, I would have been much better off using DrScheme directly.) Maybe I'll complete the game for a future posting.
Recently, I've been reading Programming Collective Intelligence, which is a practical guide to machine learning algorithms, showing how to build a recommendation system, implement a search engine, classify documents, mine websites, use genetic algorithms and simulated annealing, and implement other machine learning tasks. The book shows how to implement each of these in surprisingly few lines of Python.
The book is an excellent example of exploratory programming, showing how to incrementally build up these applications and experiment with different algorithms from the Python interactive prompt. For instance, topic clustering is illustrated by first implementing code to fetch blog pages from RSS feeds, breaking the pages into words, applying first a hierarchical clustering algorithm and then a K-means clustering algorithm to the contents, and then graphically displaying a dendrogram showing related blogs. At each step, the book shows how to try out the code and perform different experiments from the interactive prompt.
By using Python libraries, each step of implementation is pretty easy; the book can focus on the core algorithms, and leave the routine stuff to libraries: urllib2 to fetch web pages, Universal Feed Parser to access RSS feeds, Beautiful Soup to parse HTML, Python Imaging Library to generate images, pydelicious to access links on del.icio.us, and so forth.
If you want more details than the book provides (it is surprisingly lacking in references), I recommend Andrew Moore's online Statistical Data Mining Tutorials, which covers many of the same topics.
What does this have to do with Arc?
While reading this book, I was struck by the contradiction that this book is a perfect example of exploratory programming, Arc is "tuned for exploratory programming", and yet using Arc to work through the Collective Intelligence algorithms in Arc is an exercise in frustration.
The problem, of course, is that Arc lacks libraries. Arc lacks basic functionality such as fetching a web page, parsing an XML document, or accessing a database. Arc lacks utility libraries to parse HTML pages or perform numerical analysis. Arc lacks specialized API libraries to access sites such as del.icio.us or Akismet. Arc lacks specialized numerical libraries such as a support-vector machine implementation. (In fact, Arc doesn't even have all the functionality of TRS-80 BASIC, which is a pretty low bar. Arc is inexplicably lacking trig, exp, and log, not to mention arrays and decent error reporting.)
To be sure, one could implement these libraries in Arc. The point is that implementing libraries detours you from the exploratory programming you're trying to do.
I think there are two different kinds of exploratory programming. The first I'll call the "Lisp model", where you are building a system from scratch, without external dependencies. The second, which I believe is much more common, is the "Perl/Python model", where you are interacting with existing systems and building on previous work. In the first case, libraries don't really matter, but in the second case, libraries are critical. The recently-popular article Programming in a Vacuum makes this point well, that picking the "best" language is fine in a vacuum, but in the real world what libraries are available is usually the key.
Besides the lack of libraries. Arc's slow performance rules it out for many of the algorithms from Programming Collective Intelligence. Many of the algorithms run uncomfortably slow in Python, and running Arc is that much worse. It's just not true that speed is unimportant in exploratory programming.
On the positive side for Arc, chapter 11 of Programming Collective Intelligence implements genetic programming algorithms by representing programs as trees, which are then evolved and executed. To support this, the book provides Python classes to represent code as a parse tree, execute the code tree, and prettyprint the tree. As the book points out, Lisp and its variants let you represent programs as trees directly. Thus, using Arc gives you the ability to represent code as a tree and dynamically modify the code tree for free. (However, it only takes 50 lines of Python to implement the tree interpreter, so the cost of Greenspunning is not particularly severe.)
To summarize, a language for exploratory programming should be concise, interactive, reasonably fast, and have sufficient libraries. Arc currently fails on the last two factors. Time will tell if these issues get resolved or not.
At work we have a nifty electronic postage kiosk that will weigh letters, compute the postage, print a postage strip, and bill a credit card all through a display and touch screen. When I tried using it to mail a letter, everything went fine until I got a Internet Explorer error dialog:
It's always a surprise when an embedded system reveals its inner workings. Somehow I assume such systems are built on some super-special technology, not HTML and Internet Explorer.
I should mention that I was just sending a letter to Canada. While this isn't the most common thing to do, it's not a particularly bizarre action either. And I definitely wasn't doing a sinister action such as sending a letter to the country of "DROP TABLE". (I should also mention that this blog posting has nothing to do with Arc, in case my regular readers are wondering.)
I started over with the kiosk and the same error dialog appeared again. At this point, I dismissed the dialog box via the touch screen and continued with Buy Postage. It displayed "Total Postage $NaN" (i.e. Not a Number). When it asked for my credit card, I hesitated briefly, I figured the worst case was I'd have to send them a check for 0/0; it's not like they were going to bill me +Infinity. Curiosity got the better of me and I decided to plunge forward in the interests of science. After I swiped my credit card, the kiosk printed a $0.69 postage strip, and gave me a receipt:
Sure enough, the receipt said my credit card had been charged $NaN. I can only imagine what the billing backend would make of that. I called my credit card company to check what really happened, but the day's transactions weren't available yet.
I see a couple morals to this story. The most obvious is that normal people don't swipe their credit cards through a machine offering to bill them $NaN. But the more relevant moral is that error checking and software testing is a good thing, especially when monetary transactions are concerned.
P.S. I emailed the kiosk support, and they assured me that the billing problems following a "system change" were resolved, and I'd be billed the proper $0.69.
The code that implements Arc illustrates many useful programming techniques. Some of the techniques occur multiple times, and can be considered idioms of Arc programming; techniques that can be applied to many problems. This article describes some of the most common idioms, and how they can be used in Arc programming. The techniques described focus largely on macros and recursion.
Macros
Not surprisingly, given Paul Graham's enthusiasm for macros, Arc makes heavy use of macros. A few idioms appear commonly in the code.
"Decorated body"
The most common use of macros in Arc is something I call the "decorated body", which wraps a chunk of body code with something to extend it, analogous to the decorator design pattern. Typically the decorator will set something up before executing the body code, as in
w/outstring,
w/table, and
fromstring.
(Note: the link for each operation goes to the documentation. Clicking on the code icon in the documentation will show the code for the operation.)
The pattern is also used to set something up before execution and then clean up afterwards. This is especially useful for file operations, as in
w/infile,
w/outfile, and
w/instring.
These macros often have names starting with "w/", short for "with/", indicating what they are providing to the body.
The HTML library makes heavy
use of this idiom, for example in
tag,
form,
and
cdata,
which wrap the body in various HTML opening and closing tags.
To examine the implementation of this idiom, consider
let,
which is defined as:
(mac let (var val . body)
`(with (,var ,val) ,@body))
The typical features of this idiom are a macro definition with argument . body to pick up the remainder of the arguments, a macro definition starting with quasiquote `, and unquote-splicing ,@body to substitute the body code into the macro. The macro performs simple substitution:
Unlike functions, macros receive their arguments unevaluated. This allows macros to be used for conditional execution of code, or to execute code multiple times iteratively. Some examples are
when,
unless,
while,
and
for.
(You might expect if to be a macro. It is actually a built-in special form, since conditional macros need to be based on some underlying conditional form.)
The code for repeat illustrates that this idiom also uses the ,@body structure as "decorated body":
A variant of the body decorator is to use a macro to convert body code into a lambda function, which is similar to
an adapter design pattern. This idiom is used when a function expects to receive a lambda function, but passing in body code is more convenient, especially when the base function is a Scheme primitive. Examples are
thread,
atomic,
w/stdin,
and
w/stdout:
Another common macro idiom is to take a list of parameters and body code, and build a function out of these. This idiom is typically used to define a function that has special properties, and is often given a name that starts with "def". Some examples are
rfn,
afn,
defmemo,
defop,
defopr,
linkf,
and rlinkf.
The code for defmemo shows the key component of this idiom: (fn ,parms ,@body) to build the function out of the parameters and body.
One irony is that even though though macros are extremely powerful, most of the macros in Arc perform simple substitutions.
There are, however, some macros that perform complex manipulation of their arguments.
Assignment is built from the setform macro, which implements generalized variables through intricate macro processing of the arguments. Slightly less complex is
and, which implements short-circuit evaluation by turning its arguments into nested if statements. Another interesting example is with; the macro splits up the arguments into variables and values:
arc> (macex '(with (a 1 b 2) (prn a b)))
((fn (a b) (prn a b)) 1 2)
The litmatch macro performs string matching by generating code to perform a sequence of character comparisons.
Tail recursion is a standard Scheme technique for efficient recursion. If the last operation of a function is a recursive call, the compiler will use tail-call optimization to turn the recursion into iteration, avoiding creation of a stack frame on each call.
Iteration in Arc is built on top of tail recursion. Basic loop operations
such as
while,
loop, and
whilet use tail recursion on a function built from
rfn:
One idiom in Arc for using tail recursion with strings is to add an index parameter to the function. This index parameter indexes into the string, and by incrementing it in each call, the function can be made tail-recursive and doesn't require modifying the string. The index parameter is often made an optional parameter so "normal" uses of the function don't need to specify the index. Some examples of this idiom are
recstring,
findsubseq,
indented-code,
parabreak,
and urlend.
(def headmatch (pat seq (o start 0))
(let p (len pat)
((afn (i)
(or (is i p)
(and (is (pat i) (seq (+ i start)))
(self (+ i 1)))))
0)))
Recursion with cons
Many operations in Arc need to perform operations on a sequence and create a result list. The common Arc idiom is to recurse through a sequence, building up the result with cons.
Specifically, a macro (op seq) will return the cons of something done to (car seq) and (op (cdr seq)). Thus, op recursively steps through seq, processing an element at a time and building up the result.
This idiom provides an interesting contrast to the previous recursion examples because this idiom is not tail-recursive. Because the cons is applied to the result of the recursive function call, tail-call optimization cannot be applied.
Several examples of this idiom are
join,
rem,
reinsert-sorted,
pair,
firstn,
firstn-that,
tuples,
and range.
(def map1 (f xs)
(if (no xs)
nil
(cons (f (car xs)) (map1 f (cdr xs)))))
Note that f is applied to (car xs), map1 is called recursively on (cdr xs), and the two are cons'd together.
Miscellaneous
The majority of the idioms I discuss are based on macros or recursion. This secion discusses some miscellaneous idioms that don't fit those categories.
Push and reverse
The "push and reverse" idiom is an efficient, but somewhat surprising way to generate a list. In this idiom, successive elements are pushed onto the front of the list, and when the entire list is generated, it is reversed to obtain the desired order. This is a well-known Lisp idiom and is more efficient than the obvious method of appending elements to the end of the list.
Several examples of "push and reverse" are
drain,
dedup,
parse-format,
litmatch,
endmatch,
and handle-post.
The code for n-of illustrates the idiom. Items are added with
push
to an accumulator (ga in this case), and the accumulator is reversed with
rev
at the end:
Arc makes extensive use of an idiom I'll call "string stdout" because a string is used as stdout. The code writes multiple chunks to stdout, and the results are gathered into a string. This idiom avoids having to manually concatenate the components into the result string.
The tostring
macro is used to implement this idiom.
Arc has a surprisingly large number of macros that use this idiom, including
readline,
code-block,
unmarkdown,
varfield,
markdown,
eschtml,
striptags,
parafy,
subst,
multisubst,
and
num.
Typically, code using this idiom will use produce output with pr or writec, and use tostring to collect the output.
The code for multisubst illustrates this:
(def multisubst (pairs seq)
(tostring
(forlen i seq
(iflet (old new) (find [begins seq (car _) i] pairs)
(do (++ i (- (len old) 1))
(pr new))
(pr (seq i))))))
The flip side of this idiom is the "string stdin" idiom, where a string is used as stdin. The "string stdin" idiom is rarely used; one example is
parse-format.
Summary
The Arc code illustrates several common Arc idioms, some of which are well-known Scheme or Lisp idioms. Since Paul Graham is a Lisp expert, carefully studying his Arc code is a rewarding way to learn more about programming Lisp-like languages.
Undoubtedly I've missed some important idioms. Please let me know if your favorite idiom is missing, or if I've mangled the descriptions of any idioms.
The introduction of the Arc language has spawned a mini-boom of Arc
compilers and interpreters implemented in everything from JavaScript to .NET.
Because Arc is a fairly simple Lisp dialect, it is relatively
straightforward to implement, and many people are using it to learn about
compiler and interpreter design, and to showcase different implementation
techniques and languages.
Paul Graham's official distribution started
it all, of course. This version is implemented in MzScheme and translates Arc into
Scheme code. The writeup on Arc Internals provides more information on how it is implemented.
The Arc Forum is the site of most of
the unofficial Arc development effort. The "Anarki" branches of Arc on
github consist of a "stable" bug-fixed version of official Arc, and a more exploratory version with many modifications. (For information on using github, see Brief Guide to Anarki and Git.
Anarki includes "ac.sbcl.lisp", an implementation of Arc in SBCL.
It implements closures using CPS (Continuation-passing style) transformations, but doesn't include threads
or networking according to the brief description.
Perhaps the first reimplementation of Arc was ArcLite, which is Arc ported to JavaScript. This version of Arc is unique because it runs in the browser. The running version can be accessed online. It provides a subset of functionality, omitting tail-call optimization, continuations, and various operations.
Rainbow (l'arc en ciel in French) is an implementation of Arc in Java.
It is fairly complete, providing continuations and tail-call optimization
according to the description.
It is available at github.
An Arc compiler for .NET has been developed, based on the MBase compiler
prototyping framework.
The description states that it
implements a subset of Arc, but is much faster than standard Arc.
It is available from Meta Alternative.
Currently under active development is "arc2c", which compiles Arc to C. Current status is at arclanguage.org. This compiler is intended to be a full, high-performance implementation of Arc. It uses CPS transformations to implement continuations. It uses the Boehm garbage collector. arc2c is available at github.
CL-Arc is a
proposal to compile Arc to
Common Lisp; work hasn't started on it yet.
Many of these Arc implementations actively welcome participation. The future
is likely to hold continuing improvement in the quality, functionality, and
performance of Arc implementations, as well as surprising and unusual variants.
Ajax is surprisingly easy to use with the Arc language. This article shows how to implement Ajax-based autocompletion and dynamic page updates using
the script.aculo.us and Prototype JavaScript frameworks. These frameworks allow Ajax applications to be implemented with just a few lines of additional code.
The hard work is done by the script.aculo.us and Prototype libraries. All the Arc code needs to do is provide the list of autocompletion candidates, and the dynamic page content.
The example
This example implements a simple Ajax input autocompleter and dynamic page update. For the autocompleter, as soon as you enter a few letters into the input, it provides a list of matching terms. As soon as you select a term, the page is dynamically updated with associated data, as in the following screenshot:
For the purposes of the demonstration, the content is information on the
countries of the world, obtained from the CIA World
Factbook.
To summarize how it all works:
When you enter characters in the input field, the script.aculo.us
autocompleter JavaScript sends the charaters to the Arc server, which returns the
autocomplete suggestions. The autocompleter JavaScript displays the
suggestions on the page.
When you select a country, the updater JavaScript does three separate requests
to the Arc server, to request the population, area, and capital. When the
responses arrive, they are inserted into the page.
In more detail, the autocompletion is performed by Ajax.Autocompleter, and the dynamic updating is performed by Ajax.Updater. The control flow is:
The Ajax.Autocompleter associated with the autocomplete input and autocomplete_choices div sends inputs to /auto_complete?prefix=chars.
The Arc handler sends back a formatted list of autocomplete candidates, e.g.
<ul><li>Iran</li><li>Iraq</li><li>Ireland</li></ul>.
The Ajax.Autocompleter displays the candidates in the autocomplete div.
When an autocompletion is selected, the update JavaScript function starts three Ajax.Updater instances, which send an Ajax request to URL /getcontents?field=population&name=value, and similarly for area and capital in parallel.
The Arc handler for getcontents returns the desired contents.
Ajax.Updater puts the contents into the contents div.
Running the example
First set up the necessary files.
Download ajax.zip and uncompress it into the
same directory that Arc runs in. This zip file provides
ajax.html, ajax.arc, data.arc, and autocomplete.css.
Download the script.aculo.us library files from script.aculo.us. Copy the .js files from lib and src into the same directory that Arc runs in.
Next, start up the Arc interpreter,
load the ajax.arc file, and start the web server.
arc> (load "ajax.arc")
arc> (thread (serve 8080))
arc> ready to serve port 8080
Then go to http://localhost:8080/ajax.html. (Unfortunately I don't have a live demo on a public machine. If anyone sets one up, let me know and I'll add a link here.)
Start typing in the box, and you should see a dropdown with autocompleted choices. Click one, and the code will be displayed below asynchronously.
The Arc code
The Arc server code is in ajax.arc. Two Arc handlers are implemented to provide the client-side Ajax support. The first, auto_complete receives the current input contents and returns a list of up to 10 autocompletion candidates formatted in HTML.
The second handler, getcontents returns the dynamic content for the page, from the database table. Note that the handlers do nothing particularly special to make them "Ajax"; they are standard Arc web handlers based on defop and can be accessed directly through the browser. See Arc Web Server for more information on web serving with Arc.
The handlers use a couple helpers; startselts returns the elements that match the autocomplete prefix and to-html-list wraps the elements in HTML list tags.
; Returns a list of elements that start with prefix
(def startselts (prefix seq) (rem [no (begins (downcase _) prefix)] seq))
; Wraps elts in HTML list tags
(def to-html-list (elts) (tostring
(prall elts "<ul><li>" "</li><li>")
(pr "</li></ul>")))
The actual content is obtained from data.arc, which contains the information on each country as a list of lists.
Some simple code converts the list into a table called database indexed by country for easy lookup, and generates a sorted list of countries for use in autocompletion:
Several things can go wrong when trying to run the example. If the initial page doesn't load at all, something is probably wrong with the Arc server. If no autocompletion happens, the JavaScript may not be loading, or the Arc server may have problems. If the autocompletion shows up as a bulleted list, the CSS file is probably not loading.
The Arc code can also be debugged by strategically inserting print statements such as:
(write req (stderr))
This will display the request on the Arc console.
To debug Ajax, you can use Firefox plugins
Live HTTP Headers and
Firebug. Live HTTP Headers lets you see the headers between the browser and the server, and is very helpful to determine if something is failing. Firebug allows much more in-depth JavaScript debugging.
The browser's JavaScript console is also useful to see JavaScript errors.
This article has just scratched the surface of script.aculo.us and Ajax. More information is available at the script.aculo.us website or in books.
I've created documentation of Arc's HTML generation operations (from arc.arc). There are a lot of different functions there, but I've tried to cover them all.
Why would I do that? Because I can :-) But seriously, using continuations in a language entirely unsuited for it is a good way to experience the tradeoffs of different styles of languages, as well as a way to learn more about how continuations work.
This code is entirely different from my Arc version, mainly because in the Arc version I decided to see if the throw/catch idiom could replace call/cc; the Java code is much closer to the original Scheme. Because Java doesn't have first-class continuations, I used the Continuation-passing style of explicitly passing continuations around. Then call/cc is simply replaced with use of the current continuation.
Because Java doesn't support first-class functions, every continuation function needs to be wrapped in a class that implements an interface, which I unimaginatively call Continuation. The "let" also turns into an object creation, resulting in another class. This results in a fair bit of boilerplate to handle all these classes compared to the Scheme code, but the Java code maps recognizably onto the Scheme code.
On the whole, mondo-bizarro worked better in Java than I expected; no Greenspunning required. It produces the expected 11213 output, proving it works. I think the Java code is actually easier to understand, since everything is explicit.
I have also found it entertaining to implement some of the complex SICP exercises in Java; maybe I'll post details later.
I've got almost all of the Arc core documented with an alphabetical index. I've covered ac.scm, arc.arc, strings.arc, but am currently omitting the web server and applications.
For each function, I include a description, examples, a link to the code, and a link to the wiki. I'll fill in a few gaps as time permits. I'll appreciate any feedback on the organization.
Did you know that Arc has been a popular computer language since 2004? Pretty good for a language released in January 2008. I discovered Arc's longstanding popularity in Computerworld's review of Paul Graham's Hackers and Painters. The review from July 2004 says, "Paul Graham, well known for developing the popular Arc computer language and other Internet technologies...". I guess they didn't realize Arc hadn't been released at the time. Hopefully we won't see them publishing job listings looking for 4 years of Arc experience.
If you were around during the early days of the web, you surely remember the message
Error 404 Not found - file doesn't exist or is read protected [even tried multi] But what is "multi"? And why does the web server act like it's doing you a big favor trying it? Is it trying multiple times, maybe?
These days, 404 messages are still common, but the "multi" message is fairlyrare.
The message was once so common, though, it spawned a
largenumberofparodies.
I decided to get to the bottom of this, and find out what this "multi" is.
It didn't take long to discover that this message comes from the CERN httpd server, written by Tim Berners-Lee and others back in the 1990's.
A bit of poking around found the source code that generates the infamous multi message. It turns out that "multi" is short for
multi-format documents, and was a cool new feature back in 1993. It
is a technique for the server to have multiple formats of a file and provides the best one your browser can handle. For instance, suppose you try to fetch an image /cat. Back in the olden days, browsers weren't so good with images, so your browser might only display gifs and not jpegs. Your browser asks for /cat and says that it can handle gifs. Now for the multi magic. The server can have multiple files, say /cat.gif, /cat.jpg, and /cat.png. It looks at what your browser can accept, looks at what it has, and returns the best choice, in this case /cat.gif. Likewise, a file could have versions in text and PDF, and the server would return the best type for your browser. Web servers still support content negotiation, but they generally don't mention it in 404 messages any more.
So there you have it, the definitive answer to what "even tried multi" means, and a bit of web history.
In case anyone thinks continuations are straightforward, I've ported Eugene Kohlbecker's classic Scheme puzzle to Arc. What does (mondo-bizarro) print?
I've added lots more documentation on the Arc language: filesystem manipulation, anaphoric operations, iteration, threading, time operations, queues, and trees.
The Arc language provides many I/O operations. I have written an overview of them: I/O in the Arc language. This covers file I/O, string I/O, and the many operations to perform reads and writes. The operations strike me as very non-orthogonal; it's unpredictable which ones take ports, have defaults, or specify an eof value. I try to bring some organization to them.
The documentation continues with Internals of places and setforms. This continues where the assignment and places documentation left off. If you want to know how to create your own places and generalized variables in Arc, you're in luck.
The official releases of Arc are at arclanguage.org. However, a group
of enthusiasts has their own repository of Arc, called Anarki. This
repository has versions of the Arc files with bug fixes, documentation, and
extended functionality. The repository also contains additional features, documentation, applications, and Emacs tools.
The Anarki repository is stored in a source control system called git. The
repository is open for access and updates by Arc users. Accessing the
repository is straightforward, and will be described below. Submitting
changes to the repository is not much harder, but is beyond the scope of this
document; one source of information is on arclanguage.org.
The repository is promptly updated when a new version of Arc is released.
Most of the changes in the repository haven't been migrated back to the
official Arc releases, so there is considerable divergence between the
official Arc version and the Anarki version.
Installing git on Linux
On Linux, git is in the "git-core" package. You can install this with "sudo apt-get git-core" on Ubuntu, or "sudo yum install git-core" on Red Hat.
Installing git on Windows
On Windows, one option is Git on MSys, which can be downloaded from the
msysgit site, under "Featured Downloads".
Using git to fetch Anarki
Once git is installed, fetching the repository is surprisingly easy. Run "git clone git://github.com/nex3/arc.git", which will create a directory "arc-wiki" containing a copy of the repository. To get updates from the repository, run "git pull".
Once you've downloaded Anarki, you can see the cutting edge of Arc
modifications. One of the most interesting features of the Anarki version is
docstrings: "(help map)" will display documentation on the map function, for
instance.
The stable Anarki branch
A "stable" Anarki branch has also been created. This repository is close to the official Arc release, but with bug fixes and minimal new features. The stable repository lacks the experimental features and rapid changes of the regular repository.
To use the stable repository, check out Anarki as described above. Then run "git branch stable origin/stable" and "git checkout stable". All the experimental files and directories will disappear, and you will be left with the stable branch.
For more information, see the arclanguage forum.
The Arc-to-C compiler
An Arc to C compiler is being developed in a separate repository: git://github.com/sacado/arc2c.git, which can be accessed on the web at
http://github.com/sacado/arc2c/tree/master.
I took a look at the new version of arc that has been released. The following are my unofficial notes on the differences between arc1 and arc2.
News.YC release
The key difference is the release of the News.YC source in news.arc. This 1769-line file apparently provides the full implementation of news.ycombinator.com. One interesting thing is that stories are ranked according to (score-1) / (age-in-hours ^ gravity), where gravity is 1.4 by default.
There are also some gifs to support the site. The following is a list of the new files in arc2:
news.arc: the source for news.ycombinator.com.
grayarrow.gif: gray uparrow
graydown.gif: gray downarrow
s.gif: 1x1 transparent spacer
y18.gif: 1x1 spacer (same as s.gif)
Other files have minor changes:
ac.scm
arc-list? fix for '()
write, disp cleanup
app.arc
Renamed when-usermatch to when-umatch, when-usermatchr to when-umatch/r, matchform to uform.
Removed pw from good-login record.
arc.arc
(firstn nil xs) now returns xs
(nthcdr nil xs) now returns xs
trav renamed to treewise
New function union on sequences.
New function addtem to create a template.
New functions hours-since and days-since.
ensure-dir fix.
Function only is now entirely different. Wraps a function to only call it if given args.
Function plural moved to strings.arc. Now if w/bars were moved too...
New function trav: takes an object and functions. Applies the functions to the object.
New function defhook and hook to register and execute functions?
html.arc
Added orange as a color.
para now takes arguments.
Removed image-url and local-images*
spacetable renamed to sptab.
Added single-input and cdata.
libs.arc
Added news.arc.
srv.arc
Remove "frug" default account.
Excludes .arc as a static filetype, to hide source.
The Arc language is implemented by a "foundation" of functionality implemented in Scheme. I have created detailed documentation of the foundation functionality.
This article is the first part of a description of the implementation of the Arc language, based on my examination of the code.
The core of Arc is implemented in Scheme (in ac.scm), and runs in the mzscheme interpreter. The remainder of the Arc language is implemented in Arc itself (arc.arc), and is loaded into the running Arc implementation. In addition, the Arc distribution includes libraries for a web server, string handling, and a few others. Arc runs a standard Read-Evaluate-Print Loop (REPL), allowing commands to be entered and executed interactively.
In a bit more detail, when an Arc expression is input to the REPL, the Scheme reader reads and parses it. The Arc expression is converted to an analogous Scheme expression, and eval is applied to execute the Scheme code. The result is converted from the internal Scheme representation to a displayable form, and displayed as the output from the REPL.
Interestingly, Arc's [ _ ] syntax is implemented entirely independently from the rest of Arc; the code in brackets.scm extends the reader so the bracket syntax is converted to standard syntax as the input is parsed. The bracket support could easily be run on Scheme without Arc, or Arc could be run without bracket support.
Entering :a into the Arc REPL drops execution into Scheme, allowing easy examination of the internals:
Arc's data and procedures are stored in the Scheme environment, with a few modifications. Arc symbols have an underscore prepended internally to avoid collisions with Scheme names. In the Arc language, the empty list is nil, while the Scheme empty list is (). Thus, Arc lists are stored in Scheme terminated by the (arbitrary) symbol 'nil, as can be seen by the dotted notation above. Arc macros are stored as a vector of the symbol 'tagged, the symbol 'mac, and the procedure.
The REPL
Arc runs from a Read-Evaluate-Print Loop (REPL), which is started by executing the simple procedure tl:
(define (tl)
(display "Use (quit) to quit, (tl) to return here after an interrupt.\n")
(tl2))
Note that the REPL runs in Scheme, not in Arc. tl is a wrapper around tl2, which is the real REPL implementation:
The arguments to on-err are an error procedure and the body procedure. The error procedure is executed if the main procedure encounters an exeception, similar to a try/catch block, but with the catch procedure first. The on-err procedure is implemented with continuations. If you're looking for the "mind-expanding" parts of Lisp, continuations will definitely interest you, but I will ignore on-err for now.
The meat is the second lambda function. The Scheme read procedure reads the input and creates an object using the Scheme parser. The input is passed to arc-eval, which evaluates the input as an Arc form. The result is converted by arc-denil from the internal 'nil-terminated form to displayable form and written out. The tl2 procedure then calls itself; to the C programmer, this may look like a stack overflow waiting to happen. However, Scheme is tail-recursive so the stack doesn't grow, and the call acts like a simple loop. The Scheme variables _that and _thatexpr record the expression to help debugging.
arc-eval and ac
The arc-eval procedure is the main entry point for executing an Arc form. It calls ac to convert the Arc form to a quoted Scheme form, and then does an eval on the Scheme expression:
The arc-eval procedure can be executed directly from inside Scheme:
> (arc-eval '((fn (x) (+ x 1)) 42))
43
The ac procedure is the real meat of Arc, as it translates Arc to Scheme. Its second argument is the "environment", a list of symbols that are currently bound. At the REPL, this list is empty.
Arc strings are copied to Scheme strings. Arc literals are unchanged. The Arc symbol 'nil is unchanged. Input with ssyntax (i.e. : or ~) is expanded and re-evaluated. Symbols are handled by ac-var-ref. Special operators quote, quasiquote, if, fn, and set are handled by separate procedures. Procedures are handled by ac-call.
xdef
Arc primitives are created with xdef, which enters a Scheme procedure into the namespace:
(define (xdef a b)
(namespace-set-variable-value! (ac-global-name a) b)
b)
For example, the Arc newstring procedure is just the Scheme make-string procedure:
(xdef 'newstring make-string)
Note that namespace-set-variable-value! is somewhat similar to define, except it takes a symbol such as 'a, rather than a variable such as a.
Conclusion
Many more interesting aspects of the Arc implementation remain, such as procedures, scoping, and macros. I hope to do more analysis later. Much of the above is based on discussions in the Arc forum. The code snippets from the Arc distribution are copyright Paul Graham and Robert Morris; the distribution is available at http://arclanguage.org/install.