The book is an excellent example of exploratory programming, showing how to incrementally build up these applications and experiment with different algorithms from the Python interactive prompt. For instance, topic clustering is illustrated by first implementing code to fetch blog pages from RSS feeds, breaking the pages into words, applying first a hierarchical clustering algorithm and then a K-means clustering algorithm to the contents, and then graphically displaying a dendrogram showing related blogs. At each step, the book shows how to try out the code and perform different experiments from the interactive prompt.
By using Python libraries, each step of implementation is pretty easy; the book can focus on the core algorithms, and leave the routine stuff to libraries: urllib2 to fetch web pages, Universal Feed Parser to access RSS feeds, Beautiful Soup to parse HTML, Python Imaging Library to generate images, pydelicious to access links on del.icio.us, and so forth.
If you want more details than the book provides (it is surprisingly lacking in references), I recommend Andrew Moore's online Statistical Data Mining Tutorials, which covers many of the same topics.
What does this have to do with Arc?
While reading this book, I was struck by the contradiction that this book is a perfect example of exploratory programming, Arc is "tuned for exploratory programming", and yet using Arc to work through the Collective Intelligence algorithms in Arc is an exercise in frustration.The problem, of course, is that Arc lacks libraries. Arc lacks basic functionality such as fetching a web page, parsing an XML document, or accessing a database. Arc lacks utility libraries to parse HTML pages or perform numerical analysis. Arc lacks specialized API libraries to access sites such as del.icio.us or Akismet. Arc lacks specialized numerical libraries such as a support-vector machine implementation. (In fact, Arc doesn't even have all the functionality of TRS-80 BASIC, which is a pretty low bar. Arc is inexplicably lacking trig, exp, and log, not to mention arrays and decent error reporting.)
To be sure, one could implement these libraries in Arc. The point is that implementing libraries detours you from the exploratory programming you're trying to do.
Paul Graham has commented that libraries are becoming an increasingly important component of programming languages, that huge libraries are now an expected part of a new programming language, and that libraries are an increasing important feature of programming languages. Given this understanding of the importance of libraries, it's surprising that Arc is so lacking in libraries. (It's also surprising that it lacks a module system or some other way to package libraries.) It's a commonplace complaint about Lisp that it lacks libraries compared to other languages, and Arc makes this even worse.
I think there are two different kinds of exploratory programming. The first I'll call the "Lisp model", where you are building a system from scratch, without external dependencies. The second, which I believe is much more common, is the "Perl/Python model", where you are interacting with existing systems and building on previous work. In the first case, libraries don't really matter, but in the second case, libraries are critical. The recently-popular article Programming in a Vacuum makes this point well, that picking the "best" language is fine in a vacuum, but in the real world what libraries are available is usually the key.
Besides the lack of libraries. Arc's slow performance rules it out for many of the algorithms from Programming Collective Intelligence. Many of the algorithms run uncomfortably slow in Python, and running Arc is that much worse. It's just not true that speed is unimportant in exploratory programming.
On the positive side for Arc, chapter 11 of Programming Collective Intelligence implements genetic programming algorithms by representing programs as trees, which are then evolved and executed. To support this, the book provides Python classes to represent code as a parse tree, execute the code tree, and prettyprint the tree. As the book points out, Lisp and its variants let you represent programs as trees directly. Thus, using Arc gives you the ability to represent code as a tree and dynamically modify the code tree for free. (However, it only takes 50 lines of Python to implement the tree interpreter, so the cost of Greenspunning is not particularly severe.)
To summarize, a language for exploratory programming should be concise, interactive, reasonably fast, and have sufficient libraries. Arc currently fails on the last two factors. Time will tell if these issues get resolved or not.
13 comments:
As far as I understand it, Arc is a set of macros and syntactic sugar on top of MzScheme, no?
So look for "Scheme libraries" or "Lisp libraries", not "Arc" libraries. Needless to say, you'll find lots.
Shouldn't this be obvious, or what am I missing?
are you regarding Arc as a fully fledged production environment? It seems that this aspect is overlooked far too often in crticism. As far as I understand Arc arc is released as sort of a "preview".
A preview to what? The future in 10 years?
Don't get me wrong, it is good to see a language like Arc emerging the field, but it is a difficult field those days. People expect a language to do automagical things in an easy way.
Like in Ruby.
(Or Python.)
;-D
Are you aware of Clojure? It's built on top of the JVM and provides excellent interoperability with Java. All those Java libraries instantly accessible and in a lispy style too! See here for examples.
I've writing a Common Lisp system to screen-scrape a website, build a graph of the parsed data, render that to SVG and then modify the SVG.
I just asdf-installed all the libraries I needed.
I'm sure CL lacks some libraries, but for Web 2.0 / collective intelligence tasks it seems OK. :-)
Why do you call starting from scratch the "Lisp model"? Is this just a subtle attempt at making Arc sound bad, or do you actually know many Lisp programmers who don't use libraries?
I think there's a new meme that Lisp has no libraries, but it's demonstrably false. I don't know any Lisp programmers (except maybe one trying to build a new dialect) who would write a program without using existing libraries.
ARC is not meant to be usable for real projects yet. Paul Graham put it out there to get feedback from hackers about the core language. ARC is alpha. Maybe pre-Alpha. It's not ready for the big-time. Not even close. But Graham doesn't want to work in a vacuum while he's defining the core language, and that's entirely valid.
You might want to check out Chicken Scheme for a Lisp with hundreds of libraries and extensions. See: http://www.call-with-current-continuation.org/eggs/
So stop thinking "arc" - which is an exercise of "proving" how great of it will be when it finally is - and use ONE OF SEVERAL existing micro-lisps THAT ARE AVAILABLE TODAY, and actually, have been for years.
Use newlisp - http://www.newlisp.org
It's tiny, it's monolithic - one 250k executable, no need for "system installation", it has all modern networking and APIs built-in, easy and direct access to C libraried, operators on high level, and speed of perl/python.
OR: - use picoLisp, which is more cryptic, but very, very usable and practical - what is proved by it's author's consulting experience.
Do not play with Graham's talkers - use one of the implementations that already do and surpass what "arc" only purports to become
I'd take what the previous post said
even further and say go to the source
and use Common Lisp which is industrial-strength lisp with plenty of libraries (more being written every day, http://www.cl-user.net and
http://www.cliki.net) and forget about
poor pretenders to the throne such
as newlisp or arc or picolisp or whatever crappy-obsolete lisp implementation will come out next.
C++ has a huge amount of libaries, you can access all C and C++ libaries. I think there is no language with more libraries. So is C++ perfect for exploratory programming? Because the article somehow says exploratory programming is only about avilable libraries.
I used to be quite interested in arc and lisp, but recently I've been quite turned off by the attitudes of some in the lisp community. The last few responses to this post seem quite defensive ("poor pretenders to the throne", "crappy obsolete implementations"??) and even nasty. Seriously, I think that people have to accept that Lisp does have a problem. Its library system is antiquated, limited, and quite difficult to work with (contrast asdf to import in python). Moreover, it isn't as though Lisp is the end-all-do-all of languages: languages like Haskell have numerous features that Lisp just doesn't. In the end, Lisp is just another language.
I think the kind of exploratory programming these people are talking about with Arc is not exploring the same types of subjects as the book you mention. The book is looking at applications. Arc seems to be about exploring programming languages themselves, as it is a very re-programmable programming language.
Python, along with most non-lisps, doesn't have the powerful lisp macros to do this re-programming. Just as Arc lacks libraries so it can't do the exploratory programming that Python can, Python is poorly suited for the programming language exploratory programming that Arc provides.
Post a Comment