Ken Shirriff's blog: snark

Showing posts with label snark. Show all posts

9 Hacker News comments I'm tired of seeing

As a long-time reader of Hacker News, I keep seeing some comments they don't really contribute to the conversation. Since the discussions are one of the most interesting parts of the site I offer my suggestions for improving quality.

Correlation is not causation: the few readers who don't know this already won't benefit from mentioning it. If there's some specific reason you think a a study is wrong, describe it.
"If you're not paying for it, you're the product" - That was insightful the first time, but doesn't need to be posted about every free website.
Explaining a company's actions by "the legal duty to maximize shareholder value" - Since this can be used to explain any action by a company, it explains nothing. Not to mention the validity of the statement is controversial.
[citation needed] - This isn't Wikipedia, so skip the passive-aggressive comments. If you think something's wrong, explain why.
Premature optimization - labeling every optimization with this vaguely Freudian phrase doesn't make you the next Knuth. Calling every abstraction a leaky abstraction isn't useful either.
Dunning-Kruger effect - an overused explanation and criticism.
Betteridge's law of headlines - this comment doesn't need to appear every time a title ends in a question mark.
A link to a logical fallacy, such as ad hominem or more pretentiously tu quoque - this isn't a debate team and you don't score points for this.
"Cue the ...", "FTFY", "This.", "+1", "Sigh", "Meh", and other generic internet comments are just annoying.

My readers had a bunch of good suggestions. Here are a few:

The plural of anecdote is not data
Cargo cult
Comments starting with "No." "Wrong." or "False."
Just use bootstrap / heroku / nodejs / Haskell / Arduino.
"How [or Why] did this make the front page of HN?" followed by http://ycombinator.com/newsguidelines.html

In general if a comment could fit on a bumper sticker or is simply a link to a Wikipedia page or is almost a Hacker News meme, it's probably not useful.

What comments bother you the most?

Check out the long discussion at Hacker News. Thanks for visiting, HN readers!

Amusing note: when I saw the comments below, I almost started deleting them thinking "These are the stupidest comments I've seen in a long time". Then I realized I'd asked for them :-)

Edit: since this is getting a lot of attention, I'll add my "big theory" of Internet discussions.

There are three basic types of online participants: "watercooler", "scientific conference", and "debate team". In "watercooler", the participants are having an entertaining conversation and sharing anecdotes. In "scientific conference", the participants are trying to increase knowledge and solve problems. In "debate team", the participants are trying to prove their point is right.

HN was originally largely in the "scientific conference" mode, with very smart people discussing areas in which they were experts. Now HN has much more "watercooler" flavor, with smart people chatting about random things they often know little about. And certain subjects (e.g. economics, Apple, sexism, piracy) bring out the "debate team" commenters. Any of the three types can carry on happily by themself. However, much of the problem comes when the types of conversation mix. The "watercooler" conversations will annoy the "scientific conference" readers, since half of what they say is wrong. Conversely, the "scientific conference" commenters come across as pedantic when they interrupt a fun conversation with facts and corrections. A conversation between "debate team" and one of the other groups obviously goes nowhere.

Lorem Ipsue: when internationalization goes bad

I recently saw a Master cable lock for sale with the interesting text "Lore Ipsum!" and "Lorem Ipsue!" underneath the word "Pull". If you've done any graphic design or web mockups, you're probably familiar with the Lorem ipsum text that's traditionally used as a placeholder. This Latin-based text is used, for instance, when you want to show the style and layout of a website, but don't want people to get distracted by the words.
Lock with text 'Lore ipsum' and 'Lorem ipsue'

Lock with text 'Lore ipsum' and 'Lorem ipsue'

Apparently when they designed the lock packaging, they put in placeholders but forgot to replace them with the French and Spanish translations ("Tirez!" and "¡Tira!"). I find "Lorem Ipsue!" in place of "Lorem Ipsum" interesting; maybe it is supposed to be a more Spanish-sounding placeholder, but I would have been more impressed by ¡Lorem Ipsum!

The moral is: make sure you check your internationalization; just because it looks foreign doesn't mean it's right.

Why Unicode is important

A couple months ago I went to a French cafe at the Denver airport. Outside the restaurant was a large rotating menu, clearly costly to manufacture. Some of the beverages on the menu puzzled me: "Caf Latte", "Caf Americano", and "Caf Mocha".

When I saw "Proven al Salad", I realized that they had suffered an unfortunate accented character mishap; the menu was supposed to offer "Café Latte", "Café Americano", "Café Mocha", and "Provençal Salad". Unfortunately, the accented characters disappeared somewhere in the sign printing process. The font, likely Marker Felt Thin, has accented characters, so they should have been able to get it right.

So... can any deep insight be extracted from this, or is it just a somewhat entertaining picture? My first conclusion is that notwithstanding the Arc Unicode debacle, support for non-ASCII characters is important even if you're a sign shop in Denver. My second conclusion (based on personal experience) is that everything conspires to destroy non-ASCII characters, and it's non-trivial to get them right. My third conclusion - which will make sense if you've seen UTF-8 get misinterpreted as latin-1 - is the restaurant should be grateful that they didn't end up with "CafÃ© Latte".

The rise of scripting languages and the fall of Java

Java is very much in full retreat.
-- R. Loui

Professor Ronald Loui has an interesting article on the rise of scripting languages (In Praise of Scripting: Real Programming Pragmatism) in the July 2008 issue of IEEE Computer. It claims scripting languages such as Perl, Python, and Javascript have dramatically fulfilled their early promise, provide many benefits, and are poised to take over the lead from Java. However, the academic programming language community is stuck in theory and hasn't recognized the ascendence of scripting languages.

I agree that scripting languages are on the rise. Most people would agree that they provide rapid development, higher levels of abstraction, and brevity that helps the programmer. The article also describes how scripting languages can be a performance win, since they can allow experimentation and implementation of efficient algorithms that would be too painful in Java or C++. So even if C++ is faster on the micro-benchmark level, a programmer using a scripting language may end up with faster algorithms overall. I've argued somewhat controversally that Arc is too slow for my programming problems, so I remain unconvinced that basic performance can be ignored entirely.

As for the claim that Java is in full retreat, it strikes me as wishful thinking. (I'd believe "slow decline" though.) It will be interesting to check back on this claim in 5 years.

I personally believe that CS1 [freshman computer science] Java is the greatest single mistake in the history of computing curricula.
-- R. Loui

The article suggests good languages for teaching introductory computer science are gawk, Javscript, PHP, and ASP, but Python is emerging as a consensus for the best freshman programming language. This is the hardest part of the article for me to swallow. The idea of writing real programs in Awk never occurred to me, and I remain skeptical even though the author claims it works well. For those who would suggest Scheme as an introductory programming language, it was displaced as a dominant freshman language by Java a decade ago, and is apparently no longer considered an option.

I can't argue with the author's claim that student learning is enhanced by experimenting, writing code, and getting hands-on experience, and that scripting languages make this faster and easier.

Python and Ruby have the enviable properties that almost no one dislikes them, and almost everyone respects them.
-- R. Loui

In Why your favorite language is unpopular I discussed how the Change Function model can explain the success of programming languages based on maximizing the crisis solved and minimizing the perceived pain of adoption. I can apply this model applies to scripting languages as well:

Magnitude of crisis solved by Tcl/Tk: High - How to add a scripting language to a C program. How to add a GUI to a C program without painful X11 and Motif code.
Total Perceived Pain of Adoption: Low - Link Tcl in with your C program and add a few hooks. Create the GUI with trivial scripts.

Magnitude of crisis solved by Perl: High - How to quickly write CGI scripts. How to solve problems too complex for shell scripts. How to process files. How to develop quickly and iteratively.
Total Perceived Pain of Adoption: Low - Apart from looking like line noise, Perl is easy to get started with, is well integrated with Unix, has the definitive regex implementation, and has libraries for almost everything.

My point is that these languages solved specific painful problems and had low pain of adoption. As a result, they were much more successful than beautiful, powerful languages that were less able to directly solve painful problems or were more painful to adopt.

The real reason why academics were blindsided by scripting is their lack of practicality.
-- R. Loui

A major thrust of the article is that academics are too concerned with theoretical issues of syntax and semantics, rather than pragmatic issues of what a language can achive quickly, inexpensively, and practically. Academics are said to be too tied to theoretical concepts such as object-oriented programming and strong typing, and are missing the real-world benefits of scripting languages.

(Interestingly, Rob Pike made a similar argument against academics in the context of operating systems software (Systems Software Research is Irrelevant), stating that academic research is irrelevant and the real innovation is in industry. Since I have friends doing academic OS research, I should add a disclaimer here that I don't necessarily agree.)

One measure of pragmatics raised by the paper is how well does a language work with other Unix tools. I think the importance of this is underappreciated. In particular, I view this as a significant barrier to adoption of Arc. Running Arc as a shell script instead of a REPL is nontrivial (as is the case with many Lisp and Scheme implementations). Running an external program from Arc is clunky, even though it is often necessary to actually get things done (Kens' law), and real pipes are missing from Arc entirely.

Java's integration with Unix also has painful gaps - where's getpid() for instance? Why is JNI so difficult compared to calling native code from C#? I blame Sun's pure-Java platform independence ideology, and I'm surprised it hasn't hurt Java more.

On the other hand, Python and Perl provide a remarkable degree of integration, which I view as a key factor in their success. Likewise, Visual Basic is highly integrated with the Windows environment and highly successful there.

In conclusion, Loui's paper raises numerous interesting points about the success of scripting languages. I expect that the reasons for the rise of scripting languages will only get stronger, and languages that don't support the scripting model will have an increasingly harder time gaining adoption.

Note: quotes above are from the preprint and may not match the published article.

Why your favorite language is unpopular

The total world's population of Haskell programmers fits in a 747. And if that goes down, nobody would even notice.
-- Erik Meijer

I recently saw an interesting talk on functional programming by Erik Meijer (of Bananas, Lenses, Envelopes, and Barbed Wire fame). Among other things, he discussed why many superior technologies such as Haskell don't catch on.

Geek formula for success

He claims the "Geek formula" for success of a technology is that if a technology is 10 times better, it should catch on and become popular. Even if it is slower, Moore's law will soon make it 10 times faster.

So if Haskell is 10 times better than C and Haskell programs are 10 times shorter, everybody should be using Haskell.

Real-life formula for success

However, as Erik points out, "That's not how it is in real life." In real life, success is based on the perceived crisis divided by the perceived pain of adoption. Users want something that will get the job done and solve their crisis, without a lot of pain to switch.

This argument applies to many languages that remain unpopular despite their technical merits, such as Lisp, Arc, and Erlang, as well as technologies such as the Semantic Web and LaTex.

The Change Function

The above argument is based on the book The Change Function: Why Some Technologies Take Off and Others Crash and Burn by Pip Coburn. To summarize the book, new technologies aren't adopted because they are great, new, and disruptive; they are adopted only if the user's crisis solved by the technology is greater than the perceived pain of adoption. As a result, most new technologies fail.

The first half of the book is a bit fluffy, but gets more interesting when it discusses specific technologies that failed or succeeded. The book also goes out on a limb and predicts future winners (mobile enterprise Email, satellite radio, business intelligence software) and losers (RFID, entertainment PC, WiMax).

Languages and The Change Function

The Change Function argument has a lot of merit for explaining what languages become popular and what languages don't. If Lisp is so great, why are there 8 million Visual Basic programmers worldwide and few Lisp programmers? The answer isn't pointy-haired bosses (since Lisp isn't popular on SourceForge either). The crisis vs. pain of adoption model provides a powerful explanation:

Magnitude of crisis solved by Visual Basic: High (e.g. how to easily write Windows applications)
Total Perceived Pain of Adoption for Visual Basic: Very Low (hit Alt-F11 in Excel and you're done)

Magnitude of crisis solved by Lisp: Low (metaprogramming, powerful macros, and higher-order functions are solutions in search of problems)
Total Perceived Pain of Adoption for Lisp: High (this shouldn't require explanation)

The same model explains the success of, for instance, Java:
Magnitude of crisis solved by Java: High (originally how to run code in a browser and write portable code, now how to avoid crashes due to memory allocation errors and bad pointers)
Total Perceived Pain of Adoption: Low (syntax similar to C++, easy to deploy)

Applying this model to other languages is left as an exercise for the reader.

Erik points out that Erlang and Haskell are now being marketed according to the second formula: there is a multicore crisis and functional languages are the solution. It will be interesting to see how much additional traction these languages get without addressing the "pain of adoption" part.

The Change Function and startups

The Change Function ends with ten sets of questions and a set of techniques for designing technologies that will be adopted; this part of the book has many ideas that would be beneficial for startups. Many of these are fairly obvious, such as "Fail fast and iterate", and have a customer-centered culture instead of a sales-centered culture, while others are more thought-provoking: "What is the user crisis you intend to solve? What are the top five reasons a user with this crisis would not use your product?" The ultimate conclusion of the book is "Figure out what people really want!", which brings to mind the advice to make something people want.

Why Arc is bad for exploratory programming

Recently, I've been reading Programming Collective Intelligence, which is a practical guide to machine learning algorithms, showing how to build a recommendation system, implement a search engine, classify documents, mine websites, use genetic algorithms and simulated annealing, and implement other machine learning tasks. The book shows how to implement each of these in surprisingly few lines of Python.

The book is an excellent example of exploratory programming, showing how to incrementally build up these applications and experiment with different algorithms from the Python interactive prompt. For instance, topic clustering is illustrated by first implementing code to fetch blog pages from RSS feeds, breaking the pages into words, applying first a hierarchical clustering algorithm and then a K-means clustering algorithm to the contents, and then graphically displaying a dendrogram showing related blogs. At each step, the book shows how to try out the code and perform different experiments from the interactive prompt.

By using Python libraries, each step of implementation is pretty easy; the book can focus on the core algorithms, and leave the routine stuff to libraries: urllib2 to fetch web pages, Universal Feed Parser to access RSS feeds, Beautiful Soup to parse HTML, Python Imaging Library to generate images, pydelicious to access links on del.icio.us, and so forth.

If you want more details than the book provides (it is surprisingly lacking in references), I recommend Andrew Moore's online Statistical Data Mining Tutorials, which covers many of the same topics.

What does this have to do with Arc?

While reading this book, I was struck by the contradiction that this book is a perfect example of exploratory programming, Arc is "tuned for exploratory programming", and yet using Arc to work through the Collective Intelligence algorithms in Arc is an exercise in frustration.

The problem, of course, is that Arc lacks libraries. Arc lacks basic functionality such as fetching a web page, parsing an XML document, or accessing a database. Arc lacks utility libraries to parse HTML pages or perform numerical analysis. Arc lacks specialized API libraries to access sites such as del.icio.us or Akismet. Arc lacks specialized numerical libraries such as a support-vector machine implementation. (In fact, Arc doesn't even have all the functionality of TRS-80 BASIC, which is a pretty low bar. Arc is inexplicably lacking trig, exp, and log, not to mention arrays and decent error reporting.)

To be sure, one could implement these libraries in Arc. The point is that implementing libraries detours you from the exploratory programming you're trying to do.

Paul Graham has commented that libraries are becoming an increasingly important component of programming languages, that huge libraries are now an expected part of a new programming language, and that libraries are an increasing important feature of programming languages. Given this understanding of the importance of libraries, it's surprising that Arc is so lacking in libraries. (It's also surprising that it lacks a module system or some other way to package libraries.) It's a commonplace complaint about Lisp that it lacks libraries compared to other languages, and Arc makes this even worse.

I think there are two different kinds of exploratory programming. The first I'll call the "Lisp model", where you are building a system from scratch, without external dependencies. The second, which I believe is much more common, is the "Perl/Python model", where you are interacting with existing systems and building on previous work. In the first case, libraries don't really matter, but in the second case, libraries are critical. The recently-popular article Programming in a Vacuum makes this point well, that picking the "best" language is fine in a vacuum, but in the real world what libraries are available is usually the key.

Besides the lack of libraries. Arc's slow performance rules it out for many of the algorithms from Programming Collective Intelligence. Many of the algorithms run uncomfortably slow in Python, and running Arc is that much worse. It's just not true that speed is unimportant in exploratory programming.

On the positive side for Arc, chapter 11 of Programming Collective Intelligence implements genetic programming algorithms by representing programs as trees, which are then evolved and executed. To support this, the book provides Python classes to represent code as a parse tree, execute the code tree, and prettyprint the tree. As the book points out, Lisp and its variants let you represent programs as trees directly. Thus, using Arc gives you the ability to represent code as a tree and dynamically modify the code tree for free. (However, it only takes 50 lines of Python to implement the tree interpreter, so the cost of Greenspunning is not particularly severe.)

To summarize, a language for exploratory programming should be concise, interactive, reasonably fast, and have sufficient libraries. Arc currently fails on the last two factors. Time will tell if these issues get resolved or not.

The importance of software testing

At work we have a nifty electronic postage kiosk that will weigh letters, compute the postage, print a postage strip, and bill a credit card all through a display and touch screen. When I tried using it to mail a letter, everything went fine until I got a Internet Explorer error dialog:
Error message

It's always a surprise when an embedded system reveals its inner workings. Somehow I assume such systems are built on some super-special technology, not HTML and Internet Explorer.

I should mention that I was just sending a letter to Canada. While this isn't the most common thing to do, it's not a particularly bizarre action either. And I definitely wasn't doing a sinister action such as sending a letter to the country of "DROP TABLE". (I should also mention that this blog posting has nothing to do with Arc, in case my regular readers are wondering.)

I started over with the kiosk and the same error dialog appeared again. At this point, I dismissed the dialog box via the touch screen and continued with Buy Postage. It displayed "Total Postage $NaN" (i.e. Not a Number). When it asked for my credit card, I hesitated briefly, I figured the worst case was I'd have to send them a check for 0/0; it's not like they were going to bill me +Infinity. Curiosity got the better of me and I decided to plunge forward in the interests of science. After I swiped my credit card, the kiosk printed a $0.69 postage strip, and gave me a receipt:
receipt
Sure enough, the receipt said my credit card had been charged $NaN. I can only imagine what the billing backend would make of that. I called my credit card company to check what really happened, but the day's transactions weren't available yet.

I see a couple morals to this story. The most obvious is that normal people don't swipe their credit cards through a machine offering to bill them $NaN. But the more relevant moral is that error checking and software testing is a good thing, especially when monetary transactions are concerned.

P.S. I emailed the kiosk support, and they assured me that the billing problems following a "system change" were resolved, and I'd be billed the proper $0.69.