Efectos EspecialesRamblings, rants, musings, ideas and observations. Topics include (but are not limited to): programming (especially Python), books, games (especially CCGs and board games), astrology, design, writing, painting, etc.
(Adventures in OCaml optimization... please read the entire post before commenting, because I'm discovering a whole bunch of things as I go.)
I have always been a bit amazed (and occasionally amused) at Python newbies who try to do a simple thing, then complain that the language is s-l-o-w. Upon looking at the code, it then turns out that they've translated an idiom from a language they are familiar with, and that usually doesn't work so well in Python. (Concatenation of large numbers of strings using
+= is a notorious example.)
Well, turnabout is fair play. I now have the same problem with OCaml, and Caml gurus can laugh at me wholeheartedly.
All I want to do is read a (not very large) text file, and collect the contents. "Collect" means I want to have e.g. a long string containing the file's contents, or maybe a list of strings for the lines, etc. [Alarm bells go off immediately: these are Python idioms, that won't necessarily work well in Caml...]
I tried a number of approaches. I won't post all the code here, but the first attempt read a file character by character, appending them to create one big string. Another one does the same with a list of characters. These are not very efficient, because a new string/list is created whenever a character is read. So I tried another approach, that modifies a list in place. Still not efficient. In fact, they were very, very slow.
I haven't found an OCaml function (yet) that actually reads a *character*... rather, there's a clumsy function called
input that reads a string. You call it like
input ic s 0 1 where
ic is an "input channel" (aka a file opened for reading), and
s is an existing string that will be modified in-place. 0 is the index where we start, and 1 is the number of characters to be read. So this reads a one-character string.
Hmm. What about a different approach, then? Next I tried to use
input_line, which is more straightforward, kind of like Python's
f.readline(). My resulting, amateuristic code is:
(* cat5.ml *) let readlines ic = let rec readlines_helper accum = try let line = input_line ic in readlines_helper (accum @ [line]) with End_of_file -> accum in readlines_helper  ;; let filename = (Sys.argv).(1);; let start = Sys.time ();; let ic = open_in filename;; let lines = readlines ic;; close_in ic;; let stop = Sys.time ();; let print_line s = begin print_string s; print_newline (); end;; List.iter print_line lines;; print_string "\n\nReading that file took ";; print_float (stop -. start);; print_string " seconds";;
(Note that its prints the file's contents, but that happens outside of the timer.) The equivalent Python code is:
# cat5.py import sys import time start = time.time() f = open(sys.argv) lines = f.readlines() f.close() stop = time.time() for line in lines: sys.stdout.write(line) print print "Reading that file took", stop - start, "seconds"
Now, there's probably a lot wrong with my OCaml code. But still, the difference is huge. For a file of about 100K, the Caml version takes 4.6 seconds. The Python version takes 0.01 seconds.
Compiling doesn't help, cat5.ml even runs a tiny bit slower when it's compiled to an exe. I expected the Caml version to blow Python out of the water, even uncompiled.
Obviously, I must be doing something wrong. Maybe more than one thing. But what exactly? Is it the use of recursion? The
readlines_helper function is tail-recursive, right? My guess is that creating a new list for every line makes it slow... although there aren't that many lines, and changing the list in place doesn't seem to make any difference.
[A little background: I am interested in reading a file character by character, or maybe line by line (and then process the characters in that line). It's for writing a simple parser. I am currently not interested in OCaml's parser tools.]
I haven't discovered the "real" way to do this yet. More about this later.
[Update #1] Well, 5 minutes after posting this, I found a much faster way. The new
readlines function now looks like this:
let readlines ic = let rec readlines_helper accum = try let line = input_line ic in readlines_helper (line :: accum) with End_of_file -> List.rev accum in readlines_helper  ;;
@) apparently isn't very efficient, probably because it needs to traverse the existing list.
:: (like Lisp's
cons) is much faster. It appends at the front, but when it's time to return the list, I can simply reverse it.
It seems that Python is still faster though... And there also seems to be a problem with recursion. Maybe it's not tail recursive after all. Well, I'm still learning...
[Update #2] This version does not have the recursion problem:
let readlines ic = let accum = ref  in try begin while true do let line = input_line ic in accum := line :: !accum; done; !accum; end; with End_of_file -> List.rev !accum ;;
Yah, it's not pretty, and not really what you think of when doing functional programming... but it'll have to do for now. On a test file with 100,000 lines of 80 characters each (8.2 Mb), this function clocks in at around 0.28 seconds. cat5.py takes 0.12. For a file with 500,000 lines of 80 characters each (41 Mb), Caml takes 1.56 seconds, and Python 0.625. Again, it doesn't seem to matter if the Caml code is compiled or not.
I still think Caml's performance could be better than that, but this is acceptable for now. Although maybe the code is not...
OK, here are the uncensored new year's resolutions for 2005.
1. Move. More details about the move (where, who, etc) will appear on this blog in due time.
2. Get my US driver's license. Yes, I still don't have one, because I procrastinate and hardly ever drive at all, anyway. This will have to be done before #1.
3. More control over my finances. After #1 is accomplished. The US, and especially Florida, makes it very easy to be "free floating" and live from check to check, so you don't/can't save anything and get in trouble the moment you get an unexpected bill.
4. Live healthier. A little more exercise, a little less junk food, to keep my heart and arteries happy. And my stomach... I should buy stocks in Pepcid.
5. Finally publish my Nova card game. Designed in 2001, still not published. I will need to debug it, fix the rules, create PDFs, then have it printed on cardboard, to create a small number of decks for playtesting. I will also have to rethink what "publishing" means here exactly.
6. Make a decision about Wax. That will have to be done rather early. Either I will go for it completely, or I'll give up and spend my time and energy elsewhere. (Although I can still use it for personal projects.)
7. Draw and paint more. I wonder if one drawing or painting a month is too much to ask.
That's it for now. No book plans, since I have nothing to write about. In the past year I haven't even written any real articles.
Before pondering my new year's resolutions for 2005, let's see what happened to those for 2004.
It looks like the score isn't great. I didn't write a book, for example. At some point, this hypothetical book was tied up to the equally hypothetical PSF grant, which I didn't get. I didn't write a game either, at least not a computer game, although I came up with a handful of ideas for card and board games. I didn't publish my Nova card game (maybe in 2005?), didn't fix my finances (although I did make enough this year), and didn't do all that much with Firedrop.
Not very good. I didn't really accomplish any of these, except for the low-hanging fruit, the blogroll (which is taken care of by Bloglines).
I need some better resolutions for 2005. More about that
Anyway, merry Christmas y'all.
By now, everybody has seen Guido's article about adding optional static typing to Python. I don't really have much to add to it that hasn't already been said in comments and other people's blogs. And like many people, I wonder: why would one want to add this?
The article itself doesn't give many reasons why, except for this part:
This seems kind of a weak reason, IMHO. Static typing could be used for more useful things, like compiler optimization, but not a word about that. Rather, it seems that the idea is, to catch "bugs of a rather trivial kind" earlier, and to emit compiler warnings in case of type errors.
I don't know... that seems kind of flimsy. At least not important enough to add this capability to the Python compiler. If there really is a demand for these features, why not investigate first if it can be done by other means (decorators, etc)?
I'm -1 on adding this to the language, and I haven't seen (m)any positive reactions from others either. But without a doubt, GvR will have to say more about it... let's wait for that before jumping to conclusions.
Lambda seemed like a bad idea at first too... oh, wait...
pycron is a cron replacement written in Python. No source seems to be available, unfortunately. I installed this on a server the other day, and it seems to work quite well.
Update: Funny, there's another pycron at Sourceforge. It doesn't seem to be the same or even related (other than that it's a cron implemenation, of course).
I received the following mail today.
So I need another language. It's can't be Delphi or Python. It has to have reasonable support for making decent Windows GUIs, and for talking to databases. It has to be dynamic, with a compact syntax, and tremendous flexibility.
[I am] aiming for a way-higher-than-average level of productivity.
So... you are something of a dynamic language fan... what languages meet the requirements above?"
A good question, and not so easy to answer. Let's recap:
1. It cannot be Delphi or Python. Obviously it cannot be Java either. A dynamic language is preferred, so C, C++ and C# are out as well. The definition for "dynamic" may actually be rather loose; I think the emphasis is more on production gain. If production could be greatly improved by using, say, OCaml or Eiffel, then those languages would be candidates too.
2. Decent Windows GUIs. "Decent" means that it must be capable of creating native-looking GUI apps. Languages with *good* wxWidgets support are candidates; so are languages sitting on .NET (and maybe the JVM).
3. Support for talking to databases. Most non-obscure languages have that these days, and so do the .NET and JVM languages.
4. Dynamic / compact syntax / flexibility.
5. A high level of productivity.
6. I would like to add, that the language shouldn't be too immature, and should have a reasonable user base. We don't want the language to go go through incompatible changes every week. We do want availability of some libraries, and a newsgroup or mailing list to ask questions if necessary.
Problem is, that established dynamic languages like Perl and Ruby are out, because of #2. (This may be controversial, but we're not interested in arguing about that right now. Although if you know a GUI toolkit for Ruby or Perl that is *really* native-looking, and stable, and mature, then I'd like to hear about it. Note that Tk and FX are not considered to be up to snuff for the sender's purposes.)
On the other hand, .NET and JVM based languages are often immature and therefore don't meet #6. That rules out Boo, Groovy, Beanshell and many more little languages.
Maybe there are commercial Lisp/Smalltalk implementations available that match these criteria? We are willing to consider them. (Yes, there are many cool open source implementations, like CLISP and Squeak, but they don't really match #2, for example. Or #3, maybe. But feel free to correct me.)
So, what would be a good language? No offense, but in this case, I am not interested in discussing why Delphi and Python aren't acceptable, how Java productivity can be greatly improved, or how web-based apps are the wave of the future. I just want to hear about cool languages that match the criteria. Thanks in advance.
[Update #1] Why not Python or Delphi? It's not mentioned in this email, but we've had a similar discussion a while ago, and these languages are excluded for non-technical reasons. That's all I can say about it right now.
"An unmistakable sign of paranoia is continual mistrust. People with paranoid personality disorder are constantly on their guard because they see the world as a threatening place. They tend to confirm their expectations by latching on to any speck of evidence that supports their suspicions and ignore or misinterpret any evidence to the contrary. They are ever watchful and may look around for signs of a threat."
But maybe that's just the residue of my java/c++ schooling. I can't imagine myself actually ever choosing to design a statically typed system again."
Incidentally, at work we've developed a large-ish system, that has to run on a server virtually unattended. It's very important, so it's not supposed to crash or produce the wrong results. :-) And it's written in Python. So far it's been doing rather well... *knocks on wood*
I suspect static typing has an important psychological effect... somehow code *feels* more secure and correct if it can be compiled without errors. A successful compilation seems to tell you that you're at least part of the way in ensuring correctness. In contrast, with a dynamic language like Python, you're never sure until you actually run it. At least, that's what it seems like. But when you think of it, this isn't true at all -- if code compiles, all that's guaranteed is that the syntax is correct and that the types match (at least, if you don't do any casting :-). But it's not guaranteed to run any more correctly than a similar program in a dynamic language. Unit testing is one way to check that correctness, in both statically and dynamically typed languages. With a dynamic language, you just get there earlier.
And reader verbat writes:
But if you see something like:
AttributeError: 'list' object has no attribute 'split'
in you unit tests you suddenly think: "oh, wait I missed something" and go fix it."
Yes, that is an example of an error that would have been caught at compile time by a statically typed language. In a unit test, such an error message is just a side effect of testing the code's correctness.
And btw, are'nt pychecker/pylint small, little powerful static checks?
Don't fall in the trap of thinking dynamic typing is perfect, the worls's just waiting for a system es axpressive as dynamic ones with a lot more static checks :)"
I agree that static typing has certain benefits. It allows for better code analysis, for example, which in return allows for more powerful IDEs. As I have written before (in an Eclipse-related post, I think), code like
def foo(x): return x.bar(42)
simply doesn't provide enough information, to both human and code analysis tool, to determine the type of
x, much less what attributes it has besides
bar. This is both a strength and a weakness of dynamically typed languages. You can't determine the type by looking at the code; but you don't really have to care, because anything with a
bar method that accepts 42 as an argument, will pass.
I certainly don't think that dynamic typing is perfect. Especially when it comes to refactoring, some more code introspection would be a nice help (at least for Python, can't speak for other DTLs). Aside from that, though, I will take dynamic typing over static typing any day. It's so much more productive, it's not even funny.
Personally, I have never felt the need to use interfaces in Python code, but maybe my projects just aren't large enough to make them very useful. Ditto for tools like PyChecker and PyLint, which seem like they could be useful in theory, but which I very rarely use.
(Although a possible explanation for the presence of interfaces and other static checks in dynamic systems *could* be: If a system is flexible enough, and popular enough, then inevitably somebody will add their pet feature to it... whether that be static typing, or AOP, or multimethods, or...)
Design and content © 2004 Electric Shock / Hans Nowak