Tao of the Machine

Prototyping

Another person experimenting with Self-like prototypes for Python. Just remember that I was there first <wink>.

Of course, my code was just an experiment as well... I wonder if a "serious" prototypes package would be useful. I mean, would people actually use it? The prototype approach is very different from the regular Python class/instance model, and allows for very different constructs.

Also interesting is Prothon, and the c.l.py discussion about it. I agree that the choice of tabs for indentation is unfortunate. I also wish there were more code samples available. From what I see, I'm not sure it's all that close to Self's object model, but then again, I haven't actually programmed in Self...

Posted by Hans Nowak on 2004-03-26 22:50:06 {link} (see old comments)
Categories: Python, programming

A document database

I've always had a need for a program to store all kinds of information... notes, snippets, important mails, links, ideas, etc, but also larger texts like ebooks and manuals. You could search the repository by text, but also by keyword; a "document" would be associated with any number of keywords. (As opposed to storing a number of files in a directory tree, which is a hierarchical structure.)

A few years ago, I actually wrote the program that I had in mind, using a combination of Python and Delphi. Delphi for the GUI and database access, Python for flexible searching. In spite of that, the program was still quite rigid. And often slow, because in order to do the Pythonic search, it had to walk the database record by record.

Today, writing such a program, in pure Python, is much easier. Use the ZODB, stick objects in it, and you're all set, having all the flexibility you'll ever need. I can use Document objects that store text, plus metadata, like a title, author, creation date, etc. All this is easy to retrieve and search.

However, I'm not sure the current version scales very well. Granted, it's only a trial version, weighing in at a whopping 150 lines of code. :-) What I have so far is a rudimentary database, an import script, and an interactive console that can be used to query the database. Something like this:

>>> db
<database.Database instance at 0x007DBC10>
>>> db.db
<BTrees._OOBTree.OOBTree object at 0x00A07390>
>>> len(db.db)
4540
>>> d = db.db[2000]  # just a random "record"
>>> d.title
'BIGNUM.H'
>>> len(d.data)
1414
>>> d.source  # where it came from
'c:/cd-r/wizdom2\\prog\\c\\c-snippets\\BIGNUM.H'
>>>

There are currently 4540 documents in the database, varying from small to large (over 1 Mb), totaling almost 400 Mb. Searching this database "naively" is easy but very slow:

for id, document in db.db.items():
    if document.data.find("Python") > -1:
        print id

This takes forever, which isn't so surprising when you think of what's going on behind the scenes (unpickling, searching a possibly large body of data, repeat 4540 times).

Optimizing this could be an interesting task. Maybe I'll need some kind of search engine functionality, like Lupy. Maybe searching can be done "smarter"; for example, there should be a simple way to search only documents of a certain size (< 10K, < 1 Mb, etc). Or maybe the actual text/data could be stored separately from the Document object, so it only gets unpickled when we really need it. (If we search for certain keywords only, then we shouldn't have to unpickle the document's text at all.) Etc.

A simpler alternative could be, to simply avoid storing large documents. Do I really need ebooks and manuals in there? Most of the time these can be found on the internet anyway. It's much more useful to store and search personal notes, mails, etc, things that would otherwise get lost in a crowded mailbox or directory.

Code will be available when I'm content about it. :-) There will be a GUI as well (written in Wax, of course), for user-friendly document management, and editing.

Posted by Hans Nowak on 2004-03-25 01:44:45 {link} (see old comments)
Categories: Python

oldversion.com

After this, this is a great find: oldversion.com -- "because newer is not always better". Get the good, non-bloated versions of Acrobat Reader, Winamp, ICQ and much more.

Posted by Hans Nowak on 2004-03-24 00:18:00 {link} (see old comments)
Categories: internet, linkstuffs

YPMV

YourDictionary.com: The 100 most mispronounced words.

Some comments. (Disclaimer: I am not a native English speaker, so I can only go by what I've picked up around here. "Around here" is Florida. YPMV.)

clothes: I haven't heard anybody here actually pronounce the [th]. People seem to say [close]. This is Florida, not Britain. :-)

duck tape: Note that there's actually a brand called Duck Tape.

February: Again, I haven't heard anybody pronounce the first [r]. It's "feb-yoo-ary".

herb: Apparently, if it's a name you pronounce the [h], otherwise you don't.

mischievous: This word isn't used often, but I've heard it pronounced "mischievious". (Yes, with the extra i.)

pernickety: According to dictionary.com, persnickety is valid as well.

snuck: Yes, according to dictionary.com again, this *is* a word. [snuck]

spitting image: This is valid as well.

Some more reactions to this list.

Then there's the issue of how to pronounce Nevada...

Posted by Hans Nowak on 2004-03-22 23:33:38 {link} (see old comments)
Categories: general

Lindows

The Lindows site does not allow users from the Netherlands, Belgium and Luxembourg:

Important Notice! Pending Lindows' appeal visitors from the Netherlands, Belgium, and Luxembourg are not permitted to access the Lindows.com website or purchase Lindows products.

Refusing to sell your products to someone is one thing, but how can you forbid someone to visit a website? I suppose you can do crafty things with filters, block certain IP ranges etc, but forbidding it seems silly, because it's a bit hard to enforce.

Posted by Hans Nowak on 2004-03-22 18:10:42 {link} (see old comments)
Categories: internet

JKR Chat Transcript

A JK Rowling chat transcript. Contains a few hints about book six and seven. It appears that some of the popular fan theories can be put to rest. :-}

Posted by Hans Nowak on 2004-03-21 23:22:39 {link} (see old comments)
Categories: books

--
Generated by Firedrop2.