Tao of the Machine

Programming, Python, my projects, card games, books, music, Zoids, bettas, manga, cool stuff, and whatever comes to mind.

Spambayes slip-ups

I've been using spambayes for several months now. Most of the time, it does a good job. With only a little training, it sends pesky virus mails to the Junk folder. Ditto for most spam. And after the first few days, it *never* mistook ham for spam.

However, sometimes it slips up. It's no big deal, but it makes me wonder what's going on. Is the algorithm flawed, did I do something wrong (like accidentally mis-classifying certain mails), are spammers using sneaky tricks to get around the filter? The thing is that these mistakes *appear* really brain-damaged. After getting the umpteenth mail with A SUBJECT IN ALL CAPS, or the word "Nigeria" in the body, or "The Career News" as the sender, and plenty of similar mails marked as spam in the repository, you'd think that spambayes would have no problems figuring out where these belong. But for some reason, it does. Sometimes they show up as "unsure", sometimes as "ham".

The strange thing is that it classifies most mails like these as spam, but some of them somehow slip through. Except for the Career News, which is *always* marked as ham, no matter how many times I tell it it's spam. I suppose it's possible if a mail contains enough "ham" words, but it's still strange.

Do others experience the same kind of behavior? Maybe my repository is somehow messed up, and I should start with a blank slate. Or maybe it could benefit from a few simple filtering rules.

Posted by Hans Nowak on 2004-02-13 23:14:33   {link} (see old comments)
Categories: Python

The Butterfly effect

A strange movie. (Short description here) Impressive, but I found it very disturbing in several ways.

First of all, I'm not exactly a fan of gratuitous violence, and this movie has lots of it. It has lots of everything that makes a movie R-rated... multiple beatings, sex, drugs, animal abuse, pedophilia, and more. I don't really care to see all that stuff.

Second, it paints a very ugly picture of humanity. The world is populated by dumb kids, nihilistic stoners, bullying frat boys, ultra-violent misfits, pedophiles, hardcore criminals, etc. I saw many more unsympathetic characters than sympathetic ones.

On top of violence and ugly stuff comes the actual plot. At first it's a bit like a horror movie... you move on to the next part and wonder what horrible thing is going to happen now. Later on, the idea of being trapped in an alternate world (or maybe in your own mind) seemed oddly familiar, and discomforting.

A nice touch is that events that seem inexplicable in the early movie, like Evan's blackouts and what he does during those, become clear later on. In an early scene, you see Evan (as a kid) standing with a knife, and at that point you might think he will be some kind of homicidal maniac, or maybe that he's possessed. But it's nothing like that: a future version of him was simply trying to destroy a piece of dynamite that was used, or going to be used, in a prank gone bad.

I won't be watching a movie like this again anytime soon, though. Along came Polly is starting to look better and better... emoticon:wink2

Posted by Hans Nowak on 2004-02-13 00:16:31   {link} (see old comments)
Categories: general

ItemList

A forthcoming Firedrop content type will be ItemList. This can be used to make pages that contain, well, a list of items, like a FAQ. I intend to use it for a section with notes on various topics.

A new Firedrop release, containing this new type, will hopefully be available soon.

Posted by Hans Nowak on 2004-02-12 23:13:56   {link} (see old comments)
Categories: Firedrop

5

Hmm, PyZine issue 5...

[Update #1] This is more of sneak preview. The issue isn't out yet, but you can take a look at what will be in it.

[Update #2] Mark Pratt writes: "3 articles are available now and the remaining articles are released roughly 2 weeks apart. We publish the TOC so that people know what to expect. This is what we mean by "throughout the quarter". In addition that subscribers get access to all back Issues that beehive published including the articles from Issue 4. [...] Since we are no longer in print we are using the flexibility of publishing online to deliver the same product but with more time to spend with each author on Editing."

Posted by Hans Nowak on 2004-02-11 15:29:54   {link} (see old comments)
Categories: Python

The disorder of a new generation

(via Mark Pilgrim's b-links feed)

Well, it seems I finally got my own disorder now... NADD

Let's see. At this moment, I have 14 tasks in my taskbar, not counting the 11 icons in my "tray". 3M Post-It Notes, a folder, WinCVS, Pegasus Mail, Mozilla Firebird, a 4NT console, Excel, SQL Server Enterprise Manager, SQL Server Query Analyzer, a vim session, two Yahoo Messenger windows, and two Firedrop windows.

Granted, there is some overlap here, Yahoo Messenger also takes up a tray icon, and some icons just display my Internet connection. Still...

Also, in my browser, there are currently 9 tabs. (Not that much, sometimes there are 20 or so. :-)

But I think what really qualifies me is that I stopped halfway through the article to write this blog post. emoticon:smile

I think that having TV, radio, etc, on at the same time, is more of an AD(H)D thing, though. Myself, I cannot stand annoying external stimuli like noise, movements, etc, when I'm trying to concentrate. And in real life, I don't feel like I have to do something different every 2 seconds.

But when working with computers, everything changes. I happily jump from task to task, opening windows and browser tabs in the process, making notes, checking mail and web site updates, etc. 1) I can easily handle vast amounts of information, and am (usually) not fazed by new mail every 5 minutes, juggling 20 open windows, etc. As long as the arrival of new information is not intrusive. (I absolutely loathe Yahoo Messenger for this reason.)

I consider this normal, but "normal" people apparently don't. What can I say... I have many trains of thoughts going on, and I hate to lose them.

I don't really understand why a quality like this is called a disorder, but that's a discussion for another day.

1) If you have a sudden thought, idea or inspiration, then it's really easy nowadays to look things up on the Net, in a few seconds. In past ages, this was not so easy... the information simply wasn't there, or was harder to obtain (books). This may have contributed to the emergence of this "disorder".

Posted by Hans "maybe Information Deficit would be a better name?" Nowak on 2004-02-10 10:47:44   {link} (see old comments)
Categories: general

Naming worms and virii

The following quotes are excerpts from a discussion on a well-known forum.

"Virus-writers don't get to name their viruses, the anti-virus companies do that."

"I'm sure if the file you sent out was called "thisvirusisnamedJim.vbs", it would be called Jim."

"Tell that to the author of Nimda, the first major worm to spread multiple ways. He clearly named his worm "Concept Virus(CV) V.5, Copyright(C)2001 R.P.China" in a string in the binary, but the antivirus people called it "Nimda" anyway [wired.com]. Nimda 0.6 contained the string "Concept Virus(CV) V.6, Copyright(C)2001, (This's CV, No Nimda)" but it was still called Nimda."

"The anti-virus companies would call it anything but Jim. Virus writers used to be in it for the "fame" (old school ones, before spammers took over and started writing viruses for their own purposes). The last thing anti-virus companies want to do is to give them that on a plate, so they deliberately pick other names for the virus when the author has indicated a name themselves."

That's kind of dumb.

The author (or publisher) of a piece of software gets to name it. That in this case the software happens to be harmful, and/or the author despicable, is irrelevant. It's still software.

Everybody hates SCO these days, but nobody calls SCO Unix by a different name out of spite. (Well, except on Slashdot maybe :-)

This behavior of anti-virus companies is extra dubious since they are the ones making money off the viruses. The last thing they want to do is giving the virus writers fame... fair enough, except that the anti-virus companies would not exist if nobody wrote viruses. (Which would be a better world, I agree. Just so there are no misunderstandings: I do not condone the writing of harmful worms in any way.)

Posted by Hans Nowak on 2004-02-10 10:09:01   {link} (see old comments)
Categories: internet

Changing the world

(via Ted Leung) Clay Shirky: Exiting Deanspace. I'm not really interested in the article itself, since it's about American politics. My ideas don't really match the American political spectrum, and besides, I'm not allowed to vote. :-) No, what stands out is a quote, or rather a nested quote:

Margaret Mead once said "Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has." Generations of zealots have tacked these words up on various walls, never noticing that the two systems that run the modern world – markets and democracies — are working right precisely when they defeat these attempted hijackings by small groups.

It's funny -- I think of the Margaret Mead quote as something *positive*. I consider it good and desirable that small groups of committed people, or even individuals, can still change the world. Indeed, who else would?

Maybe in this post-9/11 era, people immediately associate quotes like these with terrorists, fringe groups, sects, and whatnot. Indeed, the author talks about "zealots" and "hijacking". Thoughtful, committed people are not automatically zealots. In fact, I don't consider zealots very thoughtful at all. It's true that with this quote, you can go both ways... people can change the world to make it better or worse.

Fortunately, there are plenty of examples when "thoughtful, committed people" changed the world for the better, and we don't even have to go far back in history. To stay in the programming world: Linus Torvalds would be one of those people. Richard Stallman would be another. It's not so difficult to name people from other areas, like politics, but I'll leave that as an exercise to the reader.

"Generations of zealots have tacked these words up on various walls, never noticing that the two systems that run the modern world – markets and democracies — are working right precisely when they defeat these attempted hijackings by small groups." Indeed, democracy is the dictatorship of the majority. Markets, democracies and similar systems drown out the voice of individuals and smaller groups. This can be a good thing or a bad thing, depending on the situation and on how you look at it. But let's not pretend it's solely something good.

Posted by Hans Nowak on 2004-02-09 10:23:31   {link} (see old comments)
Categories: general

Justified and ancient

As seen on Slashdot... this quote in a thread about Latin:


I also feel there may, in some sense, be an added benefit, which manifests in a variety of ways, some obvious and some far more subtle, to be gained from the study of a language, even a language which is no longer current, vernacular or in any sense idiomatic, from which not only are a great many of the present day languages of Europe clear derivatives, but which was also the nearest thing to a universal language for many centuries, in which it would be, were that language to be more widely used today, considered entirely reasonable to construct sentences of great structural complexity, far beyond that displayed in current English, containing a range of subsidiary clauses, embedded phrases, hypothetical diversions and clearly structured formations such as the dreaded Ablative Absolute, with the consequent benefit of a remarkable precision in the expression of far more complex constructs in a single structural unit than might be possible in a language tending towards a shorter, more atomic, style of construction.

This sentence is probably funny to anyone who had Latin in school, because this is exactly what translations looked like... really long run-on sentences. Sentences in Latin texts often get really long (although they don't look that long, since Latin tends to be shorter than modern European languages), and produce translations like the text above. What was worse is that we pretty much had to learn those translations by heart for the exams. emoticon:bonk

Translated Greek tended to be somewhat shorter, if I recall correctly. It's been a while...

Back then, it didn't seem very useful to learn Latin and ancient Greek, since they are dead languages. Looking back, I think the usefulness is not in using the languages themselves, but in the "side effects" of learning them.

First of all, after having studied those grammars, modern languages like French, English, Spanish and German are a piece of cake. (OK, maybe not German... :-) I took a year of Spanish in college; it was really simple after 5 years of Latin. Not only because many words look alike, but also because many things are just simpler in Spanish. For example, adjectives in Spanish can take 4 possible forms (male singular, female singular, male plural, female plural). In Latin, they can potentially take 36 different forms (3 declensions * 3 genders * 6 cases). 1) Even taking into account that some forms look alike, there are still quite a few variants.

Second, translating Greek or Latin is not easy. It forces you to carefully read the sentence (which is probably long and likely has a number of sub-sentences), look up unfamiliar words, grok the specific meaning of those words (often based on word ending), analyze what the overall sentence means, and come up with an accurate translation in Dutch (English, etc). All this while taking into account other factors like dialects, people's peculiar usage of words, expressions, styles... People unfamiliar with this might laugh at the fact that during a Latin class, we often only completed a handful of sentences... but that illustrates how difficult it is.

This seemingly pointless exercise (taking great pains to translate 2000+ year old texts in a dead language) teaches you to *think* differently. It cultivates an analytical and carefully reasoning mind, which is useful in lots of other areas. I don't think it's bad for a programmer either. :-)

A good introductory text is Wheelock's Latin.

1) Note that I am not familiar with the English words for some of these terms, so I may have them wrong... in Dutch, it's declinatie, geslacht, naamval. And yes, there are technically more than 3 declensions, but let's keep it simple, eh? :-)

Posted by Hans Nowak on 2004-02-08 22:55:13   {link} (see old comments)
Categories: general

...and integrating everything

This is a different issue altogether (see previous post), but: would it be a good idea to integrate Sextile and the macro system?

Sextile by itself is kind of incomplete... for example, it has no way to express URLs. Nor does it have special syntax for things like acronyms, paragraphs in certain styles, etc. For this, we use macros.

The macro system does not really depend on Sextile... the macros (should) work just as well in other formats, including Textile, pure HTML, and any other format (as long as the macro syntax doesn't collide with the format's).

Maybe the macro system should be a separate package (call it FooMacros for now, for lack of a better title). Then:

  • Sextile uses FooMacros (and probably has the FooMacros files included in its distro), and possibly comes with its own collection of useful macros (for URLs, acronyms, tables, etc)
  • Firedrop uses Sextile (which includes FooMacros)
  • Firedrop also uses FooMacros "separately"

Hm. Doesn't seem like a great solution, but it's doable.

Posted by Hans Nowak on 2004-02-08 22:38:17   {link} (see old comments)
Categories: Firedrop

Improving Sextile and Firedrop macros

Warning: incoherent rambling ahead...

Firedrop is becoming more and more user-friendly. It grew a few simple but necessary features, like dialog boxes for confirmation, creating a "site", editing a post's categories, etc. It still lacks solid documentation, but that is getting better as well.

Now that that is out of the way, I have to take a critical look at Firedrop's macro system, and Sextile. Started out as a lightweight, minimalistic alternative (some would say a "parody" :-) to Textile, Sextile is useful and relatively clean, but it has some serious limitations. For example, this MT-Textile page is full of things Sextile cannot do.

That is where macros come in. Most of the items on the MT-Textile page mentioned can be done by adding the appropriate macro. #1 and #2, for example. #3 already exists. Most of the others can be implemented with macros as well.

Alas, macros have their limitations too. One of the most glaring problems is that you cannot use a semicolon other than to limit arguments. You see, a macro like {foo;42;hello} really translates to a Python function call foo(42, "hello"). The semicolon is used as a separator, and currently there's no way to get around that, which is especially annoying if you want to use entities like & inside a macro.

Quoting it might be a solution; writing \; is not pretty, but it might do the trick. However, the current order of evaluation is:

  • Textile/Sextile -> HTML
  • expand embedded code
  • expand macros

so in order to make this work, Sextile must ignore the \; for the macros to see it (it doesn't, currently).

Another problem is that macros cannot be nested. This is mostly due to the simplistic "parser", which just looks for the next }. Something like {foo;{bar;42}} might make sense in some situations, but it's allowed.

Sometimes it's useful to span a whole block of text with a macro. For example, for syntax highlighting, or to create a table structure. I don't really think that nested macros are adequate for this. On the other hand, I don't want to make Sextile too convoluted, since that defies its purpose. If you have to learn lots of special syntax, then you might as well write straight HTML or XML.

Eventually, I want the Firedrop & Sextile combo to be good enough to write documentation in it, or even a book. To do so, it must be capable of producing more sophisticated output than it does now. HTML tables, paragraphs using certain styles, etc. I wonder if I should try to "fix" macros and the Sextile format for this purpose, or look at something else entirely. For example, the Documentation content type could accept DocBook XML; it could generate a valid XML document simply by glueing the entries together, and generate HTML pages for online viewing. It's not really easier to type, though.

Originally, when Firedrop was still on the drawing board, it was going to have its own flavor of HTML tags, called FHTML. With it, you could write things like

Some <f-upper>Python</f-upper> code here:
<f-colorize>
x = f(42) + 1
</f-colorize>

Eventually, I rejected the idea, because a markup language isn't really supposed to replace or transform text, and because there would be too many levels of text manipulation (several types of macros, FHTML, text formats...). However, custom tags would certainly be useful.

Maybe macros can be expanded to do something similar? E.g.

Some {+upper}Python{-upper} code here:
{+colorize;python}
x = f(42) + 1
{-colorize}

or:

{+table;2;4}
{+row;1;1}Text in cell (1,1){-row}
{+row;1;2}Text in cell (1,2){-row}
{-table}

(table will be a Python object, but how it will be implemented, is yet to be seen... function, class?)

Heavy use of this doesn't make text any prettier, but it would be very powerful, and has lots of interesting possibilities.

Posted by Hans Nowak on 2004-02-08 21:46:12   {link} (see old comments)
Categories: Firedrop

Nederhop

Hip-hop music has been around for several decades now. Although it originated in the US, there are hip-hop bands in just about every country in the world today, often rapping in their native language rather than English. The Netherlands are no exception; nederhop records have topped the charts, and new crews are emerging everywhere, even rapping in fairly obscure dialects.

It wasn't always like this, though. In countries like Germany and France, people started rapping in their own language fairly early. While Die Fantastischen Vier, MC Solaar and IAM had success with hip-hop in German and French, there was a more snobistic attitude in the Netherlands. Rap music had to be in English and nothing else.

So when a band from Amsterdam, the Osdorp Posse, started creating hip-hop in the Dutch language (and came up with the word "nederhop"), they were often met with ridicule. Their first albums were well received in the "underground", but didn't have mainstream pop appeal. In spite of this, they became more and more popular on festivals, and caused a number of other underground rappers to start using the Dutch language rather than English.

Because of this, Dutch hip-hop became more well-known, and eventually other rappers and producers saw the commercial possibilities, even though they had been rejecting the concept for years. A rapper called Extince can be credited with the first nederhop chart success, much to the dismay of the Osdorp Posse, calling him "the Dutch Vanilla Ice" (and a lot of other things ;-) because he switched to Dutch rap purely for commercial reasons.

The first chart success spawned a cascade of other artists trying their hand at rapping in the Dutch language. Slowly, this kind of music gained acceptance. What was once a curiosity, eventually became big business. (Well, "big"... we're talking the Netherlands... but you get the idea. emoticon:smile)

These days, there are lots of hip-hop bands rapping in Dutch. Some names: Brainpower, Opgezwolle, Haagse Mark, White Wolf, Ouderkerk Kaffers, De Uitverkorenen, Z-Bomb Unit, Spookrijders, Yukkie B, ABN (Belgian), Onderhonden... Heck, there are even crews that rap in the dialects of Limburg (which sounds a bit silly, IMHO :-), like the OZL-Crew and the Pikkatrillaz. Even if you don't understand Dutch, they might be worth listening to.

What must be grating to the OP is that the current "hip-hop scene" gives them little respect. Nederhop is mainstream now, there are lots of bands trying to make a buck, reviewers are making money writing about it. The OP almost single-handedly built this scene; without them, all these groups most likely would not exist. Yet some artists, reviewers and listeners claim that they don't make "real hip-hop" (whatever that means), and don't consider them part of the "scene". Some rappers even claim that *they* were the first to rap in Dutch, rather than the OP. The question remains why we never heard of those rappers before, so this last one is pretty much a non-issue.

Some of the sites above have legal downloads. Start here, for example. Or here...

On a side note: the Osdorp Posse's label, Ramp Records, has some interesting ways to battle diminishing CD sales. For example, they lowered their CD prices by ~50%. (Their CDs are now €9-10; usual prices are around €20.) Also, they have special offers, like a CD that can only be obtained with a ticket for one of their concerts.

Posted by Hans Nowak on 2004-02-07 20:16:30   {link} (see old comments)
Categories: music, Nederland

--
Generated by Firedrop2.