aquiline ascension

published: 2010-10-16
author: Hans Nowak

My Magic projects (part 2)

(See part 1 for the background story of these projects.)

There are a few Magic-related projects that I am working on at the moment.

Magicripper2 is a kind of webcrawler that fetches card info from Gatherer. It's not 100% done yet, but it does a good job in general; there are just some corner cases that need to be dealt with. (Having to do with split cards, flip cards, alternate art... mostly due to some inconsistencies in the way Gatherer handles its IDs.)

It consists of three main parts:

1. Get the "multiverse" ids of the cards in a given set (and store them in a simple text file).
2. Get the HTML of the pages showing these cards. There are two versions, one with the card data as it originally came out, and one with updated rules. (Compare, for example, the printed version and the updated version of the 4th Edition card Zephyr Falcon.)
3. Extract card data from the stored HTML and store it in XML format.

(There is also a script that downloads card images, but this is optional.)

Fetching all these cards can take quite some time. The good news is, that if you just want the XML, you don't have to use Magicripper at all. Instead, you can download it from the MTGXML project. This is just a tarball containing the XML files produced by the latest version of Magicripper.

Among other things, the XML can be used to create a searchable card database, as described in the previous post. I am doing that right now, with the magicquery project. Like the two previously mentioned projects, it's written in Python, and uses Python as its query language. It's more for personal use right now, and very much a work in progress. (Note that making such a database is trickier than it seems, because Magic often introduces new rules and mechanisms (or breaks existing ones), so there are many special cases to deal with.)

Although I still have to add a lot of stuff, magicquery is already capable of doing things that Gatherer can't, like reliably finding cards with hybrid mana.

One thing that no online card database seems to have, is the ability to search for keywords. Like, e.g., find all cards with flying. Sure enough, looking for "flying" in the rules text will find them, but it will also find all cards that grant flying, target fliers, or mention the word "flying" for other reasons. To rule out the false positives, the database would need information about the keywords themselves. In theory, this isn't so difficult; just add attributes to the card itself ("flying", "kicker:R", etc). In practice, nobody has done this, possibly because it requires going over all the cards and painstakingly record the keywords.

So, I decided to start working on this very thing. It's a lot of work, but it's also rewarding to see the queries become more powerful because of it. My goal is to add one set a day, and since I just started, and MTGXML currently has 74 sets, it will take at least two months to add them all.

If you would like to help out, drop me a note. It's not difficult, it just requires editing a text file with data like this:

Battle Rampart | defender >haste
Beastbreaker of Bala Ged | level-up:2G ?trample
Boar Umbra | totem-armor
Bramblesnap | trample
Brimstone Mage | level-up:3R


I haven't made these files available yet (there are only a few of them right now anyway), but eventually I will. Note that all kinds of metadata can be added to the cards, not just Magic-legal keywords. One could add pseudo-keywords like "#firebreathing", or descriptions like "#cantrip", "#lifegain", "#direct-damage", etc. Ravnica block cards could have their guild listed, Scars of Mirrodin cards could list their affiliation as well.

So, anyway, hopefully before the end of the year, magicquery will have keyword data for all major sets.

One more thing: the XML files generated by Magicripper2 and distributed by MTGXML, are of course language-agnostic. You could use any programming language with a decent XML library to process them. So, anybody who is so inclined, could write a searchable card database in their favorite language, or use it to make a more conservative card database with a GUI or something... or use it for something else entirely.

blog comments powered by Disqus