1. Why don't you use ZCatalog for indexing? AFAIC, It can be used outside Zope.

    http://www.zope.org/Members/kelcmab3/catalog_out_of_zope
      posted by Roberto Lupi at 01:01:23 AM on March 29, 2004  
  2. FWIW, the way Zope deals with large files is to break them into smaller indpendently pickleable pieces. See Zope's lib/python/OFS/Image.py "Pdata" class and its usage within the same module for an idea of how to do this. Storing large documents as single huge pickles is not advisable.
      posted by Chris McDonough at 01:20:58 AM on March 29, 2004  
  3. Re ZCatalog: This has been suggested before, and I will definitely look into this, but for now I'm pretty content with my simplistic indexing system.

    Re smaller independently pickleable pieces: I thought of this too, then I went a step further and decided to use regular files instead. The current version of this program (working title is Xi, by the way) does not store the files anymore, it just uses them, so they're essentially out of the Zope database. This has some drawbacks (see the post), but it has some important benefits too.
      posted by Hans at 09:23:42 AM on March 29, 2004  
  4. You could also store the files inside a zip or tar archive. The indexing information would remain inside ZODB, but the files would be archived. This would reduce disk size and help keep the users from seeing the obvious file locations. Granted they can still open the archive and edit and delete files, but it does hide them a little...
      posted by Bryan Muir at 04:16:40 AM on March 31, 2004  
  5. I have a similar system running and use
    1. a database for metadata
    2. swish-e for indexing
    3. the filesystem for the actual files
    I'm quite happy with that combination.
    If you haven't done it yet, have a look at swish-e (www.swish-e.org).
      posted by Stephan Diehl at 05:12:34 AM on March 31, 2004  
  6. Have you seen DocIndexer? http://www.methods.co.nz/docindexer/
    He's using Lupy (Lucene compatable full text indexer) to index files in a directory.
      posted by Brian Dorsey at 10:04:51 AM on March 31, 2004  
  7. You need something like reiserfs v3 or v4, wrapped so it can used as a zodb store. Reiserfs3 has been used as storagemechanism for some database product so it can be done.
      posted by an anonymous coward at 08:53:54 AM on April 01, 2004