If you're a user of a version control system (VCS) such as Subversion, you may know about delta compression which allows the repository to efficiently store a new version by just storing the deltas compared to the previous version. But if you are storing ODF (e.g. OpenOffice) documents in version control, you may have heard that you're missing out on the benefits of delta compression because ODF documents are compressed. A minor change in a document can cause large-scale changes in the document, so the VCS can't work its delta compression magic.
An ODF file is actually just a zip file of a few files that represent the document, including its main contents in an XML format.
Being a zip file, in theory it should be possible to store its contents with zero compression—a valid possibility for the zip file format. Then you could leave it to the VCS to do its own compression and delta compression as best it can. Each individual version of your document would be larger, but hopefully over the course of many revisions, you would save space in your repository. That's the theory anyway.
But OpenOffice.org doesn't have an option to save with zero compression. You get whatever it gives you. So, I've made a small utility in Python to re-save an ODF file with zero compression. OpenOffice.org can still open the file just fine. You can run this script on the file each time just before you commit your changes to your VCS.
This has had minimal testing, so should be regarded as experimental and treated with care. But it's here if you want it. It saves the original file intact with a ".bak" extension.
Note I have not tested it with encrypted files. Encryption is likely to also cause large changes to a file even for small edits. So encrypted files should probably just be left alone.
Here it is... the script is hosted as a Mercurial repository on bitbucket.org.
Download from bitbucket.org:
Add new comment