OpenOffice, OpenDocument and Metadata
Bruce D'Arcus
Openness as Democratic Imperative
- Massachusetts announced that by 2007 all public documents will be in open formats
- codified a larger set of pressures towards open document formats
- institutions and individuals also have great need for much better metadata integration
- at the OpenOffice Bibliographic Project we are at the frontiers of this metadata revolution
- origins in StarOffice; acquired by Sun in August, 1999
- open sourced in July, 2000
- intended to break Microsoft monopoly by exploiting open source, and XML
Background on OOoBib
- started by David Wilson late-2002/early-2003 as an “incubator project”
- attracted interest from similarly frustrated users
- ... and some library IT people
- now is an official project
Background on Me
- I’m a scholar working at the border of the social sciences and humanities
- cultural and political theory
- often historical archival work
- demanding metadata needs
- am a user first, developer second
- frustrations with the tools that manage my most essential data
- author of darcusblog
Bibliographic Software Today
- poor data models
- buggy software
- little interoperability
What We Want
- applications that are feature rich
- internet-enabled
- rich and flexible data models
- seamless interoperability
- choice (open solutions)
Getting There From Here
- Reuse where possible, invent where necessary
- Technology
OpenDocument and Citations/Bibliographies Now
- citation coding is very limited
- bibliographic metadata is embedded inline in the citations in the main content file
- provides room for formatting inconsistency, redundancy, and very little flexibility
Fixing the Problem
- upgrade citation coding support so that citations are pointers to external metadata (done)
- move bibliographic metadata into a dedicated file in the wrapper, and enhance the metadata model (to do)
- enhance styling support, preferably in the file format (to do)
OpenOffice Limitations
- bibliographic module is not well-modularized
- OOo provides no easy way to store custom data
- difficult to innovate
Proposed Solution: CiteProc, CSL, and OpenDocument
Metadata Picture
- it is impossible to achieve our goals without getting the low-level metadata right
- new possibilities with new standards (XML, RDF, etc.)
OpenDocument Metadata
- need for
- standardized and integrated, but flexible metadata
Background on OpenDocument
- Openness as Path to Excellence; OpenDocument Charter
- must retain high-level information suitable for editing the document
- must be friendly to transformations using XSLT or similar XML-based languages or tools
- should “borrow” from similar, existing standards wherever possible and permitted
- Technical outcomes
- files are zipped archives
- FO for styling
- SVG for vector graphics
Adobe and XMP
- a solution to the problem of integrated, but flexible, metadata
- Adobe now on the OpenDocument TC; has been promoting XMP as a metadata solution
Same framework used to represent metadata ...
- across
- applications (InDesign, Illustrator, Photoshop)
- resources (images, texts, etc.)
- formats (PDF, PNG, JPEG, DNG, HTML, SVG)
- for documents as a whole, and embedded parts
XMP and RDF
- XMP is an RDF/XML subset
- rather than reinventing the wheel, Adobe adopted an existing metadata standard
- RDF is a better solution than XML Schema solutions because (in their words) it is “scalable across domains”:
- Mixed vocabularies
- No order constraints
- Less “Brittle”
Possiblities: Metadata In OpenDocument and OpenOffice
- Enhance XMP to fit with current RDF and XML best practices, and to be more open
- Adopt XMP/RDF across the file format
- Integrate RDF storage and GUI support in OpenOffice (perhaps via XForms?)
- fit citation metadata into a larger effort; citations become just one kind of content-metadata link
What Vocabulary?
- Design Priorities
- must have more sound model than RIS/Endnote/BibTeX
- must be better XML than MODS
- more tightly controlled
- easier to work with for developers
One Possible Example
- designed with RELAX NG, the better schema language
- unordered content models
- richer validation capabilities
- consistent and elegant design (easy to write, maintain, and read)
- add
- top-level typing of resources
- pervasive RDF linking support
- finer-grained relatations than MODS or DC
- lessons from FRBR (distinction between manifestation and item)
- RDF ontology layer can provide bridges to FRBR, DC, etc.
Modeling Academic Metadata
A Complete Metadata Cycle
Possible Worlds
- this is a realistic possibility
- ... but will take cooperation between communities, agreement on standards, smart investment of resources
- potential problems