So many XML posts, so little time
Over the past few days there have been so many posts regarding Microsoft's announcement of XML formatted files for Office 12. One of the key reasons that I am working so aggressively towards Eclipse immersion (although not necessarily my next big chunk of work - priorities ...) is the view that XML will continue to become a pervasive standard for complex, "unstructured" data. Many thoughts ...
- All of this "noise" had me dredge up a few old bookmarks I'd been meaning to act upon, concerning "writing with XML" - that is, composing all of your content directly in XML. Martin Fowler has what I think is the best overall treatment, a few years ahead of his time, it seems. Jon Udell has done some interesting articles over the years, but how about this one, where he gets you thinking about SQL/XML queries against libraries of documents (interesting applications to follow ...).
- Note that Office has been able to save as XML for a while; they're making it the default format. Also, note these new formats are not exactly the same - they have been improved for security, incorporate ZIP to keep file sizes down.
- I love hacking at Office, especially in Excel - my source code page has a smidgen of the code I've written for that family of products. I welcome the day when I'll be able to write text-processors that make fast, command-line batch file driven work of my documents - things like standard page formatting, watermarking and embedded information, cleaning out edits, etc.
- Evan Erwin has a nice long post full of ideas, including the concept of bulk conversion and cleanup utilities ... yes, we think alike ... I want start hacking away ... You know, the basic command-line filter type of programming has always been my favorite; purist programming of a sort. My first big idea that I never wrote was a universal translator, a precursor to the big ETL tools of today.
- Ah, but don't stop at the command line - extend the metaphor to web services, and let your imagination wander ...
- Evan also noticed that the announcement did not seem to include Outlook (or Access, or OneNote, or Project, or Visio ...). Pity, there would definitely be some interesting stuff in those files.
- Note that Microsoft has released the XML file formats, so all of us hackers can get an early start. Here's a new blog that will focus on them - also, links to white papers from MS for more information (thanks to Brian Jones, PM in the Win Office team!)
- Note this isn't cross-application portability - this is Microsoft's XML for Office documents, but there are two other important XML specifications, from OpenOffice.org and OASIS, for office (lower case) documents. Well, at least it will be easier to adapt all of my nifty utilities for multiple document specs.
- Why wait for Office 12? You can start learning and understanding XML right now - there are already so many applications for the general technology - even apart from SOA.
- RSS feeds, the life blood of the blogging / aggregators crowd, scream for automation, for searching, manipulating content, etc.
- XBRL hit my radar with a couple of articles; compliance for financial reporting is high on many priority lists, but some of the preferred solutions can get quite expensive. Driving to a common format for releasing financial may make bring a lot of new, flexible, and cheaper tools to market.
- Is this finally a compelling reason for upgrading your Office version? So often in the past, the various upgrades have seemed to add little meaningful value and lots of extra hassle. If all of my groups documents can be accessible via XML, would it be enough to convince me to take on the hassle of upgrading?
- What about backward compatibility? Per Joe Wilcox at Jupiter, MS will offer compatibility patches back to Office 2000.
- Hey, this hits the Mac version of Office as well. There is a great post by Rick Schaut that talks about the software engineering process - how the Mac and Win Office teams are sharing specs and even code across the platforms.