WebDoc - A document format for all?
We've had PDF and PostScript for many years, they work okay, but they aren't as accessible as web pages are. What we really need is a format which combines XHTML and images into a single file. I call this proposal WebDoc. It could be a compressed ZIP archive with a .webdoc extension (mime type application/webdoc).Simple to use, click on the file to open it in the web browser which then displays the index.html within. The reason this is better than a PDF or an OpenDocument file is that existing web browsers will be able to display, navigate, bookmark and copy/paste from the WebDoc, no extra PDF Viewer software such as Adobe Reader is required.
WebDoc is a collection of open formats in a ZIP archive, ergo this really opens the doors to accessibility products, such as screen readers or braille displays for the blind. Also automated translations are possible, keeping the flow of the document, and the result as complete as the original; not as nearly as difficult as dealing with PDF files at present! Let's see where we are with this development in a few years time; a vendor might have popularised their own equivalent proposal by then! ;)
Labels: Future
2 Comments:
Have you checked into the ISO Standard for archiving yet? It includes information on why current HTML renderers may be transitory, the difficulties of many package formats, more:
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38920
You may want to check into the actual screenreader story for PDF files too... here's an entrypoint:
http://www.adobe.com/accessibility/
jd/adobe
John, Thanks for the links, I looked at the ISO draft. Couldn't spot an HTML rendering section, but yes I agree that HTML layout is often transitory because it is not fully prescribed. This means a reflow of the document is possible for a small mobile screen. Opportunity to display the document in a different layout is a great asset IMHO. Page size guidance could be included in the WebDoc meta data, so users could view in the original format if their screen was suitable.
Going onto the related topic of PDF:
Reprocessing documents in open text based formats is very easy, which is another reason XHTML is a good option for my WebDoc proposal. PDF import and decomposition into a different format is more difficult in my experience, and even then the results are far from equivalent.
Take for example the Gowers Review of Intellectual Property, it would be great to convert this into a commentable wiki on QuickTopic. However, using the tools available results in a document which has spaces within words and line-wrapped text. Perhaps some of the original mark-up is lost when it is converted to PDF. Unfortunately even Adobe's Online PDF to HTML Converter was also unable to convert the Gowers PDF into HTML. Using Adobe Reader and copying and pasting the contents into OpenOffice loses all the images and font colours.
Perhaps this is just a problem with tools not being able to decompose the PDF document fully? There do not appear to be as many PDF editing tools as there are XHTML.
The other problem I have seen is that the PDF minor version number is incremented almost yearly. When it is changed, viewers like KPDF and Xpdf display corrupted text until they are updated (viz. Power Inquiry Executive Summary). XHTML is a locked standard.
Jon
Post a Comment
<< Home