"Old News" may be newspapers' best reason to adopt XML:

Reprinted from Newspapers and Technology Magazine

by Barry Schaeffer

With the recent and dramatic growth of the remote internet protocols and wireless devices, we are, for better or worse, well on our way toward a world of "instant awareness." While today's wireless remote devices lack something in display and capacity, technology is soon likely to close that hole, making it possible for the "man on the street" to surf with nearly the same elegance as the office dweller. What, in such a world, is the place for a medium capable at best of two or three issues per day? Some would say not much. While that judgment is probably too pessimistic, it is becoming obvious that the newspapers' role as a first source of information is continuing to shrink. Moreover, as e-commerce becomes wireless, shoppers will browse the sales while they are on their shopping trip, no longer needing to grab the daily paper to check out the sales and chart their course before they leave the house. Even the venerable discount coupons now printed in the newspaper will likely become electronic as well.

There are a number of short-term tactical issues at stake in these developments, but in this column, I'd like to focus on a more strategic consideration; what to do with "old news." By that I mean that newspapers have, since their early growth, viewed themselves at the primary contemporaneous source of information about current events. A poorly understood but increasingly important artifact of that mentality is the way newspapers treat yesterday's news. Essentially, this "old news" has been thought of as expendable, having little additional value to anyone and virtually none to the newspaper that produced it. While there might have been a time when this view could be justified, today's world makes it a serious liability. In our electronic world, breaking news as newspapers have always delivered it has suffered a reduction in value and drawing power when compared to its more nimble rivals. It has not, however, cost any less to research and create, making the economic model on which newspapers have traditionally viewed content a tenuous one.

News is Information with Value over Time:

News, or information as it might be more accurately called, has a dimension beyond the way newspapers have viewed it, and that significant holds significant though untapped value and revenue potential for its creators.

The news written about any important event or trend, when viewed over time, often constitutes the best and most complete story available anywhere in society. When historians look at past events, even those in the recent past, they go directly to the newspaper archives as one if not the main source of content. As the newspaper industry embraced electronics in their production processes, the raw material for that story has even become available in machine-readable form. But newspapers have always viewed the telling of that story as the province of someone else; libraries, universities or clipping services. Often, newspapers have allowed these secondary sources access to their raw content at virtually no cost, gaining little or nothing from any secondary use. Moreover, the process of stitching together the various components of information about an important event into a coherent story was left to the secondary sources as was the revenue associated with the added value. Anyone who has ever looked at their monthly bill for the NEXUS service knows just how much added revenue we are talking about.

Finally, even during the editorial process, a newspaper pays to place experienced reporters at the center of critical events then throws away any part of their gained knowledge beyond what will fit the available news hole for the next edition or two. Studies have suggested that reporters often dig up all manner of important if ancillary information that never gets into print and is lost as the story fades. This ancillary material often has a long-term value equal to that of the content that makes the morning edition; value that is paid for but discarded by the very people who should profit from it.

The Internet has made "recent history" potentially profitable:

Beginning as far back as the early 1980's on-line access to information, including newspaper content, has been a growing business. With the growth of the Internet, this type of access has become available to anyone with a PC and a modem. Even a cursory look at the Web these days suggests that there are lots of organizations finding fertile ground keeping screens full of everything from political science to natural science to gossip. Newspapers are perfectly positioned to provide content for large parts of the public's growing interest in information with more depth than either broadcast or print journalism. People who can get the latest news on almost anything are beginning to want the entire story from start to finish even if they have to wait a short while to get. Newspapers are perfectly positioned to provide it and to profit from doing so. In the tighter economic, times here and coming, newspapers cannot pay to create valuable content and then allow others to profit from it. Just as entertainment media create products and then extract residual income from them as they are used over time and in different markets, newspapers must squeeze every possible bit of income from the value they create in the news gathering process. In this equation, it probably doesn't matter who creates the value-added products, only that the newspapers control the process and the resulting revenue streams.

Enter XML.

Newspapers have by now figured out that using XML doesn't do much for them in getting tomorrow's edition out. With the HTML tools available and their layout vendors grudgingly providing some back-end conversion to XML, many papers are wondering if all the ballyhoo about XML really applies to them. It does, and creating information with real depth is a large part of why.

The kind of value-added content for which the market is growing requires content that can be processed logically, identifying and organizing events, people, time, opinions and a host of other characteristics into a coherent presentation that allows the reader to explore the subject from a variety of perspectives. Only content richly marked up to identify and describe these components can be easily and economically used in the creation of products that become the definitive record of events and trends. This type of content must be created by the author who has researched the topic, talked to the participants and unearthed the hidden truths that will become the keys to understanding of what went on and why. Moreover, reporters who have spent weeks or months digging out the facts must be encouraged to record and link everything they know whether it will publish tomorrow or not. With structured content of the type possible in XML, all of this can happen as a normal part of the news gathering process. If it does, a newspaper will be left with a complete, flexible and very valuable information resource. This takes us back to an assertion made in this space a couple of years ago; reporters must begin capturing what they know in structured information, XML preferably. Only by recording their intellectual product in a manner that lends itself directly to both immediate publication and participation in a more searching analysis of important events can the full value of their work be harvested. The economics of news and information no longer allow the inefficiency of creating single purpose editorial content, throwing out whatever doesn't fit the classical model of newspapers' role as informer. The mentality behind the current process is like basing the entire newspaper's world on pasting up the next edition, then putting the flats and galleys in the trash for someone to retrieve and sell.

It's interesting that special-purpose journalism has already understood this truth and has moved, some several years ago, to move rich content tagging into the reporting function. Those who have persevered now own an information source on key events and movements in society that resembles a holographic image; you can see it differently from any angle you wish just by taking a different perspective. Along with that depth has come a revenue stream that behaves in almost the reverse fashion from news; the older the content becomes the more valuable it is.

XML on the back-end is a dead-end approach:

Today, the layout industry touts the concept of inferring XML from the back-end of the layout process. Whatever the justifications for this approach, it is a dead-end road. If the reporter and editor do not capture their thoughts and findings in a manner that can support full logical processing, no back-end process will be able to adequately infer the needed tagging. Other industries have dealt with this challenge for nearly 20 years and the answer always ends up the same. What you don't capture on the front end, you cannot infer on the back. Newspapers must, sooner or later, come to this same conclusion. The only question are how much time, revenue and franchise will newspapers lose before they do.