Home > SOA Tips > XML Developer > Working with XML and MS Word
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

XML DEVELOPER

Working with XML and MS Word


Ed Tittel
01.14.2004
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



XML Developer Tip
(Receive this column in your inbox,
click Edit your Profile to subscribe.)

Working with XML and MS Word

In recent tips, I've made the case that MS Office 2003 is worth a second look because of its enhanced and improved support for XML. In a recent article at XML.com, XML guru John Simpson puts some substance behind those contentions and talks about a tool that does a good job of converting Word document files (.doc) and Rich Text Format (.rtf file) versions into pretty reasonable XML form.

Along the way he makes an excellent point: just as the HTML that Word creates when you use the "Web page (.html)" selection in the "Save As…" menu includes what some markup mavens call garbage, so, likewise, does the "XML (.xml)" option. In fact, Simpson calls the resulting output "XML of a spectacularly hideous form" which is all too true, but also stinging and accurate enough to be hilarious as well. He also points to some output in a recent DevX article by A. Russell Jones entitled "Export Customized XML from Microsoft Word with VB.NET" that illustrates why Simpson is sadly correct in his assessment of the XML that MS Word produces.

In the same story, Simpson picks a conversion tool called upCast from a German software company called infinity-loop GmbH to explore other alternatives in moving between Word .doc files and more reasonable forms of XML. As a Java-based program, upCast is inherently multi-platform in nature, and works with various versions of Windows, Unix/Linux, and Macintosh OSes. The real limitation to its capabilities come from its sourcing requirements: .doc files to be converted must have been created on Windows machines (running Windows 95, 98, NT, or 2000) using MS Word 97 or some newer version of the program. Otherwise, .doc files must be saved as .rtf on the source machines before being turned over to upCast for conversion. Also, Mac and Unix/Linux users can only handle .rtf files on their machines, not native .doc files.

All this said, driving the software is remarkably easy. Working with visual menus, users can import (source file) and export (output handling capabilities). upCast also does a good job of converting Word formatting styles into CSS form, which it saves as xml-stylesheet processing instructions (PI). Namespace handling is equally adept, and the conversion tool does a good job of recognizing and formatting hyperlinks and other active content.

infinity-loop also has an XML-to-word conversion tool to complement upCast—naturally, it's named downCast. Be sure to visit the vendor's Web site and check out these interesting tools. Simpson's complete story is also worth a visit as well!


About the Author

Ed Tittel is a VP of Content Development & Delivery at CapStar LLC, an e-learning company based in Princeton, NJ. Ed runs a small team of content developers and project managers in Austin, TX, and writes regularly on XML and related vocabularies and applications. E-mail Ed at etittel@lanw.com.


For More Information:

  • Looking for free research? Browse our comprehensive White Papers section by topic, author or keyword.
  • Are you tired of technospeak? The Web Services Advisor column uses plain talk and avoids the hype.
  • For insightful opinion and commentary from today's industry leaders, read our Guest Commentary columns.
  • Hey Codeheads! Start benefiting from these time-saving XML Developer Tips and .NET Developer Tips.

  • Visit our huge Best Web Links for Web Services collection for the freshest editor-selected resources.
  • Visit Ask the Experts for answers to your Web services, SOAP, WSDL, XML, .NET, Java and EAI questions.
  • Choking on the alphabet soup of industry acronyms? Visit our helpful Glossary for the latest industry lingo.
  • Couldn't attend one of our Webcasts? Don't miss out. Visit our archive to watch at your own convenience.
  • Discuss this article, voice your opinion or talk with your peers in the SearchWebServices Discussion Forums.

Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
XML and XML schema
Making sense of data services mashups
XML turns 10
SOA helps save 100-year-old business
Oracle maps heterogeneous data services strategy for SOA
Handling XML with Ajax
Efficient XML draft published
Intel boosts XML for SOA
Paul Fremantle on coding SOA and data integration
WSDL 2.0, new messaging for Web services
XML data integration for SOA goes open source

XML
National Weather Service policy supports XML
XML and democracy at work: The Election Markup Language (EML)
For interesting interface access, check out Xamlon
Royalty-free, revolutionary UBL
Altova strikes again with MapForce 2005
Beating the RSS crunch with aggregation/bloglines
Voice, speech, SIP, and XML: ECMA-269
Microsoft Baseline Security Analyzer and XML
An open source, native XML database: dbXML 2.0
Second-generation XML security preview: SAML

XML Developer
WSDL 2.0, new messaging for Web services
Using RELAX NG For data integration
Efficient XML Interchange tackles data verbosity
XML to DDL imports, synchronizes database schemata
The basics of MathML 3.0
Migrating to XSLT 2.0
What's up with XML 2.0?
Say hello to XPath 2.0
Podcasting software covers many bases
The XML behind podcasting

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
class diagram  (SearchSOA.com)
Fast Infoset (FI)  (SearchSOA.com)
GeoRSS  (SearchSOA.com)
Keyhole Markup Language  (SearchSOA.com)
RELAX NG  (SearchSOA.com)
state diagram  (SearchSOA.com)
Universal Business Language  (SearchSOA.com)
Vector Markup Language  (SearchSOA.com)
XML infoset  (SearchSOA.com)
XML pipeline  (SearchSOA.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2001 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts