Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Conversions / February 2005

Tip: Looking for answers? Try searching our database.

Parsing and/or Displaying Simple Word Files from VB 6

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Nate Roberts - 09 Feb 2005 16:29 GMT
Hello,

I need to be able to pull in some relatively simple Word files
(preferably of any vintage, but I'd probably settle for more recent
versions) for display in my Visual Basic 6 application.  Ideally, I'd
be able to display them more or less as they were displayed in Word --
with tab stops in the same places, basic text markup (font face and
size, bold, italics) preserved -- within my application.  Internally, I
need to parse a plaintext version of the file.  Ideally, I'd also be
able to determine the page # that a particular bit of text was on, in
the original word document.  Oh, and I *cannot* guarantee that the
client on which this program will be run will have a copy of Word.

So what I'm thinking I need:
1. A way to convert from Word to RTF.
2. A way to display RTF.
3. A way to convert from Word (or RTF) to plaintext.
4. A way to determine in an RTF file or Word file where the page breaks
lie (this could be a feature of the magic converter of #1 or #3 -- I
could just do the conversion one page at a time, if the converter knew
about pagination).  By page breaks, I don't mean only the forced ones
-- I want to be able to grab a particular page as Word would print it.

... all from within VB6 without Microsoft Office present.

Am I dreaming to be able to do all that without scads of effort?  I'd
appreciate any pointers to sample code, open source libraries,
commercially licensable libraries.

Thanks!
Nate Roberts
Howard Kaikow - 10 Feb 2005 12:31 GMT
In order to do the deed, you would have to find out from MSFT the EXACT
details of Word's internal format. I would not count on MSFT supplying such
an EXACT description unless under license.

If you are running a VB 6 program, then I expect your target is a Windows
platform. As a practical matter, I would expect that most folkes having
Windoze also have Office or Word. So the best bet is to do your deed via
creating a Word object.

Signature

http://www.standards.com/; See Howard Kaikow's web site.

> Hello,
>
[quoted text clipped - 27 lines]
> Thanks!
> Nate Roberts
Nate Roberts - 10 Feb 2005 22:23 GMT
Hi Howard,

Thanks for your reply.  You're probably right that most folks will have
it, but I do want to avoid depending on that.  But maybe I can provide
a solution that detects Word's presence/absence on the system, and if
it is present, use Word to do the conversion (otherwise requiring RTF).

Thanks,
Nate
Bob   Buckland ?:-\) - 10 Feb 2005 14:27 GMT
Hi Nate,

Part of what you're asking depends on the need
for detail of content.

You can use MS Windows Wordpad to open a Word .doc
file and save it as RTF, but Wordpad does not support
all of Word's features.  You may want to look at
3rd party converters http://acii.com that do this
type of conversion and some can be manipulated via
code, for example, but the page information isn't
fully stored in Word.  Word reads the printer driver
in effect when the file is opened and paginates
the document for that driver when it's opened.

Saving to plain text in any form will likely cost
a number of Word features including table layouts,
graphics, non 'text' fonts, styles, etc.

There may also be information on http://msdn.microsoft.com

==========
Hello,

I need to be able to pull in some relatively simple Word files
(preferably of any vintage, but I'd probably settle for more recent
versions) for display in my Visual Basic 6 application.  Ideally, I'd
be able to display them more or less as they were displayed in Word --
with tab stops in the same places, basic text markup (font face and
size, bold, italics) preserved -- within my application.  Internally, I
need to parse a plaintext version of the file.  Ideally, I'd also be
able to determine the page # that a particular bit of text was on, in
the original word document.  Oh, and I *cannot* guarantee that the
client on which this program will be run will have a copy of Word.

So what I'm thinking I need:
1. A way to convert from Word to RTF.
2. A way to display RTF.
3. A way to convert from Word (or RTF) to plaintext.
4. A way to determine in an RTF file or Word file where the page breaks
lie (this could be a feature of the magic converter of #1 or #3 -- I
could just do the conversion one page at a time, if the converter knew
about pagination).  By page breaks, I don't mean only the forced ones
-- I want to be able to grab a particular page as Word would print it.

... all from within VB6 without Microsoft Office present.

Am I dreaming to be able to do all that without scads of effort?  I'd
appreciate any pointers to sample code, open source libraries,
commercially licensable libraries.

Thanks!
Nate Roberts>>
Signature

Let us know if this helped you,

Bob  Buckland  ?:-)
MS Office System Products MVP

 *Courtesy is not expensive and can pay big dividends*

Office 2003 Editions explained
http://www.microsoft.com/uk/office/editions.mspx

Nate Roberts - 10 Feb 2005 17:27 GMT
Hi Bob,

I'm not too worried about losing tables, etc. -- I expect the opened
files to be fairly simple in their formatting; no tables, no embedded
graphics.  It sounds like my desire for consistent pagination across
machines may be denied.  In general, assuming that WordPad does support
all the features of an opened document, will its pagination of that
document be identical to that of Word (assuming identical printer
configuration, of course)?  And if the user uses WordPad to save the
document as an RTF, will its pagination be consistent with the
WordPad's pagination of the .doc file?  I don't imagine WordPad can be
automated to do the conversion, right?  It may not be that big a deal
to ask the user to do such a conversion, but I'd like to avoid it if I
can.

acii.com's stuff looks nice, but the licensing costs are well above my
price range.

I've also found a couple other companies offering solutions that may
help me:
- convertspot.com sells a COM component to convert Word to plain text
($260 for a royalty-free license).  Not ideal, but I could encourage
users to save to RTF, while still giving them a possibly "good-enough"
.doc reading facility.

- http://www.subsystems.com/tewf.htm sells a complete text-editing
control that you can embed in your application.  It does read, write,
and display RTF, but has no support for .doc.  They say that Microsoft
will not license them to do anything with .doc.  The Win32 version is
$539 for a DLL and fairly complete API, $819 for a version that
includes the DLL source.  These are also royalty-free for distribution,
provided your application is substantially more than a word processor.
I'm thinking that I will probably use this one at minimum, and perhaps
supplement with some sort of Word conversion utility.  Perhaps I could
take Howard's suggestion for users who *do* have Word installed, and
script Word to do that.

Thanks for your help!

Nate Roberts
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.