Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / October 2007

Tip: Looking for answers? Try searching our database.

Splitting doc by header style

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
werD - 09 Oct 2007 22:43 GMT
Hello,

Ive got a word doc that is basically in this format with no page breaks

Page Title(Heading 2)
Page Data(Table)
Page Title(Heading 2)
Page Data(Table)
Page Title(Heading 2)
Page Data(Table)
etc...

Im using .net to loop through the doc by Paragraph and Table and Im able to
get to the plain text of the objects via a paragraph loop but i cant seem to
narrow down the data formatted as Heading 2 and just get the next table as
html or similar so that I can create some pages based on them.

Id like to do something similar to this although it's obviously not right (
i have 40 items marked with heading 2 but the headers loop only runs 3 times.)

For Each h As Word.HeaderFooter In s.Headers
               '?PageTitle=h.range.Text
               '?BodyText = '?Table and Text
           Next

Ive seen this type of functionalty in tools like robohelp(split document
into web pages based on defined formatting) before but im not sure what the
proper logic would be do get to these pieces of data

I appreciate any insight you have and would be glad to clarify further.

Thanks in Advance
DrewG
Shauna Kelly - 10 Oct 2007 04:36 GMT
Hi

I think you'll need to clarify what you're doing and what you're aiming for
before we can help you.

Headings are the short paragraphs that introduce a new part of content and
are generally styled with Heading 1, Heading 2, ... Heading 9.

Headers are the text at the top (well, generally the top) of a page that is
the same on each page and might include a field to generate a page number.
They are closely related to footers, which provide the same text at the
bottom of each page. Headers and footers are properties of a Section, of
which your document as at least one. And each Section has exactly 3 headers
and 3 footers (first page, odd and even) whether or not you've chosen to
display all 3.

So, are we talking about headings or headers?

And, what version of Word are you using?

Hope this helps.

Shauna Kelly.  Microsoft MVP.
http://www.shaunakelly.com/word

> Hello,
>
[quoted text clipped - 34 lines]
> Thanks in Advance
> DrewG
werD - 10 Oct 2007 05:40 GMT
Ah, I see. Thank you for clarifying the headers distinction. I am indeed
trying to split by headings and not headers. so I take it I should be looking
for a paragraph styled with the appropriate heading style(imc Heading 2). How
would I then grab the following table for an html conversion?  My goal is to
split this document up for viewing on different web pages but Im trying to
get just the basic table and text formatting so i can load up an xml document
with it for storage. The app will be running on a machine with Office
2007(Office 12 libraries)  but the document will be in Office 2003 format.

> Hi
>
[quoted text clipped - 59 lines]
> > Thanks in Advance
> > DrewG
werD - 12 Oct 2007 00:01 GMT
If need be I can use office 2003 or xp as well

> Hi
>
[quoted text clipped - 59 lines]
> > Thanks in Advance
> > DrewG
werD - 12 Oct 2007 20:16 GMT
So i've written this loop to go through and find all paragraphs that are
Heading 2 and then pulls out all text form additonal paragraphs until a new
Heading 2 is found.  I cant seem to figure out how to pull out the following
text as a table instead of just paragraph text.

So.. The Outline of a page area looks similar to this

Heading 2 Text
________________
|text1|          |text4|
|text2| TEXT |text5|
|text3|          |text6|
|____|_____ |____|

But My Output is this

Heading 2 Text
text1 text2 text3 TEXT text4 text5 text6

Here's the loop Ive written to get this far

       Dim H2ParaFound As Boolean = False
       Dim ExtraTitleChunks As Integer = 0
       Dim NrmlParaChunks As Integer = 0
       Dim tempPgHdrText As String = String.Empty
       Dim tempParaText As String = String.Empty
       Dim txtfound As Boolean = False

       For Each p As Word.Paragraph In doc.Paragraphs
           Dim stype As String = CType(p.Style, Word.Style).NameLocal
           If stype = "Heading 2" Then
'if this is the second part of a title
               If H2ParaFound = True And ExtraTitleChunks > 0 Then
                   tempPgHdrText &= "," & p.Range.Text
                   H2ParaFound = True
                   ExtraTitleChunks += 1
                   NrmlParaChunks = 0
               Else
'First part of a title found
                   If txtfound = True Then
                       Me.txtResults.Text &= tempParaText & vbCrLf
                       tempParaText = String.Empty
                       txtfound = False
                   End If
                   tempPgHdrText = "Page Title: " & p.Range.Text
                   H2ParaFound = True
                   ExtraTitleChunks += 1
                   NrmlParaChunks = 0
               End If
           Else
'if this is not Heading 2
               If H2ParaFound = True Then
                   Me.txtResults.Text &= tempPgHdrText & vbCrLf
                   tempPgHdrText = String.Empty
                   tempParaText = p.Range.Text
                   txtfound = True
                   H2ParaFound = False
                   ExtraTitleChunks = 0
                   NrmlParaChunks += 1
               Else
                   tempParaText &= p.Range.Text
                   txtfound = True
                   tempPgHdrText = String.Empty
                   H2ParaFound = False
                   ExtraTitleChunks = 0
                   NrmlParaChunks += 1
               End If
           End If
       Next
       If tempParaText.Length > 0 Then
           Me.txtResults.Text &= tempParaText & vbCrLf
           tempParaText = String.Empty
       End If

Any thoughts or insight?

DrewG

I'm starting to see why pople complain about the word Doc Obj Model

> Hello,
>
[quoted text clipped - 29 lines]
> Thanks in Advance
> DrewG
Shauna Kelly - 15 Oct 2007 14:22 GMT
Hi

Let's go back to the beginning here.

You want to chop a document up into many smaller documents. If you were
doing this manually, how would you do it? Bear in mind you can't select some
text and tell Word to save the selection as a separate document (and if you
think you remember ever doing that, it was with maybe WordPerfect in the
mid- to late-1980s).

You can't count on finding a range of interest and copying it into a new
document, unless (a) you base the new document on the same template as the
main document, (b) the styles in the main document haven't changed since it
was born and (c) you manually fix any section break settings (eg margins).

So, the only real way to achieve what you want is to find a bit of text you
want, delete everything above it, delete everything below it, and do Save >
As to save as the new document.

You'll have to do the same in code.

A Range has a .Start and .End that are just Longs. And, you can create a
Range and explicitly set its .Start and .End.

So as a shell:

Dim doc as Word.Document

Dim rngDeleteAbove as Word.Range
Dim rngDeleteBelow as Word.range

Dim rngStartOfFirstHeadingas Word.range
Dim rngStartOfNextHeading as Word.Range

   'set doc  to be the Document of interest

   'Use .Find to set the rngStartOfFirstHeading

   'Use .Find to set the rngStartOfNextHeading

   set rngDeleteAbove  = doc.Range
   rngDeleteAbove.End = rngStartOfFirstHeading.Start -1

   set rngDeleteBelow = doc.Range
   rngDeleteBelow.Start = rngStartOfNextHeading.Start - 1

   rngDeleteAbove.Delete
   rngDeleteBelow.Delete

   ActiveDocument.SaveAs "C:\MyPath\MyFileName.doc"

Hope this helps.

Shauna Kelly.  Microsoft MVP.
http://www.shaunakelly.com/word

> So i've written this loop to go through and find all paragraphs that are
> Heading 2 and then pulls out all text form additonal paragraphs until a
[quoted text clipped - 117 lines]
>> Thanks in Advance
>> DrewG
werD - 15 Oct 2007 18:14 GMT
Thanks. I figured out a good looping logic. Im looking for a lightweight all
in one solution  to get this converted though.  I will be saving the data to
an xml as it's loaded by the user not to individual word docs. With .net i
have no issues with this, I can store the range as xml, word doc, hashtables
etc.. The real issue that im having is getting the html equivalent of the
tables within the range. I would just do a "for each row" but in most cases
the middle column is one large merged cell so that won't work.

Is there any way beside looping through each column/cell that i can get the
entire table as an html chunk or a quickly convertable equivalent of that?

Thanks for your posts,
DrewG

> Hi
>
[quoted text clipped - 173 lines]
> >> Thanks in Advance
> >> DrewG
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.