Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / November 2007

Tip: Looking for answers? Try searching our database.

NEWBIE:splitting multi-page word doc into single word doc - thank

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
patti - 02 Sep 2007 03:06 GMT
Hi,

Environment: Windows XP/ home edition; sp2
Office 2000 [no help installed and no cd to install help :-( ]
experience: newbie to vba, c coding experience

This request is free of charge for a friend who would do anything for
anyone.  His hard drive of clients was hosed and is able to retrive it from
another source.  Unfortunately,  the files retrieved are not named properly
and incompatible with his business operations.

Problem description/vba coding segment requested:

Over 60 documents named xxxxxnnnn.doc that contain client info, delimited
with over 65 page breaks.  Within each page break contains the client info,
which needs to be extracted into their own word document, and take on the
name: client_name_date_of_service.doc to easily distinguish it.

Relevant client information (including ultimate file name) is contained
within each page break in the larger document.  Last record in original file
may not contain the page break, but I'd still like to be able to capture this
one as well.

Filename containing relevant client information should be of the form
client_name_date_of_service.doc (this information is contained within each
page break)

Objective:
 - Cycle through all the word documents (approx 60 files) in a given folder
    -for each large document file (over 60 files) -- start the splitting
process
      - open each file
        - for each page break found (between 65 - 80 page breaks)
           - for each paragraph in each page break
               - capture the first line of each page [this is the client
name]
               - for each paragraph [search for the string "date"]
                - generate  the client name (eg.joe_smith_1_24_07.doc) and
                  save in a string variable
                 - capture the entire  page
                  ---> including final page break for each page
                  ---- select the contents of this page break and copy the
entire page,
                       including trailing page break [if absolutely
necessary,]
                      into client_name_date_of_service.doc.
                     (eg.   joe_smith_1_24_07.doc)
          - next [for each page break until no more page breaks in this file
            -- note: the final client info. may not contain the trailing
page break,
                        but I'd still like to be able to capture it and
store it in its proper
                        clientname_date_of_appt.doc
      - next [for each file containing all the client data within the page
breaks]
     - close/properly dispose of any allocated resources
     - error handler to close/dispose to determine the cause of the failure
and properly shutdown the application.

?? any additional steps that I've neglected to mention.

I enjoy helping people and learning new things.    Many thanks to all who
take the time to share their time and talents by responding witih the code
capable of accomplishing this task.

With much gratitude and appreciation,
Patti
Doug Robbins - Word MVP - 02 Sep 2007 05:29 GMT
Sub splitter()

'

' splitter Macro

' Macro created 16-08-98 by Doug Robbins to save each page of a document

' as a separate file with the name Page#.DOC

'

Dim Counter As Long, Source As Document, Target As Document

Set Source = ActiveDocument

Selection.HomeKey Unit:=wdStory

Pages = Source.BuiltInDocumentProperties(wdPropertyPages)

Counter = 0

While Counter < Pages

   Counter = Counter + 1

   DocName = "Page" & Format(Counter)

   Source.Bookmarks("\Page").Range.Cut

   Set Target = Documents.Add

   Target.Range.Paste

   Target.SaveAs FileName:=DocName

   Target.Close

Wend

End Sub

Signature

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

> Hi,
>
[quoted text clipped - 71 lines]
> With much gratitude and appreciation,
> Patti
patti - 02 Sep 2007 15:52 GMT
Hi Doug,

Many thanks for your generous offer of code.  This does indeed parse the
larger file, breaking it down and writing it to a file formatted as
page[n].doc.

The only downside, is that for every iteration through the collection of
larger files, it overwrites the contents of the previous page[n].doc.

I've still quite a bit of work to do with this one though.  

The bigger piece for me would be to locate two important pieces of
information, namely:
-  the first paragraph or sentence as this contains the client name.  
- Then establish a search throughout the page for a  paragraph/sentence  
starting with the string "date"  mentioned in my original post.

These two critical pieces of information form the
client_name_date_of_service.doc
which is the business model with which this person employs.  Once these
pieces are located, I can copy the contents of the client transaction (the
code you posted), and then perform a 'file save as:
client_name_date_of_service.doc'.  As an example: joe_smith_1_24_07.doc [this
is the first client located in the larger file]
             frank_hood_1_31_06.doc [this is the second client located in
the larger file], etc.
         
If you, or anyone else, has any ideas on how to gather these two pieces of
information, located between each page break, I'd really appreciate it.  This
way, the files will be named properly and in keeping with his requirements.

I so appreciate you sharing this code segment.  If you have any additional
suggestions as to how to extract these two important pieces of info, while in
the parsing of each page break, I'd really appreciate it.

Thanks ever so much for your help.  With much gratitude and appreciation,
Patti

> Sub splitter()
>
[quoted text clipped - 113 lines]
> > With much gratitude and appreciation,
> > Patti
Russ - 04 Sep 2007 10:55 GMT
Patti,
What we need to search for are consistent patterns that you say are on each
page. Can you figure out what the patterns are? If not, then show us a few
pages of data so that we can see how it is laid out. You can, of course,
disguise the names, etc., but we need to know where the names and date
formats are in relationship to paragraph marks or other consistent text,
font, color, heading styles, etc.

> Hi Doug,
>
[quoted text clipped - 151 lines]
>>> With much gratitude and appreciation,
>>> Patti

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

patti - 05 Sep 2007 16:06 GMT
Hi Russ,

Thanks so much in your interest.  I really appreciate it.  

Description: Split large files into separate files
- open each large word.doc file containing client info.
capture two fields:
Client[n]_name (eg. Patty Smith)
Date:     (eg. Date:    9/4/07)
Filename generated: Patty_Smith_9_4_07
Client information/requirements are captured in various paragraphs which may
extend beyond one page.
Cut speicific client information (pages [1-n]) and save as
client_name_date_as_recorded_in_page1
(eg. Patty_Foober1_7_18_07)

Sample included below for reference:

Sample:
Patty Foober1
Address
Telephone Number
Date:    7/18/07 [ this may or may not be located in this area of the file]

Client information/requirement captured here and may extend into multiple
pages.  

PAGE 2
Patty Foobar1
Additional requirements may be captured here

---------------------------------page break
-----------------------------------

Rob Foobar
Date:    9/4/07 [ the date field appears somewhere in the client information
header, but the person who input the data was not consistent in their entry
methods, which means it needs to be searched and retrieved]

Client information/requirement captured here and may extend into multiple
pages.  
------------------------------- page break
-----------------------------------
Kanga Roo
Date: 9/1/07

Client information/requirement captured here
---------------------------------page break
-----------------------------------

--- Thanks again for any recommendations, and for sharing your time and
talents with me.

With much gratitude,
Patti

====================================================

> Patti,
> What we need to search for are consistent patterns that you say are on each
[quoted text clipped - 159 lines]
> >>> With much gratitude and appreciation,
> >>> Patti
Russ - 05 Sep 2007 19:19 GMT
Hi Patti,
More info please.
So a clients name appears *by itself* (no label) in the first paragraph of
each page (and may repeat on *consecutive pages* if more information is
available for that particular client)? And is consistent in that respect
from page 1 to end of document?
The date you want is always the first date found on the first page of each
client and always formatted month/day/year(two digit year)?
The name and date are always in the main text area and not in header or
footer of page?
The name and date are not formatted differently than the rest of the text?
The date is always preceded by the label Date:?

> Hi Russ,
>
[quoted text clipped - 220 lines]
>>>>> With much gratitude and appreciation,
>>>>> Patti

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

patti - 05 Sep 2007 21:48 GMT
Hi Russ,

Thanks so much for your response and inquiries.  I'll do my best to
address them.  

First page :
-Client Name is in the very first paragraph, followed by paragraph symbol
For example: Patty Foobar1, paragraph symbol
- date string is contained somewhere prior to the page break
- date string format:
Date: -> mm/dd/yy (where -> is some Microsoft inserted symbol), followed by
paragraph symbol.  For example: Date: -> 6/26/07

Client data may, or may not span multiple pages.  
If there are multiple pages:
Client Name is in the first paragraph, but may be underlined, contain
extraneous information (eg. Patty Foobar1-6/26/07, followed by paragraph
symbol

New Filename (Patty_Foobar1_6_26_07.doc) [named based upon first page]
- should contain everything in page 1 and subsequent pages, where applicable.

Header/Footer questions:
The pieces of information that will end up in the new client_name_date.doc
are
not located in the header or footer sections.

Many thanks once again for your help and interest.

Gratefully,
Patti
=====================================================

> Hi Patti,
> More info please.
[quoted text clipped - 233 lines]
> >>>>> With much gratitude and appreciation,
> >>>>> Patti
Doug Robbins - Word MVP - 08 Sep 2007 10:29 GMT
This is untested, but I think it will do what you want:

Dim Counter As Long
Dim Source As Document, Target As Document
Dim ClientName As Range
Dim FileDate As Range
Dim DocName As String
Set Source = ActiveDocument
Selection.HomeKey Unit:=wdStory
Pages = Source.BuiltInDocumentProperties(wdPropertyPages)
Counter = 0
While Counter < Pages
   Counter = Counter + 1
   Source.Bookmarks("\Page").Range.Cut
   Set Target = Documents.Add
   Target.Range.Paste
   Set ClientName = Target.Paragraphs(1).Range
   ClientName.End = ClientName.End - 1
   DocName = Left(ClientName.Text, InStr(ClientName.Text, " ") - 1) & "_" &
Mid(ClientName, InStr(ClientName.Text, " ") + 1)
   Target.Activate
   Selection.HomeKey wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Text = "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2}^13"
       .Forward = True
       .Wrap = wdFindStop
       .MatchWildcards = True
   End With
   Selection.Find.Execute
   Set FileDate = Selection.Range
   FileDate.End = FileDate.End - 1
   DocName = DocName & Format(FileDate.Text, "dd_MM_yy")
   Target.SaveAs FileName:=DocName
   Target.Close
Wend

Signature

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

> Hi Russ,
>
[quoted text clipped - 316 lines]
>> >>>>> With much gratitude and appreciation,
>> >>>>> Patti
patti - 08 Sep 2007 15:00 GMT
Hi Doug,

This looks like it fits the bill exactly.  I particularly like the approach
you took to locate the date (With Selection.Find
                      .Text = "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2}^13")

Now that's neat!  Many thanks once again to all for all your help.

Gratefully,
Patti

> This is untested, but I think it will do what you want:
>
[quoted text clipped - 265 lines]
> >> >>>>> This request is free of charge for a friend who would do anything
> >> >>>>> for
Russ - 09 Sep 2007 08:16 GMT
Patti,
Sorry, I didn't have time to work more on this until this weekend.
However, starting with Doug's basic premise, I tried to add the check for
the same client on multiple pages.

Sub Splitter()

Dim counter As Long, Source As Document, Target As Document
Dim blnFirstPage As Boolean, strDate As String, strClient As String
Dim lngPages As Long, strDocName As String, strClient2 As String
Dim myRange As Word.Range

Set Source = ActiveDocument
blnFirstPage = True
Selection.HomeKey Unit:=wdStory
lngPages = Source.BuiltInDocumentProperties(wdPropertyPages)
counter = 0
Application.ScreenUpdating = False
While counter < lngPages
   counter = counter + 1
   Source.Activate
   Source.Bookmarks("\Page").Range.Cut
  If blnFirstPage Then
       blnFirstPage = False
       Set Target = Documents.Add
       Target.Range.PasteAndFormat (wdFormatOriginalFormatting)
       strClient = Target.Paragraphs(1).Range.Text
       strClient = Left(strClient, Len(strClient) - 1)
       strClient2 = Replace(strClient, " ", "_")
       Set myRange = Target.Range.Duplicate
       With myRange.Find
           .Text = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
           .MatchWildcards = True
           .Execute
           If .Found Then
               strDocName = strClient2 & "_" & _
                   Replace(myRange.Text, "/", "_")
           Else
               MsgBox "Date not found for client: " & strClient
               Source.Undo
               Target.Undo
               Target.Close
               Exit Sub
           End If
       End With
   Else
       myRange.Start = Target.Range.End
       myRange.PasteAndFormat (wdFormatOriginalFormatting)
   End If
   If counter = lngPages Or InStr(Source.Paragraphs(1).Range.Text, _
           strClient) = 0 Then
       Do While Target.Paragraphs.Last.Range.Characters.Count = 1
           Target.Paragraphs.Last.Range.Delete
       Loop
       Target.SaveAs FileName:=strDocName
       blnFirstPage = True
       Target.Close
   End If
Wend
Application.ScreenUpdating = True
End Sub

> Hi Doug,
>
[quoted text clipped - 276 lines]
>>>>>>>>> This request is free of charge for a friend who would do anything
>>>>>>>>> for

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

Russ - 09 Sep 2007 09:02 GMT
Patti,
Also, if you want, you could add the three/four lines in message below to
unwind (undo) the original source document back to the beginning after it is
successfully split.

> Patti,
> Sorry, I didn't have time to work more on this until this weekend.
[quoted text clipped - 55 lines]
>     End If
> Wend
''''''''''''''''''''''''
Do While Source.Undo
Loop
Source.UndoClear
'Source.Saved = True 'uncomment to allow file to close without save prompt
''''''''''''''''''''''''
> Application.ScreenUpdating = True
> End Sub
[quoted text clipped - 279 lines]
>>>>>>>>>> This request is free of charge for a friend who would do anything
>>>>>>>>>> for

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

patti - 09 Sep 2007 18:32 GMT
Hi,

You and Doug have been invaluable resources, I am so thrilled to have had
the opportunity to learn something new.  I very much appreciate all the
assistance.

Thanks once again for participating in this conference and for sharing your
time and talents with me.

With much appreciation and gratitude for the help,
Patti

> Patti,
> Also, if you want, you could add the three/four lines in message below to
[quoted text clipped - 274 lines]
> >>>>>>>> joe_smith_1_24_07.doc
> >>>>>>>> [this
aq4word - 16 Nov 2007 16:00 GMT
Hi Doug
Have just used your macro for splitting large files. Excellent. Thank you
for that. I have split a 240k file (a Scrabble dictionary that has about 50%
erroneous spellings per MS Spellcheck) into 43 smaller files (page 1, page 2,
etc.). I have then applied Greg Maxey's macro for deleting wrongly spelled
words (Thank you Greg) on just one file, (i.e. Page 1). Works fine, took
about 3 hours.
Question - Is it possible to batch process the other 42 files with Greg's
macro instead of doing them one by one?

> Sub splitter()
>
[quoted text clipped - 113 lines]
> > With much gratitude and appreciation,
> > Patti
Doug Robbins - Word MVP - 17 Nov 2007 20:29 GMT
You should be able to modify the code in the following article so that it
incorporates Greg's routine

See the article "Find & ReplaceAll on a batch of documents in the same
folder" at:

http://www.word.mvps.org/FAQs/MacrosVBA/BatchFR.htm

Signature

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

> Hi Doug
> Have just used your macro for splitting large files. Excellent. Thank you
[quoted text clipped - 137 lines]
>> > With much gratitude and appreciation,
>> > Patti
aq4word - 20 Nov 2007 13:13 GMT
Thanks for your timely response Doug. Much appreciated. I'm working on it.
(I'm a Newbie!)

Regards Brian

> You should be able to modify the code in the following article so that it
> incorporates Greg's routine
[quoted text clipped - 145 lines]
> >> > With much gratitude and appreciation,
> >> > Patti

Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.