MS Office Forum / Word / Programming / November 2007
NEWBIE:splitting multi-page word doc into single word doc - thank
|
|
Thread rating:  |
patti - 02 Sep 2007 03:06 GMT Hi,
Environment: Windows XP/ home edition; sp2 Office 2000 [no help installed and no cd to install help :-( ] experience: newbie to vba, c coding experience
This request is free of charge for a friend who would do anything for anyone. His hard drive of clients was hosed and is able to retrive it from another source. Unfortunately, the files retrieved are not named properly and incompatible with his business operations.
Problem description/vba coding segment requested:
Over 60 documents named xxxxxnnnn.doc that contain client info, delimited with over 65 page breaks. Within each page break contains the client info, which needs to be extracted into their own word document, and take on the name: client_name_date_of_service.doc to easily distinguish it.
Relevant client information (including ultimate file name) is contained within each page break in the larger document. Last record in original file may not contain the page break, but I'd still like to be able to capture this one as well.
Filename containing relevant client information should be of the form client_name_date_of_service.doc (this information is contained within each page break)
Objective: - Cycle through all the word documents (approx 60 files) in a given folder -for each large document file (over 60 files) -- start the splitting process - open each file - for each page break found (between 65 - 80 page breaks) - for each paragraph in each page break - capture the first line of each page [this is the client name] - for each paragraph [search for the string "date"] - generate the client name (eg.joe_smith_1_24_07.doc) and save in a string variable - capture the entire page ---> including final page break for each page ---- select the contents of this page break and copy the entire page, including trailing page break [if absolutely necessary,] into client_name_date_of_service.doc. (eg. joe_smith_1_24_07.doc) - next [for each page break until no more page breaks in this file -- note: the final client info. may not contain the trailing page break, but I'd still like to be able to capture it and store it in its proper clientname_date_of_appt.doc - next [for each file containing all the client data within the page breaks] - close/properly dispose of any allocated resources - error handler to close/dispose to determine the cause of the failure and properly shutdown the application.
?? any additional steps that I've neglected to mention.
I enjoy helping people and learning new things. Many thanks to all who take the time to share their time and talents by responding witih the code capable of accomplishing this task.
With much gratitude and appreciation, Patti
Doug Robbins - Word MVP - 02 Sep 2007 05:29 GMT Sub splitter()
'
' splitter Macro
' Macro created 16-08-98 by Doug Robbins to save each page of a document
' as a separate file with the name Page#.DOC
'
Dim Counter As Long, Source As Document, Target As Document
Set Source = ActiveDocument
Selection.HomeKey Unit:=wdStory
Pages = Source.BuiltInDocumentProperties(wdPropertyPages)
Counter = 0
While Counter < Pages
Counter = Counter + 1
DocName = "Page" & Format(Counter)
Source.Bookmarks("\Page").Range.Cut
Set Target = Documents.Add
Target.Range.Paste
Target.SaveAs FileName:=DocName
Target.Close
Wend
End Sub
 Signature Hope this helps.
Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis.
Doug Robbins - Word MVP
> Hi, > [quoted text clipped - 71 lines] > With much gratitude and appreciation, > Patti patti - 02 Sep 2007 15:52 GMT Hi Doug,
Many thanks for your generous offer of code. This does indeed parse the larger file, breaking it down and writing it to a file formatted as page[n].doc.
The only downside, is that for every iteration through the collection of larger files, it overwrites the contents of the previous page[n].doc.
I've still quite a bit of work to do with this one though.
The bigger piece for me would be to locate two important pieces of information, namely: - the first paragraph or sentence as this contains the client name. - Then establish a search throughout the page for a paragraph/sentence starting with the string "date" mentioned in my original post.
These two critical pieces of information form the client_name_date_of_service.doc which is the business model with which this person employs. Once these pieces are located, I can copy the contents of the client transaction (the code you posted), and then perform a 'file save as: client_name_date_of_service.doc'. As an example: joe_smith_1_24_07.doc [this is the first client located in the larger file] frank_hood_1_31_06.doc [this is the second client located in the larger file], etc. If you, or anyone else, has any ideas on how to gather these two pieces of information, located between each page break, I'd really appreciate it. This way, the files will be named properly and in keeping with his requirements.
I so appreciate you sharing this code segment. If you have any additional suggestions as to how to extract these two important pieces of info, while in the parsing of each page break, I'd really appreciate it.
Thanks ever so much for your help. With much gratitude and appreciation, Patti
> Sub splitter() > [quoted text clipped - 113 lines] > > With much gratitude and appreciation, > > Patti Russ - 04 Sep 2007 10:55 GMT Patti, What we need to search for are consistent patterns that you say are on each page. Can you figure out what the patterns are? If not, then show us a few pages of data so that we can see how it is laid out. You can, of course, disguise the names, etc., but we need to know where the names and date formats are in relationship to paragraph marks or other consistent text, font, color, heading styles, etc.
> Hi Doug, > [quoted text clipped - 151 lines] >>> With much gratitude and appreciation, >>> Patti
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
patti - 05 Sep 2007 16:06 GMT Hi Russ,
Thanks so much in your interest. I really appreciate it.
Description: Split large files into separate files - open each large word.doc file containing client info. capture two fields: Client[n]_name (eg. Patty Smith) Date: (eg. Date: 9/4/07) Filename generated: Patty_Smith_9_4_07 Client information/requirements are captured in various paragraphs which may extend beyond one page. Cut speicific client information (pages [1-n]) and save as client_name_date_as_recorded_in_page1 (eg. Patty_Foober1_7_18_07)
Sample included below for reference:
Sample: Patty Foober1 Address Telephone Number Date: 7/18/07 [ this may or may not be located in this area of the file]
Client information/requirement captured here and may extend into multiple pages. PAGE 2 Patty Foobar1 Additional requirements may be captured here
---------------------------------page break ----------------------------------- Rob Foobar Date: 9/4/07 [ the date field appears somewhere in the client information header, but the person who input the data was not consistent in their entry methods, which means it needs to be searched and retrieved]
Client information/requirement captured here and may extend into multiple pages. ------------------------------- page break ----------------------------------- Kanga Roo Date: 9/1/07
Client information/requirement captured here ---------------------------------page break -----------------------------------
--- Thanks again for any recommendations, and for sharing your time and talents with me.
With much gratitude, Patti
====================================================
> Patti, > What we need to search for are consistent patterns that you say are on each [quoted text clipped - 159 lines] > >>> With much gratitude and appreciation, > >>> Patti Russ - 05 Sep 2007 19:19 GMT Hi Patti, More info please. So a clients name appears *by itself* (no label) in the first paragraph of each page (and may repeat on *consecutive pages* if more information is available for that particular client)? And is consistent in that respect from page 1 to end of document? The date you want is always the first date found on the first page of each client and always formatted month/day/year(two digit year)? The name and date are always in the main text area and not in header or footer of page? The name and date are not formatted differently than the rest of the text? The date is always preceded by the label Date:?
> Hi Russ, > [quoted text clipped - 220 lines] >>>>> With much gratitude and appreciation, >>>>> Patti
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
patti - 05 Sep 2007 21:48 GMT Hi Russ,
Thanks so much for your response and inquiries. I'll do my best to address them.
First page : -Client Name is in the very first paragraph, followed by paragraph symbol For example: Patty Foobar1, paragraph symbol - date string is contained somewhere prior to the page break - date string format: Date: -> mm/dd/yy (where -> is some Microsoft inserted symbol), followed by paragraph symbol. For example: Date: -> 6/26/07
Client data may, or may not span multiple pages. If there are multiple pages: Client Name is in the first paragraph, but may be underlined, contain extraneous information (eg. Patty Foobar1-6/26/07, followed by paragraph symbol
New Filename (Patty_Foobar1_6_26_07.doc) [named based upon first page] - should contain everything in page 1 and subsequent pages, where applicable.
Header/Footer questions: The pieces of information that will end up in the new client_name_date.doc are not located in the header or footer sections.
Many thanks once again for your help and interest.
Gratefully, Patti =====================================================
> Hi Patti, > More info please. [quoted text clipped - 233 lines] > >>>>> With much gratitude and appreciation, > >>>>> Patti Doug Robbins - Word MVP - 08 Sep 2007 10:29 GMT This is untested, but I think it will do what you want:
Dim Counter As Long Dim Source As Document, Target As Document Dim ClientName As Range Dim FileDate As Range Dim DocName As String Set Source = ActiveDocument Selection.HomeKey Unit:=wdStory Pages = Source.BuiltInDocumentProperties(wdPropertyPages) Counter = 0 While Counter < Pages Counter = Counter + 1 Source.Bookmarks("\Page").Range.Cut Set Target = Documents.Add Target.Range.Paste Set ClientName = Target.Paragraphs(1).Range ClientName.End = ClientName.End - 1 DocName = Left(ClientName.Text, InStr(ClientName.Text, " ") - 1) & "_" & Mid(ClientName, InStr(ClientName.Text, " ") + 1) Target.Activate Selection.HomeKey wdStory Selection.Find.ClearFormatting With Selection.Find .Text = "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2}^13" .Forward = True .Wrap = wdFindStop .MatchWildcards = True End With Selection.Find.Execute Set FileDate = Selection.Range FileDate.End = FileDate.End - 1 DocName = DocName & Format(FileDate.Text, "dd_MM_yy") Target.SaveAs FileName:=DocName Target.Close Wend
 Signature Hope this helps.
Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis.
Doug Robbins - Word MVP
> Hi Russ, > [quoted text clipped - 316 lines] >> >>>>> With much gratitude and appreciation, >> >>>>> Patti patti - 08 Sep 2007 15:00 GMT Hi Doug,
This looks like it fits the bill exactly. I particularly like the approach you took to locate the date (With Selection.Find .Text = "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2}^13")
Now that's neat! Many thanks once again to all for all your help.
Gratefully, Patti
> This is untested, but I think it will do what you want: > [quoted text clipped - 265 lines] > >> >>>>> This request is free of charge for a friend who would do anything > >> >>>>> for Russ - 09 Sep 2007 08:16 GMT Patti, Sorry, I didn't have time to work more on this until this weekend. However, starting with Doug's basic premise, I tried to add the check for the same client on multiple pages.
Sub Splitter()
Dim counter As Long, Source As Document, Target As Document Dim blnFirstPage As Boolean, strDate As String, strClient As String Dim lngPages As Long, strDocName As String, strClient2 As String Dim myRange As Word.Range
Set Source = ActiveDocument blnFirstPage = True Selection.HomeKey Unit:=wdStory lngPages = Source.BuiltInDocumentProperties(wdPropertyPages) counter = 0 Application.ScreenUpdating = False While counter < lngPages counter = counter + 1 Source.Activate Source.Bookmarks("\Page").Range.Cut If blnFirstPage Then blnFirstPage = False Set Target = Documents.Add Target.Range.PasteAndFormat (wdFormatOriginalFormatting) strClient = Target.Paragraphs(1).Range.Text strClient = Left(strClient, Len(strClient) - 1) strClient2 = Replace(strClient, " ", "_") Set myRange = Target.Range.Duplicate With myRange.Find .Text = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}" .MatchWildcards = True .Execute If .Found Then strDocName = strClient2 & "_" & _ Replace(myRange.Text, "/", "_") Else MsgBox "Date not found for client: " & strClient Source.Undo Target.Undo Target.Close Exit Sub End If End With Else myRange.Start = Target.Range.End myRange.PasteAndFormat (wdFormatOriginalFormatting) End If If counter = lngPages Or InStr(Source.Paragraphs(1).Range.Text, _ strClient) = 0 Then Do While Target.Paragraphs.Last.Range.Characters.Count = 1 Target.Paragraphs.Last.Range.Delete Loop Target.SaveAs FileName:=strDocName blnFirstPage = True Target.Close End If Wend Application.ScreenUpdating = True End Sub
> Hi Doug, > [quoted text clipped - 276 lines] >>>>>>>>> This request is free of charge for a friend who would do anything >>>>>>>>> for
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Russ - 09 Sep 2007 09:02 GMT Patti, Also, if you want, you could add the three/four lines in message below to unwind (undo) the original source document back to the beginning after it is successfully split.
> Patti, > Sorry, I didn't have time to work more on this until this weekend. [quoted text clipped - 55 lines] > End If > Wend '''''''''''''''''''''''' Do While Source.Undo Loop Source.UndoClear 'Source.Saved = True 'uncomment to allow file to close without save prompt ''''''''''''''''''''''''
> Application.ScreenUpdating = True > End Sub [quoted text clipped - 279 lines] >>>>>>>>>> This request is free of charge for a friend who would do anything >>>>>>>>>> for
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
patti - 09 Sep 2007 18:32 GMT Hi,
You and Doug have been invaluable resources, I am so thrilled to have had the opportunity to learn something new. I very much appreciate all the assistance.
Thanks once again for participating in this conference and for sharing your time and talents with me.
With much appreciation and gratitude for the help, Patti
> Patti, > Also, if you want, you could add the three/four lines in message below to [quoted text clipped - 274 lines] > >>>>>>>> joe_smith_1_24_07.doc > >>>>>>>> [this aq4word - 16 Nov 2007 16:00 GMT Hi Doug Have just used your macro for splitting large files. Excellent. Thank you for that. I have split a 240k file (a Scrabble dictionary that has about 50% erroneous spellings per MS Spellcheck) into 43 smaller files (page 1, page 2, etc.). I have then applied Greg Maxey's macro for deleting wrongly spelled words (Thank you Greg) on just one file, (i.e. Page 1). Works fine, took about 3 hours. Question - Is it possible to batch process the other 42 files with Greg's macro instead of doing them one by one?
> Sub splitter() > [quoted text clipped - 113 lines] > > With much gratitude and appreciation, > > Patti Doug Robbins - Word MVP - 17 Nov 2007 20:29 GMT You should be able to modify the code in the following article so that it incorporates Greg's routine
See the article "Find & ReplaceAll on a batch of documents in the same folder" at:
http://www.word.mvps.org/FAQs/MacrosVBA/BatchFR.htm
 Signature Hope this helps.
Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis.
Doug Robbins - Word MVP
> Hi Doug > Have just used your macro for splitting large files. Excellent. Thank you [quoted text clipped - 137 lines] >> > With much gratitude and appreciation, >> > Patti aq4word - 20 Nov 2007 13:13 GMT Thanks for your timely response Doug. Much appreciated. I'm working on it. (I'm a Newbie!)
Regards Brian
> You should be able to modify the code in the following article so that it > incorporates Greg's routine [quoted text clipped - 145 lines] > >> > With much gratitude and appreciation, > >> > Patti
|
|
|