MS Office Forum / Word / Programming / November 2007
How to replace a string of characters with the count of the charac
|
|
Thread rating:  |
Ancient Brit - 16 Oct 2007 20:01 GMT I have what seemed at first to be a trivial task for a Word 2003 SP2 macro (VB 6.3).
Given a body of text containing a range of characters (letters (upper and lower case), digits, punctuation, spaces), all but the letters A-Z need to be removed, then the resulting text needs to be sorted, and finally, the count of each letter should replace each block of sorted letters.
So: “I wandered lonely as a cloud, that floats on high o’er vales and hills, etc., etc., with a few 12345 thrown in for good measure!” becomes penultimately: “AAAAAAAAACCCDDDDDEEEEEEEEEEFFFGGHHHHHHIIIIILLLLLLLMNNNNNNOOOOOOOOORRRRRSSSSSTTTTTTTUUVWWWWY” and then finally: 9 0 3 5 10 3 2 6 5 0 0 7 1 6 9 0 0 5 5 7 2 1 4 0 1 0. (The zeros appear where a letter in the sequence is missing and so the count for that letter is zero).
I did some searching around and found very useful information on the use of Search/Replace with wildcards from Graham Mayor and Klaus Linke at word.mvps.org (excellent job – thank you!. I’ve been using MS Word for probably 15 years and I still find something to learn :))
My initial code worked OK – my approach was to first select the entire body of text and render it upper case, then use Search/Replace with the FIND wildcard sequence [!A-Z] and the REPLACE sequence null to reduce the text to solely A-Z.
A subsequent Search/Replace on the text added a carriage return after every character, the result was sorted, followed by another Search/Replace to remove all the carriage returns. (There may be a quicker/simpler way but I’m not aware of it.)
When it came to replacing each block of the same letter with its count, I hit a snag. Try as I might, I cannot find a simple programmatic way to do what I want.
I thought I had a solution when I tested a manual approach, using FIND with Highlight checked (so the count is returned, but more importantly the block of matching text is selected on exiting FIND, so that – I thought – I could just replace the selection with the contents of Selection.Characters.Count (and add a space as a separator).
Not so. What works manually doesn’t appear to work in a macro.
If I create a macro (even if I record one) that uses FIND to locate and select all matching characters, upon completion only the first character in the group is selected, whether I use Selection or Range.
I haven’t found a bug report that describes the FIND problem – yet – and there are clearly more complex workarounds that I could devise, but I’d prefer to keep the solution minimal and simple if I can. I’d be very grateful for some guidance, even if it’s to say: “Use a workaround; FIND is bugged.”
Best, Peter
Example code: Sub M8() ' ' Macro M8 created 10/15/2007 by Peter GQ Brooks ' ' Sort the text. Simplest way is to begin by making everything upper case (A-Z), then use ' Search/Replace to remove everything that is NOT in the range A-Z (use wildcards and the ' expression [!A-Z] for the FIND and null for the REPLACE). ' Then replace every character with itself plus a carriage return, making each character a ' line on its own, then sort, then delete all carriage returns (replace every carriage return ' with a null).
' Select the entire document. ' Change case to upper.
Selection.WholeStory Selection.Range.Case = wdUpperCase ' Ensure Find/Replace boxes have no prior formatting to impede process Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting ' Dump everything that isn't in the range A to Z.
With Selection.Find .Text = "[!A-Z]" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = True .MatchWholeWord = False .MatchWildcards = True .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll
' Now look for "any single character" and replace it with the same character and a carriage return
With Selection.Find .Text = "^?" .Replacement.Text = "^&^p" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll ' Sort entire document by paragraphs Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdEnglishUS, SubFieldNumber:="Paragraphs", SubFieldNumber2:= _ "Paragraphs", SubFieldNumber3:="Paragraphs" ' Remove all carriage returns after sorting. Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^p" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll ' Loop from A to Z and replace with count of character
For asciipointer = 65 To 90 Selection.Find.ClearFormatting With Selection.Find .Text = Chr(asciipointer) .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With ' Experimental section, trying various solutions:
' Set myRange = ActiveDocument.Content ' myRange.Find.Execute FindText:=Chr(asciipointer), Forward:=True ' Stop ' If myRange.Find.Found = True Then myRange.Text = Selection.Characters.Count ' Stop
' Selection.Find.Execute ' 'Selection.Delete ' Stop ' MsgBox (Selection.Characters.Count) ' Selection.InsertBefore Selection.Characters.Count ' Stop ' 'Selection.Replace Next asciipointer
End Sub
Jay Freedman - 16 Oct 2007 22:00 GMT Hi Peter,
I prefer to work with Range objects instead of the Selection whenever possible, but in general this working example follows your outline except for the final loop. I did have some trouble at first while trying to use the same range for the loop as I had used for the preceding manipulations -- it seemed unable to find anything -- but it all cleared immediately when I started with a fresh range (oRg2) at that point.
Note that there's an entirely different scheme at http://www.word.mvps.org/FAQs/MacrosVBA/NoTimesTextInDoc.htm, and another at http://www.word.mvps.org/FAQs/MacrosVBA/GetNoOfReplacements.htm. With either of those, it would be better to record the resulting numbers in a separate document, instead of in-place replacement.
Sub CountChars() Dim oRg As Range, oRg2 As Range Dim CharNum As Long
Set oRg = ActiveDocument.Range
oRg.Case = wdUpperCase
With oRg.Find .Format = False .Forward = True .Wrap = wdFindStop .MatchWildcards = True
.Text = "[!A-Z]" .Replacement.Text = "" .Execute Replace:=wdReplaceAll
.Text = "([A-Z])" .Replacement.Text = "\1^p" .Execute Replace:=wdReplaceAll End With
oRg.Sort
Set oRg = ActiveDocument.Range With oRg.Find .Format = False .Forward = True .Wrap = wdFindStop .MatchWildcards = False
.Text = "^p" .Replacement.Text = "" .Execute Replace:=wdReplaceAll End With
Set oRg2 = ActiveDocument.Range With oRg2.Find .Format = False .Forward = True .Wrap = wdFindStop .MatchWildcards = True For CharNum = Asc("A") To Asc("Z") .Text = "([" & Chr$(CharNum) & "]{1,})" If .Execute Then oRg2.Text = oRg2.Characters.Count & " " Else oRg2.Text = "0 " End If oRg2.Collapse wdCollapseEnd Next CharNum End With End Sub
 Signature Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit.
> I have what seemed at first to be a trivial task for a Word 2003 SP2 > macro (VB 6.3). [quoted text clipped - 185 lines] > > End Sub Ancient Brit - 16 Oct 2007 23:14 GMT Hi Jay
I wrote a moderately long reply but it seems to have gone AWOL - not sure of the lag time between sending in something and having it appear in the threads...
Anyway, at the risk of duplicating posts, I wanted to make sure I registered my thanks and appreciation for your solution - my VB skills are very rusty but I can appreciate a bit of crisp coding when I see it :)
In the interim I'd played around with MoveEndWhile and MoveStart but found that Selection.Characters.Count somehow ended up with a value of 1 if nothing was found, leading to an incorrect output. But since your solution is not only correct but much faster than mine I'm happy to cease pursuing that line of investigation :)
My next task is to make the output a little more sophisticated (read: complicated), by (i) converting runs of zeros into letters (A for 0, B for 0 0, C for 0 0 0, etc), (ii) closing the gaps between the resulting letters and numbers (but retaining spaces between numbers), so:
9 0 3 5 10 3 2 6 5 0 0 7 1 6 9 0 0 5 5 7 2 1 4 0 1 0
becomes:
9A3 5 10 3 2 6 5B7 1 6 9B5 5 7 2 1 4A1A
and finally (iii) dividing the [A-Z] source text into blocks of 125 characters if it exceeds 125, and processing each block independently.
I'm going to give that a bash but I suspect I may be back again before long...:)
Once again, many thanks for your help.
Best,
Peter
> Hi Peter, > [quoted text clipped - 255 lines] > > > > End Sub Russ - 17 Oct 2007 07:46 GMT Peter, Another way to get a speedy letter count without changing the document is to use my new favorite function combination that I first saw used by Helmut Weber. Using Jay's method to loop through the alphabet:
Public Sub LetterCount() Dim CharNum As Long For CharNum = Asc("A") To Asc("Z") MsgBox Chr$(CharNum) & " = " & UBound(Split(ActiveDocument.Content, _ Chr$(CharNum), vbTextCompare)) Next CharNum End Sub
The vbTextCompare tells it to ignore case, otherwise use vbBinaryCompare. Another neat thing about the Ubound(Split()) is that it can also count longer strings such as words or phrases.
> Hi Jay > [quoted text clipped - 295 lines] >>> >>> End Sub
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Russ - 17 Oct 2007 08:23 GMT Peter, This gets you closer to what you wanted:
Public Sub AlphabetCountString() Dim CharNum As Long Dim AlpahbetCountString As String For CharNum = Asc("A") To Asc("Z") AlpahbetCountString = AlpahbetCountString & UBound(Split(ActiveDocument.Content, _ Chr$(CharNum), vbTextCompare)) & " " Next CharNum AlpahbetCountString = Replace(AlpahbetCountString, " 0 0 0 ", "C") AlpahbetCountString = Replace(AlpahbetCountString, " 0 0 ", "B") AlpahbetCountString = Replace(AlpahbetCountString, " 0 ", "A") MsgBox AlpahbetCountString End Sub
> Peter, > Another way to get a speedy letter count without changing the document is to [quoted text clipped - 134 lines] >>>> hills, etc., etc., with a few 12345 thrown in for good measure!" >>>> becomes penultimately: "AAAAAAAAACCCDDDDDEEEEEEEEEEFFFGGHHHHHHIIIIILLLLLLLMNNNNNNOOOOOOOOORRRRRSSS>>>> S
>>>> STTTTTTTUUVWWWWY" >>>> and then finally: 9 0 3 5 10 3 2 6 5 0 0 7 1 6 9 0 0 5 5 7 2 1 4 0 1 [quoted text clipped - 173 lines] >>>> >>>> End Sub
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Russ - 17 Oct 2007 08:39 GMT Peter, Fixed the typo is the spelling of Alphabet below and added choice of selected text or whole document content, if nothing is selected:
Public Sub AlphabetCountString() Dim CharNum As Long Dim AlphabetCountString As String Dim aRange As Word.Range
If Selection.Type = wdSelectionIP Then 'No text selected? Set aRange = ActiveDocument.Content 'then work on whole main body Else Set aRange = Selection.Range End If
For CharNum = Asc("A") To Asc("Z") AlphabetCountString = AlphabetCountString & UBound(Split(aRange, _ Chr$(CharNum), vbTextCompare)) & " " Next CharNum AlphabetCountString = Replace(AlphabetCountString, " 0 0 0 ", "C") AlphabetCountString = Replace(AlphabetCountString, " 0 0 ", "B") AlphabetCountString = Replace(AlphabetCountString, " 0 ", "A") MsgBox AlphabetCountString End Sub
> Peter, > This gets you closer to what you wanted: [quoted text clipped - 156 lines] >>>>> hills, etc., etc., with a few 12345 thrown in for good measure!" >>>>> becomes penultimately: "AAAAAAAAACCCDDDDDEEEEEEEEEEFFFGGHHHHHHIIIIILLLLLLLMNNNNNNOOOOOOOOORRRRRSSS>>>>
> S >>>>> STTTTTTTUUVWWWWY" [quoted text clipped - 174 lines] >>>>> >>>>> End Sub
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Ancient Brit - 17 Oct 2007 15:50 GMT Hi Russ
Thanks for your input - it's always useful to see different ways of achieving the same goal.
How would you generalise the Replace(AlphabetCountString, " 0 ", "A") ... Replace(AlphabetCountString, " 0 0 0 ", "C") ..., bearing in mind that the theoretical space allows for coding a run of 26 zeros (i.e Z)?
Best,
Peter
> Peter, > Fixed the typo is the spelling of Alphabet below and added choice of [quoted text clipped - 277 lines] > >>>>> ' Now look for "any single character" and replace it with the same > >>>>> character and a carriage return Russ - 17 Oct 2007 18:25 GMT Peter, This should work:
Public Sub AlphabetCountString() Dim CharNum As Long Dim AlphabetCountString As String Dim aRange As Word.Range Dim ZeroString As String If Selection.Type = wdSelectionIP Then 'No text selected? Set aRange = ActiveDocument.Content 'then work on whole main body Else Set aRange = Selection.Range End If
For CharNum = Asc("A") To Asc("Z") AlphabetCountString = AlphabetCountString & UBound(Split(aRange, _ Chr$(CharNum), vbTextCompare)) & " " Next CharNum ZeroString = " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" For CharNum = 26 To 1 Step -1 AlphabetCountString = Replace(AlphabetCountString, _ Left(ZeroString, CharNum * 2) & " ", Chr$(64 + CharNum)) Next CharNum MsgBox AlphabetCountString End Sub
> Hi Russ > [quoted text clipped - 297 lines] >>>>>>> ' Now look for "any single character" and replace it with the same >>>>>>> character and a carriage return
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Ancient Brit - 17 Oct 2007 21:26 GMT Hi Russ
Hmmm. The routine terminates with a msgbox that says "0Y" which seems like it's proclaiming success ("OY!") but I have my doubts.
I'm assuming that it's designed to work on a string such as:
9 0 3 5 10 3 2 6 5 0 0 7 1 6 9 0 0 5 5 7 2 1 4 0 1 0
and to output:
9A3 5 10 3 2 6 5B7 1 6 9B5 5 7 2 1 4A1A.
I thought that I'd misunderstood the purpose and that maybe it was intended to be applied to the original plain text, so I ran it against:
I wandered lonely as a cloud, that floats on high o’er vales and hills, etc., etc., with a few 12345 thrown in for good measure!
But still it says "0Y"
Or is it complaining: "Oh, why?" :)
Best,
Peter
> Peter, > This should work: [quoted text clipped - 273 lines] > >>>>>>> be very grateful for some guidance, even if it's to say: "Use a > >>>>>>> workaround; FIND is bugged." Russ - 18 Oct 2007 06:05 GMT Peter, I added a space character before the AlphabetCountString, and it seems to test OK for me now.
To get closer to your last stated wish to work on blocks of text, I also show the subroutine changed into a function ( where you supply it with a range argument and it returns your filtered string output ).
Then I show you three different ways to use the function with some test subroutines. +++++++++++++++++++++++++++++++++ Public Sub AlphabetCountString() Dim CharNum As Long Dim AlphabetCountString As String Dim aRange As Word.Range Dim ZeroString As String
If Selection.Type = wdSelectionIP Then 'No text selected? Set aRange = ActiveDocument.Content 'then work on whole main body Else Set aRange = Selection.Range End If
For CharNum = Asc("A") To Asc("Z") AlphabetCountString = AlphabetCountString & UBound(Split(aRange, _ Chr$(CharNum), vbTextCompare)) & " " Next CharNum ZeroString = " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" AlphabetCountString = " " & AlphabetCountString For CharNum = 26 To 1 Step -1 AlphabetCountString = Replace(AlphabetCountString, _ Left(ZeroString, CharNum * 2) & " ", Chr$(64 + CharNum)) Next CharNum MsgBox Trim(AlphabetCountString) End Sub +++++++++++++++++++++++++++++++++ Public Function AlphabetCountStringF(aRange As Word.Range) As String Dim CharNum As Long Dim AlphabetCountString As String Dim ZeroString As String
For CharNum = Asc("A") To Asc("Z") AlphabetCountString = AlphabetCountString & UBound(Split(aRange, _ Chr$(CharNum), vbTextCompare)) & " " Next CharNum ZeroString = " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" AlphabetCountString = " " & AlphabetCountString For CharNum = 26 To 1 Step -1 AlphabetCountString = Replace(AlphabetCountString, _ Left(ZeroString, CharNum * 2) & " ", Chr$(64 + CharNum)) Next CharNum AlphabetCountStringF = Trim(AlphabetCountString) End Function +++++++++++++++++++++++++++++++++
Public Sub TestAlphabetCountStringF1() MsgBox AlphabetCountStringF(ActiveDocument.Content) End Sub +++++++++++++++++++++++++++++++++ Public Sub TestAlphabetCountStringF2() MsgBox AlphabetCountStringF(Selection.Range) End Sub +++++++++++++++++++++++++++++++++ Public Sub TestAlphabetCountStringF3() Dim aRange As Word.Range If Selection.Type = wdSelectionIP Then 'No text selected? Set aRange = ActiveDocument.Content 'then work on whole main body Else Set aRange = Selection.Range End If MsgBox AlphabetCountStringF(aRange) End Sub
> Hi Russ > [quoted text clipped - 310 lines] >>>>>>>>> be very grateful for some guidance, even if it's to say: "Use a >>>>>>>>> workaround; FIND is bugged."
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Russ - 18 Oct 2007 10:28 GMT Peter, This subroutine can call the aforementioned AlphabetCountStringF() function with a block of text length set by StepNum. It loops through the whole document if nothing is selected or works on the selected text. You can put your code in, where the MsgBox line is, to work with the results received from the function.
Public Sub TestAlphabetCountStringF4() Dim aRange As Word.Range Dim aRange2 As Word.Range Dim CharNum As Long Const StepNum = 125
If Selection.Type = wdSelectionIP Then 'No text selected? Set aRange = ActiveDocument.Content 'then work on whole main body Else Set aRange = Selection.Range End If Set aRange2 = aRange.Duplicate For CharNum = aRange.Start To aRange.End Step StepNum aRange2.SetRange Start:=ActiveDocument.Range(CharNum, CharNum).End _ , End:=ActiveDocument.Range(CharNum, CharNum).End If aRange.End <= aRange2.Start + StepNum Then aRange2.End = aRange.End Else aRange2.End = aRange2.Start + StepNum End If If aRange2.Start <> aRange2.End Then MsgBox AlphabetCountStringF(aRange2) 'work with function results End If Next CharNum End Sub
> Peter, > I added a space character before the AlphabetCountString, and it seems to [quoted text clipped - 341 lines] >>>>>>>>>> hills, etc., etc., with a few 12345 thrown in for good measure!" >>>>>>>>>> becomes penultimately: "AAAAAAAAACCCDDDDDEEEEEEEEEEFFFGGHHHHHHIIIIILLLLLLLMNNNNNNOOOOOOOOORRRRRSS>>>>> S
>>>>>> S >>>>>>>>>> STTTTTTTUUVWWWWY" [quoted text clipped - 39 lines] >>>>>>>>>> be very grateful for some guidance, even if it's to say: "Use a >>>>>>>>>> workaround; FIND is bugged."
 Signature Russ
drsmN0SPAMikleAThotmailD0Tcom.INVALID
Ancient Brit - 05 Nov 2007 22:16 GMT Hi Russ
Sorry for the lack of feedback - high urgency task overtook everything else, and will continue for some time yet. I'll get back with a report ASAP.
Best,
Peter
> Peter, > This subroutine can call the aforementioned AlphabetCountStringF() function [quoted text clipped - 281 lines] > >>>>>>>> > >>>>>>>> Peter
|
|
|