MS Office Forum / Word / Programming / December 2004
Using Microsoft VBScript Regular Expressions 5.5
|
|
Thread rating:  |
Joel Finkel - 01 Dec 2004 16:57 GMT Folks,
I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.
However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".
I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:
Sub DoSectionHeaders(ByRef s As String)
Set re = New RegExp re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True s = re.Replace(s, "\r<h3>\1</h3>\r") End Sub
My question: How do I pass the entire formatted text into this Sub so it can be processed?
I tried this:
Call DoSectionHeaders(Selection.FormattedText)
But Selection.FormattedText is an empty string.
Thanks in advance for all suggestions. Oh yes, I am using Word 2003.
Joel Finkel finkel@sd-il.com
Peter - 01 Dec 2004 18:07 GMT How are you setting up the Selection?
-Peter
Folks,
I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.
However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".
I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:
Sub DoSectionHeaders(ByRef s As String)
Set re = New RegExp re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True s = re.Replace(s, "\r<h3>\1</h3>\r") End Sub
My question: How do I pass the entire formatted text into this Sub so it can be processed?
I tried this:
Call DoSectionHeaders(Selection.FormattedText)
But Selection.FormattedText is an empty string.
Thanks in advance for all suggestions. Oh yes, I am using Word 2003.
Joel Finkel finkel@sd-il.com
Joel Finkel - 01 Dec 2004 18:15 GMT I have done this:
Dim rng As Range Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End) Call DoSectionHeaders(rng.FormattedText)
The Sub now gets the formatted text, and the RegExp works. However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.
/Joel Finkel finkel@sd-il.com
How are you setting up the Selection?
-Peter
"Joel Finkel" <finkel@sd-il.com> wrote in message news:ONbSWb81EHA.1404@TK2MSFTNGP11.phx.gbl... Folks,
I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.
However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".
I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:
Sub DoSectionHeaders(ByRef s As String)
Set re = New RegExp re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True s = re.Replace(s, "\r<h3>\1</h3>\r") End Sub
My question: How do I pass the entire formatted text into this Sub so it can be processed?
I tried this:
Call DoSectionHeaders(Selection.FormattedText)
But Selection.FormattedText is an empty string.
Thanks in advance for all suggestions. Oh yes, I am using Word 2003.
Joel Finkel finkel@sd-il.com
Peter - 01 Dec 2004 18:38 GMT Pass it as a Range, not a String:
Dim rng As Range Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End) Call DoSectionHeaders(rng.FormattedText)
Sub DoSectionHeaders(ByRef r As Range) Set re = New RegExp re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r") End Sub
Doesn't matter if you pass byref of byval if you use the Range object, too. I think that applies to all objects, but my theory is a little rusty.
hth,
-Peter
I have done this:
Dim rng As Range Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End) Call DoSectionHeaders(rng.FormattedText)
The Sub now gets the formatted text, and the RegExp works. However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.
/Joel Finkel finkel@sd-il.com
"Peter" <peterguy -at- hotmail -dot- com> wrote in message news:uLdqSB91EHA.1188@tk2msftngp13.phx.gbl... How are you setting up the Selection?
-Peter
"Joel Finkel" <finkel@sd-il.com> wrote in message news:ONbSWb81EHA.1404@TK2MSFTNGP11.phx.gbl... Folks,
I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.
However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".
I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:
Sub DoSectionHeaders(ByRef s As String)
Set re = New RegExp re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True s = re.Replace(s, "\r<h3>\1</h3>\r") End Sub
My question: How do I pass the entire formatted text into this Sub so it can be processed?
I tried this:
Call DoSectionHeaders(Selection.FormattedText)
But Selection.FormattedText is an empty string.
Thanks in advance for all suggestions. Oh yes, I am using Word 2003.
Joel Finkel finkel@sd-il.com
Joel Finkel - 01 Dec 2004 19:20 GMT Brilliant! Where do I send the Guiness?
Of course, now I have problems with the RegExp but that will have to wait until tonight,
Thanks.
/Joel
Pass it as a Range, not a String:
Dim rng As Range Set rng = ActiveDocument.Range(Start:=0, End:=ActiveDocument.Content.End) Call DoSectionHeaders(rng.FormattedText)
Sub DoSectionHeaders(ByRef r As Range)
Set re = New RegExp
re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True
r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r")
End Sub
Doesn't matter if you pass byref of byval if you use the Range object, too. I think that applies to all objects, but my theory is a little rusty.
hth,
-Peter
I have done this:
Dim rng As Range Set rng = ActiveDocument.Range(Start:=0, End:=ActiveDocument.Content.End) Call DoSectionHeaders(rng.FormattedText)
The Sub now gets the formatted text, and the RegExp works. However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.
/Joel Finkel finkel@sd-il.com
"Peter" <peterguy -at- hotmail -dot- com> wrote in message news:uLdqSB91EHA.1188@tk2msftngp13.phx.gbl... How are you setting up the Selection?
-Peter
Folks,
I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.
However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".
I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:
Sub DoSectionHeaders(ByRef s As String)
Set re = New RegExp
re.Pattern = "\r(\w{1,8})\r{1}" re.IgnoreCase = True re.Global = True
s = re.Replace(s, "\r<h3>\1</h3>\r")
End Sub
My question: How do I pass the entire formatted text into this Sub so it can be processed?
I tried this:
Call DoSectionHeaders(Selection.FormattedText)
But Selection.FormattedText is an empty string.
Thanks in advance for all suggestions. Oh yes, I am using Word 2003.
Joel Finkel finkel@sd-il.com
Peter - 01 Dec 2004 21:02 GMT Mmm, Guiness. Good stuff, served correctly.
Looking at what you're doing, you _might_ be able to do away with the regexp and parse your document by paragraph. Perhaps the structure of your other processing won't fit with that, but it might make it a bit easier in this case (testing for the next paragraph made it messy):
Dim para As Paragraph For Each para In ActiveDocument.Paragraphs With para.Range.FormattedText ' one thing to remember about a paragraph is that ' it includes the ending paragraph mark as a word If .Words.Count > 1 And .Words.Count <= 9 Then If Not para.Next Is Nothing Then If para.Next.Range.Words.Count > 1 Then With .Words .Item(1).InsertBefore "<h3>" .Item(.Count - 1).InsertAfter "</h3>" End With End If Else With .Words .Item(1).InsertBefore "<h3>" .Item(.Count - 1).InsertAfter "</h3>" End With End If End If End With Next para
hth,
-Peter
> Brilliant! Where do I send the Guiness? > [quoted text clipped - 90 lines] > Joel Finkel > finkel@sd-il.com Joel Finkel - 02 Dec 2004 01:04 GMT Peter,
I can't thank you enough. Your example was close enough to teach me well, and I was able to modify it to do just what I need to do.
Shouldn't serve that Guiness too cold. Room temperature is cold enough.
/Joel Finkel finkel@sd-il.com
Mmm, Guiness. Good stuff, served correctly.
Looking at what you're doing, you _might_ be able to do away with the regexp and parse your document by paragraph. Perhaps the structure of your other processing won't fit with that, but it might make it a bit easier in this case (testing for the next paragraph made it messy):
Dim para As Paragraph For Each para In ActiveDocument.Paragraphs With para.Range.FormattedText ' one thing to remember about a paragraph is that ' it includes the ending paragraph mark as a word If .Words.Count > 1 And .Words.Count <= 9 Then If Not para.Next Is Nothing Then If para.Next.Range.Words.Count > 1 Then With .Words .Item(1).InsertBefore "<h3>" .Item(.Count - 1).InsertAfter "</h3>" End With End If Else With .Words .Item(1).InsertBefore "<h3>" .Item(.Count - 1).InsertAfter "</h3>" End With End If End If End With Next para
hth,
-Peter
> Brilliant! Where do I send the Guiness? > [quoted text clipped - 97 lines] > Joel Finkel > finkel@sd-il.com Peter - 02 Dec 2004 16:44 GMT > I can't thank you enough. Your example was close enough to teach me well, > and I was able to modify it to do just what I need to do. Glad it worked for you. :-)
> Shouldn't serve that Guiness too cold. Room temperature is cold enough. Definitely. And tap is the only way to go.
-Peter
|
|
|