Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / December 2004

Tip: Looking for answers? Try searching our database.

Using Microsoft VBScript Regular Expressions 5.5

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Joel Finkel - 01 Dec 2004 16:57 GMT
Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML.  The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser.  For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

   Set re = New RegExp
   
   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True
   
   s = re.Replace(s, "\r<h3>\1</h3>\r")
   
End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?  

I tried this:

   Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions.  Oh yes, I am using Word 2003.

Joel Finkel
finkel@sd-il.com
Peter - 01 Dec 2004 18:07 GMT
How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML.  The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser.  For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

   Set re = New RegExp
   
   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True
   
   s = re.Replace(s, "\r<h3>\1</h3>\r")
   
End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?  

I tried this:

   Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions.  Oh yes, I am using Word 2003.

Joel Finkel
finkel@sd-il.com
Joel Finkel - 01 Dec 2004 18:15 GMT
I have done this:

   Dim rng As Range
   Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End)
   Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works.  However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.

/Joel Finkel
finkel@sd-il.com

How are you setting up the Selection?

-Peter

"Joel Finkel" <finkel@sd-il.com> wrote in message news:ONbSWb81EHA.1404@TK2MSFTNGP11.phx.gbl...
Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML.  The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser.  For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

   Set re = New RegExp
   
   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True
   
   s = re.Replace(s, "\r<h3>\1</h3>\r")
   
End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?  

I tried this:

   Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions.  Oh yes, I am using Word 2003.

Joel Finkel
finkel@sd-il.com
Peter - 01 Dec 2004 18:38 GMT
Pass it as a Range, not a String:

   Dim rng As Range
   Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End)
   Call DoSectionHeaders(rng.FormattedText)

Sub DoSectionHeaders(ByRef r As Range)
   
   Set re = New RegExp
   
   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True
   
   r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r")
   
End Sub

Doesn't matter if you pass byref of byval if you use the Range object, too.
I think that applies to all objects, but my theory is a little rusty.

hth,

-Peter

I have done this:

   Dim rng As Range
   Set rng = ActiveDocument.Range(Start:=0, End:¬tiveDocument.Content.End)
   Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works.  However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.

/Joel Finkel
finkel@sd-il.com

"Peter" <peterguy -at- hotmail -dot- com> wrote in message news:uLdqSB91EHA.1188@tk2msftngp13.phx.gbl...
How are you setting up the Selection?

-Peter

"Joel Finkel" <finkel@sd-il.com> wrote in message news:ONbSWb81EHA.1404@TK2MSFTNGP11.phx.gbl...
Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML.  The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser.  For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

   Set re = New RegExp
   
   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True
   
   s = re.Replace(s, "\r<h3>\1</h3>\r")
   
End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?  

I tried this:

   Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions.  Oh yes, I am using Word 2003.

Joel Finkel
finkel@sd-il.com
Joel Finkel - 01 Dec 2004 19:20 GMT
Brilliant!  Where do I send the Guiness?

Of course, now I have problems with the RegExp but that will have to wait
until tonight,

Thanks.

/Joel

Pass it as a Range, not a String:

   Dim rng As Range
   Set rng = ActiveDocument.Range(Start:=0,
End:=ActiveDocument.Content.End)
   Call DoSectionHeaders(rng.FormattedText)

Sub DoSectionHeaders(ByRef r As Range)

   Set re = New RegExp

   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True

   r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r")

End Sub

Doesn't matter if you pass byref of byval if you use the Range object, too.
I think that applies to all objects, but my theory is a little rusty.

hth,

-Peter

I have done this:

   Dim rng As Range
   Set rng = ActiveDocument.Range(Start:=0,
End:=ActiveDocument.Content.End)
   Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works.  However, when
the Call returns the rng.FormattedText is unchanged even though I explicitly
call it by reference.

/Joel Finkel
finkel@sd-il.com

"Peter" <peterguy -at- hotmail -dot- com> wrote in message
news:uLdqSB91EHA.1188@tk2msftngp13.phx.gbl...
How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word
format to simple (and I mean simple) HTML.  The script simply performs a
long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed
without using a more robust regular expression parser.  For example, I need
to find a single line that comprises 8 or fewer words and that is followed
by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and
created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

   Set re = New RegExp

   re.Pattern = "\r(\w{1,8})\r{1}"
   re.IgnoreCase = True
   re.Global = True

   s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can
be processed?

I tried this:

   Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions.  Oh yes, I am using Word 2003.

Joel Finkel
finkel@sd-il.com
Peter - 01 Dec 2004 21:02 GMT
Mmm, Guiness.  Good stuff, served correctly.

Looking at what you're doing, you _might_ be able to do away with the regexp and parse your document by paragraph.
Perhaps the structure of your other processing won't fit with that, but it might make it a bit easier in this case (testing for the next paragraph made it messy):

Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
 With para.Range.FormattedText
   ' one thing to remember about a paragraph is that
   ' it includes the ending paragraph mark as a word
   If .Words.Count > 1 And .Words.Count <= 9 Then
     If Not para.Next Is Nothing Then
       If para.Next.Range.Words.Count > 1 Then
         With .Words
           .Item(1).InsertBefore "<h3>"
           .Item(.Count - 1).InsertAfter "</h3>"
         End With
       End If
     Else
       With .Words
         .Item(1).InsertBefore "<h3>"
         .Item(.Count - 1).InsertAfter "</h3>"
       End With
     End If
   End If
 End With
Next para

hth,

-Peter

> Brilliant!  Where do I send the Guiness?
>
[quoted text clipped - 90 lines]
> Joel Finkel
> finkel@sd-il.com
Joel Finkel - 02 Dec 2004 01:04 GMT
Peter,

I can't thank you enough.  Your example was close enough to teach me well,
and I was able to modify it to do just what I need to do.

Shouldn't serve that Guiness too cold.  Room temperature is cold enough.

/Joel Finkel
finkel@sd-il.com

Mmm, Guiness.  Good stuff, served correctly.

Looking at what you're doing, you _might_ be able to do away with the regexp
and parse your document by paragraph.
Perhaps the structure of your other processing won't fit with that, but it
might make it a bit easier in this case (testing for the next paragraph made
it messy):

Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
 With para.Range.FormattedText
   ' one thing to remember about a paragraph is that
   ' it includes the ending paragraph mark as a word
   If .Words.Count > 1 And .Words.Count <= 9 Then
     If Not para.Next Is Nothing Then
       If para.Next.Range.Words.Count > 1 Then
         With .Words
           .Item(1).InsertBefore "<h3>"
           .Item(.Count - 1).InsertAfter "</h3>"
         End With
       End If
     Else
       With .Words
         .Item(1).InsertBefore "<h3>"
         .Item(.Count - 1).InsertAfter "</h3>"
       End With
     End If
   End If
 End With
Next para

hth,

-Peter

> Brilliant!  Where do I send the Guiness?
>
[quoted text clipped - 97 lines]
> Joel Finkel
> finkel@sd-il.com
Peter - 02 Dec 2004 16:44 GMT
> I can't thank you enough.  Your example was close enough to teach me well,
> and I was able to modify it to do just what I need to do.

Glad it worked for you. :-)

> Shouldn't serve that Guiness too cold.  Room temperature is cold enough.

Definitely.  And tap is the only way to go.

-Peter

Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.