MS Office Forum / Word / Programming / January 2005
Regex & Wildcards
|
|
Thread rating:  |
Vince - 06 Jan 2005 04:30 GMT Hey,
I need to find the following by matching Wild Cards.
1.1 mol/L 1 mol/L 1mol/L 1.1 mol /L 1 mol /L 1mol /L 1.1 mol / L 1 mol / L 1mol / L 1.1 mol/ L 1 mol/ L 1mol/ L
A sentence could contain any one this. For instance "James drank a solution of Nitrogen Peroxide with a concentration of 5.15 mol/L". This is what I could come up with:
([0-9.]@)( @)(mol/L)
Takes care of any numerals / decimals but does not account for: a) The space between the number and mol/L (It looks for one space or more but there is a possibility that a space might not exist like 1.1mol/L) b) It strictly looks for mol/L and can't account for mol / L, mol/ L or mol /L. In order to use this, I would have to repeat each instance with appropriate spaces!
Questions: 1) How do I write a single Wildcard match for all the possibilities listed above? 2) How can I say "Optional" in Regex. Eg. Di[peg] could be anyone of "Dig" "Dip" or "Die". But I need to say that "Di" may or may not be followed by "p" "e" or "g". In Perl, I would say "(Di)([epg])*" How do I say that in VBA?
Thanks a lot for your time / any reponse.
Vince
Helmut Weber - 06 Jan 2005 11:19 GMT Hi Vince, before putting much effort into something, that is hardly possible, as wildcard search does not allow to search for zero or more occurences, why not adjusting the text beforehand, like
Sub Test777() ResetSearch Dim rDcm As Range Set rDcm = ActiveDocument.Range With rDcm.Find .Text = "mol" .Replacement.Text = " mol" .Execute Replace:=wdReplaceAll .Text = "mol[ ]{1,}/" .Replacement.Text = "mol/" .MatchWildcards = True .Execute Replace:=wdReplaceAll .Text = "mol/[ ]{1,}L" .Replacement.Text = "mol/L" .MatchWildcards = True .Execute Replace:=wdReplaceAll .Text = "[ ]{1,}mol/L" .Replacement.Text = " mol/L" .MatchWildcards = True .Execute Replace:=wdReplaceAll End With End Sub '--- Public Sub ResetSearch() With Selection.Find .ClearFormatting .Replacement.ClearFormatting .Text = "" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False .Execute End With End Sub
HTH
Greetings from Bavaria, Germany Helmut Weber, MVP "red.sys" & chr(64) & "t-online.de" Word XP, Win 98 http://word.mvps.org/
Vince - 07 Jan 2005 02:13 GMT Hey Helmut,
Thanks for your response.
I wanted to save efforts by coming up with a text file that contained all find and replace conditions. At the risk of boring you, please allow me to explain.
Problem: I am trying to copy edit word files and part of the long list of copy editing rules, involves separating numerals and units of the format "numeral thin space unit". So, I copied a huge list of units from the internet and wrote a function that reads from a text file and does the find and replace automatically. For instance, the text file could be:
([0-9.]@)( @)(mol/L)SPLIT\1^s\3SPLITTRUESPLITTRUE ' This tells the program to find the first part before the first split, replace it with the ([0-9.]@)( @)(m/s)SPLIT\1^s\3SPLITTRUESPLITTRUE ' part before the second split, match wild characters and be case sensitive
Basically, I wanted this text file to be edited by the user so that they can add their own units that I missed. But, the problem or rather, the inconvenience is that they need to type all possibilities into the file. For instance, the above would be:
([0-9.]@)( @)(mol/L)SPLIT\1^s\3SPLITTRUESPLITTRUE ' This tells the program to find the first part before the first split, replace it with the ([0-9.]@)( @)(m/s)SPLIT\1^s\3SPLITTRUESPLITTRUE ' part before the second split, match wild characters and be case sensitive ([0-9.]@)( @)(mol / L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the program to find the first part before the first split, replace it with the ([0-9.]@)( @)(m / s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second split, match wild characters and be case sensitive ([0-9.]@)( @)(mol /L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the program to find the first part before the first split, replace it with the ([0-9.]@)( @)(m /s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second split, match wild characters and be case sensitive ([0-9.]@)( @)(mol/ L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the program to find the first part before the first split, replace it with the ([0-9.]@)( @)(m/ s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second split, match wild characters and be case sensitive
This two units, multplies to over 8 lines! This could slow down the program (Don't really mind that...) but the main problem is that the text file could become a little too big in the long run. This is why I was wondering if I could somehow accomodate the possiblities in the text file to begin with (using some wildcard search).
What I could do, however, is to use your method so that the program (when reading from the file) also makes rooms for the possibilites listed above. If you have a better idea, please let me know.
Thank you for your time.
Vince
> Hi Vince, > before putting much effort into something, [quoted text clipped - 50 lines] > Word XP, Win 98 > http://word.mvps.org/ Helmut Weber - 07 Jan 2005 09:52 GMT Hi Vince, not that I understand all, but for things like:
"mol /L", "mol/ L", "mol / L", "mol / L" "m /s", "m / s", "m/ s", "m /s", "m / s"
a possible workaround would be to replace first "/" by " / ", in order to overcome the limition that there is no search for zero ore more occurences of a character. So we add additional characters first! After that, each "/" would be surrounded by spaces. And after that, the following search using wildcards would find all occurences of [ ]{1,}/[ ]{1,} and can be replaced by "/": Resulting in "mol/L", "m/s".
And there may be more such simple tricks.
HTH Greetings from Bavaria, Germany Helmut Weber, MVP "red.sys" & chr(64) & "t-online.de" Word 2002, Windows 2000
Vince - 07 Jan 2005 10:07 GMT Hey Helmut,
Thanks, that's a great idea! I just have to find out if adding a space before and after every slash in the document is acceptable (what if there's some text that has a '/' and is not a unit). But, I don't think it should be a problem.....
Thanks, again!
Vince
> Hi Vince, > not that I understand all, but for things like: [quoted text clipped - 19 lines] > "red.sys" & chr(64) & "t-online.de" > Word 2002, Windows 2000 Helmut Weber - 07 Jan 2005 10:50 GMT Hi Vince,
just one more word, depending on how big and how complex your docs are, and on how much effort is justified, one could even create a macro, that after removing all spaces from slashes, highlights all units as they are defined in a list, and locates "/" that are not highlighted. And many more variations.
Cheers
Helmut Weber
Vince - 07 Jan 2005 11:02 GMT Thanks, Helmut!
Excellent idea. I am changing everything coming from the text file to Green color. Easy to detect odd ones out like you mentioned.
Thanks, again!
Vince
> Hi Vince, > [quoted text clipped - 10 lines] > > Helmut Weber
|
|
|