MS Office Forum / Word / Programming / January 2006
Range.Text and Range.Characters
|
|
Thread rating:  |
dan8 - 05 Jan 2006 14:36 GMT Dear colleagues,
I develop MS Word Add-in , the goal is to perform specific search in Word docment and then set cursor to found positions.
Range.Characters returns a collection of single- of double-byte characters. The case of double-byte chars is e.g. document with a table. Table contains some "D7" charatcers.
Characters is too slow way to access the whole document text, so I use Range.Text. However, Text returns me a string of single-byte characters. D and 7 are two separate characters , so Range.Text turns out to be longer than Range.Characters.Count.
This becomes a problem , when I use Range.Move to set cursor to positions found in Range.Text . Move uses the same way as Characters, i.e. D7 is single character. So cursor position becomes incorrect.
I need at least one of the following :
- be able to get text as multibyte string, where two-byte character is one character. Collecting text from Charatcers property does not work quickly enough.
- be able to position (Move) cursor by bytes, not characters.
Thank you in advance for help !
zkid - 07 Jan 2006 01:45 GMT A little confused on what it is you're trying to accomplish. Are you just searching a table, cell by cell? What do you mean that D7 is a single character? Can you provide a sample of what might be contained in a cell and for what you're actually testing?
> Dear colleagues, > [quoted text clipped - 23 lines] > > Thank you in advance for help ! dan8 - 07 Jan 2006 16:37 GMT > A little confused on what it is you're trying to accomplish. Are you just > searching a table, cell by cell? What do you mean that D7 is a single > character? Can you provide a sample of what might be contained in a cell and > for what you're actually testing? Don't mind what Add-In is seaching for and how.
I need to get text of Word document, then find some positions in this text , and be able to set cursor to these positions in original document.
Example of problem is following :
Let's consider a Word document with plain text : abcdef If we get ActiveDocument.Range(EmptyParam,EmptyParam), and then get Text property of this Range, we have "abcdef" . Characters property contains the same , 6 items (Items, one by one): a b c d e f , so we are happy.
Now let's insert an empty 2x2 cell table between 'c' and 'd' , get the same Range, Text and Characters : Text="abc#$D#$D#7#$D#7#$D#7#$D#7#$D#7#$D#7def" (so table added 13 characters to Text string) and, Characters property (Items, one by one) are : a b c #$D #$D#7 #$D#7 #$D#7 #$D#7 #$D#7 #$D#7 d e f (Table added 7 items to Characters property, last 6 of them are double-byte).
if I perform search for "d" in Text, the position will be 17 . But setting cursor to position 17 using Range.Move is incorrect, because Move works in the same way as Characters property (sees double-byte character as single character).
The obvious solution is to collect text from Characters property , but this is too slow for large documents. I have to use Text property.
Do you understand the Problem now ?
Thank you.
Jay Freedman - 07 Jan 2006 18:54 GMT Hi Dan,
I believe you're going about the job the wrong way. Using absolute character positions in a Word document is too unreliable -- there are other things that can mess up the count, including hidden text, fields, characters from double-character fonts...
Instead, declare a Word.Range object, set it to the document's range, and use its .Find method to search for the desired text. When .Find.Execute returns True, call the .Select method of the Range object.
I gather from your sample that you're automating this from .Net code, and I'm not familiar enough with that to give you a working code sample, but in VBA it would be
Dim oRg As Range Set oRg = ActiveDocument.Range With oRg.Find .ClearFormatting .Text = "d" ' fill in your search string .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = False If .Execute Then oRg.Select End If End With
Even in a very large document this will be very fast. You can include additional criteria in the .Find parameters (e.g., set .Format = True and .Font.Bold = True to find only occurrences that are bold), or you can add an If statement around the .Select (e.g., If oRg.Information(wdWithinTable) Then to select only occurrences within tables).
-- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit.
>> A little confused on what it is you're trying to accomplish. Are you just >> searching a table, cell by cell? What do you mean that D7 is a single [quoted text clipped - 51 lines] > >Thank you. dan8 - 08 Jan 2006 08:31 GMT > I believe you're going about the job the wrong way. Using absolute > character positions in a Word document is too unreliable -- there are > other things that can mess up the count, including hidden text, > fields, characters from double-character fonts... Yes. What I need is just the same positions for searching and further cursor setting. This text may contain any hidden , formatting characters, etc. Actually, Range.Characters and Range.Move does the work without errors. But getting whole text by collecting Characters.Items is too slow.
> Instead, declare a Word.Range object, set it to the document's range, > and use its .Find method to search for the desired text. When > ..Find.Execute returns True, call the .Select method of the Range > object. Thank you for the idea and code sample. But my search is too specific, it can not be implemented with Word Range.Find . Actually, this is the reason why user needs Add-In and can't use standard Word search.
Tony Jollans - 07 Jan 2006 19:07 GMT Terminology is a little bit of a problem here!
Each Item in Range.Characters is actually, itself, a Range which may contain more than one character (with a small c, in other words the normal English meaning of the word) - each of these characters normally occupies two bytes although that is not really relevant to the issue at hand.
Now to your problem! I don't think you can do what you are asking and the best way to proceed rather depends on what you are really trying to do.
Are you processing (or do you want to process) every character in the text or are you simply searching for something?
If you are processing every character it's probably as easy as anything to keep your own count. Assuming you are not working with Unicode code points above U+FFFF, the "D7" is one of very few such sequences (off the top of my head I can't think of another) and you should be able to hard code them. You may find it helpful to compare Len(Range) with Range.Characters.Count or, if tables are your only concern, checking Range.Tables.Count might be useful. I suspect, however, that you may run into other problems with things such as Fields and, perhaps, inserted symbols from Symbol Fonts. In specific instances you may be alright but in an AddIn that may be more general purpose it could be harder to handle all the possibilities.
If you are just trying to search for a character, why not use Word's Find? It will deal with all the issues and probably be more efficient than anything you can write.
-- Enjoy, Tony
> Dear colleagues, > [quoted text clipped - 23 lines] > > Thank you in advance for help ! dan8 - 08 Jan 2006 08:41 GMT > Each Item in Range.Characters is actually, itself, a Range which may contain > more than one character (with a small c, in other words the normal English > meaning of the word) - each of these characters normally occupies two bytes > although that is not really relevant to the issue at hand. As I wrote above, using Range.Characters and then Range.Move does the work. I turn Text of each character (1 or 2 bytes) to 2-byte character in Unicode string, perform my search in this string, and then use Range.Move in original document. And positioon is always correct. The only problem is performance of first step, collecting document text from Characters.
> Are you processing (or do you want to process) every character in the text > or are you simply searching for something? Searching, but the search itself is complicated and can not be implemented by embedded Range.Search. I need this string in my Add-In code.
> In specific > instances you may be alright but in an AddIn that may be more general > purpose it could be harder to handle all the possibilities. Quite so :( Add-In should work well in any Word document. This is not a problem for search, if some hidden characters appears in text being searched. But further cursor positioning should take these characters into account as well, and set cursor to correct visible position.
Tony Jollans - 08 Jan 2006 11:38 GMT There isn't really any way to reliably identify the position you want from the text string you have.
What is it about the search that means you can't use Word's own Find? From what you've said, it only involves normal characters, and if you can code the logic for your own search through text it ought to be possible to code the same logic using one or more Finds.
-- Enjoy, Tony
> > Each Item in Range.Characters is actually, itself, a Range which may contain > > more than one character (with a small c, in other words the normal English [quoted text clipped - 22 lines] > But further cursor positioning should take these characters into account as > well, and set cursor to correct visible position. dan8 - 08 Jan 2006 14:46 GMT > There isn't really any way to reliably identify the position you want from > the text string you have. However, Word's own Find does this somehow . I am wondering about the same ways to access text and positioning, but with my own search.
> What is it about the search that means you can't use Word's own Find? From > what you've said, it only involves normal characters, Character was only an illustration. Actually user will search for words in text, matching to specific criteria. Particularly, there will be so called fuzzy search , where target words has no more than k differences from pattern. Or, search without pattern at all - for words that presents several times in text (number of occurences is user-defined parameter). Just beleive that standard search is not right tool here, even with wildcards.
Tony Jollans - 08 Jan 2006 16:44 GMT > However, Word's own Find does this somehow . I am wondering about the same > ways to access text and positioning, but with my own search. You have the access - you just find the performance unacceptable.
I do appreciate that doing some things using Word's object model can be slow and painful and all I can really offer, if you have to work with the text, is the comparison of Len(Range.Text) with Range.Characters.Count I mentioned earlier - it doesn't give you a direct answer but it does alert you to the situations where some further processing will be required. As I write this I wonder whether, having done your own search within the text and (presumably) identified an exact string, you could then use Word's Find on the Range for that exact string which would provide you with the correct location (maybe even search backwards from the text offset you have identified to minimise the distance Word has to look - as you know the text position is greater than or equal to the 'range position').
-- Enjoy, Tony
> > There isn't really any way to reliably identify the position you want from > > the text string you have. [quoted text clipped - 11 lines] > times in text (number of occurences is user-defined parameter). > Just beleive that standard search is not right tool here, even with wildcards. dan8 - 09 Jan 2006 08:05 GMT > You have the access - you just find the performance unacceptable. Word performs it's own search and positioning much quicker than me using Range.Characters and Range.Move. The bottleneck is exactly Range.Characters, I have traced the code. That's why I suspect the existence of quicker methods..
> all I can really offer, if you have to work with the text, > is the comparison of Len(Range.Text) with Range.Characters.Count I mentioned [quoted text clipped - 6 lines] > the distance Word has to look - as you know the text position is greater > than or equal to the 'range position'). Thank you for the idea, I agree this is the correct approach. But only when we are sure there is no other way (apart from Range.Charatcers) to get string with multibyte characters.
Tony Jollans - 09 Jan 2006 12:06 GMT All I can really do is wish you luck. I know of no way through the Word Object Model to get what you ask for.
-- Enjoy, Tony
> > You have the access - you just find the performance unacceptable. > [quoted text clipped - 17 lines] > we are sure there is no other way (apart from Range.Charatcers) to get > string with multibyte characters. dan8 - 09 Jan 2006 15:20 GMT > All I can really do is wish you luck. I know of no way through the Word > Object Model to get what you ask for. Ok, thank you for attention to my problem !
Klaus Linke - 12 Jan 2006 15:42 GMT > Thank you for the idea, I agree this is the correct approach. But only > when we are sure there is no other way (apart from Range.Charatcers) > to get string with multibyte characters. Range.Text should work fine for that.
In tables, you don't have D7 characters (×), you do have D and 7 as two characters (which together are called "end-of-cell-markers" or "end-of-row-markers".
Regards, Klaus
|
|
|