Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / May 2006

Tip: Looking for answers? Try searching our database.

Finding Double-Byte Characters

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Mike Faulkner - 08 May 2006 12:41 GMT
Hello
I would like to view some code that finds Double-Byte Characters. I want to
remove Double-Byte characters before performing a DeltaView comparison.

The code relating to 'Double-Byte' takes a long time to search a large
document.

Any help would be much appreciated.

Regards
Mike
Jonathan West - 08 May 2006 12:48 GMT
> Hello
> I would like to view some code that finds Double-Byte Characters. I want
[quoted text clipped - 8 lines]
> Regards
> Mike

We can't help much unless you show us the code you are using at the moment

Signature

Regards
Jonathan West - Word MVP
www.intelligentdocuments.co.uk
Please reply to the newsgroup
Keep your VBA code safe, sign the ClassicVB petition www.classicvb.org 

Mike Faulkner - 08 May 2006 13:16 GMT
Jonathan

Thanks for your interest. Do you have any code to search for Double-Byte
characters? If not please do not reply.

Regards
Mike

> > Hello
> > I would like to view some code that finds Double-Byte Characters. I want
[quoted text clipped - 10 lines]
>
> We can't help much unless you show us the code you are using at the moment
Tony Jollans - 08 May 2006 13:30 GMT
What do you mean when you say "double byte characters"?

   Do you mean old-style DBCS strings bounded by SO/SI characters?
   Do  you mean any unicode characters stored as UCS-2?
   Do you mean unicode surrogate pairs for code points above plane 0?
   Do you mean any character with an ANSI code higher than 127? Or 255?
   Or what?

What code exactly are you referring to when you say it takes a long time?
Perhaps if you posted it we could see (a) what you were trying to do and (b)
how it  might be possible to improve its performance.

--
Enjoy,
Tony

> Hello
> I would like to view some code that finds Double-Byte Characters. I want to
[quoted text clipped - 7 lines]
> Regards
> Mike
Mike Faulkner - 08 May 2006 13:52 GMT
Tony

Many thanks for replying. The only code I have found is actually in this
Forum. To find it please search on 'Double-Byte'. I have used it. It loops
through every character in a document.

I know that there is a faster way of finding DBCS. I am evaluating a product
called DocXtools by Microsystems. It displays a toolbar and jumps from on
DBCS to another very quickly. Thier code is embedded in DLL's.

What type of DBCS, well I'm not really sure. DeltaView (Document comparison
app.) sometimes hangs when it encounters one.

Many thanks again

Regards
Mike

> What do you mean when you say "double byte characters"?
>
[quoted text clipped - 24 lines]
> > Regards
> > Mike
Doug Robbins - Word MVP - 08 May 2006 18:47 GMT
Help people help you by including the code in your post.

It is quite a reasonable request and your telling people like Jonathon West
not reply if he doesn't happen to have such code at his fingertips is no way
to get help.

Signature

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

> Tony
>
[quoted text clipped - 46 lines]
>> > Regards
>> > Mike
Tony Jollans - 08 May 2006 19:54 GMT
It seems I have to register in some way to get hold of docXtools so I'm not
going to be finding out much about that.

I did have a quick play with DeltaView but it seems to cope just fine with
all unicode characters - normal double byte ones and surrogate pairs.

The only example code I could find by searching this newsgroup looked
explicitly for Hiragani (or Katakana, I forget which now) characters which
happen to be 'double byte' (their presence in a document, however, makes the
whole document double byte) but are hardly the sum total of double byte
characters so I'm afraid I'm none the wiser about what it is you really
want.

It would be better if you could post the code - without it and/or a fuller
description (perhaps docxtools documentation says what it does?) of what you
want I can't really help you any more. If it's a simple find and replace as
per the code I found I doubt you will find anything faster.and I am
surprised you find it particularly slow but I don't know what code in dlls
might be doing that possibly goes way beyond what can be done in VBA.

--
Enjoy,
Tony

> Tony
>
[quoted text clipped - 42 lines]
> > > Regards
> > > Mike
Mike Faulkner - 09 May 2006 02:10 GMT
Tony

Many thanks for your detailed reply. However, I am assured by the
Microsystems (DocX) people that DBCS will occasionally cause DeltaView (DV)
version 2.x to hang.

I run a VBA tool on approx. 5,000 documents. It extracts various items of
information on each document, Revisions, Char styles, Broken styles, DV
Bookmarks & Styles etc.  Adding DBCS to the list would have helped to narrow
down the problem areas users encounter when performing DeltaView comparisons.

I'll speak to the DocX Developers and try and ask a bit more about thier
DBCS search tool.

Once again many thanks for your time.

Regards
Mike

> It seems I have to register in some way to get hold of docXtools so I'm not
> going to be finding out much about that.
[quoted text clipped - 75 lines]
> > > > Regards
> > > > Mike
Jean-Guy Marcil - 09 May 2006 21:39 GMT
Mike Faulkner was telling us:
Mike Faulkner nous racontait que :

> Tony
>
> Many thanks for replying. The only code I have found is actually in
> this Forum. To find it please search on 'Double-Byte'. I have used
> it. It loops through every character in a document.

Let me see if I can get this straight..

You need help, two very knowledgeable people offer to help.
You were rude with one, and told the other to search the group to find
examples of what you mean?

Jeezz... the nerves...

You were very lucky that Tony decided to ignore all this and ploughed on
with offering help.

I understand that you might be busy... but... next time, I might suggest
that you were a but more considerate to those who give up part of their free
time to help others...

Signature

Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
jmarcilREMOVE@CAPSsympatico.caTHISTOO
Word MVP site: http://www.word.mvps.org 

Klaus Linke - 12 May 2006 15:39 GMT
My sentiments exactly  :-)

A wildcard search would be fast, too. Say "Match wildcards",
Find what: [!^001-^0255]

I can't imagine that any Word add-in in a halfway recent version (post-97)
has general problems with Unicode, though.
You might try to find out which specific characters, if any, DeltaView has
problems with.

Regards
Klaus

> Mike Faulkner was telling us:
> Mike Faulkner nous racontait que :
[quoted text clipped - 19 lines]
> that you were a but more considerate to those who give up part of their
> free time to help others...
Mike Faulkner - 12 May 2006 18:09 GMT
Klaus

Many thanks for your advice. Workshare (DeltaView) are reluctant to reveal
what stops it's product and 'hangs' Word.

MicroSystems (DocXtools) insist that DBCS cause Comparison problems. Their
Toolbar is very fast on a 150 page document. However, it's possible that
their Discovery (DocXtools) Bookmarks the DBCS's and the toolbar simply jumps
from one bookmark to the next. This would explain why it takes 5 minutes to
'Discover' a 150 page document. It's looping through every character and
testing whether it's a DBCS.

Regards
Mike

> My sentiments exactly  :-)
>
[quoted text clipped - 32 lines]
> > that you were a but more considerate to those who give up part of their
> > free time to help others...
Tony Jollans - 15 May 2006 15:34 GMT
What I note about this is the consistent use of the terms "Double Byte" and
"DBCS", rather than Unicode, and I wonder whether this is the real issue. I
can well imagine that modern products could get hung up with 'old' DBCS
data.

--
Enjoy,
Tony

> Klaus
>
[quoted text clipped - 54 lines]
> > > jmarcilREMOVE@CAPSsympatico.caTHISTOO
> > > Word MVP site: http://www.word.mvps.org
Mike Faulkner - 16 May 2006 09:33 GMT
Tony

Many thanks for replying.

Now I understand why Microsystems(DocXtools) insist that DBCS can interfere
with Comparison tools. Approx. 80% of our documents are copies (Dupe &
Revise) of old documents. This practise ensures that old problems/issues,
like DBCS, are continually carried forward.

DocXtools has identified 5 DBCS in a document this morning. They are 'blobs'
(large fullstop) and are visible. The Word Symbol chart identifies them as -
SPACE, Character Code: 32, from: ASCII(decimal).

Regards
Mike

> What I note about this is the consistent use of the terms "Double Byte" and
> "DBCS", rather than Unicode, and I wonder whether this is the real issue. I
[quoted text clipped - 70 lines]
> > > > jmarcilREMOVE@CAPSsympatico.caTHISTOO
> > > > Word MVP site: http://www.word.mvps.org
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.