> I need to make a list of all the special (European) foreign-language
> characters in a 70-page document. The characters in the document are many and
> varied. Does anyone know an easy way to find them all and generate such a
> list?
I'm mainly curious, why do you need it? I write in Spanish every
day, and it has these characters that are not in English: áéíóúñü .
I have never needed to count them.
Here's my first thought how to approach it. There's probably
something better.
1. Use a copy of the document. Do a wildcard search for all the
English characters, and delete them all.
2. Also delete all punctuation marks and other characters.
What's left should be what you want. There's several ways to
eliminate the duplicates, but not as simple as the above.
Steven
How about doing a search and replace (with nothing) for
the 26 non-European characters (plus punctuation marks),
and then sorting. You could make a macro to do this if
you have to do it often. Or do you need to know the
exact locations of the letters in the document?
Can you tell me how you make the international "ch" (with
what looks like a little V above it in Czech)?
WordPerfect has the symbol, but it won't embed in Acrobat.
Beth
>-----Original Message-----
>I need to make a list of all the special (European) foreign-language
>characters in a 70-page document. The characters in the document are many and
>varied. Does anyone know an easy way to find them all and generate such a
>list?
> I need to make a list of all the special (European) foreign-language
> characters in a 70-page document. The characters in the document are many and
> varied. Does anyone know an easy way to find them all and generate such a
> list?
Hi Abbey,
The macro below creates such a list, with the code, symbol, and how often it occurs.
The main aim of the macro is to create a text file with the document's text, and &#xNNNN; numeric character references instead of "upper Unicode" characters.
If you don't need or want that, you can delete (or comment out) the line
ActiveDocument.Content.Text = strOld
Regards,
Klaus
Sub Unicode2TagsUnformatted()
'
' tags special characters
' as &#xHHHH; in text files
'
' run SymbolsUnprotect first!
'
' might run into problems with large files
' (memory for string allocation)
Dim Code As Long
Dim flagReplace As Boolean
Dim flagDecorative As Boolean
Dim numCharsRemaining As Long
Dim numCharsAll As Long
Dim percentCharsProcessed As Single
Dim strTag As String
Dim strOutList As String
Dim strOutChar As String
Dim strTest As String
Dim strOld As String
' decide on the kinds of text you want to include:
With ActiveDocument.Content.TextRetrievalMode
.IncludeFieldCodes = True
.IncludeHiddenText = True
End With
StatusBar = "reading document into string..."
strTest = ActiveDocument.Content.Text
strOld = strTest
numCharsAll = Len(strTest)
Do
' update character count:
numCharsRemaining = Len(strTest)
Code = Int2Long(AscW(left(strTest, 1)))
' Which characters should be tagged?
flagReplace = False
flagDecorative = False
Select Case Code
Case 30 To 31
' (protected hyphen, optional hyphen)
flagReplace = True
Case 128 To 159
' shouldn't really appear
flagReplace = True
MsgBox "Unexpected!", , Code
Case 160
' (if you want to tag protected spaces)
flagReplace = True
Case 161 To 254
' Don't really need to be tagged in most cases
' (if you transport from PC to Mac in RTF format)
flagReplace = True
' Remove these characters from the test string:
strTest = Replace(strTest, ChrW(Code), "", , , _
vbBinaryCompare)
Case 255 To &HFFFF&
' regular "upper Unicode" characters
flagReplace = True
Case Else
strTest = Replace(strTest, ChrW(Code), "", _
, , vbBinaryCompare)
' percentCharsProcessed = _
' 100 * (numCharsAll - Len(strTest)) / numCharsAll
' StatusBar = "tagging special chars " _
' & Str(percentCharsProcessed) _
' & "% completed"
End Select
If (flagReplace = True) Then
' build the tag:
strTag = Trim(Hex(Code))
While Len(strTag) < 4
strTag = "0" + strTag
Wend
strTag = "&#x" + strTag + ";"
strOutChar = strTag & vbTab & ChrW(Code)
' Replace all characters with this code:
strOld = Replace(strOld, ChrW(Code), strTag, , , _
vbBinaryCompare)
' Remove these characters from the test string:
strTest = Replace(strTest, ChrW(Code), "", , , _
vbBinaryCompare)
' For the list that's going to be added:
strOutChar = strOutChar & vbTab _
& str(numCharsRemaining - Len(strTest))
strOutList = strOutList & strOutChar & vbCrLf
percentCharsProcessed = _
100 * (numCharsAll - numCharsRemaining) / numCharsAll
StatusBar = "tagging special chars " _
& str(percentCharsProcessed) & "% completed"
End If
Loop Until Len(strTest) = 0
Documents.Add
ActiveDocument.Content.Text = strOld
ActiveDocument.Content.Style = wdStyleNormal
Selection.EndKey unit:=wdStory
Selection.Collapse (wdCollapseEnd)
Selection.TypeParagraph
Selection.TypeText ("Special characters:")
Selection.TypeParagraph
Selection.InsertAfter strOutList
If Selection.Type = wdSelectionNormal Then
Selection.Sort
End If
End Sub