Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / March 2008

Tip: Looking for answers? Try searching our database.

Word frequency count

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Raj - 05 Mar 2008 11:13 GMT
Hi,

I have a word document with the following  words(sample) :

Word_this_and that KK857 9269875  King king  King

I want a macro that shows word frequency count using space as the word
separator. The count of above should be displayed as:

Word_this_and                1
that                                1
KK857                            1
9269875                         1
King                               2
king                               1

Thanks in advance for the help

Raj
Greg Maxey - 05 Mar 2008 13:08 GMT
You will get close with this:

http://gregmaxey.mvps.org/Word_Frequency.htm

I haven't tried, but you might be able to revise the code that deals with
"singleword" to expand the scope of singleword passed the "_" character.

Signature

~~~~~~~~~~~~~~~~~~~~~~~~~~~
Greg Maxey -  Word MVP

My web site http://gregmaxey.mvps.org
Word MVP web site http://word.mvps.org
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> Hi,
>
[quoted text clipped - 15 lines]
>
> Raj
Klaus Linke - 06 Mar 2008 20:06 GMT
Playing with the scripting dictionary recently, I've written such a macro...

You need to check the "MIcrosoft Scripting Runtime" in "Tools > References".

Regards,
Klaus

 Dim sText As String
 sText = ActiveDocument.Content.Text
 Dim dicWords As New Scripting.Dictionary
 Dim vPlaces(0)
 Dim vItem As Variant
 Dim i As Long
 Dim sWord As String, sChar As String
 sWord = ""
 For i = 1 To Len(sText)
   sChar = MID(sText, i, 1)
   Select Case sChar
     ' delimiters:
     Case ChrW(9), ChrW(10), ChrW(13), ChrW(32), ChrW(34), ".", ",", _
       "<", ">", "/", ";", _
       "(", ")", "[", "]", _
       ":", "#", "+", "*", _
       "?", "!"
       If Len(sWord) <> 0 Then
          If dicWords.Exists(key:=sWord) Then
           vItem = dicWords(key:=sWord)
           ReDim Preserve vItem(UBound(vItem) + 1)
           vItem(UBound(vItem)) = i
           dicWords(key:=sWord) = vItem
          Else
           vPlaces(0) = i
           dicWords.Add sWord, vPlaces
          End If
          sWord = ""
       End If
    Case Else
' Or, if you want to define what characters can appear in words explicitly
' (and ignore the rest):
'      Case "a" To "z", "@", "~", _
'        "$", "%", "§", "&", "A" To "Z", _
'        "0" To "9", "'", "-", "_"
       sWord = sWord & sChar
   End Select
 Next i
 Dim vWord As Variant
 For Each vWord In dicWords.Keys
   Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1
 Next vWord
Klaus Linke - 06 Mar 2008 20:21 GMT
BTW, the output does not show the real power of using the scripting
dictionary.
I wrote it that way to store the position (of the last character) with the
word.
The plan was to speed up look-up of words in a large file.

This is demonstrated if you replace the output section at the end with

 Dim vWord As Variant
 For Each vWord In dicWords.Keys
   Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1,
   For i = LBound(dicWords.Item(key:=vWord)) To
UBound(dicWords.Item(key:=vWord))
     Debug.Print dicWords.Item(vWord)(i) - Len(vWord);
     Debug.Print "/";
   Next i
   Debug.Print
 Next vWord

The output:

Word_this_and  1             1 /
that           1             15 /
KK857          1             20 /
9269875        1             26 /
King           2             35 / 46 /
king           1             40 /

Klaus
Greg Maxey - 08 Mar 2008 14:21 GMT
Klaus,

Nice job!

I don't understand this part:
> ' Or, if you want to define what characters can appear in words
> explicitly ' (and ignore the rest):
> '      Case "a" To "z", "@", "~", _
> '        "$", "%", "§", "&", "A" To "Z", _
> '        "0" To "9", "'", "-", "_"

Can you post (or send me e-mail) showing exactly how the code would look
using this option and an example.  Thanks.

Signature

~~~~~~~~~~~~~~~~~~~~~~~~~~~
Greg Maxey -  Word MVP

My web site http://gregmaxey.mvps.org
Word MVP web site http://word.mvps.org
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> Playing with the scripting dictionary recently, I've written such a
> macro...
[quoted text clipped - 45 lines]
>    Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1
>  Next vWord
Raj - 10 Mar 2008 05:20 GMT
> Klaus,
>
[quoted text clipped - 70 lines]
>
> - Show quoted text -

Hi Klaus,

It worked. Thanks Klaus.

Regards,
Raj
Klaus Linke - 10 Mar 2008 16:55 GMT
> Nice job!

Oy, thanks!!

> I don't understand this part:
>> ' Or, if you want to define what characters can appear in words
>> explicitly ' (and ignore the rest):
>> '      Case "a" To "z", "@", "~", _
>> '        "$", "%", "§", "&", "A" To "Z", _
>> '        "0" To "9", "'", "-", "_"

You would replace the "Case Else" condition with this condition.
Only the characters listed would make it into "words".
All other characters would simply be removed (... nothing is done if they
are encountered).

Say if I take the original example sentence:

Word_this_and that KK857 9269875  King king  King

and use

[...]
     Case "a" To "z", "A" To "Z"
         sWord = sWord & sChar
 End Select

(notice that the numbers 0 to 9, and the underscore "_", neither appear
under the word delimiters now, nor under the word characters), I'd get

Wordthisand    1             3 /
that           1             15 /
KK             1             23 /
King           2             35 / 46 /
king           1             41 /

Regards,
Klaus
Klaus Linke - 10 Mar 2008 17:00 GMT
> Wordthisand    1             **3** /

... shows I haven't debugged this: If I ignore characters, I can't tell the
start position reliably any more.

Klaus

Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.