Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / November 2006

Tip: Looking for answers? Try searching our database.

Differences in ReadabilityStatistics

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
paxdominus - 08 Nov 2006 20:20 GMT
Greetings,

I'm trying to capture the Flesch-Kincaid Grade Level for a bunch of
documents, which I know how to do, but when comparing it (and all of
the ReadabilityStatistics) programmatically to the pop-up box, the
numbers are noticably different.

RS              PopUp        Program

Words             740                 421
Characters    3221               2078
Paragraphs    24                21
Sentences    48                32
Sen/Para    2.6                 1.6
Words/Sen      14.4              13.7
Char/Word    4.1                4.1
Passive             4                   0
FRE             79.1                  82.5
FKGL            5.1                  4.3

It's the exact same document in both places. Obviously, the numbers are
different enough to cause concern, since decisions about how to use the
document are based upon these numbers.

Why are they different? Is there any way to get the actual numbers from
the pop-up box programatically?
Jay Freedman - 09 Nov 2006 02:51 GMT
I can't tell where your "Popup" and "Program" numbers are coming from,
but I can tell you that Word has several different ways of counting
characters, words, sentences, and paragraphs, and none of them agree.

- You can get the values from the Readability Statistics dialog that
appears at the end of a spelling/grammar check.

- You can look at the results of the Tools > Word Count dialog.

- You can take the .Count values of the various collections in the
object model.

This demo shows how to get those values, and running it against any
moderate-sized document will show how far out of whack the methods
are. Further, it shows that the ratios in the Readability Statistics
often don't match the values they're supposed to be calculated from.

Sub Discrepancies()
   Dim msg As String
   ActiveDocument.Repaginate
   
   msg = "From ToolsWordCount -----------"
   msg = msg & vbCr & "Characters (with spaces):" & vbTab & _
       Dialogs(wdDialogToolsWordCount) _
           .CharactersIncludingSpaces
   msg = msg & vbCr & "Characters (no spaces):" & vbTab & _
       Dialogs(wdDialogToolsWordCount) _
           .Characters
   msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
       Dialogs(wdDialogToolsWordCount).Words
   msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
       Dialogs(wdDialogToolsWordCount).Paragraphs
   msg = msg & vbCr & "Sentences:" & vbTab & vbTab & "n/a"
   
   msg = msg & vbCr & vbCr & _
       "From Readability Stats------------"
   msg = msg & vbCr & "Characters:" & vbTab & vbTab & _
       ActiveDocument.ReadabilityStatistics(2).Value
   msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
       ActiveDocument.ReadabilityStatistics(1).Value
   msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
       ActiveDocument.ReadabilityStatistics(3).Value
   msg = msg & vbCr & "Sentences:" & vbTab & vbTab & _
       ActiveDocument.ReadabilityStatistics(4).Value

   ' first value,  in parens, is the calculated ratio
   ' second value is the one returned by the dialog
   msg = msg & vbCr & "Sen/Para:  " & Format( _
       CSng(ActiveDocument.ReadabilityStatistics(4).Value) / _
       ActiveDocument.ReadabilityStatistics(3).Value, "(= 0.0)") _
       & vbTab & vbTab & _
       ActiveDocument.ReadabilityStatistics(5).Value
   msg = msg & vbCr & "Words/Sen:  " & Format( _
       CSng(ActiveDocument.ReadabilityStatistics(1).Value) / _
       ActiveDocument.ReadabilityStatistics(4).Value, "(= 0.0)") _
       & vbTab & _
       ActiveDocument.ReadabilityStatistics(6).Value
   msg = msg & vbCr & "Char/Word:  " & Format( _
       CSng(ActiveDocument.ReadabilityStatistics(2).Value) / _
       ActiveDocument.ReadabilityStatistics(1).Value, "(= 0.0)") _
       & vbTab & _
       ActiveDocument.ReadabilityStatistics(7).Value
   
   msg = msg & vbCr & vbCr & _
       "From Object Model---------------"
   msg = msg & vbCr & "Characters:" & vbTab & vbTab & _
       ActiveDocument.Characters.Count
   msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
       ActiveDocument.Words.Count
   msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
       ActiveDocument.Paragraphs.Count
   msg = msg & vbCr & "Sentences:" & vbTab & vbTab & _
       ActiveDocument.Sentences.Count
   
   MsgBox msg
End Sub

The case of the Flesch Reading Ease and Flesch-Kincaid Grade Level is
even worse. The published formulas for those scores involve the
average number of syllables per word. Nobody outside Microsoft knows
how Word determines the syllable count and the average -- presumably
it's based on the hyphenation dictionary, but it could be just a
rule-of-thumb estimate or a hardcoded average. You can't trust it.

So my advice is not to rely on any of these methods, especially if the
results will be used to classify documents for any important purpose.
Either get a piece of open-source software (not that I know of any,
but some academic must have written some) that can be calibrated
properly, or do a manual count of a representative sample.

--
Regards,
Jay Freedman
Microsoft Word MVP        FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the
newsgroup so all may benefit.

>Greetings,
>
[quoted text clipped - 22 lines]
>Why are they different? Is there any way to get the actual numbers from
>the pop-up box programatically?
paxdominus - 09 Nov 2006 16:25 GMT
Greetings,

Since I figured that the Pop-Up stats were the actual
ReadabilityStatistices(1-10), and since programmatically I'm pulling
those numbers from ReadabilityStatistics(1-10), they should be the
same.

Does the Pop-Up pull it's stats from a different place? If so, then
where do the ReadabilityStatistics come into play?

As far as I know, since I'm using ReadabilityStatistics(1-10), the two
sets of numbers should be the same.
Jay Freedman - 09 Nov 2006 21:50 GMT
Yes, they should be the same, and they always have been whenever I checked.

My point, though, was that you can't trust any of the numbers Word presents
to you in either the popup or the ReadabilityStatistics object in VBA, and
especially not the Flesch scores.

Signature

Regards,
Jay Freedman
Microsoft Word MVP        FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

> Greetings,
>
[quoted text clipped - 8 lines]
> As far as I know, since I'm using ReadabilityStatistics(1-10), the two
> sets of numbers should be the same.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.