Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / Excel Errors / May 2008

Tip: Looking for answers? Try searching our database.

Excel - Interquartile Range Miscalulation

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
EngelhardtMD - 08 May 2008 19:26 GMT
I calculate the Interquartile Range for the following values both by hand and
using the Minitab Statistical Package I get an IQR = 20 that is the 3rd
quartile of 105 - the first quartile of 85 but when I use Excel 2007 I get
IQR = 15 = (102.5-87.5).

Can anyone explain why this is?

Sincerely,

Mark D. Engelhardt, Ph.D.
engelhardt@mobap.edu
636.293.0901

55
75
80
85
90
95
100
100
100
100
100
105
115
120
135
Jerry W. Lewis - 12 May 2008 09:30 GMT
There is no universally accepted definition of a percentile from sample
data.  Hyndman and Fan, 1996, "Sample Quantiles in Statistical Packages", The
American Statistician 50(4):361-365,1996 discuss 9 different definitions and
reference some others.

Excel (also the quantile() function in S-PLUS and R) uses Hyndman and Fan's
7th definition, which considers the min and max to be the 0th and 100th
percentiles, and equally spaces the quantiles (interpolating as needed)
corresponding to intervening observations.  This gives a reasonable
description of the sample, but is almost certainly biased as an estimate of
the underlying population quantiles.

At the time of their article (I am not aware of any subsequent changes),
Minitab's DESCRIBE used the 6th definition, and %DESCRIBE used the 2nd
definition.

SAS gives several options ...

Jerry

> I calculate the Interquartile Range for the following values both by hand and
> using the Minitab Statistical Package I get an IQR = 20 that is the 3rd
[quoted text clipped - 24 lines]
> 120
> 135
EngelhardtMD - 12 May 2008 17:22 GMT
Jerry,

It would seem to me that Excel's use of Hyndman and Fan's 7th definition
using the minimum value and maximum value as the 0th percentile and 100th
percentile respectively would be extremely sensitive to extreme outliers.

It was my understanding that one of the advantages of using the IQR was that
it was less sensitve to extreme outliers than was the variance.

Mark

> There is no universally accepted definition of a percentile from sample
> data.  Hyndman and Fan, 1996, "Sample Quantiles in Statistical Packages", The
[quoted text clipped - 44 lines]
> > 120
> > 135
Jerry W. Lewis - 17 May 2008 15:45 GMT
My last reply does not appear to have communicated with you, sorry.

The interquartile range is calculated using the 1st & 3rd sample quartiles,
but there are various ways to calculate those quartiles.

Five of Hyndman and Fan's sample quantile definitions have a particularly
simple common form given by the following VBA code.  You select among the
methods according to which definition of m you uncomment.

Function quantile(data, p)
' p=fraction of population, e.g. p=0.25 for 1st quartile
   n = WorksheetFunction.Count(data)
'   m = 0       ' H&F 4: SAS (PCTLDEF=1), R (type=4), Maple (method=3)
'   m = 0.5     ' H&F 5: R (type=5), Maple (method=4)
'   m = p       ' H&F 6: Minitab, SPSS, BMDP, JMP, SAS (PCTLDEF=4), R
(type=6), Maple (method=5)
'   m = 1 - p   ' H&F 7: Excel, S-Plus, R (type=7[default]), Maxima, Maple
(method=6)
'   m = (p+1)/3 ' H&F 8: R (type=8), Maple (method=7[default])
'   m=(p+1.5)/4 ' H&F 9: R (type=9), Maple (method=8)
   npm = n * p + m
   j = Fix(npm):   If j = 0 Then j = 1
                   If j > n Then j = n
   g = npm - j
   quantile = WorksheetFunction.Small(data, j)
   If g >= 0 And j < n Then
       quantile = (1 - g) * quantile + g * WorksheetFunction.Small(data, j
+ 1)
   End If
End Function

Excel, S-Plus, etc use H&F definition 7, which returns SMALL(data,i) as
quantile(data,(i-1)/(n-1)) and interpolates in between.  For a continuous
distribution, this will tend to give too narrow an interquartile range, since
there will tend to be a small fraction of the population beyond the extreme
sample observations.  In particular, for odd n (=2*k+1), Excel calculates the
1st (3rd) quartile as the median of the lower (upper) "half" of the sample
including the sample median (k+1 observations).

Minitab, etc use H&F definition 6, which calculates the 1st (3rd) quartile
as the median of the lower (upper) "half" of the sample.  This "half" sample
excludes the sample median (k observations) for odd n (=2*k+1).  This will
tend to be a better estimate for the population quartiles, but will tend to
give quartile estimates that are a bit too far from the center of the whole
sample (too wide an interquartile range).

Hyndman and Fan recommend their definition 8 (Maple's default definition),
which gives quartiles between those reported by Minitab and Excel.  This
approach is approximately median unbiased for continuous distributions.

Jerry

> Jerry,
>
[quoted text clipped - 55 lines]
> > > 120
> > > 135
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.