Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / New Users / July 2008

Tip: Looking for answers? Try searching our database.

how to speed up VLOOKUP?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
kane - 22 Jul 2008 15:10 GMT
hello!

I am currently working with large worksheets with lots of data and I
need to use VLOOKUP (or similar) quite often.

I know about complexity theory and thus I already make use of MATCH
and INDEX to reduce the number of calculated lookups. My lookups work
with exact data, so I can't use the approximate version of VLOOKUP.
But I can provide sorted lists.

an approximated lookup in a sorted list should take O(log n) time, an
exact lookup does not expect the lsit to be sorted and thus should
take O(n) time.

So i thought of sorting the list, making an approximated search via
MATCH and then compare the search value with INDEX(...;MATCH...;).
that's 2 O(log n) operations. if they match, i am happy, otherwise
nothing can be found.
Fortunately, this solution is extremely fast.

I then wanted to shorten this big expression from above using user
defined functions. i did not make the mistake to access the values of
the ranges, i only used "Application.Match" and "Application.Index"
and such inside this UDF so that there should not be any problems
here. anyway ... this personal function is still quite slow...

i come to think that there is some kind of conversion happening when
excel switches to vba and then back to excel, which slows down speed.

so... the best idea i have right now is to program some replacement-
macro, which replaces the imaginary function "=SPEEDVLOOKUP(...)" with
the IF-MATCH-INDEX-monster I mentioned before. but that's not a nice
solution.....

any ideas?
Don Guillett - 22 Jul 2008 15:37 GMT
Look in smaller blocks. I usually name them.

Signature

Don Guillett
Microsoft MVP Excel
SalesAid Software
dguillett1@austin.rr.com

> hello!
>
[quoted text clipped - 31 lines]
>
> any ideas?
Pete_UK - 22 Jul 2008 16:09 GMT
If your data is sorted then you could program your own binary search
to look for a match, and this would speed things up. However, UDFs
execute more slowly than built-in functions (which are compiled), so
you may not notice much of an improvement.

Pete

> hello!
>
[quoted text clipped - 31 lines]
>
> any ideas?
Niek Otten - 22 Jul 2008 16:15 GMT
If your data is sorted:

Table in A1:B5, lookup value in C1.

In D1:
=MATCH(C1,A1:A5,1)
In E1:
=IF(INDEX(A1:A5,D1)=C1,INDEX(B1:B5,D1),NA())

Lightning fast and returns #NA if no exact match.

Signature

Kind regards,

Niek Otten
Microsoft MVP - Excel

| hello!
|
[quoted text clipped - 31 lines]
|
| any ideas?
kane - 23 Jul 2008 14:52 GMT
> If your data is sorted:
>
[quoted text clipped - 12 lines]
> Niek Otten
> Microsoft MVP - Excel

That's exactly what I meant with "match-index-monster". I would like
to have a simple function that also my colleagues could use.

@Pete: I assumed that "match" does a binary search if you use it on a
sorted list (with parameter FALSE), so no need to program this search
by myself. I also read that UDFs are much more slowly, pitily.

Right now I wonder whether match does really do a binary search or
whether vlookup does sort large lists hidden in memory to speed up
things.
i created 3 excel test sheets. in all three there are 40.000 values to
look up and the vector to be searched is of the same size. hence, a
normal vlookup should normally use about 40.000*(a*40.000+d)
operations. dobule-match-index-lookups should have about
40.000*(b*2*(lb 40.000)+c+d) operations. a is the time vlookup needs
to check one cell, d is the time to transfer cell value, set flags and
so on, b is the time match needs for one calculation step, c the time
for index to extract a certain cell value. considering different
code... well there should still be at least 99% less time needed if
double-match-index-lookup is used.
i tested vlookup vs. my code vs. double-match-index-lookup and those
are the times:
52 sec / 13 sec / 12 sec

the good message in here: my own code works almost as fast as the
match-index-solution which is 4 times as fast as vlookup.
the bad message: either vlookup or match does not work as expected (or
both, of course).

so... problem solved, but things are still bad :-(
Pete_UK - 23 Jul 2008 15:08 GMT
I think if you use a FALSE parameter in MATCH or VLOOKUP (i.e. looking
for an exact match), then Excel does a sequential search as the data
does not need to be sorted and there is no way to tell Excel that the
data is sorted (I've often thought that it would be useful to have a
5th parameter in VLOOKUP to inform Excel that the data is sorted).
When you use a TRUE value then the data MUST be sorted beforehand, and
Excel uses a binary search algorithm, which is much quicker.

You can speed things up if you have a lot of VLOOKUPs getting data
from the same matching row by having a MATCH formula in a helper
column to find the matching row, and then have an INDEX formula which
uses the value from the match, so that the searching for the matching
row is only done once.

Hope this helps.

Pete

> That's exactly what I meant with "match-index-monster". I would like
> to have a simple function that also my colleagues could use.
[quoted text clipped - 26 lines]
>
> so... problem solved, but things are still bad :-(
Charles Williams - 23 Jul 2008 17:54 GMT
You can only make Excel MATCH and VLOOKUP use binary search by telling them
that the data is sorted: there is no automatic detection of sorted data.
The drawback to using binary search as implemented by Excel is that you get
the nearest match for missing data rather than #N/A, so you have to use the
trick outlined by Niek (note the 1 as final Match argument) to trap data not
found.
See also http://www.decisionmodels.com/optspeede.htm

There is a long-standing bug associated with VBA UDFs that refreshes the VBE
title bar once for each calculated instance of a UDF, unless calculation is
initiated from VBA (using something like Application.Calculate).
This bug drastically slows down Excel calculations if you have large numbers
of UDFs in formulae, unless you bypass by using Manual Calc mode and
trapping F9 etc.
See also http://www.decisionmodels.com/calcsecretsj.htm

Assuming that a UDF LOOKUP/MATCH is correctly programmed to use Range
objects etc, thus avoiding the Excel to VBA data transfer overhead, and
avoids the VBE refresh bug, it will be very nearly as fast as a native
LOOKUP/MATCH, and faster if programmed to avoid the double LOOKUP or using
better algorithms.

Also worth noting that using Application.WorksheetFunction.Match is faster
that using Application.Match.

Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com

>> If your data is sorted:
>>
[quoted text clipped - 43 lines]
>
> so... problem solved, but things are still bad :-(
Don Guillett - 23 Jul 2008 18:21 GMT
Charles, Please share why is this so?

Also worth noting that using Application.WorksheetFunction.Match is faster
that using Application.Match.

Signature

Don Guillett
Microsoft MVP Excel
SalesAid Software
dguillett1@austin.rr.com

> You can only make Excel MATCH and VLOOKUP use binary search by telling
> them that the data is sorted: there is no automatic detection of sorted
[quoted text clipped - 75 lines]
>>
>> so... problem solved, but things are still bad :-(
Charles Williams - 23 Jul 2008 18:39 GMT
Don,

The only apparent difference is the way errors are handled (returning an
error value as opposed to raising an error), so I don't know why
WorksheetFunction is faster, it just is (20-30%).

Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com

> Charles, Please share why is this so?
>
[quoted text clipped - 80 lines]
>>>
>>> so... problem solved, but things are still bad :-(
Niek Otten - 26 Jul 2008 21:50 GMT
Hi Kane,

UDFs are slow, but that's only relative.
I did a few tests on my almost 5 years old Pentium.

I copy my VBA UDF below. It's about a 100 times faster than a workssheet VLOOKUP with FALSE as a 4th argument.
I also include an XLM version (not to be confused with XML); it is again twice as fast.
Both solutions have a simple, understandable User Interface.

A combination of MATCH and INDEX is again 10 times faster, but is a bit more tedious to implement ("a monster", as you chose to
call it).

But what are we talking about?
My example (which I can mail you if you wish), has a 64k table and 1000 VLOOKUP formulas, with random search arguments.
VLOOKUP/FALSE solution: 5 seconds.
VBA solution: 0.11 seconds
XLM solution: .055 seconds
MATCH/INDEX solution: .0043 seconds

Even on my oldtimer it's impossible to notice the difference between the last 3 solutions. Of course, if you would have 64K
formulas, it could be noticed.

There are also differences if the number of matches varies. One would think that more matches mean more double VLOOKUPs so require
more time, but that does not seem to be the case. Apparently returning #NA requires more time.

The XLM solution could be almost twice as fast again if I could find a way to imitate the MATCH/INDEX approach. But unless I
require the lookup table to be passed as an argument twice (once only the fisrt column, once the whole table) I don't know how to
isolate the first column for the MATCH function. My XLM knowledge has faded away quite a bit!

Any suggestions are welcome!

Signature

Kind regards,

Niek Otten
Microsoft MVP - Excel

=================================================================
Function VLookupSort(SearchArgument As Range, SearchTable As Range, _
   ColumnNo As Long, Optional SortDirection, Optional NotFound)
' Works as Vlookup, exact match (4th argument = FALSE)
' But takes advantage of the fact that a table is sorted
' and thus is much faster
' Also permits table to be sorted descending (Sortdirection -1)
' Optional argument for return value if item not found, defaults to #NA
Dim ItemFound

If IsMissing(SortDirection) Then SortDirection = 1

ItemFound = Application.Match(SearchArgument, Intersect(SearchTable, SearchTable.Cells(1).EntireColumn), _
   SortDirection)
If SearchTable(ItemFound, 1) <> SearchArgument Then
   If IsMissing(NotFound) Then
   VLookupSort = CVErr(xlErrNA)
   Else
   VLookupSort = NotFound
   End If
Else
   VLookupSort = _
   SearchTable(ItemFound, ColumnNo)
End If
End Function
=================================================================
The XLM solution. Create an XLM sheet; CTRL+F11. Paste the code below into A1 and down.
Insert a defined name (VlookupSortXLM) for A1.
=================================================================
     VlookupSortXLM
     =ARGUMENT("SearchArgument",1)
     =ARGUMENT("SearchTable",8)
     =ARGUMENT("ColumnNo",1)
     =VLOOKUP(SearchArgument,SearchTable,1)
     =IF(A5=SearchArgument,VLOOKUP(SearchArgument,SearchTable,ColumnNo),NA())
     =RETURN(A6)
=================================================================

| > If your data is sorted:
| >
[quoted text clipped - 43 lines]
|
| so... problem solved, but things are still bad :-(
kane - 30 Jul 2008 08:33 GMT
Hi all again!

First I want to say sorry that I didn't respond until now, but I've
been away for a business trip the last days and had no internet. I
also want to apologize right now for my bad english - its not my
native language. But it seems until now you all understood my
problems :-)

I coded a bit at home and used another testing environment (an old pc)
and could improve time differences significantly (65000 lookups with
vlookup / vba-code / double-match-index-thingy: 200s / 24s / <1s - how
did you measure your calculation times, niek?).
but you see, my own solution is still much worse than the ugly code.

@charles: this information about whether VBE is open or F9 is used is
really interesting. I will try a Application.Calculate button.
I did transfer the SearchRange as Range object, but not the argument -
maybe there is also a slowdown there. Need to find time to test it.
The same for Application.WorksheetFunction.***

@niek:
your solution looks quite nice. I hopefully will test it soon.
Since you say that returning #NA needs a lot of time, maybe an error
handler can help here. I think I read some time ago that transfering
errors in VBA is time consuming. Maybe worth a try.
defining the searchargument as range might be a key element to speed
up my own code which extracts the value of the cell right now.
Also your XLM solution I need to try first, but I have to admit that I
never came in contact with XLM before (at least I don't remember such
an occasion). Might be a bit harder for me to do more than copy&paste
here :-)

Another idea of mine is to create a macro which replaces all commands
with name "fastvlookup" (or any other custom name) and some parameters
with the already well known double-match-index-monster. fastvlookup
needs not to be known to excel for this to work, of course. But first
I will try nieks solution :-)

PS: It would be a great thing if excel has some kind of "issorted"
flag for columns/rows, which has positions "true/false/don't know".
every time you ask for this flag it is recalculated and if the column/
row is changed it is resetted to "don't know". But that's only wishful
thinking....

> Hi Kane,
>
[quoted text clipped - 69 lines]
>       =RETURN(A6)
> =================================================================
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.