MS Office Forum / Excel / New Users / July 2008
how to speed up VLOOKUP?
|
|
Thread rating:  |
kane - 22 Jul 2008 15:10 GMT hello!
I am currently working with large worksheets with lots of data and I need to use VLOOKUP (or similar) quite often.
I know about complexity theory and thus I already make use of MATCH and INDEX to reduce the number of calculated lookups. My lookups work with exact data, so I can't use the approximate version of VLOOKUP. But I can provide sorted lists.
an approximated lookup in a sorted list should take O(log n) time, an exact lookup does not expect the lsit to be sorted and thus should take O(n) time.
So i thought of sorting the list, making an approximated search via MATCH and then compare the search value with INDEX(...;MATCH...;). that's 2 O(log n) operations. if they match, i am happy, otherwise nothing can be found. Fortunately, this solution is extremely fast.
I then wanted to shorten this big expression from above using user defined functions. i did not make the mistake to access the values of the ranges, i only used "Application.Match" and "Application.Index" and such inside this UDF so that there should not be any problems here. anyway ... this personal function is still quite slow...
i come to think that there is some kind of conversion happening when excel switches to vba and then back to excel, which slows down speed.
so... the best idea i have right now is to program some replacement- macro, which replaces the imaginary function "=SPEEDVLOOKUP(...)" with the IF-MATCH-INDEX-monster I mentioned before. but that's not a nice solution.....
any ideas?
Don Guillett - 22 Jul 2008 15:37 GMT Look in smaller blocks. I usually name them.
 Signature Don Guillett Microsoft MVP Excel SalesAid Software dguillett1@austin.rr.com
> hello! > [quoted text clipped - 31 lines] > > any ideas? Pete_UK - 22 Jul 2008 16:09 GMT If your data is sorted then you could program your own binary search to look for a match, and this would speed things up. However, UDFs execute more slowly than built-in functions (which are compiled), so you may not notice much of an improvement.
Pete
> hello! > [quoted text clipped - 31 lines] > > any ideas? Niek Otten - 22 Jul 2008 16:15 GMT If your data is sorted:
Table in A1:B5, lookup value in C1.
In D1: =MATCH(C1,A1:A5,1) In E1: =IF(INDEX(A1:A5,D1)=C1,INDEX(B1:B5,D1),NA())
Lightning fast and returns #NA if no exact match.
 Signature Kind regards,
Niek Otten Microsoft MVP - Excel
| hello! | [quoted text clipped - 31 lines] | | any ideas? kane - 23 Jul 2008 14:52 GMT > If your data is sorted: > [quoted text clipped - 12 lines] > Niek Otten > Microsoft MVP - Excel That's exactly what I meant with "match-index-monster". I would like to have a simple function that also my colleagues could use.
@Pete: I assumed that "match" does a binary search if you use it on a sorted list (with parameter FALSE), so no need to program this search by myself. I also read that UDFs are much more slowly, pitily.
Right now I wonder whether match does really do a binary search or whether vlookup does sort large lists hidden in memory to speed up things. i created 3 excel test sheets. in all three there are 40.000 values to look up and the vector to be searched is of the same size. hence, a normal vlookup should normally use about 40.000*(a*40.000+d) operations. dobule-match-index-lookups should have about 40.000*(b*2*(lb 40.000)+c+d) operations. a is the time vlookup needs to check one cell, d is the time to transfer cell value, set flags and so on, b is the time match needs for one calculation step, c the time for index to extract a certain cell value. considering different code... well there should still be at least 99% less time needed if double-match-index-lookup is used. i tested vlookup vs. my code vs. double-match-index-lookup and those are the times: 52 sec / 13 sec / 12 sec
the good message in here: my own code works almost as fast as the match-index-solution which is 4 times as fast as vlookup. the bad message: either vlookup or match does not work as expected (or both, of course).
so... problem solved, but things are still bad :-(
Pete_UK - 23 Jul 2008 15:08 GMT I think if you use a FALSE parameter in MATCH or VLOOKUP (i.e. looking for an exact match), then Excel does a sequential search as the data does not need to be sorted and there is no way to tell Excel that the data is sorted (I've often thought that it would be useful to have a 5th parameter in VLOOKUP to inform Excel that the data is sorted). When you use a TRUE value then the data MUST be sorted beforehand, and Excel uses a binary search algorithm, which is much quicker.
You can speed things up if you have a lot of VLOOKUPs getting data from the same matching row by having a MATCH formula in a helper column to find the matching row, and then have an INDEX formula which uses the value from the match, so that the searching for the matching row is only done once.
Hope this helps.
Pete
> That's exactly what I meant with "match-index-monster". I would like > to have a simple function that also my colleagues could use. [quoted text clipped - 26 lines] > > so... problem solved, but things are still bad :-( Charles Williams - 23 Jul 2008 17:54 GMT You can only make Excel MATCH and VLOOKUP use binary search by telling them that the data is sorted: there is no automatic detection of sorted data. The drawback to using binary search as implemented by Excel is that you get the nearest match for missing data rather than #N/A, so you have to use the trick outlined by Niek (note the 1 as final Match argument) to trap data not found. See also http://www.decisionmodels.com/optspeede.htm
There is a long-standing bug associated with VBA UDFs that refreshes the VBE title bar once for each calculated instance of a UDF, unless calculation is initiated from VBA (using something like Application.Calculate). This bug drastically slows down Excel calculations if you have large numbers of UDFs in formulae, unless you bypass by using Manual Calc mode and trapping F9 etc. See also http://www.decisionmodels.com/calcsecretsj.htm
Assuming that a UDF LOOKUP/MATCH is correctly programmed to use Range objects etc, thus avoiding the Excel to VBA data transfer overhead, and avoids the VBE refresh bug, it will be very nearly as fast as a native LOOKUP/MATCH, and faster if programmed to avoid the double LOOKUP or using better algorithms.
Also worth noting that using Application.WorksheetFunction.Match is faster that using Application.Match.
Charles __________________________________________________ The Excel Calculation Site http://www.decisionmodels.com
>> If your data is sorted: >> [quoted text clipped - 43 lines] > > so... problem solved, but things are still bad :-( Don Guillett - 23 Jul 2008 18:21 GMT Charles, Please share why is this so?
Also worth noting that using Application.WorksheetFunction.Match is faster that using Application.Match.
 Signature Don Guillett Microsoft MVP Excel SalesAid Software dguillett1@austin.rr.com
> You can only make Excel MATCH and VLOOKUP use binary search by telling > them that the data is sorted: there is no automatic detection of sorted [quoted text clipped - 75 lines] >> >> so... problem solved, but things are still bad :-( Charles Williams - 23 Jul 2008 18:39 GMT Don,
The only apparent difference is the way errors are handled (returning an error value as opposed to raising an error), so I don't know why WorksheetFunction is faster, it just is (20-30%).
Charles __________________________________________________ The Excel Calculation Site http://www.decisionmodels.com
> Charles, Please share why is this so? > [quoted text clipped - 80 lines] >>> >>> so... problem solved, but things are still bad :-( Niek Otten - 26 Jul 2008 21:50 GMT Hi Kane,
UDFs are slow, but that's only relative. I did a few tests on my almost 5 years old Pentium.
I copy my VBA UDF below. It's about a 100 times faster than a workssheet VLOOKUP with FALSE as a 4th argument. I also include an XLM version (not to be confused with XML); it is again twice as fast. Both solutions have a simple, understandable User Interface.
A combination of MATCH and INDEX is again 10 times faster, but is a bit more tedious to implement ("a monster", as you chose to call it).
But what are we talking about? My example (which I can mail you if you wish), has a 64k table and 1000 VLOOKUP formulas, with random search arguments. VLOOKUP/FALSE solution: 5 seconds. VBA solution: 0.11 seconds XLM solution: .055 seconds MATCH/INDEX solution: .0043 seconds
Even on my oldtimer it's impossible to notice the difference between the last 3 solutions. Of course, if you would have 64K formulas, it could be noticed.
There are also differences if the number of matches varies. One would think that more matches mean more double VLOOKUPs so require more time, but that does not seem to be the case. Apparently returning #NA requires more time.
The XLM solution could be almost twice as fast again if I could find a way to imitate the MATCH/INDEX approach. But unless I require the lookup table to be passed as an argument twice (once only the fisrt column, once the whole table) I don't know how to isolate the first column for the MATCH function. My XLM knowledge has faded away quite a bit!
Any suggestions are welcome!
 Signature Kind regards,
Niek Otten Microsoft MVP - Excel
================================================================= Function VLookupSort(SearchArgument As Range, SearchTable As Range, _ ColumnNo As Long, Optional SortDirection, Optional NotFound) ' Works as Vlookup, exact match (4th argument = FALSE) ' But takes advantage of the fact that a table is sorted ' and thus is much faster ' Also permits table to be sorted descending (Sortdirection -1) ' Optional argument for return value if item not found, defaults to #NA Dim ItemFound
If IsMissing(SortDirection) Then SortDirection = 1
ItemFound = Application.Match(SearchArgument, Intersect(SearchTable, SearchTable.Cells(1).EntireColumn), _ SortDirection) If SearchTable(ItemFound, 1) <> SearchArgument Then If IsMissing(NotFound) Then VLookupSort = CVErr(xlErrNA) Else VLookupSort = NotFound End If Else VLookupSort = _ SearchTable(ItemFound, ColumnNo) End If End Function ================================================================= The XLM solution. Create an XLM sheet; CTRL+F11. Paste the code below into A1 and down. Insert a defined name (VlookupSortXLM) for A1. ================================================================= VlookupSortXLM =ARGUMENT("SearchArgument",1) =ARGUMENT("SearchTable",8) =ARGUMENT("ColumnNo",1) =VLOOKUP(SearchArgument,SearchTable,1) =IF(A5=SearchArgument,VLOOKUP(SearchArgument,SearchTable,ColumnNo),NA()) =RETURN(A6) =================================================================
| > If your data is sorted: | > [quoted text clipped - 43 lines] | | so... problem solved, but things are still bad :-( kane - 30 Jul 2008 08:33 GMT Hi all again!
First I want to say sorry that I didn't respond until now, but I've been away for a business trip the last days and had no internet. I also want to apologize right now for my bad english - its not my native language. But it seems until now you all understood my problems :-)
I coded a bit at home and used another testing environment (an old pc) and could improve time differences significantly (65000 lookups with vlookup / vba-code / double-match-index-thingy: 200s / 24s / <1s - how did you measure your calculation times, niek?). but you see, my own solution is still much worse than the ugly code.
@charles: this information about whether VBE is open or F9 is used is really interesting. I will try a Application.Calculate button. I did transfer the SearchRange as Range object, but not the argument - maybe there is also a slowdown there. Need to find time to test it. The same for Application.WorksheetFunction.***
@niek: your solution looks quite nice. I hopefully will test it soon. Since you say that returning #NA needs a lot of time, maybe an error handler can help here. I think I read some time ago that transfering errors in VBA is time consuming. Maybe worth a try. defining the searchargument as range might be a key element to speed up my own code which extracts the value of the cell right now. Also your XLM solution I need to try first, but I have to admit that I never came in contact with XLM before (at least I don't remember such an occasion). Might be a bit harder for me to do more than copy&paste here :-)
Another idea of mine is to create a macro which replaces all commands with name "fastvlookup" (or any other custom name) and some parameters with the already well known double-match-index-monster. fastvlookup needs not to be known to excel for this to work, of course. But first I will try nieks solution :-)
PS: It would be a great thing if excel has some kind of "issorted" flag for columns/rows, which has positions "true/false/don't know". every time you ask for this flag it is recalculated and if the column/ row is changed it is resetted to "don't know". But that's only wishful thinking....
> Hi Kane, > [quoted text clipped - 69 lines] > =RETURN(A6) > =================================================================
|
|
|