MS Office Forum / Excel / New Users / June 2007
Finding Duplicate Entries
|
|
Thread rating:  |
MarkC - 18 Jun 2007 01:47 GMT Using XP Pro SP2 and Excel XP (2002)
Like to know if it is possible find all duplicate entries?. Can not use a starting string because it's unknown having all kinds of possibilities.
I text pasted a file directory with paths and filenames, I can have duplicate filenames in different directories, and like to search for duplicate names if possible. The list was entered in delimited based on "\" which will be in it's own column, dependent on how many sub-folders.
At this stage, I am interested in only finding duplicate file naming. Not concerned if these files are identical files, or different files having the same name.
Thanks,
Mark
Bernie Deitrick - 18 Jun 2007 18:43 GMT Mark,
I'm assuming that your list is in column A, starting in cell A2.
In Cell B2, array enter (Enter using Ctrl-Shift-Enter) the formula:
=MID(A2,MAX(IF(ISERROR(SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))),0,SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))))+1,256)
in Cell C2, enter the formula:
=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")
Then copy B2:C2 down to match your list of file names in column A, and duplicate filenames will be highlighted by column C's value.
HTH, Bernie MS Excel MVP
> Using XP Pro SP2 and Excel XP (2002) > [quoted text clipped - 11 lines] > > Mark *alan* - 19 Jun 2007 01:16 GMT (TOP-POSTING CORRECTED)
>> Using XP Pro SP2 and Excel XP (2002) >> [quoted text clipped - 13 lines] >> >> Mark
> Mark, > [quoted text clipped - 14 lines] > Bernie > MS Excel MVP That doesn't appear to work (at least not in Excel 2003).
Here, however, is a low-tech solution that does work in Excel 2003:
Assuming that A1 is a column heading and your name entries begin in A2,
1. select column A 2. data> filter > advanced filter> check Unique records only > click OK 3. Alt; (selects visible cells) 4. Data > filter > show all
Step 2 will hide all the duplicates. Step 3 will highlight the remaining unique records. Step 4 will make the entire column A reappear, with Unique records highlighted (grey) and the duplicates un-highlighted (white).
Hope that helps. -- Alan
A: Because it runs counter to the normal flow of communication. Q: Why is top-posting frowned upon?
Dave Peterson - 19 Jun 2007 01:29 GMT First, it's the custom of the microsoft.public.excel newsgroups to top post. It's different than most of Usenet.
And try Bernie's array formula once more. That long formula returns just the filename and drops the drive and path (all folders). The formula returns everything after the last backslash.
It should work for you in xl2003--it worked for me.
Second, your formula looks for exact duplicates for the whole string--not just the filename. Your technique doesn't address the OP's problem.
> (TOP-POSTING CORRECTED) > [quoted text clipped - 57 lines] > A: Because it runs counter to the normal flow of communication. > Q: Why is top-posting frowned upon?
 Signature Dave Peterson
MarkC - 20 Jun 2007 03:39 GMT I tried both methods and got both of them to work. Bernie's formula is the one that I needed for this particular project. However, it is good know about the advance filter explained by Alan. It may come in handy for some other projects down the road.
I do have one more issue, I have 10,000 rows, is there a quicker way to "copy down" the formula cells other than holding down the bottom right corner and dragging? I have other duplicate filenames sheet tabs that have 20,000+ rows and there's got to be a faster way. Using Excel 2002.
Thanks,
M
> First, it's the custom of the microsoft.public.excel newsgroups to top > post. [quoted text clipped - 76 lines] >> A: Because it runs counter to the normal flow of communication. >> Q: Why is top-posting frowned upon? Dave Peterson - 20 Jun 2007 03:47 GMT Debra Dalgleish has a few options here: http://contextures.com/xlDataEntry01.html
> I tried both methods and got both of them to work. Bernie's formula is the > one that I needed for this particular project. However, it is good know [quoted text clipped - 94 lines] > > > > Dave Peterson
 Signature Dave Peterson
MarkC - 20 Jun 2007 05:30 GMT thanks, Interesting site and in my favorites! M
> Debra Dalgleish has a few options here: > http://contextures.com/xlDataEntry01.html [quoted text clipped - 111 lines] >> > >> > Dave Peterson Gord Dibben - 20 Jun 2007 04:45 GMT Mark
Enter the formula then double-click on the fill handle to copy down as far as you have data in an adjacent column.
Or type the formulas in B2 and C2 then in namebox type B2:C10000 then ENTER key.
Then CTRL + d to fill.
Gord Dibben MS Excel MVP
>I tried both methods and got both of them to work. Bernie's formula is the >one that I needed for this particular project. However, it is good know [quoted text clipped - 90 lines] >>> A: Because it runs counter to the normal flow of communication. >>> Q: Why is top-posting frowned upon? MarkC - 20 Jun 2007 05:13 GMT Thanks Gord, I got the namebox method to work. But for some reason couldn't get the double-click on the fill handle to work. Can you explain that one a little more? And is it available in Excel 2002?
Thanks,
M
> Mark > [quoted text clipped - 111 lines] >>>> A: Because it runs counter to the normal flow of communication. >>>> Q: Why is top-posting frowned upon? Gord Dibben - 20 Jun 2007 15:09 GMT Yes, it is available in 2002
Enter the formula in top cell of column.
If adjacent column has continuous data, double-clicking on the little black square at lower right corner of the cell should copy down as far as you have data in the adjacent column.
Note: you must have "allow drag and drop" checked in Tools>Options>Edit in order to get the fill handle.
You did say you could drag/copy so I assume it is checked.
Gord
>Thanks Gord, I got the namebox method to work. But for some reason couldn't >get the double-click on the fill handle to work. Can you explain that one a [quoted text clipped - 119 lines] >>>>> A: Because it runs counter to the normal flow of communication. >>>>> Q: Why is top-posting frowned upon? MarkC - 21 Jun 2007 03:47 GMT Got IT!!!
Thanks! M
> Yes, it is available in 2002 > [quoted text clipped - 148 lines] >>>>>> A: Because it runs counter to the normal flow of communication. >>>>>> Q: Why is top-posting frowned upon? MarkC - 21 Jun 2007 03:46 GMT I am retyping Alan's method for quick reference:
*****Assuming that A1 is a column heading and your name entries begin in A2,
1. select column A 2. data> filter > advanced filter> check Unique records only > click OK 3. Alt; (selects visible cells) 4. Data > filter > show all
Step 2 will hide all the duplicates. Step 3 will highlight the remaining unique records. Step 4 will make the entire column A reappear, with Unique records highlighted (grey) and the duplicates un-highlighted (white).*****
My Questions: Is there an opposite way in doing this? That is, is there a way to select non-visible cells? Or does Excel have a reverse highlight/un-highlight feature? Highlight becomes Un-highlighted and vice-versa.
Thanks, M
> (TOP-POSTING CORRECTED) > [quoted text clipped - 58 lines] > A: Because it runs counter to the normal flow of communication. > Q: Why is top-posting frowned upon? MarkC - 19 Jun 2007 01:16 GMT Thanks, I will give it a try! M
> Mark, > [quoted text clipped - 32 lines] >> >> Mark MarkC - 20 Jun 2007 05:29 GMT Too ALL:
I have played around with the duplicates and found out with Bernie's method, it does find all filename duplicates, but the problem is, when deleting the rows, it removes "ALL" duplicates. I need to keep one on record to show that it's on file.
With Alan's method, it looks for exact duplicates for the whole string. The good part is when deleting the marked duplicates, it leaves one on record.
M
> Thanks, I will give it a try! > M [quoted text clipped - 36 lines] >>> >>> Mark Dave Peterson - 20 Jun 2007 12:55 GMT I think that the difference is how people understand your original question.
If you want all duplicates marked, then use Bernie's formula: =IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")
If you want all duplicates marked but not the first entry, you can use: =IF(COUNTIF(B$2:B2,B2)>1,"Duplicate File Name","")
> Too ALL: > [quoted text clipped - 48 lines] > >>> > >>> Mark
 Signature Dave Peterson
|
|
|