Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / New Users / December 2007

Tip: Looking for answers? Try searching our database.

Garbage date data now my headache

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Gary F Shelton - 28 Dec 2007 17:43 GMT
Okay folks I have a painful .csv file that I recieve daily that is now my
problem. Granted I am only as good as my data so let me show you an example
of the data:

Column A                        Column B
Date Code                      Date Code Answer
X71229                          12/29/2007
X71203330382               12/03/2007
X71023 G3                     10/23/2007
X71003DWOODS            10/03/2007
X70913 D5                     09/13/2007
X70913                          09/13/2007
20071215                       12/15/2007
20071005                       10/05/2007
81007                            08/10/2007
80211                            08/02/2011
071220                           12/20/2007

So as you can see are some of the problems with the data:
the string length of the data changes, the format of some cells are text and
some are numbers, some of the data has letter in front and others don't, some
have the full year typed out then month then day and others don't. the data
is what I call garbage but like I stated it is my headache to fix.

So my goal is to try and come up with some methodology to fix it. Any takers
ready for this headache.

Signature

GS

Otto Moehrbach - 28 Dec 2007 19:29 GMT
Excel, as all computer programs, is a beast of logic.  I can think of
various techniques to clean up the first column and put those entries into a
more manageable format, but the logic of some of your examples is beyond
logic.  For instance, why is 80211 08/02/11 and not 02/11/2008 while 71229
is 12/29/2007 and not 07/12/2029?  That particular twist of logic (or lack
of) is not something that a computer can handle.  Perhaps you have some
insight that you might be able to share that could be used by a computer to
differentiate those values.  Perhaps some data in some other column could
point to a particular logic pattern to use for that entry.
A cleanup of the first column would be like removing leading and trailing
text, as well as removing the G3 and D5.  Then, with a logic pattern
furnished by you, Excel could convert all the numbers to dates.  Post back
if any of the above could help you.  Otto
> Okay folks I have a painful .csv file that I recieve daily that is now my
> problem. Granted I am only as good as my data so let me show you an
[quoted text clipped - 27 lines]
> takers
> ready for this headache.
Gary F Shelton - 28 Dec 2007 22:11 GMT
Otto,

Let me give a brief of what I know and then I will lead into some
assumptions of mine.

My data is for consumable products that have a shelf life. My company pays a
vendor to store our product. The vendor is geographically located in Atlanta,
Chicago, New York, Los Angeles, etc. When I get data from this vendor I go to
their website where the data is consolidated. So what I assume is happening
and some of the patters I am seeing is that the date data formats change when
the their is a change in Warehouses which leads me to believe that each
geographically located Warehouse inputs and uploads their data in different
formats. As you pointed out some of the data is just down right garbage data
like the 080211 stuff as that had to be a human error in someone on their end
mistypig it... But you are right that I am going to have to use some other
records in my data set to try and make logical patterns out of the date code
data. I think the big thing I will need to do is get the data parsed out to
the most logical pattern I can and then manually alter the outliers... I
guess that I can do is work the.csv file before I try and import it into
excel ... The other thing that is frustrating is that this vendor must make
lots of changes to their website titles as last month they had certain title
names and they had them in certain columns and now in December they made
changes... So as you can imagine it makes it even harder for me try and
standardize a process on my end... I will be working on this file some more
and if I do get stuck I am sure I will ask some more questions...
Signature

GS

> Excel, as all computer programs, is a beast of logic.  I can think of
> various techniques to clean up the first column and put those entries into a
[quoted text clipped - 40 lines]
> > takers
> > ready for this headache.
Ron Rosenfeld - 29 Dec 2007 00:40 GMT
>Okay folks I have a painful .csv file that I recieve daily that is now my
>problem. Granted I am only as good as my data so let me show you an example
[quoted text clipped - 22 lines]
>So my goal is to try and come up with some methodology to fix it. Any takers
>ready for this headache.

The formats that you have are translatable and consistent except for the next
to last two that start with an eight.

All of the others are basically in a YMD format that is either
    yymmdd  or
    ymmdd   or
    yyyymmdd
(with extraneous information possible before and after)

The two dates that start with an eight are both in mddyy format.

How do you know that the Date Code Answer for those two entries is what it is?

If we could apply the same logic, we might have a chance.  Maybe there is
something in another field?
--ron
Ron Rosenfeld - 29 Dec 2007 03:05 GMT
>Okay folks I have a painful .csv file that I recieve daily that is now my
>problem. Granted I am only as good as my data so let me show you an example
[quoted text clipped - 22 lines]
>So my goal is to try and come up with some methodology to fix it. Any takers
>ready for this headache.

As I think about this more, it occurs to me that you will also need to know the
purpose of the "X".  If it is a placeholder for a "0", that would be good.

However, if it is some other kind of indicator, that might be followed by
either a five or six digit date code, then other entries become ambiguous.

For example, an entry similar to your #2:

X710101330382 could be
2007-10-10  or
1971-01-01

--ron
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.