Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Document Management / July 2006

Tip: Looking for answers? Try searching our database.

A Better Solution?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Rebecca - 21 Jul 2006 03:53 GMT
I apologize that this is not exactly a question (except for one in the last
paragraph), but it would be nice to hear some comments or suggestions.

Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
entire book of 350 pages, which has many color pictures, using Acrobat 7.0.  
The resulting PDF was 154 megabytes.  I then saved this PDF as a htm file,
opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
word document.  The resulting doc size is a teensy-weensy 21 KB, and the file
(which is broken up into 350 separate pages, just as in the original PDF) is
as readible as a PDF (and more navigational after I add some page numbers,
links, and the like -- this process can be automated). OCR-ing takes too much
time (and I have to proofread the files anyway), so just having images of the
book in one MS Word file is a workable solution.  And with a Tablet PC the
images are inkable for annotations and the like.

If I scan directly into Word or other programs, I get huge files, no matter
how much I fiddle with the resolutions, file types, or compressions.  Using a
ADF I'm currently scanning all the thousands of books and articles scattered
here and there in my library (for my personal use, so no copyright issues),
and I will be able to carry my portable (and searchable) library around on my
(under 2 pound) Tablet PC.  

I've tried to import jpeg images into other programs such as OneNote,
AskSam, UltraRecall, you name it, but the resulting size of the files bloats
to intolerable levels.  PDF files take up too much space (and are slow when
navigating).  Does anyone have a better solution other than the one I
mentioned above?
Tim in Ottawa - 21 Jul 2006 15:28 GMT
Just keep the big pdfs, this will retain the most quality. Although these
files seem big now, as computers and software get faster and faster in the
coming years, these file sizes will seem irrelevant.

> I apologize that this is not exactly a question (except for one in the last
> paragraph), but it would be nice to hear some comments or suggestions.
[quoted text clipped - 23 lines]
> navigating).  Does anyone have a better solution other than the one I
> mentioned above?
Robert M. Franz (RMF) - 21 Jul 2006 15:33 GMT
Hi Rebecca

> Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
> entire book of 350 pages, which has many color pictures, using Acrobat 7.0.  
> The resulting PDF was 154 megabytes.  I then saved this PDF as a htm file,
> opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
> word document.  The resulting doc size is a teensy-weensy 21 KB, and the file

Wait a minute: how many high-color pictures are there in your 350 page
document? At 21 KByte, I doubt there can be much text in a 350 page Word
document, and no pictures to speak of. When you save as HTML, the
pictures and other stuff are most probably external (that's what Word
does, anyway, when you save a document to HTML there).

There seems to be either a couple of other big files around, or your
resulting document cannot be much more then a mere text file ...

BTW, have you tried saving as RTF from Acrobat?

Greetinx
Robert
Signature

 /"\  ASCII Ribbon Campaign |   MS
 \ /                        |  MVP
  X        Against HTML     |  for
 / \     in e-mail & news   | Word

Rebecca - 21 Jul 2006 16:33 GMT
Yes, Robert, the scanned book contains dozens and dozens of color pictures,
and yes, that's the actual size of the file.  I know, at first I thought it
was a bug (say, my computer was not reading the file size correctly) or I was
losing my eyesight or my mind.  

It does seem impossible (and I've been experimenting with various scanned
images for years to get the file sizes down).  Try it out and you'll see.  
It's almost a miracle (if you've got a ton of scanned material in PDF files,
that is).  And frankly, navigating PDF files in Acrobat is a pain (slow as
molasses, despite some nice functions, though).  But with a Tablet PC, you
can ink and do other thinks with the images in MS Word with no problem, and
it still does not increase the file size too much (though I haven't been
highlighting that much yet).

I don't think there are other big (connecting) files lurking somewhere on my
hard disk, and if there are, well, this would be a first, too.  I saved the
htm files as MS Word files, so go figure.  But who knows, maybe you're right
-- maybe there's a catch somewhere.  But as I recommended, try it with a big
PDF in Acrobat, save it as a htm file, open it in MS Word, and save it as a
MS Word doc. Viola!

> Hi Rebecca
>
[quoted text clipped - 17 lines]
> Greetinx
> Robert
Robert M. Franz (RMF) - 21 Jul 2006 19:46 GMT
Hi Rebecca

> Yes, Robert, the scanned book contains dozens and dozens of color pictures,
> and yes, that's the actual size of the file.  I know, at first I thought it
> was a bug (say, my computer was not reading the file size correctly) or I was
> losing my eyesight or my mind.  

Wasn't thinking about a bug, but I've seen my share of 300 page (and
lots more) files in Word, and one file with 300 pages, esp. if it is a
converted thingy, is unbeleivable to be less than 30 KByte in size --
and that's w/o pictures! :-)

> It does seem impossible (and I've been experimenting with various scanned
> images for years to get the file sizes down).  Try it out and you'll see.  
[quoted text clipped - 4 lines]
> it still does not increase the file size too much (though I haven't been
> highlighting that much yet).

You are talking about the "full" Acrobat (not the Reader), right?
Haven't got that one any of the systems I'm working at these days,
unfortunately. But if the file is as small as you say it is, can you
send it to me for inspection? I'm _very_ dubious I must admit. Sheer
information theory would prohibit compression in the magnitude we're
discussion here (well, that's not quite right: you can compress the
whole Bible into 1 bit, but then the whole Bible text must be part of
the decompressing algorithm -- and I very much doubt Acrobat hacked the
Word executables ... ;-)).

Greetinx from good old Europe
Robert
Signature

 /"\  ASCII Ribbon Campaign |   MS
 \ /                        |  MVP
  X        Against HTML     |  for
 / \     in e-mail & news   | Word

Rebecca - 22 Jul 2006 02:08 GMT
Robert,

Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an
incredibly useful scanner, by the way).  And yes, the file sizes are correct
because I sent them to myself by e-mail (I couldn't find your e-mail address
-- if you give it to me and I'll send you one with a lot of color pictures
[the whole book was scanned in color] , but please remember it was scanned
for my personal use -- I want to avoid copyright entanglements).  Did I
stumble upon a method to save an enormous amount of disk space?  Like you I
am still very dubious -- it's too good to be true.  Such compression is
absolutely impossible, as you said, and maybe you will be able to find out
what's really going on.  The original PDF was 154 megabytes, and the
resulting MS doc file is about 170 kbs.  As you implied, such radical
compression would be insane.  Please see if you can get to the bottom of this.

> Hi Rebecca
>
[quoted text clipped - 29 lines]
> Greetinx from good old Europe
> Robert
Robert M. Franz (RMF) - 24 Jul 2006 10:11 GMT
Hi Rebecca

> Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an
> incredibly useful scanner, by the way).  And yes, the file sizes are correct
[quoted text clipped - 8 lines]
> resulting MS doc file is about 170 kbs.  As you implied, such radical
> compression would be insane.  Please see if you can get to the bottom of this.

My email address should show up even in this MSFT web thingy (CDO)
you're using to access this group: robert.franz (at) mvps.org

I'm really curious now to see what you'll send me! :-) And have no fear,
I won't send this document elsewhere without your approval.

Greetinx
Robert
Signature

 /"\  ASCII Ribbon Campaign |   MS
 \ /                        |  MVP
  X        Against HTML     |  for
 / \     in e-mail & news   | Word

Robert M. Franz (RMF) - 25 Jul 2006 12:45 GMT
Hi Rebecca

[..]> My email address should show up even in this MSFT web thingy (CDO)
> you're using to access this group: robert.franz (at) mvps.org
>
> I'm really curious now to see what you'll send me! :-) And have no fear,
> I won't send this document elsewhere without your approval.

OK, I got your file, and my scepticism seems to be justified: I see a
bunch of picture placeholders with small "red X".

When I switch to field-code view (ALT-F9), I see a whole document
consisting of INCLUDEPICTRE fields like this.

INCLUDEPICTURE "images/PYRAMIDS_img_0.jpg" \* MERGEFORMAT \d

If you look at the syntax of INCLUDEPICTURE, the \d switch is explicitly
prohibiting storing the image in the document. So, look at the "images"
subfolder at the position of your file, and there you go.

Even Acrobat can't do magin, after all ... :-)

Greetinx
Robert
Signature

 /"\  ASCII Ribbon Campaign |   MS
 \ /                        |  MVP
  X        Against HTML     |  for
 / \     in e-mail & news   | Word

Rebecca - 27 Jul 2006 04:51 GMT
Thanks, Robert.

Alas!  When something is too good to be true, it is usually is just that.  
The image folder you mentioned was right there staring me in the face (which
is quite red right now).  And the jpegs there are all about 350-400 kbs each.
Thank God Seagate is going to come out with a terabyte hard drive because
I'll need one or two thanks to all these huge PDFs.

> Hi Rebecca
>
[quoted text clipped - 20 lines]
> Greetinx
> Robert
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.