I'm sending this in HTML because of formatting issues, apologies if you can't read it.
Does the document begin as follows?
1.
2.
3.
(I don't know Thai so I'm just guessing).
If so, the problem is probably that the file was saved on a non-Unicode setup, perhaps on an earlier Thai-only version of Windows/Office, perhaps using a third-party Thai-language enabling program. I don't know anything about Thai, but this is a big problem for East Asian languages such as Chinese. Before Windows and Office had good Unicode support, a lot of Chinese-language users outside of Chinese locales used such programs to input Chinese characters into Word.
The problem is that the files created this way are not easily readable in later, Unicode-aware versions of Office.
I wrote a simple VBA program that converts such documents in Chinese, and it could probably be adapted to your situation. Michael would be able to tell us if there's an easier way around this but VBA is the only way I've found so far.
The key function would be this:
Function ThaiToUnicode(ThaiText As String) As StringThaiToUnicode = StrConv(StrConv(ThaiText, vbFromUnicode), vbUnicode, wdThai)End Function
Given a string from your document, it will turn it into a correctly-formatted Unicode version of that string. Running it on the first few lines gave the above output, which looks like Thai to me ... and Google suggests that the first word is "manual."
If this is the right track, I can send you some code that can be adapted to process a whole document.
Bruce
> I'm sending this in HTML because of formatting issues, apologies
> if you can't read it.
Don't worry, I can read it.
BTW, if by "formatting issues" you mean writing Thai, the message can be
send in non-html simply be using the format Thai encoding.
> Does the document begin as follows?
Yes Sir, very good work!!!!
A rough translation of what you wrote:
>ÊÒúÑ
Table of Contents
>º··Õè 1. ¡ÒõԴµÑé§Ãкº
Chapter 1. Installing the software
>º··Õè 2. àÃÔèÁµé¹ãªé§Ò¹
Chapter 2. Start using the software
>º··Õè 3. àÅ×Í¡¢éÍÁÙÅ
Chapter 3. Selecting data
>(I don't know Thai so I'm just guessing).
>If so, the problem is probably that the file was saved on a non-Unicode
>setup, perhaps on an earlier Thai-only version of Windows/Office, perhaps
[quoted text clipped - 5 lines]
>The problem is that the files created this way are not easily readable in
>later, Unicode-aware versions of Office.
The file is dated "23/01/1997".
Lots of files in that time have been written in a non-Unicode setup, but
used fonts especialy made for that particular language (8 bit fonts).
Loading these files without the proper fonts installed results in a screen
full of garbage.
Strange thing is that I have fonts that this file requires (CordiaUPC,
AngsanaUPC, EucrosiaUPC) installed on my machine, but it is stil garbled.
>I wrote a simple VBA program that converts such documents in Chinese, and
>it
>could probably be adapted to your situation. Michael would be able to tell
>us if there's an easier way around this but VBA is the only way I've found
>so far.
The easiest way I can imagine (IMHO) is that Word would ask for a conversion
or at least to install the proper fonts if Word cannot find them..
The original document specifies that the fonts CordiaUPC, AngsanaUPC and
EucrosiaUPC has been used (looked up with a binary editor).
If word can find these fonts, the document should be displayed correctly.
If not, Word should ask for the original fonts.
I have the multilanguage version of Word and Windows installled, thus
reading documents written in Thai would be assumed to be possible..
>The key function would be this:
>Function ThaiToUnicode(ThaiText As String) As String
>ThaiToUnicode = StrConv(StrConv(ThaiText, vbFromUnicode), vbUnicode,
[quoted text clipped - 5 lines]
>few lines gave the above output, which looks like Thai to me ... and Google
>suggests that the first word is "manual."
Altough I know a little bit of programming in Visual Basic myself, I did not
experiment with that idea.
But if the end result is acceptable, and apparently it is, then this
solution is fine for me.
>If this is the right track, I can send you some code that can be adapted to
>process a whole document.
For me, the end result is most important.
As far as I can see, you did a wonderful job and I wish to thank you for
that.
If it would not be too much trouble for you, I would gladly receive from you
a code as you describe (Visual Basic) that would allow me to process the
whole document.
I would surely appreciate if Michael or some people from Microsoft could
come up with a Macro that would convert documents written in non-Unicode
fonts to Unicode fonts and publish it for the many other users of Office who
faces the same problem.
Regards,
Tamara
>Bruce
>> What format are the files in?
>
[quoted text clipped - 62 lines]
>>>>>
>>>>> Tamara
Bruce Rusk - 22 Nov 2005 04:53 GMT
Tamara,
I played around with it a bit more, and here's the solution.
1. Open the .DOC file in Wordpad, not Word (in XP you can right-click, Open
With -> WordPad).
2. Save As a new file name (e.g. Manual2.doc) with Rich Text format
3. Open the new file in Word. The Thai text might display in the wrong font,
just select all and put it into the Thai font.
Should be fine, though some of the formatting might not be preserved.
Bruce
Bruce Rusk - 22 Nov 2005 05:19 GMT
Or, alternately, open in WordPad, paste into a blank Word document, change
font.
> Tamara,
>
[quoted text clipped - 11 lines]
>
> Bruce
Tamara - 23 Nov 2005 01:41 GMT
Dear Bruce,
I have tried both options you gave, but with no succes.
The text was displayed everytime as question marks.
Spinning further on your idea, I tried to open the "*.doc" file with
Internet Explorer (right-clicking and choosse "open with").
The file was then loaded into Word as a "Web" file and strangely enough it
was loaded in the right Thai font.
I guess that some of the formatting might be lost, but that is less
important.
Strange thing is that the file which was originaly more than 12MB after the
conversion was reduced to 1,768KB.
The manual is complete as far as I can see.
I have uploaded the result to see for yourself at:
http://s35.yousendit.com/d.aspx?id=0BIO3EY2T6YOH0QIEXR8AARKDN
If you don't mind, I would gladly receive the program code to prcoess a file
as you said.
Maybe it weil come in handy for loading other files if there is a problem.
Thanks to all the people who helped me with this problem.
Regards,
Tamara
> Or, alternately, open in WordPad, paste into a blank Word document, change
> font.
[quoted text clipped - 14 lines]
>>
>> Bruce
Bob Eaton - 22 Nov 2005 18:02 GMT
> I would surely appreciate if Michael or some people from Microsoft could
> come up with a Macro that would convert documents written in non-Unicode
> fonts to Unicode fonts and publish it for the many other users of Office
> who faces the same problem.
Check out http://scripts.sil.org/EncCnvtrs for the "Data Conversion Macro"
which can be used to convert documents from legacy fonts/encodings to
Unicode (or other Unicode text processing such as transliteration, etc).
The current version supports code page converters (e.g. Thai code page =
874), but they can't be configured directly (it'll be there in the next
version--until then, see below for how to add a code page converter to the
repository).
Once the converter is added to the repository, then it can be used in any
COM/.Net enabled application and in particular, the Data Conversion Macro is
a VBA client app for converting Word documents.
Bob
To add a code page converter, use something like the following:
Sub AddThaiCodePageConverter()
Dim aECs As EncConverters
Set aECs = CreateObject("SilEncConverters.EncConverters")
aECs.Add "Thai Code Page", "874", ConvType_Legacy_to_from_Unicode,
"Thai", "UNICODE", _
ProcessTypeFlags_CodePageConversion
End Sub
Tamara - 23 Nov 2005 01:57 GMT
>> I would surely appreciate if Michael or some people from Microsoft could
>> come up with a Macro that would convert documents written in non-Unicode
[quoted text clipped - 26 lines]
> ProcessTypeFlags_CodePageConversion
> End Sub
Dear Bob,
Thank you for the link to the "Data Conversion Macro" and the code to add a
Code Page converter to MS Office..
I've finaly been able to open the document without loosing any vital
information.
Regards,
Tamara