|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If you're a Word 2002/2003/2007 user and you're in a hurry...If you are having problems using an Excel Workbook as a data source for a MailMerge in Word 2002 (Word XP), Word 2003, or Word 2007 many problems can be solved in one of two ways (and I'm not so sure when it comes to Word 2007) 1. Use Word "field formatting switches" to modify the way Word displays your dataYou can use formatting switches in your Word MERGEFIELD fields and
other types of field. You will find
plenty of information about many of the problems and solutions on Graham Mayor's pages
at 2. Change the way Word gets your data to "DDE"Word has several ways to connect to Excel data. In Word 2002 and
later, Word uses something called OLE DB unless you say otherwise. You
can change Word so that it uses the way that Word gets its data from
Excel to a thing called "DDE." DDE is an old method that does not
understand everything in newer Excel workbooks, so it is not a good
long-term choice. In particular, DDE only has a chance of working if: To change to DDE: To see detailed screenshots of the above procedure, click here. It's a big page with lots of graphics - about 2Mbytes. Notes: ...If you're not a Word 2002/2003/2007 user, or you're not in a hurry, or the above approaches didn't helpUnfortunately, I cannot provide "The Answer", partly because there is no single solution that works in all cases, and partly because some of the problems are still unexplored. So the following notes describe what my current understanding of how Word gets data from Excel. It is not at all straightforward, so bear with me... A short note for Mac Word users: I only have experience of Mac Word 2004, which only has one connection method for connecting to Excel. It behaves in a rather similar way to the DDE method in Windows except that you can connect to any worksheet in the workbook. The main problem with it is that it does not work properly with non-ANSI Unicode data in the workbook. At the moment I can't offer a workaround for that problem except that if your worksheet has 63 columns or fewer, you should be able to copy/paste it into a Word document and use that as the data source. If you can paste non-tabular data, you may be able to go beyond that 63-column limit. The rest of this document is concerned with the Windows version of Word. How (Windows) Word gets data from ExcelWord can get data from Excel using "OLE DB", "DDE", "ODBC", or a "converter". These things are often referred to as Connection methods. For further details on these methods, look here. Otherwise, the key points are as follows: a. In Word 2000 and earlier Word will try to get data from Excel using DDE by default - that is, unless you specify otherwise. b. in Word 2002, 2003, Word will try to get data from Excel using the Jet OLE DB Provider by default. In Word 2007, Word will use the Ace OLE DB Provider by default. Jet is the database engine that is used by Microsoft Access to maintain data in its .mdb format files. It is also capable of reading and writing a number of other formats including some older dBase formats, delimited text files, Excel workbooks and others. In Office 2007, Microsoft produced a modified version of Jet called Ace. A key change for Word users is that the Ace OLE DB Provider can read data in the new Excel 2007 XML formats - .xlsx and so on. c. DDE uses Excel to get the data more or less in the format you see it in Excel. However, DDE can only get data from the first worksheet in your workbook. It cannot read Unicode data properly. It isn't maintained or supported. Because it uses Excel, Excel must be on your system, and if you are trying to automate Word to do a MailMerge, the whole process becomes much more complicated because Excel must run. d. In contrast, the OLE DB Provider does not use Excel, except,
perhaps when the workbook you want to access is already open in Excel.
The OLE DB Provider can
get data from any worksheet that it can see in your workbook. It reads Unicode text data
properly. It can retrieve very long memo fields. However, the OLEDB
provider does not always get the data as you see it in Excel, and in
some cases it can lose data such as memo fields, text and so on. Also,
unless the workbook is already open in Excel, the OLE DB Provider e. As far as I am aware, in Word 2003 no method can open Excel data sources (or any other kind of data source) located on an Internet URL such as an http or ftp URL. The file has to be accessible via the Windows file system, either locally or via a Windows share (I do not think you can open files via a WebDAV share either). Even in that case, you will need the necessary Windows file system permissions to open the file - they may be permissions on the file itself, the folder it is in, or the share. f. If the workbook is already open in Excel, all the methods seem to be able to connect to it. However, the Converter asks whether you want to make a read-only copy. The other methods never seem to report a "read-only" problem. g. If the workbook is encrypted using a "Password to Open", only the DDE method wil be able to get data from it if the workbook is closed. DD will get Excel to open the book, and the user will be prompted for the password. if the workbook is already open in Excel, DDE, ODBC and OLE DB all appear to work, but the Converter asks whether you want to make a read-only copy, then tries to open it and cannot. h. if the Workbook is not already open in Excel but you have opened it as a Word data source, then Excel pops up its "File in use" dialog box and ask if the user wants to open the file Read-Only. The user can avoid this dialog by opening the workbook using Excel File|Open and specifying "Open Read-Only" in the Open button's dropdown option list. (If Word used DDE to open the Workbook, it will already be open in Excel). Trying to change the way that Word opens the workbook by setting the OLE DB Provider's connection string "Mode" parameter to a value such as "Share Deny None" does not work around this problem. The remainder of this article is about what the OLE DB Provider actually does. The information provided here comes from experiments that have been done mainly on a Windows XP Pro SP2 system running Office 2003 Pro SP2 both fully patched up to around 25 January 2007. The experiments have all been to do with setting up Excel as a mail Merge Data Source, rather than as a datasource for a Word DATABASE field. Using an Excel data source in Word while it is open in ExcelBroadly speaking, Word seems to be able to use an Excel workbook as a data source even when you have it open in Excel, whichever method you use to connect. Why doesn't the OLE DB provider get the data as I see it in Excel?None of the Connection methods transfer anything other than the values in the cell, with or without text formatting such as currency signs, numeric separators, percentage signs and so on,. None of them transfers typeface or font size, text or cell colour information and so on. However, when you format a value in Excel, so that you see, say $1,234.50 in a cell, OLE DB may only transfer the value 1234.5. There are two main reasons for this: 1. Excel does not actually store the data as you see itThe OLE DB Provider does not retrieve the data you see in Excel from the .xls workbook file because Excel does not store the data as you actually see it, and the Provider does not apply all the formatting information that Excel stores about each cell.For example, suppose you enter the number 1234.5 in an Excel cell,
then format it as a US dollar currency value, Excel will display it as
$1,234.50 . However, when you save the workbook, Excel does not store
that displayed value anywhere. What it (probably) stores is: I say "probably" because I don't have the details of the proprietary binary .xls format and cannot see what Excel actually stores. However, in Excel 2003 it is possible to see what Excel probably stores in .xls format by saving in Spreadsheet XML format and examining it using (say) Notepad or the XML Notepad application that you can download from the Microsoft website. The XML format definitely does not store "$1,234.50" Not surprisingly, Excel understands its own format specifications and
is capable of reconstructing the display text "$1,234.50" when you
re-open the Workbook. This is why the DDE connection method can return
the data you see in Excel - it lets Excel do all the
formatting work for it. Excel format specifications are richer and more
complicated than the formatting switches used in Word MailMerge. For
example, Excel can use: However, the OLE DB provider does not use the same code as Excel and
does not appear to understand or apply some of the format specifications that Excel uses, and
in some cases probably applies its own rules. One thing I have noticed
is that the formats that Excel uses if you select a format from the
Currency column are sometimes different from the formats you get in the
Custom column
. In particular 2. The OLE DB Provider has to determine a specific data type for each column. In order to conform to OLE DB standards, the OLE DB provider has to determine a "data type" for each column in the data source. This is normal practice in the database world. For example, in an Access database you specify whether a column is numeric, text, YesNo (Boolean), date/time and so on. You simply can't store a text value such as mytext in a numeric column in Access - the software will not let you. In Excel, by contrast, every cell in a column could contain a
different type of data. Excel encourages this, because it is designed to
recognise certain types of data and act accordingly. It should be able
to - it's a spreadsheet, not a database. For example, if you enter
a number such as 1234.5 into a cell, Excel right-justifies it. Enter the
text mytext and Excel left-justifies it. Enter the text true
and Excel recognises it as a "Boolean" (or "logical") value, and
displays the text TRUE in the middle of the cell. So if the OLE DB
Provider finds a column with all those different types of data in it, it
has to do two things: The problem with this approach is that if the OLE DB Provider decides that a column is numeric, then it has to discard the contents of a cell containing the text mytext. The only thing the Provider can do is to set that value to a numeric value (and traditionally, it would be 0) or to a special "null" value. So you see "mytext" in Excel, but "0" or "" in Word. There's one possible way around this, and that would be for the OLE DB Provider to decide that every column was a text column and to convert all the values in every column to text. That way, it could return mytext as "mytext", the number 1234.5 as the text "1234.5", the Boolean TRUE as the text "TRUE", and so on. In other words, every value gets through, and anything that needs to convert the text "1234.5" back to the number 1234.5 can do so. Unfortunately, the OLE DB Provider does not do that. How the OLE DB determines the data type for a column1. Data types returned by the providerThe Provider recognises the following types as far as I can
tell: 2. The IMEX, TypeGuessRows, and ImportMixedTypes settingsWhen a program such as Word invokes the Provider, it can tell it to use one of three different "IMEX" values. "IMEX" is just a contraction of "Import/Export mode". The Provider recognises IMEX=0, IMEX=1 and IMEX=2. The provider also looks at two values stored in the Windows Registry. These are called TypeGuessRows and ImportMixedTypes. By default, TypeGuessRows is set to 8. In theory it can be set to any number between 0 and...a larger number than the maximum number of rows in an Excel worksheet (65536 for Excel 2003, 1048576 for Excel 2007). By default, ImportMixedTypes is set to "Text". The Provider recognises two values "Text" and "Majority type". If this value is set to anything else, e.g. "Other", the Provider will raise an error and fail. 3. Determining column typeFor each column in the worksheet, the Provider reads a maximum of TypeGuessRows data cells (i.e. ignoring the column name cell), and counts the number of cells with each data type. Most documentation suggests that if TypeGuessRows is set to 0, the provider reads the entire column, but experiments suggest it actually reads 16384 rows of data. This was perhaps a maximum row count in an earlier version of Excel. If IMEX=0 or IMEX=2, or IMEX=1 and ImportMixedTypes="Majority
type", or the provider only found one type in the column) the Provider
uses the "Majority type" to determine the column type. So for example,
if TypeGuessRows is 8 and in the first 8 rows of a column there are 5
numbers and 3 string values, the Provider sets the column type to
Number. If there are equal counts of numbers and strings, "numbers" will
"win". If there are 5 Boolean values and 3 strings, the Provider will
set the column type to Boolean. And so on. However, there are two
important exceptions: h. Otherwise, i.e. if IMEX=1, ImportMixedTypes="Text", and the provider has found different types in the first TypeGuessRows rows, the provider sets the column type to "string", unless it found a memo in which case it sets the column type to memo. 4. Converting/formatting the data to fit the column typeOnce the provider has decided the column type, it decides what to do with the values in the column that do not have that type. However, it does this differently depending on whether you specified IMEX=0 or 2, or IMEX=1. The following table shows the possibilities. The column on the left shows examples of data that you might have in your Excel worksheet. Columns 2-6 show what happens to that data in each type of column that the OLE DB Provider can return. For example, if IMEX=0 and the Provider decides that a column should be "Boolean", a cell containing the number 1 will appear as "True" in Word. The texts in this table are the results of Word MERGEFIELD fields.
Perhaps the main things to notice in the IMEX=1 table are the
problems frequently reported by users, including
Because of regional differences, you may find that OLE DB does not behave in exactly the same way on your system. For example, it may return £5,432.10 or $5,432.10 instead of 5432.1 in some cases. Also, if for example you work with the German-language version of Office, "true" and "false" are treated as strings in Excel, whereas the German words WAHR and FALSCH are recognised as the true and false Boolean values. On my German language system (which actually runs German Windows XP and the "MLP" multi-language version of Word 2003, so it may be different from a typical German-language Windows/Office system), when I use the OLE DB provider to get Boolean Excel data, I still see the English-language words. There is one other critical piece of information. When Word connects via OLE DB to a .xls data source, Word always specifies IMEX=1. If you have managed to follow the story so far, you will probably realise that OLE DB is less likely to lose data that way. Some conclusionsI think the main conclusions to be drawn are as follows: a. it's all very messy and could probably work a lot better b. if you have an Excel column with a single data type, all the data will get through but will probably need reformatting using switches in Word MERGEFIELD fields as suggested near the beginning of this page. if anything, the most irritating situation is when you have plain numbers and the Provider tacks all those extra digits on the end. c. if you have a column with mixed data types, in most cases you will see
most of your data, but d. However, if the memos are not truncated, Word and OLE DB do seem to be able to read very large memo fields - in simple tests here I was successfully reading memo fields around 140Kb before I gave up. Is there any other way to work around these problems?There does not appear to be a straightforward workaround. Word already sets IMEX=1 which probably provides more useful results in most cases than IMEX=0. But you can try the following... 1. Format a column as text via Data|Text to Columns... If you have a mixture of numbers and texts in a column but numbers in
the first 8 rows, you may need to get the OLE DB provider to "see" the
column as a text column. However, selecting the column and using
Format|Cells to set the format as Text does not appear to be enough. One
way that does appear to work is 2. Add a row of cells containing memo texts and tell the Provider to inspect only 1 rowThis is a kludge that requires you to add a row to your spreadsheet before using it as a data source, and other stuff. I haven't used the approach for real - it's just something I thought of as I investigated the problems. I suspect that it might cause problems if you try to sort or filter your data.The idea is as follows: However, there's a problem with this approach. If you change the standard settings for the Jet Excel Engine by changing TypeGuessRows to 1, it may affect everything on your system that gets data from OLE DB. Changing these settings in not a good programming practice in my opinion. Incidentally, if you decide to go in the other direction and get OLE DB to look at more rows, you could in theory set the count to the maximum number of rows in an Excel 2003 worksheet. But that doesn't solve the "mixed data" problem when you're using Memo fields. And the OLE DB will take a long time to read the entire spreadsheet - every time you open it as a Word data source. Fortunately, it's possible to create your own area in the registry containing the values you want for TypeGuessRows and ImportMixedTypes, and tell the Provider to look in your area rather than the standard area. The Provider will pick up any values you don't put in your own area from the standard area. So for example, the standard registry keys and values are here: Suppose I create my own keys and put them here: Then I have to tell the OLE DB provider where they are by adding this to my
OLE DB connection string: If you are familiar with VBA and the OpenDataSource method, you might be thinking that all you have to do is record the VBA while you connect to your Excel workbook, then alter the Connection parameter and run the modified VBA. Unfortunately, there is yet another twist in the tale. When you connect to an Excel data source using OLE DB, Word does set up and return a Connection string. Unfortunately, Word takes no notice of any connection string you provide in an OpenDataSource method call when it opens an Excel .xls using OLE DB. Fortunately, there is a workaround. 3. In addition, use a .udl or .odc file to open the workbook indirectly...Instead of connecting to the .xls file directly, you can connect
using a .udl (Universal Data Link) file or .odc (Office Data Connection)
file that contains the necessary connection information. .udl files
contain the information that an OLE DB provider needs to connect to a
particular "database" (which in the case of Excel, means a "workbook".
However, they don't let you specify a "table" in that database (for
Excel,"table" means worksheet or range). I'm not going to consider .udl
files any further at the moment because programmers usually already know
how to create them, and Office is more oriented to using .odc files.
.odc files are rather like .udl files but There does not seem to be a specification of the .odc format anywhere, so all you can do is create one, see what's in it, and try modifying the contents. I outline how to create one below. There are several ways you can use a .odc file with Word MailMerge (or to insert a DATABASE field) 4. ...or create one or more .odc files and let the users select them instead of the .xls workbookIf the idea is still that the user selects the data source manually,
then you can provide a .odc with all the right information in it and let
them select that instead of the .xls file. As usual, there's a catch.
The user isn't selecting the actual workbook any more, so there has to
be another way to specify it. Two possible approaches are: There seem to be at least two types of .odc file. One stores information on how to connect to the "database" as a whole, and the other stores information on how to connect to a specific table within a database. For Excel workbooks a "database" is a workbook, and the tables are the worksheets and named ranges within it. If you use the "database" type of .odc file, the user should see a prompt for a list of worksheets and named ranges, just as if they had tried to open the workbook by selecting the .xls. If you use the "table" type of .odc file, you have to name the table. If you were trying to use approach (a), that might mean that you would have to create one .odc fie for every workbook/table combination you wanted - probably not such a good idea. However, if you want to make it particularly easy for a user to select a few specific workbook/table combinations without even facing the dialog with the list of tables, it's probably quite a good approach. Just don't forget to train the users :-) There's one more snag with using .odc files that some people need to know. Supposing you are designing a system which will run with one data source today (let's say today.xls) but which you know will eventually use another (oneday.xls) Can you roll out a Word document with a .odc that points to today.xls, then simply roll out a new version of the .odc that points to oneday.xls when you want to change over? Unfortunately not. Word only really gets connection information from the .odc when it executes an OpenDataSource call. It stores that connection information internally. So if you close your .doc, change the .odc, and re-open the .doc, the real data source will not change. If you want Word to take notice of the .odc every time the .doc opens, you will have to ensure that Word actually issues an OpenDataSource call, e.g. in an AutoOpen macro. My guess is that this results from a deliberate design decision. Incidentally, Excel works in a similar way when using .odc files. 5. If you're programming, you can do some of this using a VBA OpenDataSource method call... If you want to make the connection programmatically, the VBA method call can be very simple, e.g.
ActiveDocument.MailMerge.OpenDataSource _ If the .odc is a "table" type .odc, the method call should open the workbook and table specified in the .odc, as in (a). If no workbook is specified, the user will face the OpenDataSource dialog box as in (b). If the .odc is a "database" type .odc and the workbook was specified in the .odc, the user should see the "tables" dialog box, as usual. If no workbook was specified, the user will probably see the Data Link Dialog, then possibly the tables dialog box - probably a recipe for disaster so if you really want to do that, I leave it up to you to research it. However, an obvious programming ploy once you have created your .odc would be to see if you can set up a general-purpose .odc then specify the workbook name, table name and perhaps other stuff in the OpenDataSource method call. What actually happens is as follows: a. if you put anything at all in the Connection parameter in
the OpenDataSource call, the entire connection string in the .odc is
ignored. However, in this case, Word does honour what you put in the
connection string (unlike when you specify a .xls file in the Name
parameter). So you have to put everything you need and want in the
Connection parameter. As far as I can tell, the absolute minimum for
connecting to an Excel workbook consists of: In addition, you may want to specify the things we have been
discussing, e.g. b. if you put anything at all in the SQLStatement/SQLStatement1 parameters, it overrides any table specification in the .odc file, but it must be a valid piece of Jet SQL. This seems to be the case whether you have a "database" type .odc file or a "table" type .odc file. 6. ...and you can use an empty .odc fileIf you need to use a .odc in OpenDataSource and you want to override the connection string in the .odc and the table name in the .odc, here's one final kludge that may really help: don't put anything at all in the .odc file. It's just a text file. Leave it completely empty, and put everything you need in the OpenDataSource call, e.g.
ActiveDocument.MailMerge.OpenDataSource _ Programming detailsThe following notes are intended to fill in some of the practical
details missing from the section titled "Is there any other way to work
around these problems?", in particular: 1. How to set up a private area in the Windows Registry containing Jet-related settingsa. Open Notepad b. Copy/paste the following text into it. Remove any blank lines at the beginning and end. Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0\Engines\Excel] c. Save the file using a .reg extension (e.g. in c:\a\mysample.reg) d. start Windows Registry Editor, e.g. by locating regedit.exe, typically in c:\windows\regedit.exe, and double-clicking it e. Click FIle|Import, select the .reg file you just saved, and import it. 2. How to create and edit a .odc file for an Excel data sourceYou can create a .odc in Word, Excel, or by editing an existing one. In Word, start with a blank document and the MailMerge toolbar
enabled, then When the OLE DB Provider tries to connect to the data source, if there are any problems, Word may either pop up the Data Link dialog box again, or it may just try to treat the .odc as if it were a .csv type text file, in which case Word will pop up various error messages to do with invalid field counts and so on. There is one other potentially useful aspect of .odc files. If you created your file in your "My Data Sources" folder and you open the .odc using Internet Explorer, you should see the results - i.e. for a "database" .odc, you should see a list of tables, and for a "table" .odc, you should see the table's data. This doesn't seem to work for all types of .odc, though. Also, if your .odc is in a different folder, you will need to copy a file in "My Data Sources" called DATACONN.HTC to the same folder as your .odc for this to work, and you will probably face more security-related dialog boxes when you try to open the .odc. The .HTC file contains JavaScript code that may shed some light on the contents of your .odc The following listing is for a "Table" .odc file for an Excel data source. Some notes: k. You can edit the .odc in Notepad and change whatever you need in the ConnectionString. Notice that you need to use HTML-style names for some characters, e.g. " instead of " l. For Excel, there doesn't seem to be a great deal of
difference between a "database" .odc and a "table" .odc. You should
check for yourself, but if you start with a "Table" .odc, to make a
"Database" .odc fior the same .xls file, m. In a "table" .odc, to change the table that Word will open,
I assumed that you would have to change the name in the <odc:CommandText>
element and that the <meta name=Table content=> tag would just be
documentation. This turns out to be incorrect, at least for Excel files.
You only need to change the meta tag, although I suppose it is good
practice to change the other one as well, or delete it if is not
necessary. You can use the following: Here's the .odc sample <html> 3. How to set up and use a Word VBA macro that uses OpenDataSourceLet's take the example from an earlier section in this page: ActiveDocument.MailMerge.OpenDataSource _ Add Sub and End Sub statements before and after the statement. In the following example I also add some statements to disconnect the existing data source and set the merge type and destination: Sub mysub() ActiveDocument.MailMerge.MainDocumentType =
wdNotAMergeDocument ActiveDocument.MailMerge.OpenDataSource _ End Sub To use the macro, see the following articles by Word MVPs: http://word.mvps.org/FAQs/MacrosVBA/CreateAMacro.htm http://word.mvps.org/FAQs/MacrosVBA/DocumentEvents.htm You may also have to set up the registry entry described in the following Microsoft Knowledgebase article: http://support.microsoft.com/kb/825765 [End of page]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Please post any follow-up questions to this article in the Microsoft public newsgroup on mailmerge and fields. If you are using a newsreader such as Outlook Express, the server is at news://news.microsoft.com and the Mailmerge newsgroup is at news://news.microsoft.com/microsoft.public.word.mailmerge.fields. Otherwise, you can go to the Microsoft Communities home at http://www.microsoft.com/communities and look for the group. For some reason it is currently named "Mailmerge and Fax" | ||
| Disclaimer | Peter Jamieson's Tip Pages: Copyright © 2005-2007, Peter Jamieson |
|