Wednesday, April 21, 2010

How to create an Excel file programmatically (Part 2)

Offce 2003 XML method

This method is as simple as saving an .xls file into a template .xml format. When we open the saved XML file, we'll see a rather simple format to follow and places where we can insert new data rows.

<!--xml version="1.0"?--> <!--mso-application progid="Excel.Sheet"?--> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40"> <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office" /> <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel" /> <Styles> <Style />... </Styles> <Worksheet ss:Name="Sheet1"> ... {Data table location} <WorksheetOptions /> </Worksheet> </Workbook>

All we have to do then is insert the data table in the XML dialect below, which while different from (X)HTML is still relatively easy to understand and follow.

<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="2" x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15"> <Column ss:Width="107.25"/> <Column ss:Width="106.5"/> <Column ss:Width="21"/> <Row> <Cell><Data ss:Type="String">This is Cell A1 in Excel</Data></Cell> <Cell><Data ss:Type="String">This is Cell B1 in Excel</Data></Cell> </Row> <Row> <Cell><Data ss:Type="Number">123</Data></Cell> <Cell><Data ss:Type="Number">456</Data></Cell> <Cell ss:Formula="=RC[-2]+RC[-1]"><Data ss:Type="Number">579</Data></Cell> </Row> </Table>

But even if we don't use Excel's own dialect of table elements, we can still easily substitute in regular HTML (4.0) tags as indicated in the namespace declaration near the beginning of the XML document above:

<Workbook ... xmlns:html=".../REC-html40">

All in all, it's still relatively simple to do.

(Part 3)The OpenXML approach, however, isn't as simple (for me, anyway)

Sunday, April 18, 2010

How to create an Excel file programmatically (Part 1)

This was a fairly easy and straight-forward job without requiring any additional APIs. Microsoft Office 2003 was pretty forgiving about Excel file extension that any auto-generated Excel file would be easily accepted without much hiccup. Nowadays, with Office 2007 and the upcoming Office 2010, we again are forced to resort to a "few" different Excel-creation techniques which yielded various pros and cons.

  1. The should've-been-the-easy-way: (X)HTML+CSS+Excel Formula minus-JavaScript format saved as .xls file.
  2. The should've-been-just-as-easy-way: Office 2003 XML format saved as .xls file.
  3. The what-we're-now-forced-to-do: Microsoft OpenXML SDK (2.0) way creating a warning-free Excel 2007 file (SDK 2.0 is the most recent library for Office 2010).

I think I know one of the many reasons Microsoft decided to restrict the flexibility later versions of its Office suite—something to do with all those macro/VBA-infected office files or something like that. So they now go by the "locked-down-by-default" starting point. But I also think it's a crying shame that we get all sort of warning dialogs for file extension renaming and things that are already automatically recognized by Microsoft Office suite, anyway. If only there's a way to suppress the dialog from an administrative policy instead of going to each user's PC... If anything, they should make still automatically protect the end-user while still enabling programmers maintain high-productivity by adopting web document standard but without that whole pop-up dialogs everytime so that worker-bee programmers like myself aren't forced to find, re-devise yet another technique to create, and yet again re-wrap a new set of APIs on a new SDK just to do what's supposed to be a simple thing. Do I digress? Perhaps not. We'll see with the aforementioned techniques in the following illustrations.

(X)HTML && CSS || Excel Formula !JavaScript method

This technique is as simple as it could be; there really is no other additional digging required other than what any web programmer already knows. Anyone could manually write the (X)HTML-based Excel file or use his/her favorite programming language to generate the intended file. You be the judge after seeing the code in Listing 1 below.

<!--xml version="1.0" ...?--> <!--DOCTYPE html PUBLIC "... XHTML 1.0 Strict//EN"... --> <html> <head> <title>XHTML-based Excel file</title> <style>/*CSS styling directives here*/</style> </head> <body> <table> <tr> <td>This is Cell A1 in Excel</td> <td>This is Cell B1 in Excel</td> </tr> <tr> <td>123</td> <td>456</td> <td>=A2+B2</td> </tr> </table> </body> </html>

The advantages of this technique are as apparent as the advantages of following a widely-accepted and standardized document object model, especially one that would be readily usable on a web browser. In addition, this would translate well to the natural order of a spreadsheet being a two-dimensional table understood as rows and columns made up of cells. Those as what

<table>, <tr>, <td> and other related (X)HTML elements are for.

The one great disadvantage stemmed from an annoyance of the application (understandably) warning the user that the content format is different from what is understood by Excel when looking at an .xls file extension. And it is more often than not sufficient to put-off a client-centric programmer who cares about NOT scaring his customer as much as he does about programming ease and efficiency.

Will this image: XHTML-based Excel file warning dialog be enough to make us try to think of a new, more "Excel-worthy" technique? I certainly thought so.

Up next (Part 2): Office 2003 XML format approach.