Thursday, August 25, 2011

Free Options for reading and writing Excel Files

In general there are three main categories of tools that allow you to interact with Excel files.
  • MS Excel Primary Interop Assembly (PIA) Based: These require MS Excel be installed and licensed. Not a good option of server environment since it basically spins up Excel.
  • OleDB Based: These are very fast a light weight solutions generally that are server friendly. No license or installation of Excel required. I highly recommend the ACE driver instead of JET driver. You can do read, write, update with this option. See OLE DB Article section below for details.
  • Other: This means they have coded their own solution typically using some kind of XML format, but sometimes using binary format.
I have not tested many of these solutions other than using PIA or OLE DB. The exception to that is CarlosAg Excel Xml Writer Library which I do like, but it is only for Writing Excel files, not reading them. It has good performance and works well for tabular and non-tabular spreadsheets.
I would love to add other solutions to the list so please let me know your favorite solutions.

Solution Requires Excel / Uses PIA Uses OleDB Uses
OpenXmlWriter
Server Friendly Supports Read Supports Write Supports
.XLS
Supports
.XLSX
MS Excel Interop (PIA) Yes No No Yes Yes Yes Yes
.NET Framework Data Provider for OLE DB (using ACE OLEDB 12.0) No ACE or JET No Yes Yes Yes Yes
Open XML SDK 2.0 for MS Office No No Yes Yes Yes Yes No Yes
Excel Data Reader No No ? Yes Yes No Yes Yes
ExcelMapper
for tabular data
No ACE & JET ? Yes Yes Yes Yes Yes
CarlosAg Excel Xml Writer Library – uses old xml spreadsheet format No No No Yes No Yes No No (only .xml)
GemBox (for small files it is free) No No? ? Yes Yes Yes Yes Yes
Koogra No No ? Yes? Yes No? Yes? Yes
MyXLS No Yes ? Yes Yes Yes Yes No
ExtremeML No No No Yes Yes Yes No? Yes
ClosedXML No No No Yes Yes Yes No Yes
SpreadSheetLight No No Yes Yes Yes Yes No Yes
EPPlus No No ? Yes Yes Yes No Yes

My thoughts (Feb 2015)

If you are doing anything server side such as ASP.NET, I highly suggest staying away from anything that uses the MS Excel Interop (PIA). It is nice it supports .xls format though.

If you need to read data VERY fast OleDB will be WAY faster than any api.

If you need to write many MB of data I suggest you use a product that uses the OpenXmlWriter.

I have used and like ClosedXML alot. It is reasonably fast and easy to use.

I have read about SpreadSheetLight and it looks very easy to program because it is designed to work like you are using Excel. It appears that it uses the OpenXmlWriter which is as fast as you can get. It appears to be very efficient at writing large amounts of data, but less efficient for editing existing sheets. It has excellent documentation. I think this would be my choice if I were to start a new project again and didn't know the OpenXml API.

EPPlus also look good and has like 4 times the downloads of ClosedXML now. It does do Data Validations and pivot tables which is nice. The documentation seems lacking. If you want to be close to OpenXml and be able to interact with it, you may like this product.

Using Excel PIA Articles:

Loading and reading the Microsoft Excel file content using C#
How to export database to Excel file
C# Excel Tutorial

OLE DB Articles

OLE DB can use either JET or ACE. ACE is MUCH better than JET because it doesn’t have the same pitfalls with guessing data that JET does, etc. So when using OLE DB change the connection string to use ACE instead of JET driver for OleDB.
Connection strings for Excel 2007 - Connection Strings for Excel / ACE OLE DB 12.0
How to read from an Excel file using OLEDB
How to insert data to Excel file using OLEDB 
How to update data in Excel file using OLEDB

My favorite Option for very large amounts of data is to use the Open XML format
Writing Large Excel File with the Open XML SDK
How to create a stylesheet in Excel Open XML


Other Useful Links

FileHelpers – import/export data from fixed length or delimited records using strongly type values
Web Spreadsheet
OpenXMLDeveloper.org

Tuesday, August 23, 2011

Configuring Subversion for MS Office files

Subversion works well, and does a fair job of handling MS Office documents (or any other unmergeable file) out of the box, but it is not what I would call obvious. There are some tweaks I would recommend in order to make it work much better.

To accomplish this we need to do a few things. I am assuming you are using TortoiseSVN, but most subversion clients should have similar features if they implement the full set of subversion features.

I have three goals. I have listed them below and explained what is needed to achieve each of the goals.

Goal 1: Make subversion not automatically merge MS Office files

I think is met by default, but if nothing else will be handled by the time we are done.

 

Goal 2: Lock the file so others cannot make changes and commit them.

This is actually built into subversion. Before you edit a file you should always get an update (Right-click file | SVN Update) to it to make sure you have the most recent copy of it. You should make sure no one else has a lock on the file by right-clicking the file | TortoiseSVN | Check for Modification. Assuming the file is not locked by someone else, we need to lock the file (Right-click file | TortoiseSVN | Get Lock… OR Right-click file | SVN Get Lock…).  We have now prevented people from ACCIDENTALLY overwriting our un-committed changes by committing first. I say accidentally because they can still break locks if they really feel they are more important. The problem is that unless they use the Check for Modifications… like we did, they won’t know the file is locked until they try to commit the file since the lock prevents commits, not editing. Which leads us to our next goal.

Goal 3: Automatically communicate to other users that the file is locked.

This is not the same thing as actually locking the file, but really is more of a communication tool to let the other users know that someone else is editing it. We want to be more proactive really so that we or someone else never starts editing a file that is already locked. Up to this point, everything has been under your control, but to meet this goal, it is client configuration, not a server configuration. This means that you can recommend this to your team members to add to their subversion configuration file, but you can’t force them. Luckily it is more about convenience that enforcement on this requirement.

To meet this goal, we need to open out config file and add the svn:needs-lock property to any MS Office file extension. You will need to add them if they are not already there (they probably aren’t unless you have a customized file already). You will also need to uncomment the line with enable-auto-props = yes. Also, be sure the [auto-props] line is uncommented. See the example config file below for examples. Once you do that once you get an update (Right-click file | SVN Update) from subversion the file will now be read-only if you don’t have a lock on it and it will also have a new icon (image).

WARNING: The automatic properties are NOT applied except on Add or Import into subversion. This means that any if you are trying implement this on an existing repository that already has Office files in it that you need to manually add the svn:needs-lock property to the properties.

You can do that in TortoiseSVN and getting a lock for the file(s) then Right-Clicking the file | TortoiseSVN | Properties | New… button and then choose svn:needs-lock for the Property Name field and * for the Property value field.

You can also do the same thing at the directory level or for multiple files and even recursively by selecting a directory or multiple files then TortoiseSVN by Right-Clicking the file | TortoiseSVN | Properties | New… button and then choose svn:needs-lock for the Property Name field and * for the Property value field and then click Apply property recursively checkbox if applicable. Keep in mind this will apply the property to ALL files you select and those files may not be Office docs, so be aware of what you are selecting.  If you have a lot of file mixed in with mergeable file to do this to you may want to automate the task. Here is a good explanation of how this can be done. After you set the property, you will need to Commit the change like you would any other changes.

In all cases, after you are done making you changes and have commited them, be sure to release the lock on the file(s).

Now when you get a lock (Right-click file | TortoiseSVN | Get Lock… OR Right-click file | SVN Get Lock…) on the file the icon changes to a yellow lock and the file is no longer read-only. You are now safe to make your changes.

 

Unlocking

There are two ways to unlock a file. The first is simply to commit your changes (assuming you made some). NOTE: The Tortoisesvn docs seem to indicate the lock is released even if you don’t make a change, but that isn’t how it seems to work on my version. The lock is only released when I make a change before committing. It definitely should release the lock if you make a change first though. So, I guess your mileage may vary. The other option is to manually release it by Right-clicking file | TortoiseSVN | Release Lock. Once the lock is removed other people can get a lock on the file.

 

Additional Resources

Download TortoiseSVN

To read up more on how to use locking in TortoiseSVN click here.

Here is a seemingly good online book on subversion – FREE

Automatic Lock - Modify – Unlock – scripts to help do this for existing files in subversion

 

Example Configuration File

The configuration file is located at (in Windows):

C:\Users\yourUserNameHere\AppData\Roaming\Subversion or maybe C:\Users\yourUserNameHere\AppData\Subversion (not sure for non-roaming profiles).

If you are using TortoiseSVN you can edit the config file by going to Settings | General tab | Edit button.

### This file configures various client-side behaviors.
###
### The commented-out examples below are intended to demonstrate
### how to use this file.

### Section for authentication and authorization customizations.
[auth]
### Set password stores used by Subversion. They should be
### delimited by spaces or commas. The order of values determines
### the order in which password stores are used.
### Valid password stores:
###   gnome-keyring        (Unix-like systems)
###   kwallet              (Unix-like systems)
###   keychain             (Mac OS X)
###   windows-cryptoapi    (Windows)
#password-stores = windows-cryptoapi
###
### The rest of this section in this file has been deprecated.
### Both 'store-passwords' and 'store-auth-creds' can now be
### specified in the 'servers' file in your config directory.
### Anything specified in this section is overridden by settings
### specified in the 'servers' file.
###
### Set store-passwords to 'no' to avoid storing passwords in the
### auth/ area of your config directory.  It defaults to 'yes',
### but Subversion will never save your password to disk in
### plaintext unless you tell it to (see the 'servers' file).
### Note that this option only prevents saving of *new* passwords;
### it doesn't invalidate existing passwords.  (To do that, remove
### the cache files by hand as described in the Subversion book.)
# store-passwords = no
### Set store-auth-creds to 'no' to avoid storing any subversion
### credentials in the auth/ area of your config directory.
### It defaults to 'yes'.  Note that this option only prevents
### saving of *new* credentials;  it doesn't invalidate existing
### caches.  (To do that, remove the cache files by hand.)
# store-auth-creds = no

### Section for configuring external helper applications.
[helpers]
### Set editor-cmd to the command used to invoke your text editor.
###   This will override the environment variables that Subversion
###   examines by default to find this information ($EDITOR,
###   et al).
# editor-cmd = editor (vi, emacs, notepad, etc.)
### Set diff-cmd to the absolute path of your 'diff' program.
###   This will override the compile-time default, which is to use
###   Subversion's internal diff implementation.
# diff-cmd = diff_program (diff, gdiff, etc.)
### Set diff3-cmd to the absolute path of your 'diff3' program.
###   This will override the compile-time default, which is to use
###   Subversion's internal diff3 implementation.
# diff3-cmd = diff3_program (diff3, gdiff3, etc.)
### Set diff3-has-program-arg to 'yes' if your 'diff3' program
###   accepts the '--diff-program' option.
# diff3-has-program-arg = [yes | no]
### Set merge-tool-cmd to the command used to invoke your external
### merging tool of choice. Subversion will pass 4 arguments to
### the specified command: base theirs mine merged
# merge-tool-cmd = merge_command

### Section for configuring tunnel agents.
[tunnels]
### Configure svn protocol tunnel schemes here.  By default, only
### the 'ssh' scheme is defined.  You can define other schemes to
### be used with 'svn+scheme://hostname/path' URLs.  A scheme
### definition is simply a command, optionally prefixed by an
### environment variable name which can override the command if it
### is defined.  The command (or environment variable) may contain
### arguments, using standard shell quoting for arguments with
### spaces.  The command will be invoked as:
###   <command> <hostname> svnserve -t
### (If the URL includes a username, then the hostname will be
### passed to the tunnel agent as <user>@<hostname>.)  If the
### built-in ssh scheme were not predefined, it could be defined
### as:
# ssh = $SVN_SSH ssh -q
### If you wanted to define a new 'rsh' scheme, to be used with
### 'svn+rsh:' URLs, you could do so as follows:
# rsh = rsh
### Or, if you wanted to specify a full path and arguments:
# rsh = /path/to/rsh -l myusername
### On Windows, if you are specifying a full path to a command,
### use a forward slash (/) or a paired backslash (\\) as the
### path separator.  A single backslash will be treated as an
### escape for the following character.

### Section for configuring miscelleneous Subversion options.
[miscellany]
### Set global-ignores to a set of whitespace-delimited globs
### which Subversion will ignore in its 'status' output, and
### while importing or adding files and directories.
### '*' matches leading dots, e.g. '*.rej' matches '.foo.rej'.
# global-ignores = *.o *.lo *.la *.al .libs *.so *.so.[0-9]* *.a *.pyc *.pyo
#   *.rej *~ #*# .#* .*.swp .DS_Store

global-ignores = *.o *.lo *.la *.al .libs *.so *.so.[0-9]* *.a *.pyc *.pyo ~* *~

### Set log-encoding to the default encoding for log messages
# log-encoding = latin1
### Set use-commit-times to make checkout/update/switch/revert
### put last-committed timestamps on every file touched.
# use-commit-times = yes
### Set no-unlock to prevent 'svn commit' from automatically
### releasing locks on files.
# no-unlock = yes
### Set mime-types-file to a MIME type registry file, used to
### provide hints to Subversion's MIME type auto-detection
### algorithm.
# mime-types-file = /path/to/mime.types
### Set preserved-conflict-file-exts to a whitespace-delimited
### list of patterns matching file extensions which should be
### preserved in generated conflict file names.  By default,
### conflict files use custom extensions.
# preserved-conflict-file-exts = doc ppt xls od?
### Set enable-auto-props to 'yes' to enable automatic properties
### for 'svn add' and 'svn import', it defaults to 'no'.
### Automatic properties are defined in the section 'auto-props'.
enable-auto-props = yes
### Set interactive-conflicts to 'no' to disable interactive
### conflict resolution prompting.  It defaults to 'yes'.
# interactive-conflicts = no
### Section for configuring automatic properties.
[auto-props]
### The format of the entries is:
###   file-name-pattern = propname[=value][;propname[=value]...]
### The file-name-pattern can contain wildcards (such as '*' and
### '?').  All entries which match (case-insensitively) will be
### applied to the file.  Note that auto-props functionality
### must be enabled, which is typically done by setting the
### 'enable-auto-props' option.
*.awk      = svn:eol-style=native
*.bat      = svn:eol-style=native;svn:executable
*.c        = svn:eol-style=native
*.ccf      = svn:eol-style=native
*.cd       = svn:eol-style=native
*.cdl      = svn:eol-style=native
*.cpp      = svn:eol-style=native
*.cs      = svn:eol-style=native
*.csv      = svn:eol-style=native
*.ddl      = svn:eol-style=native
*.doc      = svn:needs-lock
*.docm     = svn:needs-lock
*.docx     = svn:needs-lock
*.dot      = svn:needs-lock
*.dotm     = svn:needs-lock
*.dotx     = svn:needs-lock
*.dsp      = svn:eol-style=CRLF
*.dsw      = svn:eol-style=CRLF
*.g        = svn:eol-style=native
*.gif      = svn:mime-type=image/gif
*.h        = svn:eol-style=native
*.hd       = svn:eol-style=native
*.hpp      = svn:eol-style=native
*.htm      = svn:eol-style=native;svn:mime-type=text/html
*.html     = svn:eol-style=native;svn:mime-type=text/html
*.ico      = svn:mime-type=image/ico
*.id       = svn:eol-style=native
*.idl      = svn:eol-style=native
*.include  = svn:eol-style=native
*.inf      = svn:eol-style=native;svn:executable
*.ini      = svn:eol-style=native
*.java     = svn:eol-style=native
*.jpe?g    = svn:mime-type=image/jpeg
*.jpg      = svn:mime-type=image/jpeg
*.ld       = svn:eol-style=native
*.lk       = svn:eol-style=native
*.m3u      = svn:mime-type=audio/x-mpegurl
*.mdb      = svn:needs-lock
*.mk       = svn:eol-style=native
*.mmf      = svn:eol-style=native
*.pdf      = svn:needs-lock
*.pl       = svn:eol-style=native
*.pm       = svn:eol-style=native
*.png      = svn:mime-type=image/png
*.ppt*     = svn:needs-lock
*.py       = svn:eol-style=native
*.pyw      = svn:eol-style=native
*.qconf    = svn:eol-style=native
*.rgd      = svn:eol-style=native
*.rtf      = svn:needs-lock
*.s        = svn:eol-style=native
*.sh       = svn:eol-style=native;svn:executable
*.sldm     = svn:needs-lock
*.sldx     = svn:needs-lock
*.tex      = svn:eol-style=native
*.thmx     = svn:needs-lock
*.txt      = svn:eol-style=native
*.xlam     = svn:needs-lock
*.xls      = svn:needs-lock
*.xlsb     = svn:needs-lock
*.xlsm     = svn:needs-lock
*.xlsx     = svn:needs-lock
*.xltm     = svn:needs-lock
*.xltx     = svn:needs-lock
Makefile   = svn:eol-style=native
makefile   = svn:eol-style=native
SConscript = svn:eol-style=native
SConstruct = svn:eol-style=native

Thursday, August 18, 2011

Change Default Paste to Unformatted Text in MS Word 2007

Are you tired of MS Word 2007 and higher pasting formatted text your document when all you really want is unformatted text to be pasted? The temporary solution is to go to the Home ribbon | Paste (down arrow) | Paste Special… | Select Unformatted Text | OK. That is a lot of clicking for just a simple paste that I think should be the default.

Finally in MS Word 2007 they added some nice features to address this very issue. Go to the Word menu image  | Word Options | Advanced and then make the changes as noted by the four arrows below.

 

image

 

Now when you copy and paste you will always get text only unless you do like would have done before and to to Home ribbon | Paste (down arrow) | Paste Special… but now choose Formatted Text (RTF) or HTML Format depending on your source.

Thank you Microsoft!

Wednesday, August 3, 2011

How to add another user’s inbox to Outlook

 

In Outlook you can open another users inbox if they make you a delegate / share it with you. You can also add it to your list of Mail Folders so that you can easily view it. Below are the instructions to do both of these things.

In Outlook 2007 (maybe similar for other versions) do the follow:

  1. Open Outlook 2007 if you have not already
  2. Go to the Tools menu and choose Account Settings…
  3. Assuming you are now on the E-mail tab, click your default account.
  4. Click the Change…
  5. Click More Settings…
  6. Click the Advanced tab
  7. Click the Add… button
  8. Type in the name of the user who’s mailbox or calendar, etc that you want to access
  9. Click OK and OK until you are back on the screen titled Change E-mail Account.
  10. Wait until the Next > button becomes active and then click it.
  11. Click the Finish button.
  12. Click the Close button.

You should now see your mailbox near your current inbox on the Navigation Pane (aka Mail Folder pane). This will bring over the calendar, tasks, inbox, etc.

If you don’t want to always have this person’s stuff in your Outlook, you can also access it by opening each time. To do this do the following:

  1. Open Outlook 2007 if you have not already.
  2. Go to the File menu and choose Open and then Other User’s Folder…
  3. Type the name of the person in the Name… textbox.
  4. Select what you want to access from the Folder type drop down list. For example Inbox to view the email or Calendar for Calendar.

NOTE: This opens it but does not keep it readily available to you. To access it again you can follow the same basic steps (Go to the File menu and choose Open and then choose the item you added – it should show at the end of the menu).

Monday, August 1, 2011

Screen-scraping / Automation Tools

In general I don’t recommend screenscraping at all, and it should only be a last resort. I do recommend test automation of UI if you are developing software though. Screenscraping will consume a ton of time to build it, and then to maintain it when the screens change in very minor ways. The main reason is inevitably the thing you want access to on the screen is not easy to get to and interact with. In many cases a tool will get you 80% to 90% of the way there and then you will hit a block wall or at least an endless pit that sucks all your time and resources to find a solution.

Assuming you have decided that screen-scraping is worth it or you are doing UI test automation I do have a few recommendations if you use C# and are on the Windows platform. Some things to consider: What are you trying to scrap? What type of content are you trying to scrape? For example, is it a web page (does it have AJAX), Silverlight, or is it a desktop app (Is it Java or Native Windows). Will this be on a server and if so can you install FireFox (FF) or Internet Explorer (IE)? How will you can these tools. It also depends on how you will execute tool. For example, some tools can be called directly from C# while other are Java or even XML based. Some can be called from command line other can’t.

Before I recommend the tool I HIGHLY recommend commenting the heck out of your code with Page and Page element you are working with. It is easy enough to write it and figure out what page you are and what element on that page you are working with when you are writing it for the first time because you are looking at it. However, when something changes and you have to modify it, it will be very time consuming to try to debug and/or figure out what you were doing. The reason is that often code becomes very cryptic because you are navigating through arrays of arrays or arrays or in general trying to get to things that can difficult to do.

Ok, enough lecturing, here is what I recommend:

 

Tool

HTML (No AJAX)

HTML (w/ AJAX)

Silverlight

Windows Desktop

Java Application

Language

Comment

White

N

N

Y

Y

Y

C#

This is my tool of choice for Silverlight or Windows Desktop or even Java applications. It reminds me of Watin (see below) as far as programming style and being easy and intuitive. Requires IE/FF. Don’t use the mouse when running, but can “help script” if gets stuck by moving mouse.

Watin

Y

Y

N

N

N

C#

In general this is the tool of choice for HTML with or without AJAX. Easy and intuitive to work with.

WebHarvest

Y

N

N

N

N

Java or Command Line

This is nice because it simulates browser so no browser needed, but this also means no support for AJAX or javascript. It is XML based and you don’t really write java code, you write XML, so it is all declarative programming. Kind of different, but quick and effective for plain HTML sites.

Selenium Y Y * See Silverlight-Selenium N N C#/Java, Command Line, and most popular languages. I would try Watin first in most cases, but don’t discount Selenium either. It is a scalable solution that is supported by many languages and platforms and browsers. Has a recorder that can be useful. Has complete IDE. XPath based which I find not as intuitive or easy to debug as Watin, but can be effective at navigating a page.
Silverlight-Selenium N N Y N N C# This still uses Selenium is actually just an extension to Selenium. I think White is easier to use and a bit more robust, but if White doesn’t work for your Silverlight app try this.