Monday, August 1, 2011

Screen-scraping / Automation Tools

In general I don’t recommend screenscraping at all, and it should only be a last resort. I do recommend test automation of UI if you are developing software though. Screenscraping will consume a ton of time to build it, and then to maintain it when the screens change in very minor ways. The main reason is inevitably the thing you want access to on the screen is not easy to get to and interact with. In many cases a tool will get you 80% to 90% of the way there and then you will hit a block wall or at least an endless pit that sucks all your time and resources to find a solution.

Assuming you have decided that screen-scraping is worth it or you are doing UI test automation I do have a few recommendations if you use C# and are on the Windows platform. Some things to consider: What are you trying to scrap? What type of content are you trying to scrape? For example, is it a web page (does it have AJAX), Silverlight, or is it a desktop app (Is it Java or Native Windows). Will this be on a server and if so can you install FireFox (FF) or Internet Explorer (IE)? How will you can these tools. It also depends on how you will execute tool. For example, some tools can be called directly from C# while other are Java or even XML based. Some can be called from command line other can’t.

Before I recommend the tool I HIGHLY recommend commenting the heck out of your code with Page and Page element you are working with. It is easy enough to write it and figure out what page you are and what element on that page you are working with when you are writing it for the first time because you are looking at it. However, when something changes and you have to modify it, it will be very time consuming to try to debug and/or figure out what you were doing. The reason is that often code becomes very cryptic because you are navigating through arrays of arrays or arrays or in general trying to get to things that can difficult to do.

Ok, enough lecturing, here is what I recommend:

 

Tool

HTML (No AJAX)

HTML (w/ AJAX)

Silverlight

Windows Desktop

Java Application

Language

Comment

White

N

N

Y

Y

Y

C#

This is my tool of choice for Silverlight or Windows Desktop or even Java applications. It reminds me of Watin (see below) as far as programming style and being easy and intuitive. Requires IE/FF. Don’t use the mouse when running, but can “help script” if gets stuck by moving mouse.

Watin

Y

Y

N

N

N

C#

In general this is the tool of choice for HTML with or without AJAX. Easy and intuitive to work with.

WebHarvest

Y

N

N

N

N

Java or Command Line

This is nice because it simulates browser so no browser needed, but this also means no support for AJAX or javascript. It is XML based and you don’t really write java code, you write XML, so it is all declarative programming. Kind of different, but quick and effective for plain HTML sites.

Selenium Y Y * See Silverlight-Selenium N N C#/Java, Command Line, and most popular languages. I would try Watin first in most cases, but don’t discount Selenium either. It is a scalable solution that is supported by many languages and platforms and browsers. Has a recorder that can be useful. Has complete IDE. XPath based which I find not as intuitive or easy to debug as Watin, but can be effective at navigating a page.
Silverlight-Selenium N N Y N N C# This still uses Selenium is actually just an extension to Selenium. I think White is easier to use and a bit more robust, but if White doesn’t work for your Silverlight app try this.

5 comments:

parvuselephantus said...

Could you, please include www.lordui.com to your list? It's kind of macro with a lot of artificial Intelligence elements.

Brent V said...

Hi parvuselephantus,

I am trying to keep my list to free tools. I was clear about that, but that was my intent. However, I will keep your comment here for other to see if they are looking for a paid solution. Your tools sounds compelling. Best of luck.

Thanks,

Brent

Daniel said...

Hi. Great article. Have you checked also this one: http://www.uipath.com/automate/web-scraping-software

Software Development Company said...

Hello,
The Article on Screen-scraping / Automation Tools is very informative give detail information thanks for Sharing the information about it.It's amazing to know about Automation Tools.
mobile application testing

Unknown said...

Hello Brent,

Great post. Can’t get any more straight forward than this article. Thanks!
As I have been working on Blue Prism and UI Path but not Openspan, I can only tell you the following.
Personally, if feel UI Path is extremely user friendly.
Speed of implementation would be very high but anyone who has worked on it can sense the delay problem that has to be improved upon.
Can be used for various integration services with different workflow modules hence scoring high on re usability.
Citrix environment automation and desktop contribute to major pros of this tool. The architecture of the tool ensures future proof, meaning it can evolve in unbelievable measures.
It has a free community edition allowing everyone to download and learn with umpteen number of learning PDF’s and videos available online
Only con is lack of control. No coding. Neither C# nor VB script can be run.
Excellent tutorials - very easy to understand with all the details. I hope you will continue to provide more such tutorials.

Muchas Gracias,
Jeffry