The poor mans OCR

Optical Character Recognition (OCR) has been around a long time. One of it’s main uses, for those not familiar, is to gather text from images.

Off the shelf products, such as Abby’s FineReader exist, with prices ranging from $150 for a single user copy, to up to $10000 for large enterprise ‘site licenses’. But where’s the fun in buying it!

I learnt recently that Microsoft Office 2007 has built in OCR capabilities which can be accessed from C# via a COM interface. I will explain in this post how to leverage these capabilites.

First off, you need to have MS Office 2007 installed. This is obviously a dependency if you develop an application to use the OCR capabilites in the field – it won’t work without Office installed. Furthermore, the OCR capability doesn’t install by default when you install Office, you need to add a component called ‘Microsoft Office Document Imaging’ (MODI).

For instructions on how to add the required component, look here.

Now that you have MODI installed, you can create an OCR application! Boot up Visual Studio and create a new C# console application.

You’ll first need to add a reference to MODI, so we can use it from your application. From the Visual Studio Solution Explorer window, right-click on the ‘References’ folder. When the dialog box appears, select the ‘COM’ tab. Finally, select the object named ‘Microsoft Office Document Imaging 12.0 Type Library’.

The code below will create a new MODI document, retrieve the text, and ouput it word by word to the console, (you could also output it to a text file or a custom XML file, I leave that as an exercise for the reader). I’ve assumed below the image file you wish to retrieve the text from is located at ‘C:\Images’.


// Grab the text from an image
MODI.Document md = new MODI.Document();
md.Create(@"C:\Images\Image.tif");
md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

// Retrieve the text gathered from the image
MODI.Image image = (MODI.Image)md.Images[0];
MODI.Layout layout = image.Layout;

// Loop through the list of words
for (int j = 0; j < layout.Words.Count; j++) { MODI.Word word = (MODI.Word)layout.Words[j]; Console.WriteLine(word.Text); } md.Close(false);

Notice the 'MODI.MiLANGUAGES.miLANG_ENGLISH' parameter above, this is set to the language you are dealing with. It looks like 22 languages are supported, including Japanese and the Chinese variants.

When I ran this, (using the home page of my McAfee 'Total Protection' 2010 suite as the guinea pig), the results were surprisingly accurate (for English anyway), with only two recognition errors.

Take a look here.

I wonder if it is also as accurate for double byte languages like Japanese etc. I'd also like to check it against an RTL language like Arabic.

Anyway, it's definetely worth a look if you want to develop a custom OCR application on a shoestring budget.

Programmatically verify resources in a DLL

I had a requirement recently to be able to programmatically check certain resources were contained in a set of (native) DLL resource files. The idea behind this was to add some post-build automated engineering checks to our existing automated test suite, e.g. ensuring resources for all the required languages have been injected correctly.

I wanted to write a simple C# application to perform these checks. I came accross this handy library which contained functions for almost all the functionality I required:

ResourcesLib

Using this, we can perform functions such as importing resources, loading strings, and even injecting new resources.

For example, here’s how you could retrieve all languages contained in a resource DLL:

string file = "resource.dll";
RawResourceFile resFile = new RawResourceFile();
resFile.Load(file);

for (int i = 0; i < resFile.Languages.Count; i++) { Console.WriteLine("Language: " + resFile.Languages[i]); }

Download the library and take a look, it can save you lots of time if you need to perform any checks/actions on DLL file.

Automating Virtual Machine operations on ESXi Server from C#

VMware provides two really useful API’s for automating virtual machine (VM) tasks on both VMware Workstation and VMware ESXi server.

  • VI Infrastructure API
  • VIX API

These are extremely easy to use from C#. In a QA environment, the automation of VM’s can be hugely benifical, wheather attempting to automate an environment for build sanity checks or functional tests.

This post will outline the basics of using the VIX API from C#, in order to perform operations on VMware ESXi server. If you don’t have access to an ESXi server, you can install it on a VM, it’s free to download from the VMware website!

For starters, you will need to install the API’s on your development machine. In order to download, you will need to create a VMware account, which you may already have if you have downloaded Workstation or ESXi server in the past. If you dont, you can create an account for free. Once logged into your account, you can download both API’s from the ‘Support & Downloads’ section.

Let me explain the differance between these two API’s. From VMware’s own documentation:

The VI API provides access to the VMware Infrastructure management components—the managed objects that can be used to manage, monitor, and control life-cycle operations of virtual machines and other VMware infrastructure components (datacenters, datastores, networks, and so on).”

VIX on the other hand, is used to automate the actual operations on VM’s, such as booting them, copying in files, getting/setting VM environment varibles and other tasks you may wish to perform. The coolest part of VIX is that a wrapper for C# exists, created by Daniel Doubrovkine over at dblock.org. This wrapper, ‘VMwareTasks’, provides a simple object-orientated approach to VIX, which will be familar to C# developers. Download the wrapper here.

Now for the basics of using the VIX API and VMwareTasks wrapper. Create a new console application project in Visual Studio. You will need to add a reference to the VMwareTasks DLL, which is located in the ‘bin’ directory when you extract the VMwareTasks download.

Look how simple it is to power on a VM!


// Declare a new virtual host
VMWareVirtualHost host = new VMWareVirtualHost();

// Connect to the ESXi server
host.ConnectToVMWareVIServer("192.168.1.39", "root", "password123");

// Power on an existing VM by name
VMWareVirtualMachine machine = host.Open("[datastore1] XPP_SP2.vmx");
machine.PowerOn();

The simple code above just connects to an ESXi server, and powers on an existing VM, but you can see how easy it is to perform operations on VM’s.

Here’s how to create and revert to a snapshot:


VMWareVirtualHost host = new VMWareVirtualHost();
host.ConnectToVMWareVIServer("192.168.1.39", "root", "password123");
VMWareVirtualMachine machine = host.Open("[datastore1] Vista_EN.vmx");
machine.PowerOn();
machine.Login("Tester", "testing");

string snapShotName = "base";
machine.Snapshots.CreateSnapshot(snapShotName, "Clean");
machine.PowerOff();

VMWareSnapshot snapshot = machine.Snapshots.GetNamedSnapshot("base");
snapshot.RevertToSnapshot();

Or to create a directory:


machine.CreateDirectoryInGuest(@"C:\TestDir");

You can see from the above examples how easy it is to perform operations on VM’s using these API’s. Install the API’s and play around with the functionality, I guarantee you’ll be impressed!