Book demo Start trial

Best practices for automating with image and text recognition

LEAPWORK has two categories of building blocks based on Image and Text recognition: "Mouse and Keyboard" and "Find and Get". In this post we cover some of the basics of Image and Text recognition and present best practices and solutions for some of the built-in challenges with Image & Text recognition.


Image recognition is the "art" of finding one image within another image. Typically you will have one image that is defined at design time (captured into LEAPWORK) and one image which is a screen shot of the actual application when the test case is running. What LEAPWORK will do when the test case is running, is look for the captured image in the screen shots and act according to the defined flow.

Technically, image recognition compares a matrix of numbers with another matrix of numbers and returns if the first matrix is part of the second matrix. One of the challenges is that the 2 matrices can change if the screen resolution changes. E.g. if the test case is executed on another machine or the resolution has changed, then the accuracy in finding the captured image in the screen shot can decrease, which can lead to less robust test cases.

Text recognition is based on pattern recognition, which means that LEAPWORK searches an area on the screen for a pattern that matches letters. Letters can come in different fonts, colors, size and the background can be an image or a gradient pattern which can make it harder to recognize the actual letters and numbers on the screen.

There are some best practices to handle the above challenges, so the test cases will remain/become robust. In the following the different best practices are described:

Capturing icons

The background color behind an icon can change, so don’t include parts of the background when capturing an icon.

A “hover” effect can change how the icon looks when hovered by a mouse pointer, for instance showing a brighter or darker version. This can usually be handled by closing all open windows as part of the test run, by setting the 'Action' property on the Start building block to "Close all windows".


A “selected” or “opened” effect can change how the icon looks when selected. For instance, a Chrome icon in the Windows task bar looks different when Chrome is opened as opposed to if browser instances are opened. This can typically be solved by using the Image collection feature (see later).

No browsers open:

At least one browser open: 


One situation that can occur for all types of applications is that an image is shown first in one place and is then moved to another. One example is some modern web sites where all resources are first loaded into the page and then are "boot-strapped" into position. Another example could be a dialog box in a desktop application that is shown and then centered on the screen.

For both cases LEAPWORK can find the image in the first position and then continue the test flow. If the image is then moved as part of the application, the position for the image is now not correct which will make the test case fail.

Usually checking the “Await no movement” property on the building block solves this problem. This will tell the image recognition engine to wait until the screen has not changed for a period of time before starting to search for the image.


For Image recognition and especially for Text recognition it is best practice and highly recommended to use "Areas". An "Area" is a sub section of the entire screen and is used to tell the Image/Text recognition engine to only search for the captured image or a specific text/text pattern in the specified area. Typically you will define an area at the part of the screen where you expect the image or text to appear - and then add some margin on top of this.

Specifying an area has two main purposes:

  • You ensure that you are looking for the right instance of the captured image/text. If the word appears multiple times on a screen you could get a list of the occurrences instead of the "right" one.
  • The speed of execution is considerably higher if the LEAPWORK Image/Text recognition engine only has to search a fraction of the screen instead of the whole screen.

More information about using and defining areas.

Image collections

The Image Collection feature allows you to capture two or more images into a collection and then use the collection when searching for an image. This means you can e.g. capture the same button in different states (no focus, in focus, hovered, pressed etc.), add all the captured images into one collection and then just have the test case click or find the button regardless of the state of the button.

In the example below we have captured the "Search" button from a Windows desktop application.

The button can have 4 different looks depending on focus and hover effect, and each state has been captured:

To create an image collection simply click "Collection", add a name for the collection and press "Save". Then drag all the images you want in the collection into the collection using your mouse:

You can now use the collection in a building block by dragging the collection onto the image field in the building block:

When the Click Image is executed, it will search the screen for the images in the collection one by one. If it finds one of the images, it will click it and then stop the search and hand over the execution to the next building block in the flow.

The Image collection can also be used to handle different resolutions of the same icon/image if you know you will run the same test case in different resolutions. It can also be used to handle different states of icons.

The image resources are shared within a project, so the collections can be used in multiple test cases. This means you can create e.g. a "Chrome icon" collection that contains all relevant states of the chrome icon in the windows task bar, and then use this collection across all test cases that operate with Chrome. This has the bonus that you only have to maintain the image collection in one place instead of in all the test cases.

Remote design and execution

A typical setup of LEAPWORK consists of a number of workstations with LEAPWORK Studio installed, a Controller installed on a common/shared server to make sharing easy and then one or more machines entirely used to execute the test cases. When test cases using Image and Text recognition run, they will interact with the actual screen, making it impossible to run on the normal work PC, which is the reason for the "remote machines".

To make your test cases independent of differences in the screen resolution between machines where the test cases can be executed, you can define an Environment pointing to a "remote machine". You can then use the "remote machine" to capture images on instead of your local workstation. This way you will end up capturing images directly on the machine where you will execute the test case, securing that the screen resolution is always the same.

To create a "remote machine" you need to install the LEAPWORK Agent on a dedicated workstation that is accessible from both LEAPWORK Studio and the LEAPWORK Controller. Once the remote machine is up and running you can define an Environment in Studio pointing to this machine. You can find more info here.

When the environment is created you can select it in the 'Preview environment' on the design canvas. In the example below, "Amazon Cloud Remote" is an environment pointing to a cloud hosted (Amazon) server where the LEAPWORK Agent is installed.

When  the 'Preview Environment' points to a remote machine, a "terminal" window will popup when you capture new images, allowing you to capture directly on the remote machine instead of on your local machine.


The building blocks using Image recognition have a property named Precision, when the building blocks are expanded. The property has two sub properties:

Pixels: The level of tolerated accuracy in the image recognition.

Color: The color density of the same set of pixels can change due to the hardware used. This property specifies the sensitivity to changes in the color density.

In this section you can set the accepted level of accuracy for the Image recognition. Default is "Picture perfect" which means that there has to be a perfect match, pixel by pixel, before the captured image is considered found on the screen. In some cases a higher level of tolerance is needed. The advice is to start with 'Picture Perfect' for both properties and then change them one level at a time until the image recognition works as intended.

Dirty Edge Mode

'Dirty Edge Mode' is a setting on the Text recognition building blocks which tells the Text Recognition engine that the text to search for/interpret is not located on a solid color background, but on some kind of gradient/image based background causing the edges of the individual letters to be "dirty". In other words if the color of text and the background is not the same for all letters, then it can be necessary to set the "Dirty Edge Mode".