Book demo Start trial

Best practices for automating with image and text recognition

LEAPWORK has two categories of building blocks based on Image and Text recognition: "Mouse and Keyboard" and "Find and Get". In this post we cover some of the basics of Image and Text recognition and present best practices and solutions for some of the built-in challenges with Image & Text recognition.


Image recognition is the "art" of finding one image within another image. Typically you will have one image that is defined at design time (captured into LEAPWORK) and one image which is a screen shot of the actual application when the test case is running. What LEAPWORK will do when the test case is running, is look for the captured image in the screen shots and act according to the defined flow.

Technically, image recognition compares a matrix of numbers with another matrix of numbers and returns if the first matrix is part of the second matrix. One of the challenges is that the 2 matrices can change if the screen resolution changes. E.g. if the test case is executed on another machine or the resolution has changed, then the accuracy in finding the captured image in the screen shot can decrease, which can lead to less robust test cases.

Text recognition is based on pattern recognition, which means that LEAPWORK searches an area on the screen for a pattern that matches letters. Letters can come in different fonts, colors, size and the background can be an image or a gradient pattern which can make it harder to recognize the actual letters and numbers on the screen.

There are some best practices to handle the above challenges, so the test cases will remain/become robust. In the following the different best practices are described:

Capturing icons

The background color behind an icon can change, so don’t include parts of the background when capturing an icon.

A “hover” effect can change how the icon looks when hovered by a mouse pointer, for instance showing a brighter or darker version. This can usually be handled by closing all open windows as part of the test run, by setting the 'Action' property on the Start building block to "Close all windows".


A “selected” or “opened” effect can change how the icon looks when selected. For instance, a Chrome icon in the Windows task bar looks different when Chrome is opened as opposed to if browser instances are opened. This can typically be solved by using the Image collection feature (see later).

No browsers open:

At least one browser open: 


One situation that can occur for all types of applications is that an image is shown first in one place and is then moved to another. One example is some modern web sites where all resources are first loaded into the page and then are "boot-strapped" into position. Another example could be a dialog box in a desktop application that is shown and then centered on the screen.

For both cases LEAPWORK can find the image in the first position and then continue the test flow. If the image is then moved as part of the application, the position for the image is now not correct which will make the test case fail.

Usually checking the “Await no movement” property on the building block solves this problem. This will tell the image recognition engine to wait until the screen has not changed for a period of time before starting to search for the image.


For Image recognition and especially for Text recognition it is best practice and highly recommended to use "Areas". An "Area" is a sub section of the entire screen and is used to tell the Image/Text recognition engine to only search for the captured image or a specific text/text pattern in the specified area. Typically you will define an area at the part of the screen where you expect the image or text to appear - and then add some margin on top of this.

Specifying an area has two main purposes:

  • You ensure that you are looking for the right instance of the captured image/text. If the word appears multiple times on a screen you could get a list of the occurrences instead of the "right" one.
  • The speed of execution is considerably higher if the LEAPWORK Image/Text recognition engine only has to search a fraction of the screen instead of the whole screen.

More information about using and defining areas.

Image collections

The Image Collection feature allows you to capture two or more images into a collection and then use the collection when searching for an image. This means you can e.g. capture the same button in different states (no focus, in focus, hovered, pressed etc.), add all the captured images into one collection and then just have the test case click or find the button regardless of the state of the button. This increases the robustness and the tolerance for changes in a flow.

In the example below we have captured the search button - "Go" - from a Windows desktop application.


The button can have 4 different looks depending on focus and hover effect:


All 4 states have been captured, and the image are now located as resources under the flow in the asset menu:


In the example above the images are renamed to make it easier to identify the images.Hovering an image in the asset menu will popup a thumbnail view of the image.

To create an Image Collection click "New" + "Capture" + "Image collection". This will create a new, empty Image collection in the Asset Menu. It is also possible to simply right-click the folder where the Image Collection should be located in and select "Capture" + "Image Collection".

Image Collections can be identified by this logo in the asset menu:


Once added it is best practice to rename the image collection to something meaningful to make it easier to maintain and reuse the image collection across multiple flows.

Adding images to an Image Collection is really simple: Just drag-n-drop images from anywhere in the asset menu on top of the image collection. To view the images in the collection double-click the collection to open the "Edit image collection" dialog.


In the dialog it is possible to edit and change the images individually if needed.


You can now use the collection in a building block by dragging the collection onto the image field in the building block:


When the Click Image is executed, it will search the screen for the images in the collection one by one. If it finds one of the images, it will click it and then stop the search and hand over the execution to the next building block in the flow.

The Image collection can also be used to handle different resolutions of the same icon/image if you know you will run the same test case in different resolutions. It can also be used to handle different states of icons.

The image resources are shared within a project, so the collections can be used in multiple test cases. This means you can create e.g. a "Chrome icon" collection that contains all relevant states of the chrome icon in the windows task bar, and then use this collection across all test cases that operate with Chrome. This has the bonus that you only have to maintain the image collection in one place instead of in all the test cases.

Remote design and execution

A typical setup of LEAPWORK consists of a number of workstations with LEAPWORK Studio installed, a Controller installed on a common/shared server to make sharing easy and then one or more machines entirely used to execute the test cases. When test cases using Image and Text recognition run, they will interact with the actual screen, making it impossible to run on the normal work PC, which is the reason for the "remote machines".

To make your test cases independent of differences in the screen resolution between machines where the test cases can be executed, you can define an Environment pointing to a "remote machine". You can then use the "remote machine" to capture images on instead of your local workstation. This way you will end up capturing images directly on the machine where you will execute the test case, securing that the screen resolution is always the same.

To create a "remote machine" you need to install the LEAPWORK Agent on a dedicated workstation that is accessible from both LEAPWORK Studio and the LEAPWORK Controller. Once the remote machine is up and running you can define an Environment in Studio pointing to this machine. You can find more info here.

When the environment is created you can select it in the 'Preview environment' on the design canvas. In the example below, "Amazon Cloud Remote" is an environment pointing to a cloud hosted (Amazon) server where the LEAPWORK Agent is installed.

When  the 'Preview Environment' points to a remote machine, a "terminal" window will popup when you capture new images, allowing you to capture directly on the remote machine instead of on your local machine.


The building blocks using Image recognition have a property named Precision, when the building blocks are expanded. The property has two sub properties:

Pixels: The level of tolerated accuracy in the image recognition.

Color: The color density of the same set of pixels can change due to the hardware used. This property specifies the sensitivity to changes in the color density.

In this section you can set the accepted level of accuracy for the Image recognition. Default is "Picture perfect" which means that there has to be a perfect match, pixel by pixel, before the captured image is considered found on the screen. In some cases a higher level of tolerance is needed. The advice is to start with 'Picture Perfect' for both properties and then change them one level at a time until the image recognition works as intended.


On the building blocks using OCR (text recognition) you can change the settings for the OCR engine to optimize how the characters are recognized.


You can choose between 2 different built-in OCR engines in the building block:

- "Default": This is based on Tesseract version 3.5 which is an openSource engine used by literally all OCR engines. 

- "Default (new)": This engine is based on Tesseract version 4.0 which uses a neural network architecture (LSTM) to optimize the engine. This architecture is considered to be the future within all types of recognition software (images, speech, video, text etc.)

Both "Default" and "Default (new)" are working engines, but because of the different technologies, one engine might be a better fit for some applications, while the other engines is the best fit for another set of applications. In case the OCR building blocks are not behaving as expected one option is to try to change to the other engine.

"ABBYY" is the world leading OCR engine, but it will require a separate license with ABBYY to select this option. Also be aware that ABBYY itself requires some infrastructure work to be setup, so in most cases the built-in engines is the best option.

OCR Mode:
User can opt for "Fast mode" or "High Quality" OCR mode.

Fast Speed: The OCR engine performs two recognition in parallel: one in normal color schema (black text on white background) and one using inverted colors. This mode is faster than the "High quality" setting, so if the characters are found correctly, simply keep this setting.

High Quality: The OCR engine performs four recognition in parallel: two in normal color schema and two in inverted colors. This settings is slower than the "Fast speed", but can be needed if the OCR engine is not returning the characters correctly using the "Fast speed" setting.

OCR Precision:

OCR precision sets the accuracy of the OCR results on a character level. This means, a higher OCR precision level requires a higher confidence in the OCR engine before a certain character is matched.

With a high precision you can be very confident that the characters found are the correct characters.

On the other side a high precision can result in that some characters are not found. Setting a lower precision means that in general more characters are found, but the assurance that it’s the right characters is lower than with a high precision. So, the right setting is a balance between finding all the right characters and not include too much that ruins the result, and will depends on the font, colors, background, size of the text.

Precision Levels are:

  1. High: This is the highest Confidence factor or precision which user are sure that the character is large and visible enough (not hazy or compacted) to be recognized by an OCR engine. The predefined value is 70.
  2. Medium: This is the medium Confidence factor which user can opt when they think the character may or may not be recognized by an OCR engine, so they set this. This tells engine to search look for the possible characters in the defined area. The predefined value is 50.
  3. Low: This is the Low Confidence factor which user can opt when they are less sure that the character can be recognized by an OCR engine, so they set this. This tells engine to search for relatively possible characters in and outside the dictionary in the defined area where the precision to identify is low. The predefined value is 30.
  4. Very Low: This is the Lowest Confidence factor which user can opt when they are least sure that the character can be recognized by an OCR engine, so they set this. This tells engine to search for relatively all possible characters in and outside the dictionary in the defined area where the precision to identify is least. The predefined value is 20.
  5. Custom: This can be used to set the custom Precision value/Confidence factor. It is ranging from 0-100.

Zero will return everything what was recognized by OCR and 100 will return the best possible recognized result. 
In case the built-in OCR engine in LEAPWORK is not matching the requirements, it is possible to change the engine to ABBYY. ABBYY is the world leading vendor of OCR engines and perform second-to-one when it comes to OCR. Contact our [Customer Success, link to chat page] team to get started with ABBYY.