Demystifying AI: What does Artificial Intelligence mean for Test Automation?

AI is currently a hot topic in the UI test automation community, but what does it really mean for testing and what are its goals or outcomes? In this blog Richard Clark, CTO of Provar, shares some reflections on what we mean by AI, how tools are (or aren’t!) making use of this technology and how we’re incorporating its principles here at Provar.

Why is AI becoming so popular in test automation?

With the UI Test Automation Community there has been a big drive in the last 18 months to have self healing and AI based testing. There may be various reasons that tools have taken this path, including:

  1. Their default locators are not robust, test cases break when UI changes are made by either developers, admins or vendor application releases/patches.
  2. The locators are binding to ID elements that are auto-generated by the application being tested and vary across environments (e.g. for Salesforce.com object ids and custom field ids) and even across changes in the same environment (e.g. Salesforce Visualforce j_id values, Aura Ids, Lightning page refresh).
  3. Their locators are actually dependent on multiple locators, which slows down test execution as they trip through a hierarchy of ways to locate a screen element.
  4. Their locators don’t understand different page layouts, user experience themes, field level security, picklist values or record variants.
  5. There is more interest from investors in AI tools than Test Automation tools.

So what do these other test tools actually do, and what specific “AI” do they have?

The tools generally apply one or more of the following:

  • A record and playback function to capture the clicks and data entry from a human user and allow that data to be re-used for multiple test scenarios. This initial recording is often brittle and provides a false belief that the test cases are written more quickly as a result.
  • Create several locators for the screen element and an algorithm/score for what constitutes a match. This means when 1 locator breaks your test doesn’t necessarily fail. The issue with this approach is your locators become less robust over time with multiple changes and that you won’t know about the issue until it fails the scoring algorithm limit. In most cases when it does break you have to fix every single locator one-by-one.
  • As above but some added ‘smarts’ to change all locators at one click to use the new preferred default locator first.
  • Cache web pages and identify a delta where there has been a change, find the references and try and apply the rules to relocate the field.
  • Utilize machine learning to update their locator scoring algorithm using data captured from all their clients (yes, your test tool vendor may be capturing data about your application and test cases). This applies a single algorithm regardless of the application or user experience being tested, a lowest common denominator.

Image credit: CognitiveScale.com

What all the above have in common is that yes, some of them are self healing, but there is no AI component to this, and where there is machine learning it’s not specific to the application you wish to test. These are just programmatic rules, not true self-learning features that are adjusting its algorithm or optimising test cases automatically in a proactive way. To use a popular AI example, if you teach an AI image classification system an apple can it then recognize that a picture of an orange is 100% also a fruit, rather than “probably not an apple”?

A further issue is that what if the rules engine or probability algorithm results in tests passing when they should be failing? What if I do want my test to fail if the label changes or the wrong CSS class has been used?

So what is Artificial Intelligence (AI)?

We need to be very careful when using the term AI. For some companies it almost seems to be liberally applied, possibly in an attempt to dumb down their features for their audience, or more cynically to improve their company valuation/share price.

Let’s take Google’s definition first:

“noun:

  1. the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

Hmm. Well on that basis every single UI test automation tool already has AI, undertaking visual perception normally requiring a human! Equally that makes Google Translate an AI too. I’m sceptical, let’s try another well used definition:

To paraphrase Wikipedia:
“…Computer science defines AI research as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.[1] More in detail, Kaplan and Haenlein define AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation…”

OK, so this is more like it. When I studied AI at University in the 1980s it was mainly about natural language processing, understanding both syntax and semantics. Understanding the rules of English will help a computer interpret a well written text, but leave it floundering on a typical Facebook comment or Tweet.

If a Test Automation tool is going to claim to have AI, then for me personally we would have to have a tool that learns automatically, that doesn’t have to be taught the rules but works them out for itself and is continually improving, rewriting its rules and classifications.

Image credit: Capability Cafe, “Impact of Artificial Intelligence/Machine Learning on Workforce Capability”

In the case of Salesforce Einstein (and Google’s) Image Recognition it does this based on thousands of examples, the more examples the better it gets at recognising the specific objects that have been categorized already by a human, so that when it encounters a new image it can give a percentage prediction for its recognition. The human brain does the same thing with five possible results: It Is, It Might Be, I Don’t Know, I Don’t Think It Is and It Isn’t. We don’t always get those right of course, especially when it comes to UFOs, and likewise neither do AI image or speech recognition get it right every time. Amazon Echo is pretty good, but boy is it frustrating when you have to change what you say repeatedly to get it to work. To carry on that theme, when I bought my first Echo I was unable to get it to play “Songs by Sade” but I could get it to play “Smooth Operator by Sade”. Now I can do both, it’s either learned or been patched.

Where does all this sit on Provar’s roadmap?

Provar uses metadata to create robust test cases in the first place so that many of the strategies used by other tools are not required. It also means that tests are black and white: there is no ‘probability’ of having the right locator, it is the right locator. For the majority of our UI locators we make any required changes when Salesforce release their tri-annual updates so that you don’t have to. Even better we do this within the pre-release window whenever possible so that you have the maximum time to run your own regression tests during the Sandbox Preview. If you’ve used one of our recommended locators based on Salesforce Metadata this means your tests keep running and are robust. We also make it easy to create test data on the fly so you don’t rely on pre-existing values in an org. You don’t need a self-healing tool if your tests are robust already.

Provar uses metadata to create robust test cases in the first place so that many of the strategies used by other tools are not required.

In some cases it’s not possible to use a metadata locator, for example if your developers have used a different UI framework, custom HTML or for testing non-Salesforce CRM web pages.

So for those edge cases we’ve already been working on enhancing our existing Xpath, Label, CSS, Block and Table locators for generic pages. This is designed to both improve our recommendation of robust locators and for Provar to learn about your Test Strategy so that we can default our recommended locators based on your previous selections. We even remember the strategies you teach Provar so you can re-apply them across your applications. Unlike other tools we let you keep these strategies separate, as the strategies for testing Salesforce aren’t necessarily the best strategies for SAP, Workday or Dynamics 365.

Mutation Testing which could lead to a whole paradigm shift in the way tests are authored to provide a level of exploratory testing.

We’ve also been investigating collaborations with other Salesforce vendor tools to better help you understand test coverage against your specific Salesforce implementation/customisations and to generate test steps automatically (quite easy for us to do, but hard to generate good test steps automatically). Some of these features are still at least 6 months away, but watch this space.

Finally we’re keeping a close eye on concepts such as Mutation Testing which could lead to a whole paradigm shift in the way tests are authored and how they can evolve automatically to provide a level of exploratory testing. This may seem like science fiction, but so were self driving cars and having a conversation with a computer just 10 years ago.

We don’t consider this to be Artificial Intelligence, we just call it common sense to enhance our user’s experience, but if you’re comparing different tools, then fine, call it AI if you insist.

So are there any real AI Test Tools out there?

Arguably yes, there are some frameworks that are beginning to show some elements of real self learning and predictive testing. Frameworks that can generate tests for you or identify which changes may be breaking changes vs cosmetic. I use the term frameworks deliberately, if you have a coded test automation and you’re happy to have the overhead to maintain that code base then there is some very cool tech that does include a level of machine learning available.

Image credit: ca.com, “Artificial intelligence and continuous testing: It’s the next big thing (really)”

When it comes to no-code solutions such as Provar I’m yet to see anything more than a rules engine you have to ‘program’ or a generic locator algorithm. Certainly I’ve seen no other tools that can handle the change-over from user experiences such as Salesforce Classic to Lightning with test cases that run seamlessly on both UIs plus keep running without making changes to your test case across multiple releases.

At Provar we always made it our mission to help Salesforce customers create and maintain robust test cases for their UI and API tests. We accomplished this in the following ways:

  • Leveraging the Salesforce metadata to automatically identify screen elements and field values.
  • Automatically recognising the page you want to test and building all the required navigation steps automatically.
  • Leveraging the Salesforce API to interact with the underlying data model directly.
  • Suggesting alternate locators and allowing you to override our recommendations.
  • Providing the ability to build test cases that are reusable across different Salesforce UIs such as Classic, Lightning, Mobile and Communities.
  • Continuous improvement for keeping in synch with the latest Salesforce, Selenium Webdriver and multiple browser vendors.
  • Providing in built Provar APIs and a custom API to extend capabilities beyond Salesforce to perform meaningful end-to-end test cases.

What tools like Provar have delivered, and continue to deliver, is Intelligent Test Automation, robust test scripts integrated into an enterprise’s Continuous Integration and Continuous Delivery processes. It’s real, has rapid ROI and it’s relatively easy to achieve. It’s intelligent, but there’s nothing artificial about it!