10.8.19

Selenium WebDriver Type Hierarchy

Ever wondered how is the WebDriver actually implemented or why do we use ChromeDriver but call it WebDriver or what is RemoteWebDriver? Lets find out.

  • WebDriver is an Interface.
  • JavascriptExecutor is an Interface.
  • RemoteWebDriver is the parent Class that implements the WebDriver and JavascriptExecutor interfaces. 
  • ChromeDriver and FirefoxDriver and browser drivers are the child Classes that extend the parent RemoteWebDriver class.

An Interface by definition does not have any implementation details of its methods, just the empty method declarations. Its the responsibility of the implementing class to 'implement' those methods by adding details of what the methods would do.
Since WebDriver and JavascriptExecutor are Interfaces they only have abstract (empty) methods; and the 'fully-implemented' class RemoteWebDriver actually provides the definition of the methods in these 2 interfaces - All the abstarct methods in the WebDriver and JavascriptExecutor interfaces are implemented in the RemoteWebDriver class.
Browser specific drivers like ChromeDriver and FirefoxDriver then go and extend this RemoteWebDriver class to add more methods of their own; or have their own implementations of the same methods.

But why this hierarchy?  The actual developers of Selenium don't know how all the different browsers work internally. So they just declared the methods that they thought were important to work with Selenium and left the actual implementation part of these methods to the developers of these browsers.
The real problem is that browsers are complicated software and not everything is open-source/visible to external developers, so they cannot customize.
For instance, the actual implementation of the 'Click' method for WebDriver could be different for each of Chrome and Firefox, hence, they have their own driver versions for the same (which is why we don't use Firefox driver on Chrome).
Also, in a way this puts the onus on the browser-companies to provide the implementation of their drivers to stay relevant and be widely adopted.

So can we do this?  WebDriver driver = new WebDriver();
We get a compile time error: Cannot instantiate the type WebDriver - why? because we cannot instantiate an interface ie., cannot create an object of an interface (WebDriver) and invoke its methods.
Since WebDriver is an Interface and not a 'Class', and all its methods are just empty shells (abstract), we really cannot do anything anyways by creating an object of the interface and trying to call its methods - hence, its not advised to create an object of an empty interface.
Thus, if we want to perform any action we have to invoke the implementing class of that interface.


So should we do this?  WebDriver driver = new RemoteWebDriver();
We get a compile time error: The Constructor RemoteWebDriver() is not visible - what this means is that there is no method like this to be called directly (constructor is also a method).
Though technically we can have this code -
WebDriver driver = new RemoteWebDriver(capabilities);
Or
WebDriver driver = new RemoteWebDriver(URL, capabilities.chrome());
Or
WebDriver driver = new RemoteWebDriver(commandExecutor, capabilities);
Why we don't use the above is because RemoteWebDriver is usually intended to be used while working with Selenium Grid and needs the Selenium server, wehere-as if we use ChromeDriver() we would be invoking the local installation of the chrome browser on our machines.

What about this?  ChromeDriver driver = new ChromeDriver();
Since ChromeDriver is a class, it implements all the methods of the WebDriver interface. But the 'driver' instance that gets created will only be able to use the methods implemented by ChromeDriver and supported only by the chrome browser; and as such we would be restricted to run our scripts only using the chrome browser.
To work with other browsers we will have to create individual objects via - FirefoxDriver driver = new FirefoxDriver();
And we will have to keep switching at runtime.

This is the reason we use this:  WebDriver driver = new ChromeDriver();
So that we can work with different browsers without having to update our code for every browser specific driver. And this would make our code more extensible by providing us the flexibility to work with any number of browsers (drivers).
Also, this is better design as a change in driver initialization for one browser will not be affect others, and we can have different configurations for different browsers.
Here, WebDriver is the interface, ChromeDriver() is the Constructor, new is the keyword and [new ChromeDriver()] is the object referenced by the 'driver' variable.
'Java' specific reason - WebDriver is the super interface for all browser classes like FirefoxDriver, ChromeDriver etc. So WebDriver instance can hold object of any driver class. This is also called Upcasting - When we pass the reference of a super-class [parent] to the object of its sub-class [child].

But can we do vice versa? - ChromeDriver driver = new WebDriver();
We get a compile time error: Cannot convert from WebDriver to ChromeDriver.

But then why do we have to do this? - JavascriptExecutor js = (JavascriptExecutor) driver;
WebDriver and JavascriptExecutor are two different interfaces, and they do not have any methods common. The 2 methods of the JSE (executeScript and executeAsyncScript) are not present in WebDriver interface.
But all the methods of the WebDriver and JSE interfaces have been implemented by the browser drivers.
Because we had up-cast the 'driver' object to WebDriver and WebDriver does not have the methods of JSE interface, we have to down-cast.
We wouldn't have had to down-cast had we just used [ChromeDriver driver = new ChromeDriver();] In this case, you do not need to downcast it to JavascriptExecutor as the 'driver' has visibility of all methods of JSE because the browser driver class 'ChromeDriver' extends 'RemoteWebDriver' class, hence, ChromeDriver has indirect access of  all methods of JSE via RemoteWebDriver.

Infact we can even cast it to ChromeDriver and not have to use JavascriptExecutor, like below -
JavascriptExecutor js = (ChromeDriver) driver; // This works too!!


Addtional notes -

  • SearchContext is the top most interface which has only two methods names findElement() and findElements(). These methods are abstract as SearchContext is an interface. This is the reason we do not up-cast to SearchContext because there is no point in just having 2 methods to work with; and having to downcast every time we want to use the third method.
  • WebDriver is also an interface which extends SearchContext but since WebDriver has the maximum number of methods, it is the key interface against which tests should be written. There are many implementing classes for the WebDriver interface, as listed as below:
    • AndroidDriver
    • AndroidWebDriver
    • ChromeDriver
    • FirefoxDriver
    • HtmlUnitDriver
    • InternetExplorerDriver
    • IPhoneDriver
    • IPhoneSimulatorDriver
    • SafariDriver
  • WebDriver has many abstarct methods like get(String url), close(), quit() , getWindowHandle etc. WebDriver also has nested interfaces names Window, Navigation, Timeouts etc that are used to perform specific actions like getPosition(), back(), forward() etc.
  • RemoteWebDriver is the fully implemented class for WebDriver, JavascriptExecutor and TakesScreenshot interfaces. (Fully implemented class means it defines the body for all inherited abstract methods.)
  • Then we have browser specific driver classes like ChromeDriver(), EdgeDriver(), FirefoxDriver() etc which extend RemoteWebDriver.
  • RemoteWebDriver implements JavascriptExecutor and provides definition for both methods of the JSE. Since all browser-specific driver classes like ChromeDriver etc extends RemoteWebDriver, we can execute JavaScript commands via JSE methods on these different browsers.


3.8.19

Difference between WebDriver and JavaScript Clicks

We can click on a webelement in 2 ways:
Using WebDriver click – element.click()
Using JavaScript click – ((JavascriptExecutor)driver).executeScript("arguments[0].click()", element); 

When we click on a webelement using WebDriver, it checks for 2 conditions before clicking - The element must be visible; and it must have a height and width greater than 0.
If preconditions are not satisfied, the click is not performed and we get an exception.

But the JavaScript click method does not check these preconditions before clicking. The HTMLElement.click() method simulates a mouse click on an element. When click() is used with supported elements (such as an <input>), it fires the element’s click event.
So, JavaScript can click on a webelement which may not be actually visible to the user or be clickable by WebDriver API. 

But we know - "Selenium-WebDriver makes direct calls to the browser using each browser's native support for automation." Meaning...WebDriver tries to mimic actual user behavior when working on browsers.

From Selenium 3 onwards, WebDriver APIs are designed to simulate actions similar to how a real user would perform actions on the GUI via the browser, and not use wrapped JS calls to execute different commands on the browser, like it happens via SeleniumRC.

All browsers now have their own drivers which implement the WebDriver API and Selenium communicates with these drivers via HTTP and these drivers communicate natively with the browser. So we can say that the ChromeDriver performs actions similar to a user using the chrome browser.

JavaScript bypasses this and goes to interact with the DOM of the page directly. This is not how a real user would use a browser.
This is also similar to the problem we had with v1 of Selenium where it used JavaScript to directly communicate with the browser [because SeleniumRC was just a form of wrapped JavaScript calls]

Also sometimes, use of JS methods may not trigger events which would have been otherwise triggered had we used WebDriver. For example, a subsequent onClick() event may not get triggered when a button is clicked via JS.

Hence, in order to simulate actual user behavior we should go for WebDriver and use JS sparingly and only when direct methods of WebDriver dont work.


This was precisely the problem with SeleniumRC.

SeleniumRC had 2 components - core and server. The core is basically a bunch of JavaScript code that is injected into the browsers to control/automate the behavior. Using JavaScript to control browsers caused issues, specially with IE as it has a different implementation/behavior with JavaScript. In a way, Selenium sends Selenese commands over to Se Core via JS Injections which in turn control the browser. 

Also, there was problem with the same origin policy of browsers and to overcome this, the server component was used so that all the JavaScript code injection was directed via the server, so as to appear that its originating from the same host. This caused problems when there were popups, file uploads, etc, and this was relatively slower to run.
[To avoid 'Same Origin Policy' proxy injection method is used, in proxy injection mode the Selenium Server acts as a client configured HTTP proxy, which sits between the browser and application under test and then masks the AUT under a fictional URL]

Also, there were many overlapping and confusing methods implemented which made it difficult to use.

WebDriver is a cleaner and object oriented implementation, and it controls the browsers using their native methods for browser automation, and does not rely on JavaScript injection. It works at the OS/Browser level, and does not have to use JS to control the browser.

Also SeleniumRC did not support headless testing; there was no HtmlUnitDriver.

2.8.19

NgWebDriver - Alternative to Protractor for AngularJS

AngularJS is an opensource framework for building web applications using JavaScript. Over the years it has become quite popular and any new web-app built these days has elements of AngularJS.

What this means under the hood is that AngularJS could result in some pretty complex DOM for the web pages, with its own custom tags like ng-app, ng-model, ng-bind, ng-repeat etc.

To help with testing, a new set of purpose-built tools targeting AngularJS have cropped up, and Protractor is one of the leaders in this field, and has gained good traction in the QA community.
Protractor is an end-to-end framework built using the JavaScript bindings of Selenium and its own locator methods to work with the AngularJS tags.

But the number of teams using Java + Selenium far out-number those using JavaScript + Selenium, and as such its difficult for them to incorporate Protractor in their own Java frameworks.
This is specially true for teams that already have a mature framework which supports multiple technologies, not just Web. And as it is usually the case, just UI-based testing is not the only goal of these teams. Hence, it makes little sense for them to use Protractor for testing.
And AngularJS is just one of the many ways we build apps - its just a small feature in the bigger scope of things.
And to test those, what we really need is a library that can be incorporated easily in the existing framework - not a new tool all together.

NgWebDriver is one such java library.

It has many useful methods to work with AngularJS locators, just as in Protractor (in fact it internally leverages Protractor to work with AngularJS).

With NgWebDriver we dont need to depend on Protractor at all because we can just merge this in our existing framework and use it as and when we need to.
This removes the need to change our framework just for AngularJS.

Maven dependency to add NgWebDriver library -

<dependency>
<groupId>com.paulhammant</groupId>
<artifactId>ngwebdriver</artifactId>
<version>1.1.4</version>
</dependency>

Sample code shows how to use it.

package main.java.com.automation.keyword.app;

import org.apache.log4j.Logger;
import org.apache.xmlbeans.impl.xb.xsdschema.Public;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import com.paulhammant.ngwebdriver.ByAngular;
import com.paulhammant.ngwebdriver.NgWebDriver;
import main.java.com.automation.keyword.driver.Driver;
import main.java.com.automation.keyword.driver.Utils;

public class NgWebDriverPoC {

public Utils utils = new Utils();

public Logger log = Logger.getLogger(PoC.class.getName());

public void angularWait(WebDriver driver) {

/*
* Need to downcast to the NgWebDriver in order to use some of its methods
*/
try {

NgWebDriver ngWebDriver = new NgWebDriver((JavascriptExecutor) driver);
ngWebDriver.waitForAngularRequestsToFinish();
log.info("waiting for Angular Requests to finish");

//  The waitForAngularRequestsToFinish method throws up ScriptTimeoutException many times and its better to catch it than have the script fail
} catch (ScriptTimeoutException e) {
log.info("ScriptTimeoutException while waiting for Angular Requests to finish");
}
}

public void angularJSDemo() {

String angularURL = "https://hello-angularjs.appspot.com/sorttablecolumn";

utils.openBrowser(angularURL);

WebDriver driver = Driver.getDriver();

angularWait(driver);

/*
* By adding the NgWebDriver library we can directly call some of its methods to locate angular elements We dont
* need to downcast the driver object
*/
driver.findElement(ByAngular.model("name")).sendKeys("ABC");
driver.findElement(ByAngular.model("employees")).sendKeys("100");
driver.findElement(ByAngular.model("headoffice")).sendKeys("Charlotte NC");
driver.findElement(ByAngular.buttonText("Submit")).click();

String hqCity = driver.findElement(ByAngular.repeater("company in companies").row(3).column("name")).getText();
log.info("City - " + hqCity);
hqCity = driver.findElement(ByAngular.repeater("company in companies").row(2).column("headoffice")).getText();
log.info("City - " + hqCity);

}

}



Testing APIs with RestAssured

These days there are many tools available to test REST based APIs - some of them are quite mature and feature-rich like Citrus, soapUI and Postman; but these are built only for API testing and trying to use them for other general purposes is often difficult.

If you already have a mature framework and want to 'incrementally' test APIs also as part of your existing codebase for functional automation, and want to save time in building a new framework around new tools, then you can use RestAssured in your existing framework.

RestAssured is a very capable java library for testing APIs and it even supports BDD-style spec.

Maven dependencies for Rest Assured -

<dependency>
<groupId>io.rest-assured</groupId>
<artifactId>rest-assured</artifactId>
<version>3.3.0</version>
</dependency>

<dependency>
<groupId>io.rest-assured</groupId>
<artifactId>json-path</artifactId>
<version>3.3.0</version>
</dependency>

<dependency>
<groupId>io.rest-assured</groupId>
<artifactId>xml-path</artifactId>
<version>3.3.0</version>
</dependency>

<dependency>
<groupId>io.rest-assured</groupId>
<artifactId>json-schema-validator</artifactId>
<version>3.3.0</version>
</dependency>


Sample code for working with RestAssured -

package main.java.com.automation.keyword.app;

import static io.restassured.RestAssured.*;
import static io.restassured.matcher.RestAssuredMatchers.*;
import static io.restassured.module.jsv.JsonSchemaValidator.*;
import static org.hamcrest.Matchers.*;
import static org.testng.Assert.assertEquals;
import static org.testng.Assert.assertTrue;
import org.apache.log4j.Logger;
import io.restassured.RestAssured;
import io.restassured.builder.RequestSpecBuilder;
import io.restassured.http.Header;
import io.restassured.http.Headers;
import io.restassured.path.json.JsonPath;
import io.restassured.response.Response;
import io.restassured.response.ValidatableResponse;
import io.restassured.specification.RequestSpecification;
import io.restassured.specification.ResponseSpecification;

public class RESTPoC {

private static Logger log = Logger.getLogger(RESTPoC.class.getName());

/*
* Variables for specification of different APIs though we may have only a single response specification
*/

private RequestSpecification raceReqSpec;

private RequestSpecification postmanReqSpec;

private ResponseSpecification respSpec;

private String url = "";

/*
* Function to set the config for all Req/Resp specifications
* But somehow its use is not getting implemented correctly as of now
*/

public void restConfig() {

log.info("setting proxy...");

//  RestAssured.proxy("localhost", 8888);

log.info("configuring common requestSpecification...");

RequestSpecification requestSpecification = new RequestSpecBuilder().

//    addHeader("Content-type", "json").
//    addHeader("Accept", "application/json").

build();

log.info("setting this as the specification for all REST requests...");

RestAssured.requestSpecification = requestSpecification;

}

public String getURL(String apiName) {

switch (apiName.toLowerCase()) {

case "google-books":
url = "https://www.googleapis.com/books/v1/volumes?q=isbn:0747532699";
break;
case "google-books-java":
url = "https://www.googleapis.com/books/v1/volumes?title:java";
break;
case "f1-api":
url = "http://ergast.com/api/f1/2018/circuits.json";
break;
case "postman-get":
url = "https://postman-echo.com/GET";
break;
case "400":
url = "http://ergast.com/api/f1/2018/circuits1.json";
break;
case "404":
url = "http://ergast.com/api/f1/2018/circuits.json 1";
break;
default:
break;
}

log.info("Request URL: " + url);
return url;
}

public String setURL() {

   url = "https://www.googleapis.com/books/v1/volumes?q=isbn:0747532699";
//   url = "http://ergast.com/api/f1/2018/circuits.json";
//   url = "https://postman-echo.com/GET";

log.info("Request URL: " + url);
return url;

}

/*
* This function does not take a Req Specification to get the Response from the resourceURL
* It has BDD style coding
* It also logs all requests and response steps
*/

public void getResponseBDD() {

given().log().all().

when().get(url).

then().log().all().statusCode(200);

}

/*
* This function does not take a Req Specification to get the Response from the resourceURL
* Nor does this uses BDD style coding
* @param resourceURL
*/

public Response getResponseDirectlyNoReqSpec(String resourceURL) {

Response rsp = null;

RequestSpecification rq = RestAssured.given();

rsp = rq.get(resourceURL);

log.info("------------------------------------------------------------------");

log.info("URL: " + resourceURL + " has Response: " + "\n" + rsp.asString());

log.info("------------------------------------------------------------------");

return rsp;
}

public Boolean chkInvalidResponse() {

Response rsp;

Boolean result;

//  rsp = getResponseDirectlyNoReqSpec(getURL("f1-api") + " 12312");

rsp = getResponseDirectlyNoReqSpec(getURL("404"));

if (getStatusCode(rsp) == 200) {

log.info("Valid response received");

result = true;

} else {

getStatusLine(rsp);
getAllHeaders(rsp);
result = false;
}
return result;
}

public void sampleJsonPathExp() {

Response rsp;

rsp = getResponseDirectlyNoReqSpec(getURL("google-books-java"));
JsonPath jp = rsp.jsonPath();
}

public void chkResponseGoogleBooksAPI() {

Response rsp = getResponseDirectlyNoReqSpec(getURL("google-books"));

if (getStatusCode(rsp) == 200) {

getAllHeaders(rsp);

} else {
getAllHeaders(rsp);
}
}

/*
* Function to check response of F1 API via JsonPath
*/
public void chkResponseF1API() {

Response rsp = getResponseDirectlyNoReqSpec(getURL("f1-api"));

String contentType = "";

//   Proceed only if response is 200

if (getStatusCode(rsp) == 200) {

getStatusLine(rsp);

getAllHeaders(rsp);

contentType = getHeaderValue(rsp, "Content-type");

getHeaderValue(rsp, "Server");

// Proceed only if response type is JSON

if (contentType.toLowerCase().contains("json")) {

JsonPath jp = rsp.jsonPath();

log.info("Series Name: " + jp.get("MRData.series").toString().toUpperCase());

log.info("Year: " + jp.get("MRData.CircuitTable.season"));

log.info("Circuit Name: " + jp.get("MRData.CircuitTable.Circuits[0].circuitName"));

log.info("Circuit Country: " + jp.get("MRData.CircuitTable.Circuits[0].Location.country"));

log.info("Total Circuits: " + jp.get("MRData.total"));

log.info("Getting name and country of each circuit -------------------------------------");

for (int i = 0; i < Integer.parseInt(jp.get("MRData.total")); i++) {

log.info("Circuit Name: " + jp.get("MRData.CircuitTable.Circuits[" + i + "].circuitName"));

log.info("Circuit Country: " + jp.get("MRData.CircuitTable.Circuits[" + i + "].Location.country"));

}
}
}

// TestNG Assert library

assertTrue(contentType.toLowerCase().contains("json"));

//  assertEquals(contentType, "application/json");
}

// Status Code is of type int
public int getStatusCode(Response response) {

int statusCode;
statusCode = response.getStatusCode();
log.info("Status Code: " + statusCode);
return statusCode;
}

// Status msg is of type string
public String getStatusLine(Response response) {

String statusLine;

statusLine = response.getStatusLine() + "";

log.info("Status Msg: " + statusLine);

return statusLine;
}

public String getHeaderValue(Response response, String headerName) {

String headerValue = "";

headerValue = response.getHeader(headerName) + "";

log.info("Header name: " + headerName + " - value: " + headerValue);

return headerValue;
}

public void getAllHeaders(Response response) {

log.info("Getting value of all Headers via Headers object ---------------------------------");

Headers allHeaders = response.getHeaders();

for (Header header : allHeaders) {

log.info("Header name: " + header.getName() + " - value: " + header.getValue());

}
}

/*
* Keyword sort of methods Invoking requests without first calling the config Req/Resp is successful when common
* spec for response is not used
*/
public void invokeRestNoConfig() {
getURL("f1-api");
getResponseBDD();
}

public static void main(String[] args) {

RESTPoC rd = new RESTPoC();
rd.chkResponseF1API();
rd.chkResponseGoogleBooksAPI();
rd.chkInvalidResponse();
rd.sampleJsonPathExp();

}

}

Use JavaScript with Selenium WebDriver

WebDriver is very powerful and supports lots of methods and features. But there are some cases which are best handled via JavaScriptExecutor.

JS extends the capabilities of the WebDriver and can be helpful in the below cases.

  • Submit page instead of Click - Sometimes a button on a webpage does not have any Click methods, instead its treated as a form that has to be submitted. In such cases, even though there is button you can theoritically click, its not registered as a click but rather as a submit. Hence, in such cases, the click() method of the WebDriver may not always work and we may have to submit the page/form via JS.
  • Handling nested web elements - Usual WebDriver commands like "Click" may not work on toggle always, as it may find that object is not clickable.
  • Complete area of some web elements like button, checkbox etc are not  clickable. You need to click on specific part of element to perform action. Selenium might fail here some times. In this case also JS is very useful.
  • Handling different types of Calendars.
  • Scrolling can be a big problem in selenium. Using JS, we can scroll by pixels or to the web element.
  • Handling hidden elements - JS can get text or attribute value from hidden webelements which could be difficult by direct methods.
  • Drag and drop issues can be handled via js.
  • Object Location - JS can also be used to locate web elements using below methods
    • getElementById
    • getElementsByClassName
    • getElementsByName
    • getElementsByTagName

 /*
  * Function to execute Synchronous script via JS
  * The return is of the type superclass Object
  */
 public Object executeSyncJS(String jsCode) {

   /*
   * Downcasting driver to JavascriptExecutor object because we had upcasted our driver object to WebDriver
   * We don't need to downcast had we used [ChromeDriver driver= new ChromeDriver();] instead of [WebDriver driver= new ChromeDriver();]
   */

  JavascriptExecutor jsExe = (JavascriptExecutor) getDriver();
 

  /*
   * The return type of the JS Response depends on the result of the command executed
   * It could be anything - String, boolean, Map etc
   * Since Object class is the parent class of all objects, we are setting the type of response to 'Object'
   */

  Object jsResponse = jsExe.executeScript(jsCode) ;

  return jsResponse ;
 }

 

 /*
  * Function to click a WebElement via JavaScript
  */
 public Object jsClick(WebElement elementToClick) {

  JavascriptExecutor jsExe = (JavascriptExecutor) getDriver();

  Object jsResponse = jsExe.executeScript("arguments[0].click;" , elementToClick );

   return jsResponse;
 }


  public void jsDemo() {
  

  //Scroll vertically via JS 
  executeSyncJS("window.scroll(0,1000)");

  Utils.sleep(1000);

  executeSyncJS("window.scrollTo(0,2000)");

  Utils.sleep(1000);

  executeSyncJS("window.scrollBy(0,1000)");

   //Return the result of a script execution

  Object result = executeSyncJS("return 1===2");

  log.info("Result of JS: " + result);
 
 }

 

 /*
  * Function to find element by ID via JavaScript
  * To return a WebElement we need to downcast the Object
  */
 public WebElement jsGetElementById() {

  WebElement webElement = null ;

//  getDriver().get("https://www.cleartrip.com/");

  JavascriptExecutor jsExe = (JavascriptExecutor) getDriver();

  jsExe.executeScript("return document.getElementById('FromTag')") ;

  return webElement;
 }