Selenium WebDriver Type Hierarchy

Ever wondered how is the WebDriver actually implemented or why do we use ChromeDriver but call it WebDriver or what is RemoteWebDriver? Lets find out.

WebDriver is an Interface.
JavascriptExecutor is an Interface.
RemoteWebDriver is the parent Class that implements the WebDriver and JavascriptExecutor interfaces.
ChromeDriver and FirefoxDriver and browser drivers are the child Classes that extend the parent RemoteWebDriver class.

An Interface by definition does not have any implementation details of its methods, just the empty method declarations. Its the responsibility of the implementing class to 'implement' those methods by adding details of what the methods would do.
Since WebDriver and JavascriptExecutor are Interfaces they only have abstract (empty) methods; and the 'fully-implemented' class RemoteWebDriver actually provides the definition of the methods in these 2 interfaces - All the abstarct methods in the WebDriver and JavascriptExecutor interfaces are implemented in the RemoteWebDriver class.
Browser specific drivers like ChromeDriver and FirefoxDriver then go and extend this RemoteWebDriver class to add more methods of their own; or have their own implementations of the same methods.

But why this hierarchy? The actual developers of Selenium don't know how all the different browsers work internally. So they just declared the methods that they thought were important to work with Selenium and left the actual implementation part of these methods to the developers of these browsers.
The real problem is that browsers are complicated software and not everything is open-source/visible to external developers, so they cannot customize.
For instance, the actual implementation of the 'Click' method for WebDriver could be different for each of Chrome and Firefox, hence, they have their own driver versions for the same (which is why we don't use Firefox driver on Chrome).
Also, in a way this puts the onus on the browser-companies to provide the implementation of their drivers to stay relevant and be widely adopted.

So can we do this? WebDriver driver = new WebDriver();
We get a compile time error: Cannot instantiate the type WebDriver - why? because we cannot instantiate an interface ie., cannot create an object of an interface (WebDriver) and invoke its methods.
Since WebDriver is an Interface and not a 'Class', and all its methods are just empty shells (abstract), we really cannot do anything anyways by creating an object of the interface and trying to call its methods - hence, its not advised to create an object of an empty interface.
Thus, if we want to perform any action we have to invoke the implementing class of that interface.

So should we do this? WebDriver driver = new RemoteWebDriver();
We get a compile time error: The Constructor RemoteWebDriver() is not visible - what this means is that there is no method like this to be called directly (constructor is also a method).
Though technically we can have this code -
WebDriver driver = new RemoteWebDriver(capabilities);
Or
WebDriver driver = new RemoteWebDriver(URL, capabilities.chrome());
Or
WebDriver driver = new RemoteWebDriver(commandExecutor, capabilities);
Why we don't use the above is because RemoteWebDriver is usually intended to be used while working with Selenium Grid and needs the Selenium server, wehere-as if we use ChromeDriver() we would be invoking the local installation of the chrome browser on our machines.

What about this? ChromeDriver driver = new ChromeDriver();
Since ChromeDriver is a class, it implements all the methods of the WebDriver interface. But the 'driver' instance that gets created will only be able to use the methods implemented by ChromeDriver and supported only by the chrome browser; and as such we would be restricted to run our scripts only using the chrome browser.
To work with other browsers we will have to create individual objects via - FirefoxDriver driver = new FirefoxDriver();
And we will have to keep switching at runtime.

This is the reason we use this: WebDriver driver = new ChromeDriver();
So that we can work with different browsers without having to update our code for every browser specific driver. And this would make our code more extensible by providing us the flexibility to work with any number of browsers (drivers).
Also, this is better design as a change in driver initialization for one browser will not be affect others, and we can have different configurations for different browsers.
Here, WebDriver is the interface, ChromeDriver() is the Constructor, new is the keyword and [new ChromeDriver()] is the object referenced by the 'driver' variable.
'Java' specific reason - WebDriver is the super interface for all browser classes like FirefoxDriver, ChromeDriver etc. So WebDriver instance can hold object of any driver class. This is also called Upcasting - When we pass the reference of a super-class [parent] to the object of its sub-class [child].

But can we do vice versa? - ChromeDriver driver = new WebDriver();
We get a compile time error: Cannot convert from WebDriver to ChromeDriver.

But then why do we have to do this? - JavascriptExecutor js = (JavascriptExecutor) driver;
WebDriver and JavascriptExecutor are two different interfaces, and they do not have any methods common. The 2 methods of the JSE (executeScript and executeAsyncScript) are not present in WebDriver interface.
But all the methods of the WebDriver and JSE interfaces have been implemented by the browser drivers.
Because we had up-cast the 'driver' object to WebDriver and WebDriver does not have the methods of JSE interface, we have to down-cast.
We wouldn't have had to down-cast had we just used [ChromeDriver driver = new ChromeDriver();] In this case, you do not need to downcast it to JavascriptExecutor as the 'driver' has visibility of all methods of JSE because the browser driver class 'ChromeDriver' extends 'RemoteWebDriver' class, hence, ChromeDriver has indirect access of all methods of JSE via RemoteWebDriver.

Infact we can even cast it to ChromeDriver and not have to use JavascriptExecutor, like below -
JavascriptExecutor js = (ChromeDriver) driver; // This works too!!

Addtional notes -

SearchContext is the top most interface which has only two methods names findElement() and findElements(). These methods are abstract as SearchContext is an interface. This is the reason we do not up-cast to SearchContext because there is no point in just having 2 methods to work with; and having to downcast every time we want to use the third method.
WebDriver is also an interface which extends SearchContext but since WebDriver has the maximum number of methods, it is the key interface against which tests should be written. There are many implementing classes for the WebDriver interface, as listed as below:

AndroidDriver
AndroidWebDriver
ChromeDriver
FirefoxDriver
HtmlUnitDriver
InternetExplorerDriver
IPhoneDriver
IPhoneSimulatorDriver
SafariDriver

WebDriver has many abstarct methods like get(String url), close(), quit() , getWindowHandle etc. WebDriver also has nested interfaces names Window, Navigation, Timeouts etc that are used to perform specific actions like getPosition(), back(), forward() etc.
RemoteWebDriver is the fully implemented class for WebDriver, JavascriptExecutor and TakesScreenshot interfaces. (Fully implemented class means it defines the body for all inherited abstract methods.)
Then we have browser specific driver classes like ChromeDriver(), EdgeDriver(), FirefoxDriver() etc which extend RemoteWebDriver.
RemoteWebDriver implements JavascriptExecutor and provides definition for both methods of the JSE. Since all browser-specific driver classes like ChromeDriver etc extends RemoteWebDriver, we can execute JavaScript commands via JSE methods on these different browsers.

Automation - BDD - Tools

10.8.19

Selenium WebDriver Type Hierarchy

No comments:

Post a Comment