Thursday, June 27, 2019

Connecting DBeaver with Spark Databricks Cluster


Got the Simba JDBC drivers from databricks.

Extracted the zip and then SimbaSparkJDBC41-2.6.3.1003.zip

Adding Simba driver to DBeaver:

In the DBeaver:
  1. Driver Manager
    1. Select New
                                                               i.      Give some name to Driver Name – Label only
                                                             ii.      Click on Add file and select the SimbaSparkJDBC41-2.6.3.1003.jar file
                                                           iii.      Add com.simba.spark.jdbc41.Driver to the Class Name: (Class name is as of 06/27/2019)
Getting JDBC URL from Databricks:
  1. Goto your Cluster from Databricks
  2. Click on Advanced Options in the Configuration tab
  3. Click on JDBC/ODBC tab
  4. Grab the JDBC URL provided which will look like below:
jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD=<personal-access-token>

Getting the Token from Databricks:
  1. Click on the user Icon on the top right most corner and click on User Settings.
  2. Select Access Tokens and create a token. NOTE: Pay attention to the dialogue box as this token is only showed once so you save it first.

Connecting Spark thru DBeaver:
  1. Click New Database connection in DBeaver
  2. Select the driver you just added – check for the label you provided for Driver Name when you added Simba driver
  3. Copy the URL you obtained from databricks into JDBC URL:
  4. Provide the token you obtained in PWD= in the URL
jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD= ##############
  1. Test Connection and you should be connected to your cluster and should be able to see all databases.

Added UseNativeQuery=1 at the end of url as I was getting errors in DBeaver:

jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD=##############;UseNativeQuery=1