Thursday, June 27, 2019

Connecting DBeaver with Spark Databricks Cluster


Got the Simba JDBC drivers from databricks.

Extracted the zip and then SimbaSparkJDBC41-2.6.3.1003.zip

Adding Simba driver to DBeaver:

In the DBeaver:
  1. Driver Manager
    1. Select New
                                                               i.      Give some name to Driver Name – Label only
                                                             ii.      Click on Add file and select the SimbaSparkJDBC41-2.6.3.1003.jar file
                                                           iii.      Add com.simba.spark.jdbc41.Driver to the Class Name: (Class name is as of 06/27/2019)
Getting JDBC URL from Databricks:
  1. Goto your Cluster from Databricks
  2. Click on Advanced Options in the Configuration tab
  3. Click on JDBC/ODBC tab
  4. Grab the JDBC URL provided which will look like below:
jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD=<personal-access-token>

Getting the Token from Databricks:
  1. Click on the user Icon on the top right most corner and click on User Settings.
  2. Select Access Tokens and create a token. NOTE: Pay attention to the dialogue box as this token is only showed once so you save it first.

Connecting Spark thru DBeaver:
  1. Click New Database connection in DBeaver
  2. Select the driver you just added – check for the label you provided for Driver Name when you added Simba driver
  3. Copy the URL you obtained from databricks into JDBC URL:
  4. Provide the token you obtained in PWD= in the URL
jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD= ##############
  1. Test Connection and you should be connected to your cluster and should be able to see all databases.

Added UseNativeQuery=1 at the end of url as I was getting errors in DBeaver:

jdbc:spark://<server-name-info>:<port>/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/0/cluster-name;AuthMech=3;UID=token;PWD=##############;UseNativeQuery=1

3 comments:

  1. I get the following error:
    Can't create driver instance
    Error creating driver 'databricks' instance.
    Most likely required jar files are missing.
    You should configure jars in driver settings.

    Reason: can't load driver class 'com.spark.jdbc42.Driver'
    Error creating driver 'databricks' instance.

    ReplyDelete
  2. worked for me with com.simba.spark.jdbc.Driver in Class Name

    ReplyDelete
  3. use class name com.spark.jdbc.Driver.
    Also the jar SparkJDBC42.jar which you get in the SimbaSparkJDBC42-2.6.17.1021.zip folder should be installed in your cluster.

    ReplyDelete