Databricks

Section header image mobile

Coefficient's Databricks integration enables you to effortlessly connect your Databricks database server and import data into Google Sheets or Excel. You can import data by choosing specific tables and columns or crafting custom queries.

Connecting to Databricks

Import from Databricks

Import from Tables & Columns

Import from Custom SQL Query

Import from GPT SQL Builder

Schedule your Import, Snapshots, and Add Automations

FAQs for Databricks Integration

Connecting to Databricks

When you begin a Databricks import for the first time, you will need to go through a few steps to connect Databricks as a data source for Coefficient.

ℹ️ NOTE: Coefficient will need the following information: JDBC URL/Personal Access Token (PAT) and Catalog Name.

1. Open the Coefficient Sidebar and click Menu.

MenuIconUpdated.png

2. Select “Connected Sources”.

ConnectedSources.png

3. Select “Add Connection at the bottom and then “Connect” to Databricks.

Screenshot 2024-07-23 at 8.13.49 PM.png

 

4. Enter the "JDBC URL" and "Catalog name" for your Databricks server. Click on "Connect" when done.

Screenshot 2024-07-23 at 8.16.13 PM.png

ℹ️ VIDEO: For help obtaining your JDBC URL and Personal Access Token, click here

 

ℹ️ NOTE: If you need help finding your "JDBC URL," click here.

 

ℹ️ NOTE: If you need help generating your Personal Access Token, click here.

5. If your database is behind a firewall, you will need to whitelist (ALL 3)

Coefficient's server IP addresses. (34.217.184.131, 44.234.233.60, 52.32.132.51). Click "Connect" when done.

Screenshot 2024-08-06 at 12.03.40 AM.png

6. You will then be presented with the option to share this connection with other members of your team who also use Coefficient. Your credentials will NOT be shared with your team.  🎉

Screenshot 2024-08-06 at 12.01.51 AM.png

Import from Databricks

There are a few ways to import data using Coefficient from Databricks: Importing from Tables & Columns, Importing from a Custom SQL Query, and Importing from GPT SQL Builder.

Importing from Tables and columns allows you to create imports without having to write SQL. Using a Custom SQL Query gives you additional flexibility in the data that you are importing into Coefficient. And lastly, you can now prompt the Coefficient's AI to automatically build the SQL query for you. 🤯

Import from Tables & Columns

1. From the Sidebar select “Import from…”.

ImportFrom (1).png

2. Select “Databricks” from the list.

Screenshot 2024-08-06 at 12.10.56 AM.png

3. Select "From Tables & Columns".

Tables and Columns.png

4. The Import Preview window opens showing all the table schemas from your Databricks database. Select the table for your import. (e.g.”default.customer_202406181234”)

Screenshot 2024-07-24 at 7.53.41 PM.png

5. Once the table is selected, the fields within that table will appear in a list on the left side of the Import Preview window. Select the fields you want to include in your import by checking/unchecking the corresponding boxes.
ℹ️ NOTE: The Import Preview shows only a sample of your data (50 rows). This sample data will be updated if there are any changes to the import's criteria. 

Screenshot 2024-07-24 at 7.58.13 PM.png

6. Customize your import by adding filters, sorts, limits, or even grouping the data into a cloud pivot table. Then "Import" when done.

Screenshot 2024-07-24 at 8.01.23 PM.png

7. Congratulations on your first Databricks import using Tables & Columns! 🎉

Screenshot 2024-07-24 at 8.05.02 PM.png

Import from Custom SQL Query

1. From the Sidebar select “Import from…”.

ImportFrom.png

2. Select “Databricks” from the list.

Screenshot 2024-08-06 at 12.10.56 AM.png

3. Select "Custom SQL Query".

Screenshot 2024-07-24 at 8.47.12 PM copy.png

 

4. The Import Preview window opens, allowing you to enter your custom SQL query in the blue text box shown below. For further flexibility, you can use Coefficient’s SQL Parameters feature to point a value to a specific cell/range of cells for your query.

ℹ️  NOTE: Whenever you make changes to your query, you need to click "Refresh Preview" to update the sample data shown in the preview window.

Screenshot 2024-08-08 at 8.04.57 PM.png

5. When you click “Import”, you will be prompted to give your import a name. The name MUST be UNIQUE, as it will also be the name of the tab in your Google Sheets/Excel when imported. (You can always change the name later if needed).

Screenshot 2024-07-24 at 8.20.10 PM.png

6. Congratulations on your successful Databricks Custom SQL import with Coefficient! 🎉

Screenshot 2024-07-24 at 8.23.19 PM.png

Import from GPT SQL Builder

1. From the Sidebar select “Import from…”.

ImportFrom.png

2. Select “Databricks” from the list.

Screenshot 2024-08-06 at 12.10.56 AM.png

3. Select "GPT SQL Builder".

Screenshot 2024-07-24 at 8.47.12 PM copy 3.png

4. Select your database schema (eg. order_v_3) and enter your prompt/query in the "Describe what you want to query" box. (Example: "Give me all the orders created from the year 2021") When done, click "Generate SQL".

ℹ️  PRO TIP: Be specific when entering your prompts so the AI can easily understand your requirements and provide more accurate results.

Screenshot 2024-07-24 at 8.31.47 PM.png

5. The GPT SQL Builder will automatically generate and write the SQL query for you in the blue text box. The sample data of your query's result are shown at the bottom (the first 50 rows are displayed). Click "Import" when done.

Screenshot 2024-07-24 at 8.36.55 PM.png

6. You will be prompted to give your import a name. Remember it MUST be UNIQUE as it will also be the name of the tab in your Google Sheets/Excel when imported. (You can always change the name later if needed).

Screenshot 2024-07-24 at 8.40.30 PM.png

7. Congratulations on your Databricks import using Coefficient's GPT SQL Builder!  🎉 

Screenshot 2024-07-24 at 8.42.04 PM.png

Schedule your Import, Snapshots, and Add Automations

Once you have pulled your data into your spreadsheet using Coefficient, you can set up the following:

1. Schedule your ImportScheduled_Refreshes.png

2. Turn on SnapshotsSnapshot.png

3. Set Up Automations

AutomationswithGPT.png

FAQs for Databricks Integration

How can I locate the JDBC URL for my Databricks instance?

  1. Login to your Databricks account.
  2. In Databricks, navigate to the left panel, click "Compute," and select the resource you wish to connect to. Make sure it’s running.
  3. Go to "Advanced Options" on the compute details page, then click "JDBC/ODBC."
  4. Copy the JDBC URL. You will need to replace the <personal-access-token> in the URL.
 

How can I generate my Personal Access Token (PAT) in Databricks:

  1. Login to your Databricks account.
  2. In Databricks, click on your user icon in the top right corner and choose "Settings."

  3. Navigate to the "Developer" tab. Under "Access tokens," click "Manage," then select "Generate new token."
  4. Give the token a name. If you want the token to never expire, leave the Lifetime field blank. Click "Generate" and copy the token. Use this token to replace <personal-access-token> in the JDBC URL.

I'm stuck on "Connecting" while trying to link Coefficient with my Databricks server. How can I fix this?

Here are a few things you can try:

  1. Ensure you entered the correct JDBC URL and Catalog Name for your Databricks instance.
  2. Ensure that the cluster you are trying to connect with is properly running.
  3. If your database is behind a firewall, you must whitelist all (3) of Coefficient's IP Addresses from your database server. Check this article for more information.
    • 34.217.184.131
    • 44.234.233.60
    • 52.32.132.51
  4. If you see a specific connection error (e.g., Invalid PAT, Host, or Catalog Name), please contact support (support@coefficient.io)for further assistance.

Can I create multiple Catalog connections with Coefficient? 

Yes, Coefficient supports multiple connections with our Databricks integration. To create an additional connection, launch the sidebar, click on "Import from," then select "Add Connection," and choose "Databricks SQL." Enter the JDBC URL and Catalog you want to connect to. We highly recommend naming each connection to distinguish between them easily.

Was this article helpful? 0 out of 0 found this helpful