Coefficient's Databricks integration enables you to effortlessly connect your Databricks database server and import data into Google Sheets or Excel. You can import data by choosing specific tables and columns or crafting custom queries.
Configuring Databricks OAuth (For Databricks Admin Only)
Connecting to Databricks via OAuth
Connecting to Databricks using JDBC URL
Schedule your Import, Snapshots, and Add Automations
FAQs for Databricks Integration
Configuring Databricks OAuth (For Databricks Admin Only)
To use this feature, your Databricks Admin must first configure Databricks OAuth for your domain.
ℹ️ NOTE: The Databricks Admin must also have Coefficient Admin access in order to complete the setup.
|
1. Sign in to your Databricks account console. |
2. Once logged in, click the Settings icon in the sidebar. |
3. Go to App Connections, and then click Add connection.
4. In the field provided, enter a name for your Coefficient OAuth connection (for example: "Coefficient OAuth Integration").
5. In the field provided, enter this redirect URL for your connection: https://app.coefficient.io/api/oauth2_proxy/databricks
|
6. Under the Access Scopes, check the box for "All APIs". This enables Coefficient to access the Databricks APIs to import your data. ℹ️ NOTE: The remaining options are filled out by default; there is no need to change them.
Click the Add button to save your settings. |
|
7. The Connection created dialog will display your Client ID and Client Secret, which you will need to configure the connection with Coefficient.
ℹ️ NOTE: Be sure to copy and save both values. For security reasons, the Client Secret will only be shown once. If it is lost, you will need to delete and recreate the OAuth application to generate a new one.
8. Copy and save the Account ID from the URL in your browser’s address bar. You’ll need this value for the Issuer URI when configuring the connection in Coefficient.
Connecting to Databricks via OAuth
When you begin a Databricks import for the first time, you will need to go through a few steps to connect Databricks as a data source for Coefficient.
ℹ️ NOTE: Coefficient will need the following information: Issuer URl, Client ID, Client Secret, JDBC URL without token, and Catalog Name.
| 1. Open the Coefficient Sidebar and click Menu.
|
2. Select “Connected Sources”. |
3. Select “Add Connection” at the bottom and then “Connect” to Databricks SQL.
|
|
4. Click the "connect using Databricks OAuth" link. |
5. Click the "Configure" button. |
6. You will be redirected to the OAuth Settings page, where you’ll enter the Issuer URI, Client ID, and Client Secret that you copied when setting up the Coefficient OAuth connection in Databricks. Once entered, click Save to complete the setup.”
ℹ️ NOTE: For AWS or GCP-hosted Databricks, append the Account ID you copied from your Databricks console to this URL:
https://accounts.cloud.databricks.com/oidc/accounts/<account-id>
For Azure-hosted Databricks, use your workspace instance name in the following format:
https://<workspace-instance-name>/oidc
(Use the resulting URI as your Issuer URI when configuring the connection.)
| 7. Go back to your spreadsheet and click the Refresh button in the sidebar. |
8. Enter the "JDBC URL without token" and "Catalog name" for your Databricks server. Click on "Authorize" when done. ℹ️ NOTE: If you need help finding your "JDBC URL," click here. |
9. Give your OAuth connection a unique name, and click Save.
|
Connecting to Databricks using JDBC URL
ℹ️ NOTE: Coefficient will need the following information: JDBC URL/Personal Access Token (PAT) and Catalog Name.
| 1. Open the Coefficient Sidebar and click Menu.
|
2. Select “Connected Sources”. |
3. Select “Add Connection” at the bottom and then “Connect” to Databricks SQL.
|
|
4. Enter the "JDBC URL" and "Catalog name" for your Databricks server. Click "Connect" when done. |
ℹ️ VIDEO: For help obtaining your JDBC URL and Personal Access Token, click here. ℹ️ NOTE: If you need help finding your "JDBC URL," click here. ℹ️ NOTE: If you need help generating your Personal Access Token, click here. |
|
5. If your database is behind a firewall, you will need to whitelist (ALL 3) Coefficient's server IP addresses: 34.217.184.131 44.234.233.60 52.32.132.51 |
6. You will then be presented with the option to share this connection with other members of your team who also use Coefficient. Your credentials will NOT be shared with your team. |
Import from Databricks
There are a few ways to import data using Coefficient from Databricks: Importing from Tables & Columns, Importing from a Custom SQL Query, and Importing from GPT SQL Builder.
Importing from Tables and columns allows you to create imports without having to write SQL. Using a Custom SQL Query gives you additional flexibility in the data that you are importing into Coefficient. And lastly, you can now prompt the Coefficient's AI to automatically build the SQL query for you. 🤯
Import from Tables & Columns
|
1. From the Sidebar, select “Import from…”. |
2. Select “Databricks SQL” from the list. |
3. Select "From Tables & Columns". |
4. The Import Preview window opens, showing all the table schemas from your Databricks database. Select the table for your import. (e.g., trips).
5. Once the table is selected, the fields within that table will appear in a list on the left side of the Import Preview window. Select the fields you want to include in your import by checking/unchecking the corresponding boxes.
ℹ️ NOTE: The Import Preview shows only a sample of your data (50 rows). This sample data will be updated if there are any changes to the import's criteria.
6. Customize your import by adding filters, sorts, limits, or even grouping the data into a cloud pivot table. Then, click Import when done.
7. Congratulations on your first Databricks import using Tables & Columns! 🎉
Import from Custom SQL Query
|
1. From the Sidebar, select “Import from…”. |
2. Select “Databricks” from the list. |
3. Select "Custom SQL Query".
|
4. The Import Preview window opens, allowing you to enter your custom SQL query in the blue text box shown below. For further flexibility, you can use Coefficient’s SQL Parameters feature to point a value to a specific cell/range of cells for your query.
ℹ️ NOTE: Whenever you make changes to your query, you need to click "Refresh Preview" to update the sample data shown in the preview window.
5. You will be prompted to give your import a name. The name MUST be UNIQUE, as it will also be the name of the tab in your Google Sheets/Excel when imported. (You can always change the name later if needed.)
6. Congratulations on your successful Databricks Custom SQL import with Coefficient! 🎉
Import from GPT SQL Builder
|
1. From the Sidebar, select “Import from…”. |
2. Select “Databricks” from the list. |
3. Select "Custom SQL Query".
|
4. Select your database schema and enter your prompt/query in the "Describe what you want to query" box (example: "Show me all the trips with a pickup zip of 10001"). Then, click "Generate SQL".
ℹ️ PRO TIP: Be specific when entering your prompts so the AI can easily understand your requirements and provide more accurate results.
5. The GPT SQL Builder will automatically generate and write the SQL query for you in the blue text box. The sample data of your query's result is shown at the bottom (the first 50 rows are displayed). Click Import when done.
6. You will be prompted to give your import a name. Remember it MUST be UNIQUE, as it will also be the name of the tab in your Google Sheets/Excel when imported. (You can always change the name later if needed.)
7. Congratulations on your Databricks import using Coefficient's GPT SQL Builder! 🎉
Schedule your Import, Snapshots, and Add Automations
Once you have pulled your data into your spreadsheet using Coefficient, you can set up the following:
|
|
FAQs for Databricks Integration
How can I generate my Personal Access Token (PAT) in Databricks:
- Login to your Databricks account.
-
In Databricks, click on your user icon in the top right corner and choose "Settings."
- Navigate to the "Developer" tab. Under "Access tokens," click "Manage," then select "Generate new token."
- Give the token a name. If you want the token to never expire, leave the Lifetime field blank. Click "Generate" and copy the token. Use this token to replace <personal-access-token> in the JDBC URL.
I'm stuck on "Connecting" while trying to link Coefficient with my Databricks server. How can I fix this?
Here are a few things you can try:
- Ensure you entered the correct JDBC URL and Catalog Name for your Databricks instance.
- Ensure that the cluster you are trying to connect with is properly running.
- If your database is behind a firewall, you must whitelist all (3) of Coefficient's IP Addresses from your database server. Check this article for more information.
- 34.217.184.131
- 44.234.233.60
- 52.32.132.51
- If you see a specific connection error (e.g., Invalid PAT, Host, or Catalog Name), please contact support (support@coefficient.io)for further assistance.
Can I create multiple Catalog connections with Coefficient?
Yes, Coefficient supports multiple connections with our Databricks integration. To create an additional connection, launch the sidebar, click on "Import from," then select "Add Connection," and choose "Databricks SQL." Enter the JDBC URL and Catalog you want to connect to. We highly recommend naming each connection to distinguish between them easily.
Can I connect to Databricks OAuth using a personal Gmail?
Databricks OAuth only supports domain addresses. Personal email providers like Gmail are not supported. If you need help setting up access for your domain account, please contact us at support@coefficient.io.