Coefficient's integration with Amazon Redshift boasts an intuitive visual query builder, simplifying the process of selecting tables and columns and applying filters. The system seamlessly translates these selections into efficient SQL queries for optimal performance on your Redshift database. Additionally, users have the flexibility to incorporate their own custom SQL queries, allowing for a personalized and tailored approach to data imports.
Schedule your Import, Snapshots, and Add Automations
Connecting to Redshift
When you begin a Redshift import for the first time, you will need to go through a few steps to connect Redshift as a data source for Coefficient.
ℹ️ NOTE: Coefficient will need the following information: Redshift Host, Database name, Username, Password, and Port (The default Port for Redshift is 5439).
|
1. Open the Coefficient Sidebar and click the Menu. |
2. Select “Connected Sources”. |
3. Select “Add Connection” at the bottom and then “Connect” to Redshift. |
|
4. Enter the required fields (Host, Database name, Username, Password, and Port) |
5. Ensure that your Redshift cluster has the Publicly accessible setting enabled. You might also need to whitelist (All 3) Coefficient’s IP addresses (34.217.184.131, 44.234.233.60, 52.32.132.51) on your Redshift cluster’s Security Group and/or VPC network ACL. For detailed instructions, see this AWS support article. |
Advanced Settings
When setting up your Amazon Redshift connection, you can enable advanced settings to select a specific schema or configure SSH Tunnel encryption.
Visible Schemas
By default, all schemas are displayed when creating an import. If your database contains many schemas, you can improve load times by showing only a subset of them.
|
1. In the connection screen, click the "Advanced Settings" link and choose the schema you want to use for your data source connection. ℹ️ NOTE: To learn more about SQL Schema Limits, click here. |
|
SSH Tunnel
If your database server requires an SSH Tunnel for encryption, you can enable this option and enter your credentials in the connection screen.
|
1. In the connection screen, toggle the "Use SSH Tunnel" on and enter your SSH credentials:
|
Import from Redshift
There are 2 different ways to import data using Coefficient from Redshift, Import from tables, and Custom SQL query. Importing from tables allows you to create imports without having to write SQL. Using a Custom SQL Query gives you additional flexibility in the data that you are importing into Coefficient.
Import from Tables
|
1. Open the Coefficient Sidebar and click on the “Import from…” button. |
2. Select the "Redshift" connection you created. |
3. Select “From Tables & Columns”. |
4. A Data Preview window will appear, allowing you to select the table that you want to import from.
5. After the table is selected, the fields will populate a list on the left side. You can uncheck/check the fields to exclude/include in your import. From this preview, you can also add filters, sort, pivot, or drag and drop the fields in the view to be in the order you need. Then click “Import”.
6. Congratulations on your successful Redshift import using Coefficient! 🎉
Custom SQL Query
|
1. Open the Coefficient Sidebar and click on the “Import from…” button. |
2. Select Redshift as the data source. |
3. Select “Custom SQL query”. |
4. Enter your query in the text box provided or use AI to write a query. Then click the Import button.
5. When you click “Import”, you will be prompted to give your import a name. The name MUST be UNIQUE as it will also be the name of the tab in your Google Sheets/Excel when imported. (You can always change the name later if needed.)
6. Congratulations on your successful Redshift import using Coefficient! 🎉
SQL Editor Features
The SQL Editor window (Custom SQL and GPT SQL Builder) contains features that facilitate ease of use and familiarity, adapting native SQL tools.
|
Autocomplete Query |
Shows innovative suggestions such as keywords, table names/fields as you type your query. |
|
Undo / Redo Query |
Return/revert query changes during the import preview session. |
|
Inline Data Preview |
Preview changes from the import preview window as you modify the query. |
|
Query Formatting |
Arrange queries in a more readable format with a simple right-click. |
|
Run Keys Shortcut |
Allows Command + Enter to run queries. |
|
In-place Parameter Referencing |
Parameters can be called from the query. |
Export to SQL DB
To have an accurate data analysis from your Amazon Redshift database, it's vital to keep all information up-to-date, and Coefficient helps achieve that with the capability to export data from your spreadsheet into the Amazon Redshift platform. 🎉
You can check out the details on how to push data into Redshift from our Export to SQL DB article, here.
ℹ️ NOTE: Scheduled exports to Redshift are supported for the UPSERT and UPDATE actions. Click here to learn more about Scheduled Exports.
Schedule your Import, Snapshots, and Add Automations
Once you have pulled your data into your spreadsheet using Coefficient, you can set up the following:
|
|
FAQs for Redshift Integration
I keep getting an error when I try to connect Redshift to Coefficient. What is wrong?
Please review the following, then try connecting again:
- Make sure that you have entered the correct Hostname, Database Name, Username, Password, and Port for your Redshift cluster. To find the Hostname and Port:
- Sign in to the AWS Console
- Type “Amazon Redshift” into the search bar and click on “Amazon Redshift” (listed under Services)
- Select the cluster you would like to connect to
- The **Endpoint ****URL contains the Hostname and Port of your Redshift cluster.
- Ensure that your Redshift cluster has the Publicly accessible setting enabled. You might also need to whitelist Coefficient’s IP addresses on your Redshift cluster’s Security Group and/or VPC network ACL. For detailed instructions, see this AWS support article.
- Coefficient IP Addresses (All 3 need to be whitelisted)
- 34.217.184.131
- 44.234.233.60
- 52.32.132.51
- Coefficient IP Addresses (All 3 need to be whitelisted)
When you connect Coefficient to Redshift, do you maintain access?
When Coefficient needs to run a query, we establish a connection to your database, run the query on your behalf, and terminate the connection once the query completes.
I added a table (or column) in my Redshift database; why is it not showing up in Coefficient?
To deliver a snappy experience when you set up imports from Redshift, we cache your database schema for up to 24 hours. If you recently changed your database schema (e.g., added a table/column, renamed a table/column, etc.), and you don't see the change reflected in Coefficient, you can force a schema reload:
- Open the Coefficient sidebar in Google Sheets/Excel.
- Click on the ≣ menu in the top right of the sidebar, then click on “Connected Sources”.
- Click on your Redshift connection to see its Connection Settings page
- Click on the︙button near the top right, choose “Reload Schema”, and click “Reload” on the confirmation dialog
My custom SQL script seems to run longer than expected and sometimes, I see a "SQL Error - canceling statement due to statement timeout" error when I refresh my import, what should I do?
The error message you're seeing indicates that the SQL query you're trying to execute is being canceled due to a statement timeout. This means that the query is taking too long to execute, and your database server is configured to cancel any query that exceeds a certain execution time threshold.
Here are some steps you can take to understand and fix this issue:
- Examine the Execution Plan: Use the EXPLAIN command to get the execution plan for your query. This will show you where the query might be inefficient, such as performing full table scans or using nested loops that could be optimized. (Click here to learn more about the EXPLAIN command with Redshift).
- Optimize the Query: Look for ways to make the query more efficient. This could involve adding indexes to the columns used in the WHERE clause and the ILIKE conditions, rewriting the query to reduce complexity, or breaking it into smaller parts.
- Reduce the Dataset: If possible, limit the scope of the query. For example, if you're querying a large date range or a large number of rows, see if you can reduce that range with LIMIT.
- Increase the Statement Timeout (not recommended): If you have control over the database server settings, you can increase the statement timeout value. This is a temporary solution and may not be ideal if the query is inherently inefficient.