Coefficient's integration with Amazon Redshift boasts an intuitive visual query builder, simplifying the process of selecting tables and columns and applying filters. The system seamlessly translates these selections into efficient SQL queries for optimal performance on your Redshift database. Additionally, users have the flexibility to incorporate their own custom SQL queries, allowing for a personalized and tailored approach to data imports.
Schedule your Import, Snapshots, and Add Automations
Connecting to Redshift
When you begin a Redshift import for the first time, you will need to go through a few steps to connect Redshift as a data source for Coefficient.
ℹ️ NOTE: Coefficient will need the following information: Redshift Host, Database name, Username, Password, and Port (The default Port for Redshift is 5439).
1. Open the Coefficient Sidebar and click the Menu. |
2. Select “Connected Sources”. |
3. Select “Add Connection” at the bottom and then “Connect” to Redshift. |
4. Enter the required fields (Host, Database name, Username, Password, and Port) |
5. Ensure that your Redshift cluster has the Publicly accessible setting enabled. You might also need to whitelist (All 3) Coefficient’s IP addresses (34.217.184.131, 44.234.233.60, 52.32.132.51) on your Redshift cluster’s Security Group and/or VPC network ACL. For detailed instructions, see this AWS support article. |
Import from Redshift
There are 2 different ways to import data using Coefficient from Redshift, Import from tables, and Custom SQL query. Importing from tables allows you to create imports without having to write SQL. Using a Custom SQL Query gives you additional flexibility in the data that you are importing into Coefficient.
Import from Tables
1. Open the Coefficient Sidebar and click on the “Import from…” button. |
2. Select the "Redshift" connection you created. |
3. Select “From Tables & Columns”. |
4. A Data Preview window will appear, allowing you to select the table that you want to import from.
5. After the table is selected, the fields will populate a list on the left side. You can uncheck/check the fields to exclude/include in your import. From this preview, you can also add filters, sort, pivot, or drag and drop the fields in the view to be in the order you need. Then click “Import”.
6. Congratulations on your successful Redshift import using Coefficient! 🎉
Custom SQL Query
1. Open the Coefficient Sidebar and click on the “Import from…” button. |
2. Select Redshift as the data source. |
3. Select “Custom SQL query”. |
4. Enter your query in the text box provided or use AI to write a query. Then click the Import button.
5. Add a name for your import.
6. Congratulations on your successful Redshift import using Coefficient! 🎉
Import from GPT SQL Builder
1. From the Sidebar select “Import from…”.
|
2. Select “Redshift” from the list. |
3. Select "GPT SQL Builder". |
4. Enter your prompt/query in the "Describe what you want to query" box.
ℹ️ PRO TIP: Be specific when entering your prompts so that the AI can easily understand your requirements and provide more accurate results.5. The SQL Builder will automatically generate and write the SQL query for you in the blue text box.
ℹ️ NOTE: Click "Refresh Preview" to display a sample of your data results (only 50 rows are shown) or to update the results of the preview if you make any changes to the query.6. You will be prompted to give your import a name. Remember it MUST be UNIQUE as it will also be the name of the tab in your Google Sheet/Excel workbook when imported. (You can always change the name later if needed).7. Congratulations on your Redshift import using Coefficient's GPT SQL Builder! 🎉
ℹ️ See GPT SQL Builder to learn more!
Schedule your Import, Snapshots, and Add Automations
Once you have pulled your data into your spreadsheet using Coefficient, you can set up the following:
|
FAQs for Redshift Integration
I keep getting an error when I try to connect Redshift to Coefficient. What is wrong?
Please review the following, then try connecting again:
- Make sure that you have entered the correct Hostname, Database Name, Username, Password, and Port for your Redshift cluster. To find the Hostname and Port:
- Sign in to the AWS Console
- Type “Amazon Redshift” into the search bar and click on “Amazon Redshift” (listed under Services)
- Select the cluster you would like to connect to
- The **Endpoint ****URL contains the Hostname and Port of your Redshift cluster.
- Ensure that your Redshift cluster has the Publicly accessible setting enabled. You might also need to whitelist Coefficient’s IP addresses on your Redshift cluster’s Security Group and/or VPC network ACL. For detailed instructions, see this AWS support article.
- Coefficient IP Addresses (All 3 need to be whitelisted)
- 34.217.184.131
- 44.234.233.60
- 52.32.132.51
- Coefficient IP Addresses (All 3 need to be whitelisted)
When you connect Coefficient to Redshift, do you maintain access?
When Coefficient needs to run a query, we establish a connection to your database, run the query on your behalf, and terminate the connection once the query completes.
I added a table (or column) in my Redshift database; why is it not showing up in Coefficient?
To deliver a snappy experience when you set up imports from Redshift, we cache your database schema for up to 24 hours. If you recently changed your database schema (e.g., added a table/column, renamed a table/column, etc.), and you don't see the change reflected in Coefficient, you can force a schema reload:
- Open the Coefficient sidebar in Google Sheets/Excel.
- Click on the ≣ menu in the top right of the sidebar, then click on “Connected Sources”.
- Click on your Redshift connection to see its Connection Settings page
- Click on the︙button near the top right, choose “Reload Schema”, and click “Reload” on the confirmation dialog
My custom SQL script seems to run longer than expected and sometimes, I see a "SQL Error - canceling statement due to statement timeout" error when I refresh my import, what should I do?
The error message you're seeing indicates that the SQL query you're trying to execute is being canceled due to a statement timeout. This means that the query is taking too long to execute, and your database server is configured to cancel any query that exceeds a certain execution time threshold.
Here are some steps you can take to understand and fix this issue:
- Examine the Execution Plan: Use the EXPLAIN command to get the execution plan for your query. This will show you where the query might be inefficient, such as performing full table scans or using nested loops that could be optimized. (Click here to learn more about the EXPLAIN command with Redshift).
- Optimize the Query: Look for ways to make the query more efficient. This could involve adding indexes to the columns used in the WHERE clause and the ILIKE conditions, rewriting the query to reduce complexity, or breaking it into smaller parts.
- Reduce the Dataset: If possible, limit the scope of the query. For example, if you're querying a large date range or a large number of rows, see if you can reduce that range with LIMIT.
- Increase the Statement Timeout (not recommended): If you have control over the database server settings, you can increase the statement timeout value. This is a temporary solution and may not be ideal if the query is inherently inefficient.