![]() Each control file contains a list of data file names. GoldenGate Application Adapter converts trail records into a per source table set of control and data file(s) onto Amazon S3.GoldenGate data pump running on premise sends trail records to the GoldenGate Application Adapter instance running on an EC2 Linux instance.GoldenGate is configured on source database to capture data changes and store as GoldenGate trail files.The realization of this best practice through Oracle GoldenGate can be depicted in the following diagram:ĭata changes from source Oracle (or non-Oracle) database on premise to Redshift follow these steps: Then VACUUM and ANALYZE commands are executed on target table. Merge records in stage table to target table when database query activities are low.Redshift COPY command is executed to load data into stage tables which does not have contention with database queries from reports and BI dashboards.Each target table has a corresponding staging table.Each stream of record changes is exported as files on S3.Solution OverviewĪmazon published a document " Best Practices for Micro-Batch Loading on Amazon Redshift" with the following conceptual design elements: Most efficient way to load millions or billions of rows into Redshift tables is by loading files from Amazon S3 and there is a "COPY" command in Redshift to support this approach.Įven though Oracle GoldenGate has delivery support for PostgreSQL which Redshift is derived from, using GoldenGate for PostgreSQL is not going to work due to poor DML performance when writing continuously onto Redshift target tables and the lack of primary key constraint enforcement.Table level commands of VACUUM and ANALYZE are needed to re-organize a table's physical structures for optimized queries. DELETE is logical and query performance is affected after logical deletes.DML updates to table have high contention problem with queries typically found in analytics EDW.Update on a row is very slow compared to INSERT.It is possible to have duplicate rows even if a table's DDL has primary key defined. Primary key constraint is not enforced - primary constraint can be defined in DDL but Redshift only keeps the definition in data dictionary.Redshift posts a few challenges when it comes to data integration: Data Integration Challenges of Amazon RedshiftĪmazon Redshift is derived from PostgreSQL v8.0.2 but it has been re-written to optimize for analytics. Part 2 of this series will provide details of a sample implementation. This article is the first of a 2-part series on how to deploy GoldenGate for populating Amazon Redshift and it will focus on a high level solution overview. It would be a perfect fit to combine GoldenGate and Redshift into a real-time data warehousing platform. Amazon Redshift is a cloud based petabyte scale for data warehouse and it needs a lot of data loads for business intelligence and operational reports. Organizations deploy GoldenGate to replicate data between heterogeneous databases for both on-premise and public cloud environments using change-data-capture (CDC) style of data integration. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |