Real Time Data Warehouse ETL (Part 1)

The cheapest and easiest way to solve the real-time ETL problem is to not even attempt it in the first place, but we live in the real world and businesses are saying that:

·      Viewing yesterday’s data is no longer sufficient.

·      They need a faster reaction time to respond to threats and opportunities.

To meet client demand for real time data, I needed to look into modifying my extracting, transforming and loading (ETL) process.  One of the most difficult parts of building any data warehouse is the process ETL of data from the source system.  Performing ETL of data in real-time introduces additional challenges.  Almost all ETL tools and systems, whether based on off-the-shelf products or custom-coded, operate in a batch mode.  They assume that the data becomes available as some sort of extract file on a certain schedule, usually hourly, nightly, weekly, or monthly.  Then the system transforms and cleanses the data and loads it into the data warehouse.  When loading data continuously in real-time, there can’t be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data.  The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems.

One solution is to settle for near-real time updates by increasing the frequency of the data loads. Weekly loads can be replaced by daily loads, for example, or daily loads can be changed to hourly or more frequently.  This is generally a much easier and cheaper proposition than trying to update in real-time.  For one thing it doesn’t require getting a whole new ETL tool set and changing your prospective from a query/pull ETL process to an event driven architecture.

I work in the Microsoft BI Stack so I develop my ETL processes using SQL Server Integration Services (SSIS), but to meet increasing client demand for real time data integration into their Data Warehouse I needed to learn a new technology to me.  Enter SQL Service Broker.  So What is Service Broker?

·      A core service of SQL Server since SQL 2005.

·      Used internally within SQL Server (Database Mail, etc.).

·      Mechanism for the guaranteed asynchronous delivery and processing of messages between databases (event driven process – changes to a database table).

I didn’t count on replacing the ETL/SSIS batch process; more likely just augmenting it with Service Broker.

Pros & Cons of using Service Broker as an ETL tool

Pros Cons
  • Flexible
  • Included in all editions (Express or greater)
  • Fast & efficient delivery
  • Reliable delivery
  • Secure (Windows authentication or certificate-based encryption)
  • Relatively high implementation cost (custom development)
  • Can require some tuning and manipulation to get it to perform well
  • Lacks good GUI tools & wizards
  • Lacks good automated monitoring tools

Service Broker Learning Resources

Unlike searching Google when you need help with an SSIS package error, fix or just to learn something new the resources for learning Service Broker are limited.  I have complied some of the best that I have found.

http://www.databasejournal.com/article.php/1503191/ – Marcin Policht’s journal contains several publications covering Service Broker (some parts of big series about the SQL Server 2005 Express Edition):

1) Part 26 – Introduction to Service Broker: http://www.databasejournal.com/features/mssql/article.php/3757581/

2) Part 27 – Implementing Basic Service Broker Objects: http://www.databasejournal.com/features/mssql/article.php/3759311/

3) Part 28 – Implementing Service Broker Conversation: http://www.databasejournal.com/features/mssql/article.php/3762261/

 

In Part 2 of this series on implementing Service Broker as a real time ETL tool, we will look at the code that I used to integrate Dynamics GP General Ledger data changes into BI360 Data Warehouse.  If you need help kick starting your organization into becoming data driven organization by exploring your Business Intelligence, BI360, Power BI, Microsoft Dynamics GP and CRM needs, please contact KTL Solutions at 301.360.0001 or by email at [email url=”info@ktlsolutions.com” class=””]info@ktlsolutions.com[/email].


BARRY CROWELL, MBA |BI/EDW Solution Architect

Barry uses his expertise to lead and teach a team of GP consultants to assure our solution scoping, architecting, and delivery meet and exceed the customer’s expectations. He helps clients analyze their current business processes and provides them with process improvements. He has performed implementations as the lead consultant and end-user project manager. Barry has over 20 years of experience working in accounting and the Microsoft Dynamics GP industry. He also possesses a bachelor’s degree in accounting and business administration from Black Hills State University and a master’s in business administration from La Salle University.

Share this post

Related Posts