Real-World Example: Retail Data Product
- Hub_Customer: Stores unique ‘Customer_ID’
- Hub_Product: Stores ‘Product_ID’
- Link_Purchase: Joins Customer and Product hubs via ‘Transaction_ID’
- Satellite_Customer_Info: Contains name, address, and email
- Satellite_Purchase_Detail: Stores time, quality, and payment method
Now let’s do a more nuanced example: Integrating marketing and sales data
To illustrate how Data Vault works in practice, let’s walk through a simplified real-world scenario. Imagine a company wants to integrate data from multiple source systems - for example, a sales database and a marketing platform - to build a unified view of customer activity. The sales system contains transactions (who bought what, when, for how much), and the marketing system contains information on campaigns and customer outreach (who was targeted, responses, etc.) In a traditional model, combining these would either involve creating a complex single schema or building pipelines to a common data lake and then cleaning and joining data for each use case. With Data Vault, we can seamlessly bring these together while preserving all the history.
First, we identify the key business entities (the business keys) in this domain. Obvious ones are Customer, Product, Campaign, and Sale. In Data Vault, each of these becomes a Hub. We create a Customer Hub storing unique customer IDs, a Product Hub for product IDs, a Campaign Hub for campaign IDs, and a Sales Hub (if we treat each sale as an identifiable business object - alternatively, Sale could be modeled as a link between Customer and Product. Each hub will contain the list of all unique IDs from any source that references that entity. For instance, the Customer Hub will aggregate customer identifiers coming from both the sales system and the marketing system (even if one system calls it “client” and the other “user” they both map to the unified Customer hub).
Next, we model the relationships. We might have a Customer-Campaign link representing that a customer was targeted by or responded to a marketing campaign.
The satellites provide the detail from each source. For example, the sales source system might generate a sales satellite connected to a Sales hub that contains facts like quantity, price, sale date, payment method, for each transaction. There might also be a Customer satellite that provides customer info from the sales source system (e.g. customer address), and a product satellite for product attributes (name, category from the sales source system catalog). The marketing source system would also contribute its own satellites: e.g. a Customer Satellite (Marketing) that stores customer profile info or engagement scores from the marketing system, and a Campaign Satellite that stores campaign details (campaign name, start/end date, channel, etc.) attached to the Campaign Hub. It could also have a link satellite if needed (for example, a Customer–Campaign Link Satellite capturing details of each marketing touch like the response or click-through information). Each satellite is loaded from its respective source system, with a timestamp so we retain the history of changes. For instance, if a customer’s address changes in the sales system, a new row is added in the Sales-Customer Satellite with the new address and the date of change, while the Customer Hub remains the same.
With this Data Vault model, the company can now answer questions like “Which marketing campaigns did each customer see before they purchased product X?” or “What is the full history of interactions (marketing and sales) for customer xyz over time?” The beauty is that if tomorrow a new source comes long - such as a customer support system with tickets - we can integrate it easily. We may add a Support Ticket Hub and a Customer-Ticket Link, as well as satellites for ticket details and any customer info from support. This won’t break the existing model, it just extends it. The marketing and sales data vault we built remains intact and continues to operate. New queries can now leverage the support data too, by connecting through the shared Customer hub. The Data Vault ensures all data is connected via common business keys (customers, products, etc), but each source feeds its own satellites, so we maintain the lineage and original; fidelity of each source. The result is an enterprise data warehouse that is always in sync with the latest from all source systems, historically accurate, and ready to feed any number of downstream data products.
The business now has a 360-degree view of customers across sales and marketing, built on a future-proof data warehouse design.