Every organization past a certain scale gets pitched the same dream: a unified data warehouse where all truth lives. One system. One schema. One place to govern, one place to query, one place to get the answer. The pitch makes sense in theory. The reality is different. The more unified the warehouse, the more cargo-cultist the governance becomes. Rules pile up. Query performance degrades. Nobody trusts the data because nobody understands where it came from.
We have been inside thirty-odd organizations at this scale in the past two years. The pattern holds. The teams winning at data are not the ones with the most unified warehouse. They are the ones with clear contracts between domains, with lineage they can actually trace, and with semantic layers that sit on top of messy reality instead of pretending the mess does not exist.
The unified warehouse premise rests on a false constraint: the assumption that you can make everyone agree on a single schema. That works until your organization grows past seventy people. After that, you have domain experts in different systems who have different incentives. Sales calls customer a different thing than Finance does. Engineering measures something differently than Operations. The schema that makes everyone happy makes nobody happy.
The hidden cost is process. As the warehouse grows, governance gets stricter to keep it from becoming a disaster. Data owners have to review every change. Lineage becomes a checklist instead of an automated byproduct. Pipeline changes have to go through review boards. What started as a liberating single source of truth becomes a gating function.
"A single warehouse is not a technical problem you solve. It is an organizational problem you manage away."Data Practice
The result is people working around it. Finance keeps its own data mart. Sales has analytics in a different platform. Engineering runs its own dashboards on raw data. The unified warehouse becomes an artifact of governance, not a source of truth.
The pattern that works at scale is different. Keep the data close to the teams that own it. Give each domain clear responsibility for its own tables and schemas. Put a semantic layer on top that translates between the different representations. Use contracts and lineage tooling to make the relationships explicit.
This sounds more complex. It is more complex in one dimension, the dimension of distributed coordination. It is simpler in every other dimension. Query performance is better because you are not running against one massive warehouse. Governance is faster because you do not need a review board to change someone else's schema. Trust is higher because lineage is actually visible.
Domain ownership. Each team owns the truth for its domain. Finance owns the general ledger and the chart of accounts. Sales owns the customer master and the opportunity stage definitions. Engineering owns the deployment manifest and the service registry. Nobody else modifies these tables without explicit approval from the domain owner. This is not a request, it is a contract.
Semantic translation. When Sales needs to join customer data with Finance data, a semantic layer translates between the two representations. The sales customer has different attributes than the finance customer. The semantic layer makes the mapping explicit. dbt models, or a proper semantic layer tool like Looker or similar, do this translation. The cost is maintenance. The benefit is decoupling. Finance can change its schema without breaking Sales.
Lineage as infrastructure. Every transformation logs where it came from and where it went. This is not audit logging. This is operational data lineage that your team actually uses to debug issues. When a number is wrong, you follow the lineage back to the source. If lineage is not automated, nobody maintains it. Make it automatic.
Most organizations we work with end up with a three-tier structure. Not because it is elegant, but because it is what works operationally.
Governance becomes faster. A change to the Sales mart affects only the Sales team and their consumers. It does not trigger a review of the entire warehouse schema.
Scaling becomes easier. When you hire a new team, you onboard their data. You do not redesign the warehouse. You add a new domain mart with its own ownership and schemas.
Performance stays consistent. You are not running against one growing monolith. Each domain owns its own infrastructure. Teams can scale their parts independently.
Quality control is distributed. Each domain is accountable for the quality of its data. The central team handles the contracts and the semantic layer, not the data itself.
Federated approaches feel less tidy than unified ones. In the conversation with leadership, you need to be clear about what you are trading. A unified warehouse looks simpler on a whiteboard. A federated approach with clear contracts and semantic layers is simpler in practice, where people actually work. The whiteboard will always lose that fight unless you are explicit about operational reality.
Frame it this way: unified architecture optimizes for simplicity of governance. Federated architecture optimizes for speed of delivery. In most organizations, speed of delivery wins in the market. Governance catches up later.
The single source of truth is a fine goal. The single warehouse is not the way to get there. Teams that are winning are using contracts, semantic layers, and distributed data ownership. It requires discipline. It requires good tooling. But it scales.
Bring the problem. We'll come back with a written brief: what to build, what to defer, and where AI actually moves the number. No deck pitches.