How to Enable Snowflake Data Shares from Databricks (AWS RDS Source)
Practical architectures to expose Databricks-managed data to Snowflake consumers when your system of record is AWS RDS—trade-offs, onboarding complexity, and a simpler alternative.
How to Enable Snowflake Data Shares from Databricks (AWS RDS Source)
Many providers store operational data in Amazon RDS and use Databricks for transformation, governance, and ML. Yet many of their clients consume data in Snowflake and prefer native Snowflake Data Shares. This guide explains how to publish Databricks-governed data to Snowflake consumers and compares multiple approaches in terms of engineering effort, cost, and reliability.
Context and Goals
- Source system: AWS RDS (e.g., PostgreSQL/MySQL)
- Primary platform: Databricks (Delta Lake + Unity Catalog)
- Consumer preference: Snowflake data sharing (zero-copy experience inside Snowflake)
- Goal: Share governed, curated datasets with Snowflake consumers while minimizing data drift, latency, and operational load.
Baseline Architecture
- Ingest from AWS RDS into a bronze layer (incremental/CDC).
- Transform into curated Delta tables (silver/gold) in Databricks.
- Publish governed views/objects to consumers using Unity Catalog.
The question is: how do we expose step 3 to Snowflake consumers in a way that is reliable and economical?
Option 1: Replicate Curated Data into Snowflake, Then Share Natively
Create canonical datasets in Snowflake (copies of your curated Delta tables), then use Snowflake Data Sharing to expose them to consumers.
Implementation Paths
- AWS DMS → S3 → Snowpipe: Use DMS for CDC from RDS or curated exports, land files in S3, ingest with Snowpipe/Tasks into Snowflake.
- Databricks → JDBC/ODBC Writer: Databricks jobs write curated tables directly to Snowflake using connectors.
- ETL Vendor (e.g., Fivetran/Matillion): Managed replication from Databricks outputs or RDS into Snowflake.
Pros
- Native Snowflake sharing: Best consumer UX once data is inside Snowflake.
- Predictable governance inside Snowflake: RBAC, masking, secure views, reader accounts.
- Mature tooling: Many teams/partners are familiar with these pipelines.
Cons and Challenges
- Data duplication: Additional storage and egress costs; dual-governance overhead.
- Latency: Batch ingestion and copy steps add minutes to hours.
- Complexity: Jobs, retries, schema drift handling, cost monitoring across two platforms.
- Operational toil: Two sets of lineage/monitoring and incident runbooks.
When to choose
- Strict requirement for fully native Snowflake experience and features.
- Consumers run heavy workloads that must stay in Snowflake.
Option 2: Delta Lake UniForm → Iceberg Metadata → Snowflake External Tables
Use Delta Lake UniForm to expose Delta tables through Iceberg-compatible metadata. Snowflake can read Iceberg tables as external tables on S3, then you can create Snowflake Data Shares over those external tables.
Implementation Steps
- Enable UniForm on curated Delta tables in Unity Catalog.
- Store UniForm/Iceberg metadata alongside Delta data on S3.
- In Snowflake, create External Volume/Stage and define External Tables over the Iceberg metadata.
- Build secure views and share via Snowflake Data Sharing.
Pros
- Near zero-copy: Avoids full duplication into Snowflake tables.
- One storage of truth: Single S3 data lake backing both Databricks and Snowflake.
- Good consumer UX: Data is queryable in Snowflake; can still use shares.
Cons and Challenges
- Advanced setup: UniForm/Iceberg details, permissions, and metadata lifecycle.
- Consistency windows: Cross-engine metadata refresh and partition discovery delays.
- Performance tuning: Requires file sizing/compaction best practices.
- Feature mismatch: Some Snowflake features behave differently with external tables vs native tables.
When to choose
- You want to minimize data duplication while still serving Snowflake consumers.
- Your team can own Iceberg/Delta UniForm operational complexity.
Option 3: S3 Parquet Snapshots → Snowflake External Tables → Shares
Publish periodic Parquet snapshots (or CDC appends) from Databricks to S3 locations that Snowflake maps as External Tables. Then create secure views and share them.
Implementation Steps
- Databricks jobs write partitioned Parquet to S3 (append or snapshot).
- In Snowflake, create External Stage and External Tables on those paths.
- Use Tasks/Streams to maintain merged views and consumer-friendly schemas.
- Share secure views via Snowflake Data Sharing.
Pros
- Simple, low-cost: No continuous replication tool required.
- Clear control: You choose snapshot cadence and retention.
- Good enough for many analytics consumers.
Cons and Challenges
- Eventual consistency: Metadata refresh timing and late files.
- Schema drift: Requires contracts and automated evolution routines.
- Governance duplication: Masking/filters in both Databricks and Snowflake layers.
When to choose
- Consumers accept batch cadences (e.g., hourly/daily).
- You prefer explicit, file-based interfaces with low vendor lock-in.
Snowflake Onboarding Complexity to Expect
Regardless of the option, plan for the following Snowflake-specific onboarding steps and risks:
- Organization/Region alignment: Cross-region sharing requires replication and may add costs and lead time.
- Reader accounts: Useful for consumers without Snowflake, but introduce extra admin, limits, and cost considerations.
- Network topology: PrivateLink, VPC routing, and allowlists must be designed for both provider and consumer.
- RBAC mapping: Translate Unity Catalog permissions to Snowflake roles, secure views, and masking policies.
- Cost governance: Who pays for compute? Quotas/warehouses for reader accounts vs consumer-owned accounts.
- Schema evolution: Contract changes must propagate across engines without breaking consumers.
- Observability: End-to-end lineage, delivery SLOs, failed refreshes, and anomaly detection across two platforms.
Decision Guide (Quick Heuristics)
- Need native Snowflake features/performance → Choose Option 1.
- Minimize duplication; accept cross-engine ops → Choose Option 2.
- Batch sharing with low cost/complexity → Choose Option 3.
If uncertain, pilot Option 3 to validate consumer needs, then graduate to Option 1 or 2 for scale/performance.
A Simpler Path with DataPorto
Engineering teams often spend weeks stitching together replication jobs, external tables, governance policies, and onboarding runbooks. DataPorto removes this overhead:
- One-click Snowflake share provisioning: Automatically creates databases, schemas, secure views, masking, and share objects (or reader accounts) based on catalog policies.
- No-copy or replicated delivery: Choose UniForm/Iceberg external tables or managed replication without writing glue code.
- Contract-aware schema evolution: Safe alter/add with consumer impact analysis and rollout plans.
- End-to-end monitoring: Health, freshness, costs, and consumer usage across Databricks and Snowflake.
- Network automation: PrivateLink, stages, and cross-account permissions handled for you.
Result: faster onboarding, lower engineering cost, and higher reliability—while your clients enjoy a native Snowflake experience.
Want to see it in action? Connect with us and we'll configure a Snowflake data share from your Databricks-curated datasets in minutes.