Hi there, let us talk about Data Lake Objects in this post of Data Cloud Terminologies. Before that, let's understand what is Data Lake. Data Lake is a centralized repository that stores, processes, and secures large amounts of data in its original format. Data lakes can store data from any source, in any type or volume, and in any format, including structured, semi-structured, and unstructured. These data lakes are designed to handle massive volumes of raw, structured, and unstructured data at scale.
Data Lake Object (DLO)
A Data Lake Object is a virtual representation of data stored in an external data lake. DLO allows Salesforce Data Cloud to access, process, and utilize raw data without the need for duplicating or moving it into Data Cloud database. This approach ensures scalability, flexibility, and efficiency when handling massive datasets.
Data Lake Objects serve as a reference point to external data sources. They store metadata for the external data and also hold mapping information.
The DLO organizes data into a structured format that Salesforce Data Cloud can understand and work with, even if the data comes from diverse or unstructured sources. The DLO adds context to the raw data by aligning it with Data Cloud schema and make it usable for workflows, analytics, and customer insights.
There are various benefits of Data Lake Objects
- No Data Movement: Data remains in the external data lake, reducing latency and costs.
- Compatibility: Supports multiple data lakes (e.g., AWS S3, Snowflake).
- Integration with Einstein: Data from DLOs can be used with Einstein AI for advanced predictive analytics.
How Is Data Unified Without Movement?
Data Cloud uses the stored metadata and virtual references to unify and operate on the data:
a. Virtual Schema Mapping:
- When you create a Data Lake Object (DLO), Salesforce maps the schema of the external data source to a format compatible with Salesforce's Customer 360 Data Model.
- The mapping creates a virtual "view" of the external data without importing it.
b. On-Demand Data Retrieval:
- During operations (e.g., queries, transformations, analytics), Salesforce uses the metadata to generate precise queries against the external data source.
- Only the required data (fields, rows) is fetched and processed, often in chunks.
c. Data Unification at Runtime:
- Salesforce applies unification rules (like deduplication or matching) in real-time or near real-time.
- Unified profiles are created dynamically using both internal and external data, based on the metadata definitions.
How Are Relationships Managed Without Data Movement?
Relationships between data from external sources and internal Salesforce objects are established using identity resolution techniques:
Keys and Identifiers:
Metadata contains the mapping of unique identifiers (e.g., customer IDs, email addresses) between external and internal data.Match and Merge Rules:
Rules define how to match records (e.g., match external "Customer_ID" with Salesforce "Contact_ID") and merge them into a unified profile.Graph Data Models:
Data Cloud uses graph-based data models to store relationships virtually. This allows it to dynamically query related data without physically storing it together.
Can Salesforce Objects Be Used as Data Lakes?
While Salesforce objects store data within Salesforce, they are not designed to function as external data lakes. However, Salesforce Data Cloud provides mechanisms to unify data from Salesforce objects alongside external data lake objects:
Salesforce Objects as Internal Data:
Data Cloud natively supports Salesforce objects. You don’t need DLOs to access data from Salesforce objects because this data already resides within Salesforce’s ecosystem.Connecting Salesforce Data to DLOs:
If you have Salesforce object data exported into an external storage (e.g., an Amazon S3 bucket), you could treat that exported data as a data lake. In this case:- Salesforce object data becomes external data after export.
- You can then create DLOs to connect to the external data.
What If You Want to Use Salesforce Object Data with Data Cloud?
Salesforce objects are integrated directly with Data Cloud as native sources via the Customer 360 Data Model:
- Salesforce Data Cloud can ingest and process data from Salesforce objects without needing a DLO.
- These objects are treated as internal data sources and are unified with external data (via DLOs) during identity resolution, deduplication, or profile unification.
Summary
- Data Lake Objects act as a link between Salesforce Data Cloud and external data sources like Amazon S3 or Snowflake.
- They don’t store the actual data but hold the structure and mapping details of the external data.
- This allows Salesforce to access and use the data without moving it into Salesforce.
- They help unify external data with internal Salesforce data for insights and analytics.
- Think of them as a "blueprint" that tells Salesforce where the data is and how to use it.
No comments :
Post a Comment
Hi there, comments on this site are moderated, you might need to wait until your comment is published. Spam and promotions will be deleted. Sorry for the inconvenience but we have moderated the comments for the safety of this website users. If you have any concern, or if you are not able to comment for some reason, email us at rahul@forcetrails.com