RESOURCES / Articles

Datorama Data Lakes:
Flexible Granularity at Scale

By Ashish Nelson on January 17, 2022, at 12:25 PM EST

This featured article covers Datorama Data Lake, which manages detailed, granular data. Similar to Media Cost Center platform, Data Lake is considered a premium SKU. See Your Salesforce Account Executive for pricing details.

With the release of its new feature called Data Lakes, Datorama users can now onboard billions of rows and potentially store terabytes of information. These could include Keyword, Event Log, and Geo Level Data, in their rawest and most granular forms. Long-term, licensed Datorama users of the platform will attest that charging customers by data row usage elicits an immediate eye-roll and collective groan. This is particularly true for media holding companies, search agencies, and other enterprises that onboard hundreds if not thousands of accounts. Analytical Marketers can find clever ways of reducing the level of granularity by flattening onboarded data or they can consider adding this premium feature, Salesforce Datorama Data Lakes.

Here’s a common challenge, let’s say you’d like to onboard Keyword level data coming from Google Ads in its raw form. Your client would like to see the Top 20 Performing Keywords based on Impressions and Clicks by Day. You could onboard this data into Datorama today, but you would probably see a significant impact to your data row consumption levels. Fortunately, through Data Lakes, Datorama simplifies this entire workflow, controlling data usage and the overall cost of platform ownership.

Unlike other Data Stream types, Usage and Pricing don’t go by the row count but, rather, in terms of Terabytes of storage space. Data Streams created under a certain Workspace create correlating tables of data, which together make up a single Lake that pertains to that specific Workspace. See our Datorama Media Agency case study.

A user can create Data Lakes Data Streams type using the following options:

  • Drag & Drop: Upload a file stored in your local machine which serves as the initial file with which data is previewed and mapped.
  • Technical Vendor: Allows users to pull data from different data storage services, such as Drop Box and Amazon S3, for example.
  • Marketing Vendor API Connection: Offers a list of available platforms from which to retrieve data via an API connection. Marketing Vendors retrieve data from designated platforms, such as Google or Facebook ads. These API connectors differ from normal non-lake API connectors in that they are big data compatible. This means that they’re capable of retrieving high volume granular data when possible.

Data Lakes Marketing Vendor API Connection

So, to pull the Top 20 Performing Keyword by Impressions and Clicks daily, we can use Google Ads as the Marketing Vendor and pull Keyword level data using Data Lake: Data Stream.

The flow diagram below explains the approach of ranking Top 20 Performing Keywords on daily basis in Data Lake and using its output as a data feed for a TotalConnect stream (Search Keyword Data Model).

Data Lakes Flow Chart

Retrieving Data from Data Lakes ensures that the row consumption is at a bare minimum. This is because the user can now restrict the Platform to narrow down only Top 20 performing keywords based on a ranking condition. And this can be done without losing the Keyword Level data in its entirety because the keyword data in its raw form is always intact in Data Lake and retrieved on a scheduled basis. Please note: All data stored in the data lake is fully compliant with GDPR requirements and regulations. See your Salesforce Account Executive for further details.

Data Lake FAQ

Can Data lakes be shared with other MCI Workspaces within the same MCI account?

Yes, the ‘Sharing management’ option in Data Lake allows you to share the data lakes or parts of it like tables or views with other workspaces in the same account.

What are the possible ways of creating Data lake queries in MCI?

Queries can be created using SQL or MCI’s inbuilt Query builder

Is there a row limit while previewing the output of a query in MCI UI?

Yes, the UI shows first 5000 rows of the output as a preview to a user. If you wish to download a quick export, it’s limited to 25000 rows.

Can an output of a Data Lake query be stored as a virtual table or a view?

Yes, MCI does give you the option to save a query as a ‘Logical View’. This saves the entire query so you don’t have to write it again from scratch.

About Decision Foundry

Decision Foundry is a Salesforce, independent software vendor, managed services provider, and a certified award-winning Salesforce Marketing Cloud integration partner. Decision Foundry closes the gap between data accessibility, platform adoption and business impact. Our consulting services include the integration of Data Cloud, Account, Engagement, Personalization, Tableau, and Intelligence.

Contact us today to see what we can do for your Marketing Cloud Intelligence investment using Datorama Data Lakes.




Connectors Guides