1
2
3
4
5
6
1
›
Why step 1 is the hardest: Every SAP team wants to jump into Datasphere and start connecting tables. The discipline of starting with a business question — not a technical one — is what separates a useful data product from a technically correct dataset nobody uses. The business question is your anchor. Everything else follows from it.
Common mistake: Defining the entity as a table name ("Sales Invoice Data") rather than a business outcome ("Revenue performance by product and account"). The entity name should mean something to someone in finance, sales or clinical ops — not just a data engineer.
Write it as a question. Who needs to make what decision, using what data?
- Commercial: Which accounts are buying across multiple product categories vs. single-category, and what's the revenue gap if we cross-sold?
- Clinical education: Which HCPs have received training in the last 12 months but show zero or declining product usage — and why?
- Supply chain: What is our instrument loan set utilisation rate by hospital and specialty, and which sets haven't moved in 90+ days?
- Finance: How does our actual vs. plan revenue compare by product family, sales rep and quarter — with prior year as context?
- Regulatory: Which products are approaching end of shelf life across all consignment locations, and what is the write-off exposure?
Consumers and stakeholders
This person validates what "correct" looks like when you test the model. Without them, you're guessing.
Scope and granularity
Being clear about exclusions prevents scope creep and helps you ship something useful quickly.
Define your success criteria before you build — otherwise "done" never arrives.
2
›
What spaces actually are: A Space is a logical container in Datasphere — it controls who can access data, how much compute is available for queries, and where the governance boundary sits. Think of it as a department's data workspace. Finance runs their space. Commercial runs theirs. They can share data products across spaces via the Data Marketplace, but each team governs their own house.
Common mistake: Creating one giant space for everything, or creating a new space per project. One space per business domain is the right unit — broad enough to share, narrow enough to govern. Start with one space for your first data product. You can always create more once you understand the patterns.
Domain assignment
Access and permissions
- Data builders Can create views, models, data flows. Usually the data team or a dedicated analyst.
- Data integrators Can manage connections and replication flows. Usually IT or a Basis/integration team member.
- Viewers / consumers Can read data and consume published data products. Usually business users accessing via SAC or other tools.
- Space administrator Manages users, roles and space configuration. Should be a named individual, not a shared account.
Data sensitivity and compliance
- Contains HCP (healthcare professional) personal information Names, DEA/AHPRA numbers, practice addresses — privacy obligations apply
- Contains patient-level data Even de-identified, this may trigger TGA or ethical review requirements
- Contains commercially sensitive pricing or contract terms Restrict to finance + senior commercial only
- Subject to hospital or GPO data sharing agreements Check whether hospital usage data can be stored in a cloud system
- Needs to be accessible to external parties (distributors, partners) Requires additional Data Marketplace governance controls
- Subject to SOX, audit or regulatory reporting requirements Data lineage and change history will be important
Resource allocation
3
›
Live vs. replication — the key decision: A live connection queries the source system in real time — great for current data, but it puts load on your SAP system and can't combine data from multiple sources easily. Replication copies data into Datasphere on a schedule — better for complex transformations, multi-source joins, and production workloads. Most medtech organisations start with replication for reliability, then layer in live connections where real-time matters.
Common mistake: Pulling raw tables directly without using available semantic extractions. For S/4HANA, CDS (Core Data Services) views are pre-built, semantically labelled extractions — they save weeks of data modelling work. Always check what CDS views exist for your use case before building joins from scratch.
Primary SAP source
Non-SAP sources
- Salesforce / CRM Opportunity pipeline, account contacts, activity history
- Clinical LMS or training platform Course completions, certification status, HCP training records
- Hospital usage / implant data EDI feeds or manual uploads from hospital procurement systems
- Excel / manual spreadsheets Territory plans, budget files, commission calculators — these need a migration path
- Warehouse / 3PL / inventory system Consignment stock, loan sets, expiry tracking
- Tender / contract management system GPO contracts, hospital pricing agreements
- Market data / MTAA / competitive data External reference data for share-of-market analysis
SAP table names (VBRK, KNA1, MARA...) or plain descriptions — whatever you know at this stage
Data freshness and quality
- Customer/account master duplicates Multiple records for the same hospital or account
- Product hierarchy not maintained or inconsistently used Affects any product-level reporting
- Hospital usage data arrives with a lag EDI feeds typically 1–4 weeks behind actual procedure date
- Sales rep / territory assignments not current Rep changes not reflected in historical transactions
- Currency / pricing inconsistencies across company codes
- Manual data entry errors in order or billing documents
- Missing or incomplete cost centre assignments
4
›
Semantic labelling is what makes a data model a data product: You can have perfect joins and clean data, but if the columns are still named KUNNR and NETWR, business users can't self-serve and SAP Analytics Cloud can't auto-interpret the model. Renaming technical fields, marking measures and dimensions, and setting semantic usage — these are the steps that separate an IT asset from a business asset.
Common mistake: Setting semantic usage as "Relational Dataset" (the default) when you should be using "Analytical Dataset" or "Fact." The semantic usage tells SAC how to interpret the model — get it wrong and measures won't aggregate correctly and dimensions won't filter properly.
Model type
List main table + join tables. Note the join key for each — this is where data quality issues surface first.
Measures — what you'll count, sum or calculate
- Net revenue / net sales value NETWR — apply currency conversion if multi-currency
- Gross revenue / gross billing value
- Gross margin / margin % Requires cost data — confirm COGS source before committing to this
- Units sold / quantity billed FKIMG — check unit of measure consistency across products
- Procedure / case count If you have procedure-level data — not available from billing alone
- Loan set utilisation rate Requires instrument movement data — typically not in standard SAP billing
- Clinical education hours / event count
- Days on hand / stock coverage
- Prior year / budget variance Requires a second data source for plan/budget data
Document business rules that aren't standard SAP fields — these need explicit logic in the view.
Dimensions — how you'll slice the data
- Customer / hospital / account Include customer group and account tier if maintained
- Product / material Include product hierarchy (brand, category, subcategory)
- Surgical specialty May need to derive from product hierarchy or custom field
- Sales representative Align to current territory assignment, not historical billing rep
- Sales organisation / distribution channel
- Geographic region / state / territory
- Time — fiscal period, calendar month, quarter, year
- Company code / legal entity
- Customer segment / hospital type (public/private)
- Consignment location / warehouse
Naming and documentation
- Renamed all technical column names to business-readable labels KUNNR → Customer Number, NETWR → Net Revenue, VKORG → Sales Org
- Added column descriptions to every measure and calculated field
- Marked all measures and dimensions with correct semantic type
- Validated row counts against a known report or source extract
- Checked for and resolved duplicate rows (usually a join issue)
- Business stakeholder has reviewed a sample and confirmed it looks correct
- Applied relevant filters (e.g. exclude cancelled orders, test customers, intercompany)
Future maintainers (including you, in 6 months) need to understand why transformations were applied.
5
›
This step is what makes a data model a data product: Without publishing to the Data Marketplace, you have a well-built view that only your team knows about. The Marketplace is Datasphere's internal catalogue — it's how other teams discover, understand, and request access to your data. Think of it as an internal app store for governed data. The quality of your description, tags and documentation directly determines how much the data product gets used.
Common mistake: Treating the Data Marketplace as an afterthought. The description you write here is what a future consumer reads before deciding whether to trust your data. A vague description ("Sales data from SAP") gets ignored. A specific description ("Net revenue by account, product hierarchy and sales rep — updated daily, excludes intercompany, covers FY2022 to present") gets used.
Data product identity
Business-readable. Specific. Avoid version numbers in the name.
Write for someone who knows the business but hasn't seen the model. What question does this answer? What's included and excluded?
Metadata and discoverability
- Business domain tag e.g. Commercial, Finance, Supply Chain, Clinical Education
- Geographic scope tag e.g. ANZ, APAC, Global, Australia Only
- Regulatory / compliance tag e.g. SOX-relevant, TGA-sensitive, Contains PII, GDPR
- Source system tag e.g. S/4HANA, Salesforce, BW/4HANA
- Data freshness tag e.g. Real-time, Daily, Monthly
Access controls
SLA and data contract
- Refresh schedule When does data update? Who monitors failures? What's the SLA?
- Data quality owner Named person who investigates data issues raised by consumers
- Breaking change notification How will you notify consumers before schema or logic changes?
- Historical data coverage How far back does the data go? Is there a retention policy?
- Known limitations documented Be explicit about what this data product cannot answer — prevents misuse
Explicitly documenting limitations is a sign of maturity — it builds trust, not doubt.
6
›
A data product only has value when it's consumed: The whole point of building a governed, semantic data product is that any tool — SAC, Databricks, Power BI, Snowflake — can connect to it and inherit the business labels, governance and trust you've built in. You build once. You consume everywhere.
Common mistake: Building a data product and then recreating all the business logic again in SAC or Power BI. If your transformation logic lives in the BI tool, you don't have a data product — you have a report. Logic belongs in the data model. The consumption layer should be thin: visualisation only.
Primary consumption tool
SAP Analytics Cloud — if applicable
- Semantic labels from Datasphere auto-populate in SAC model If not, check that measures and dimensions were correctly set in Step 4
- Hierarchies (product, org, time) display correctly in SAC
- Currency conversion and unit of measure handled consistently
- Row-level security configured if needed e.g. Sales reps should only see their own territory data
- Story / dashboard validated against a known "source of truth" report Get sign-off from business stakeholder before releasing broadly
Third-party and AI consumption
- Databricks — for data science, forecasting or ML on SAP data BDC Connect for Databricks — zero-copy Delta Sharing
- Snowflake — for enterprise analytics or external data sharing BDC Connect for Snowflake — zero-copy access
- Microsoft Fabric — Power BI, Excel, AI Foundry Planned GA Q3 2026 — worth planning for now
- SAP Joule / AI agents Governed, semantic data products are the foundation Joule needs to give reliable answers
- External partners / hospitals Via Data Marketplace with external access controls
Adoption and change management
Named with the people currently maintaining it — they become your adoption allies, not blockers.
How will you move users from the old way to the new data product?
- Business stakeholder has signed off on data accuracy
- Access requests from first consumers have been processed
- Data product is published in the Marketplace with complete documentation
- Refresh schedule is running and monitored
- Feedback channel exists (who do consumers contact if something looks wrong?)
- Version 2 backlog exists — what's in the next iteration?
Your readiness summary
Click the button below to generate a plain-text summary of your inputs across all six steps. You can copy this into an email, a project brief, or share it with your implementation partner as a starting point for scoping a Datasphere engagement.
Want to take this further? Once you have a clear picture of your first data product, the next step is a scoping conversation — typically a 2-hour workshop with your business and IT stakeholders to validate the design and build an implementation roadmap. Bruce Dando and the Emphasys team specialise in Datasphere data product design and delivery for SAP customers in Australia and New Zealand.