Introduction
The data element can be subdivided into four components to help us categorise the issues we must address and information we must find to fully understand the data we are migrating.
- Quality
- Location
- Volume
- Security
Quality
We must be sure that the migration process does not cause the quality of the data to deteriorate. To assess this we must understand the quality of the source data. This can be done by working with the client to determine the key metrics for the data quality. it is always necessary to involve the data owner in this exercise because we should ensure the metrics align with their goals and priorities for the data.
Quality Metrics
- Total count of records.
- Total count of contact records.
- Count of Contact records missing a last name value.
- Count of Contact records missing an email address.
- Percentage of Contacts missing a last name value.
- Percentage of Contacts missing an email address.
- Sum of all Opportunity amounts.
- Sum of opportunity amount by Owner.
- Sum of Opportunity amount by Account.
Do not try to clean or improve the data quality during the migration
One of our goals should be to make it as easy as possible to prove that the migration has completed correctly.
Location
The answer to this question will depend on,
- The tools we have chosen to use to move the data
- The target system
- The security requirements of the migration
There are some guidelines we can employ to inform our decision.
- Treat production PII data as if it were toxic waste.
- Limit the places you store it.
- Limit who can access it.
- Securely dispose of it when it is no longer needed.
If the client already uses databases on premise ask the client to spin up a database for you in their infrastructure. This has the benefit of keeping the data within the client's boundary of control.
Using a client owned on premise filestore for staging the migration data in flat files keeps the data within the clients boundary of control.
If the client does not have any existing databases or filestores the data could be staged in the cloud. The client will need to be comfortable with this approach and it may be a tougher sell than the on premise option. The considerations for a cloud staging database are,
- Does the client already use a cloud platform that can be reused?
- Does the client need to purchase access to a new platform?
Volume
The list below shows the appropriate choices for increasing volumes of data.
Up to Hundreds of Records
Manual migration by direct entry of data through the UI or use of the UI import wizardsThousands of Records
UI import wizards if the data has only a few links to other objects and requires no transformation. Dataloader.io or Salesforce Dataloader if the data has many links to other objects mapped via external IDs. A manually managed process could be acceptable but an automated one would be preferred.Tens of Thousands of Records
Salesforce Dataloader (Dataloader.io may require a fee at this volume level), Talend, Pentaho, an automatic and repeatable process is required to make this manageable.Hundreds of Thousands or Millions of Records upwards
Talend, Pentaho, Mulesoft, using parallel threads where possible, an automatic repeatable process is a necessity.Target and Source system capacity and limitations
It will also be necessary to validate that the target system can accommodate the expected data volume. With knowledge of the volume and chosen process technology we must then test the speed of the migration to determine the elapsed time that it will take so we can plan the migration. We need to determine whether the volume of records can be loaded within our migration window. If we predict that it will not fit we may then change our process from a single migration to a staged migration across a number of days or weeks.Security
To address this we need to understand the security classification of the data and be clear about who will have access to the data as it is migrated. Does the data contain personally identifiable information? Does the data contain sensitive information? What would be the impact of a loss of the data during the migration?
For example if the migration process is developed by a third party or consultancy, it may be prudent that they should not have access to the production data for the migration and so the process would be handed over to an internal staff member to execute.
We must also consider the legal obligations placed on us by the relevant legislation e.g. GDPR and the determination of who is the data processor for the migration.

