|
Overview
Over 10 years Redwood Systems has built up
a well-earned reputation as a market leader in high profile data migration for major
international companies such as ntl:,
CABLE & WIRELESS,
Virgin mobile
and euroBeLL
and worked in partnership alongside market leaders such as IBM,
Capgemini and CSG Systems Inc.
This document is designed as a guide on the main processes of a typical Data Migration but
is not an exhaustive list of Redwood's experience.
Data Migration
This document aims to outline the typical processes involved in a data migration.
Data migration is the moving and transforming of data from legacy to target database systems. This includes one to one and one to many mapping and movement of static and transactional data. Migration also relates to the physical extraction and transmission of data between legacy and target hardware platforms.
ISO 9001 / TickIT accredited
The fundamental aims of certification are quality achievement and improvement and the delivery of customer satisfaction.
The ISO and TickIT Standards are adhered to throughout all stages of the migration process.
Customer Requirements
The first stage is the contact from the customer asking us to tender for a data migration project. The invitation to tender will typically include the Scope / Requirements and Business Rules:
- Legacy and Target - Databases / Hardware / Software
- Timeframes - Start and Finish
- Milestones
- Location
- Data Volumes
Proposal
This is a response to the invitation to tender, which comprises the following:
Migration Strategy
- Migration development models are based on an iterative approach.
- Multiple Legacy / Targets - any migration may transform data from one or more legacy databases to one or more targets
- Scope - Redwood definition / understanding of customer requirements, inclusions and exclusions
- The data may be migrated in several ways, depending on data volumes and timescales:
- All at once (big bang)
- In logical blocks (chunking, e.g. by franchise)
- Pilot - A pre-test or trial run for the purpose of proving the migration process, live applications and business processes before implementing on a larger scale.
- Catch Up - To minimise downtime only business critical data is migrated, leaving historical data to be migrated at a later stage.
- Post Migration / Parallel Runs - Both pre and post migration systems remain active and are compared after a period of time to ensure the new systems are working as expected.
Milestones can include:
- Completion of specifications / mappings
- Successful 1st iteration
- Completion of an agreed number of iterations
- Delivery to User Acceptance Testing team
- Successful Dress Rehearsal
- Go Live
Roles and Responsibilities
Data Migration Project Manager/Team Lead is responsible for:
- Redwood Systems Limited project management
- Change Control
- Solution Design
- Quality
- Reporting
- Issues Management
Data Migration Analyst is responsible for:
- Gap Analysis
- Data Analysis & Mapping
- Data migration program specifications
- Extraction software design
- Exception reporting software design
Data Migration Developers are responsible for:
- Migration
- Integrity
- Reconciliation (note these are independently developed)
- Migration Execution and Control
Testers/Quality Assurance team is responsible for:
- Test approach
- Test scripts
- Test cases
- Integrity software design
- Reconciliation software design
Other Roles:
Operational and Database Administration support for source/target systems.
Parameter Definition and Parameter Translation team
Legacy system Business Analysts
Target system Business Analysts
Data Cleansing Team
Testing Team
Dependencies
Environmental Dependencies
- Connectivity - remote or on-site
- Development and Testing Infrastructure - hardware, software, databases, applications and desktop configuration
Support Dependencies
- Training (legacy & target applications) - particularly for an in-house test team
- Business Analysts -provide expert knowledge on both legacy and target systems
- Operations - Hardware / Software / Database Analysts - facilitate system housekeeping when necessary
- Business Contacts
- User Acceptance Testers - chosen by the business
- Business Support for data cleanse
Data Dependencies
- Translation Tables - translates legacy parameters to target parameters
- Static Data / Parameters / Seed Data (target parameters)
- Business Rules - migration selection criteria (e.g. number of months history)
- Entity Relationship Diagrams / Transfer Dataset / Schemas (legacy & target)
- Sign Off / User Acceptance criteria - within agreed tolerance limits
- Data Dictionary
Project Management
Project Plan
- Milestones and Timescales
- Resources
- Individual Roles and Responsibilities
- Contingency
Communication
It is important to have good communication channels with the project manager and business analysts. Important considerations include the need to agree the location, method and format for regular meetings/contact to discuss progress, resources and communicate any problems or incidents, which may impact the ability of others to perform their duty. These could take the form of weekly conference calls, progress reports or attending on site project meetings.
Change Control
- Scope Change Requests - a stringent change control mechanism needs to be in place to handle any deviations and creeping scope from the original project requirements.
- Version Control - all documents and code shall be version controlled.
Issue Management
- Internal issue management- as a result of Gap analysis, Data Mapping, Iterations Output (i.e. reconciliation and file integrity or as a result of eyeballing)
- External issue management - Load to Target problems and as a result of User Acceptance Testing
- Mechanism - examples:
- Test Director
- Bugzilla
- Excel
- Access
- TracNotes
Analysis
Gap Analysis
Identifying where differences in the functionalities of the target system and legacy system mean that data may be left behind or alternatively generating default data for the new system where nothing comparable exists on legacy.
Liaison with the business is vital in this phase as mission critical data cannot be allowed to be left behind, it is usual to consult with the relevant business process leader or Subject Matter Expert (SME). Often it is the case that this process ends up as a compromise between:
- Pulling the necessary data out of the legacy system to meet the new systems functionality
- Pushing certain data into the new system from legacy to enable the continuity of certain ad hoc or custom in-house processes to continue.
Data mapping
This is the process of mapping data from the legacy to target database schemas taking into account any reformatting needed. This would normally include the derivation of translation tables used to transform parametric data. It may be the case at this point that the seed data, or static data, for the new system needs generating and here again tight integration and consultation with the business is a must.
Translation Tables
Mapping Legacy Parameters to Target Parameters
Specifications
These designs are produced to enable the developer to create the Extract, Transform and Load (ETL) modules. The output from the gap analysis and data mapping are used to drive the design process. Any constraints imposed by platforms, operating systems, programming languages, timescales etc should be referenced at this stage, as should any dependencies that this module will have on other such modules in the migration as a whole; failure to do this may result in the specifications being flawed.
There are generally two forms of migration specification:
Functional (e.g. Premise migration strategy)
Detailed Design (e.g. Premise data mapping document)
Built into the migration process at the specification level are steps to reconcile the migrated data at predetermined points during the migration. These checks verify that no data has been lost or gained during each step of an iteration and enable any anomalies to be spotted early and their cause ascertained with minimal loss of time.
Usually written independently from the migration, the specifications for the reconciliation programs used to validate the end-to-end migration process are designed once the target data has been mapped and is more or less static. These routines count like-for-like entities on the legacy system and target system and ensure that the correct volumes of data from legacy have migrated successfully to the target and thus build business confidence.
Development
Extracts / Loads
- Depending on the migration strategy, extract routines shall be written to derive the legacy data required
- Transfer data from Legacy and/or Target to interim migration environment via FTP, Tape, CSV, D/B object copy, ODBC, API
- Transfer data from interim migration environment to target
Migration (transform)
There are a number of potential approaches to a Data Migration:
- Use a middleware tool (e.g. ETI, Powermart). This extracts data from the legacy system, manipulates it and pushes it to the target system. These "4th Generation" approaches are less flexible and often less efficient than bespoke coding, resulting in longer migrations and less control over the data migrated.
- The Data Migration processes are individually coded to be run on a source, an interim or target platform. The data is extracted from the legacy platform to the interim / target platform, where the code is used to manipulate the legacy data into the target system format. The great advantage of this approach is that it can encompass any migration manipulation that may be required in the most efficient, effective way and retain the utmost control. Where there is critical / sensitive data migrated this approach is desirable.
- Use a target system 'File Load Utility', if one exists. This usually requires the use of one of the above processes to populate a pre-defined Target Database. A load and validate facility will then push valid data to the target system.
- Use an application's data conversion/upgrade facility, where available.
Reconciliation
Independent end-to-end comparisons of data content to create the necessary level of business confidence
- Bespoke code is written to extract required total figures for each of the areas from the legacy, interim and target databases. These figures will be totalled and broken down into business areas and segments that are of relevant interest, so that they can be compared to each other. Where differences do occur, investigation will then instruct us to alter the migration code or if there are reasonable mitigating factors.
- Spreadsheets are created to report figures to all levels of management to verify that the process is working and build confidence in the process.
Referential File Integrities
Depending on the constraints of the interim/target database, data may be checked to ascertain and validate its quality. There may be certain categories of dirty data that should be disallowed e.g. duplicate data, null values, data that does not match to a parameter table or an incompatible combination of data in separate fields as proscribed by the analyst. Scripts are written that run automatically after each iteration of the migration. A report is then generated to itemise the non-compatible data.
Iterations
These are the execution of the migration process, which may or may not include new cuts of legacy data.
These facilitate:
- Collation of migration process timings (extraction, transmission, transformation and load).
- The refinement of the migration code i.e. increase data volume and decrease exceptions through:
- Continual identification of data cleanse issues
- Confirmation of parameter settings and parameter translations
- Identification of any migration merge issues
- Reconciliation
From our experience the majority of the data will conform to the migration rules and as such take a minimal effort to migrate ("80/20 rule"). The remaining data, however, is often highly complex with many anomalies and deviations and so will take up the majority of the development time.
Data Cuts
- Extracts of data taken from the legacy and target systems. This can be a complex task where the migration is from multiple legacy systems and it is important that the data is synchronised across all systems at the time the cuts are taken (e.g. end of day processes complete).
- Subsets / selective cuts - Depending upon business rules and migration strategy the extracted data may need to be split before transfer.
Freeze
Prior to any iteration, Parameters, Translation Tables and Code should be frozen to provide a stable platform for the iteration.
Quality Assurance
Reconciliation
- Horizontal reconciliation (number on legacy = number on interim = number on target) and Vertical reconciliation (categorisation counts (i.e. Address counts by region = total addresses) and across systems).
- Figures at all stages (legacy, interim, target) to provide checkpoints.
File Integrities
Scripts that identify and report the following for each table:
- Referential Integrity - check values against target master and parameter files.
- Data Constraints
- Duplicate Data
Translation Table Validation
Run after new cut of data or new version of translation tables, two stages:
- Verifies that all legacy data accounted for in "From" translation
- Verifies that all "To" translations exist in target parameter data
Eyeballing
Comparison of legacy and target applications
- Scenario Testing -Legacy to target system verification that data has been migrated correctly for certain customers chosen by the business who's circumstances fall into categories (e.g. inclusion and exclusion Business Rule categories, data volumes etc.)
- Regression Testing - testing known problem areas
- Spot Testing - a random spot check on migrated data
- Independent Team - the eyeballing is generally carried out by a dedicated testing team rather than the migration team
UAT
This is the customer based User Acceptance Test of the migrated data which will form part of the Customer Signoff
Data Cleanse
This activity is required to ensure that legacy system data conforms to the rules of data migration. The activities include manual or automatic updates to legacy data. This is an ongoing activity, as while the legacy systems are active there is the potential to reintroduce data cleanse issues.
Identified by
- Data Mapping
- Eyeballing
- Reconciliation
- File Integrities
Common Areas
- Address Formats
- Titles (e.g. mrs, Mrs, MRS, first name)
- Invalid characters
- Duplicate Data
- Free Format to parameter field
Cleansing Strategy
- Legacy - Pre Migration
- During migration (not advised as this makes reconciliation very difficult)
- Target - Post Migration (either manual or via data fix)
- Ad Hoc Reporting - Ongoing
Implementation
Freeze
A code and parameter freeze occurs in the run up to the dress rehearsal. Any problems post freeze are run as post freeze fixes.
Dress Rehearsal
Dress rehearsals are intended to mobilise the resources that will be required to support a cutover in the production environment. The primary aim of a dress rehearsal is to identify the risks and issues associated with the implementation plan. It will execute all the steps necessary to execute a successful 'go live' migration.
Through the execution of a dress rehearsal all the go live checkpoints will be properly managed and executed and if required, the appropriate escalation routes taken.
Go Live window (typical migration)
- Legacy system 'end of business day' closedown
- Legacy system data extractions
- Legacy system data transmissions
- Readiness checks
- Migration Execution
- Reconciliation
- Integrity checking
- Transfer load to Target
- User Acceptance testing
- Reconciliation
- Acceptance and Go Live
Post Implementation
Support
For an agreed period after implementation certain key members of the migration team will be available to the business to support them in the first stages of using the new system. Typically this will involve analysis of any irregularities that may have arisen through dirty data or otherwise and where necessary writing data fixes for them.
Post Implementation fixes
Post Implementation Data Fixes are programs that are executed post migration to fix data that was either migrated in an 'unclean' state or migrated with known errors. These will typically take the form of SQL scripts.
|