Top Banner
OPTIMIZING DATA MIGRATION PROJECTS CHALLENGES AND STRATEGIES FOR MAXIMIZING RESULTS, MINIMIZING RISKS
22

OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

Apr 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

OPTIMIZING DATA MIGRATION PROJECTS

CHALLENGES AND STRATEGIES FOR MAXIMIZING RESULTS, MINIMIZING RISKS

Page 2: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

2OPTIMIZING DATA MIGRATION PROJECTS

TABLE OF CONTENTS

3 INTRODUCTION

Maximizing Results, Minimizing Risks in Data Migration

4 CHAPTER 1:

The Three Biggest Challenges in the Data Migration Process

7 CHAPTER 2:

Data Migration Best Practices for Reducing Infrastructure Costs

10 CHAPTER 3:

How Metadata Management Can Accelerate Data Migration

13 CHAPTER 4:

Getting Past the Big Bang Myth to a Workable Data Migration Strategy

15 CHAPTER 5:

Supercharge Your Data Migration Plan with Process Automation

18 CHAPTER 6:

Ensuring Data Integrity — Before and After the Migration

20 CHAPTER 7:

Information Assurance Best Practices for In-Flight and At-Rest Data

22 CONCLUSION

A Smooth Data Migration Process Is Worth All the Effort

Page 3: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

3OPTIMIZING DATA MIGRATION PROJECTS

Data migration can be a complex and often underestimated task, where the stakes are high and the margin for error extremely slim. With the program focused on business process changes and building new IT systems, and

the reduced attention afforded to major migration projects, a variety of blind spots and unforeseen problems may emerge. Despite these challenges, from time to time it’s an absolute must for organizations prioritize data migration in order for them to effectively meet critical mission needs.

Because of the data-related constraints and requirements facing government agencies, it pays to consider a number of strategic and technical issues regarding the migration well in advance. By doing so, you can deliver a smooth migration process with minimal disruptions to business operations. Just as importantly, in many instances you can also resolve long-standing issues with your legacy data that can improve data quality and consistency moving forward.

Maximizing Results, Minimizing Risks in Data Migration

INTRODUCTION

Page 4: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

4OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 1

The Three Biggest Challenges in the Data Migration Process

Every organization encounters the problem of moving large amounts of information from older systems and legacy platforms to more cost effective, modernized solutions.

Managing a data migration process can be an extremely complex endeavor that presents a broad (and relatively consistent) set of challenges including the data itself, the platforms, and planning/execution. These challenges underscore an unfortunate fact about IT solutions vendors as they can sometimes underestimate the difficulty of seeding modernized systems with existing legacy data, leaving government customers with a complex migration problem after they’ve already contracted for the new solution.

1. The size, complexity, and condition of the data

One of the most central challenges is the sheer volume of the data that needs to be moved. Storage used to be expensive, but this is no longer the case. Given the relatively low cost of data storage compared to the risks of not keeping information, program managers will usually err of the side of storing more information and keeping it longer. As a result, government agencies typically maintain vast quantities of data in most of the systems they operate. Depending on the agency, that may mean multiple terabytes of data that need to be moved from legacy systems into a new system. The sheer amount of data that can be involved during modernization creates demands on bandwidth, storage space, personnel, and many other factors.

The complexity of the existing and target information models can also be a factor. Simply obtaining a detailed understanding of the existing legacy data takes time and effort. Even when a government organization has implemented mature data governance procedures and diligently maintained data reference models describing the information in legacy data platforms, the actual understanding of the data itself can remain at a high level or be incomplete. Reference models can be misleading, because although they tell us what data

Page 5: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

5OPTIMIZING DATA MIGRATION PROJECTS

was supposed to be there (at one point in time), the data that is really there is often uncertain. This is why it’s smart to use data reference models as a starting point, and then review the actual data to complete your understanding of the data.

Another challenge in the data migration process is that there is often a lack of visibility into the condition and quality of the data being moved. Some of the data an agency needs to move may have been created many years or even decades ago — using systems and applications that are no longer supported and in repositories that have not been consistently checked for accuracy or compatibility. Consequently, even if the data can be mapped to the new platform, it may not be possible to actually move it where it needs to go.

Date information is a notorious example of this problem. Modernized databases have consistent definitions for valid dates, times and timestamps, and new technology systems frequently leverage these definitions in IT applications. But what happens when you encounter the date “000033”? No matter how you transpose or rearrange the digits, it’s not going to translate to a native date value. So then what? Should you throw away the entire record, or perhaps just that one field? Should you correct the anomaly in the legacy system before converting? Performing detailed analysis of the existing data at the very beginning of migration planning is a must. Only then will stakeholders have the ability to determine how best to deal with anomalous data.

The volume of data can also make it nearly impossible to evaluate existing data using only manual processes. Fortunately, there

are a variety of tools and strategies that can effectively automate aspects of the processes of reviewing, cleaning, and verifying the data. In fact, one of the first places to look is in the embedded tools that are included with the new platform, combined with ubiquitous scripting techniques available on pretty much every platform.

2. Understanding the technology platform (both new and old)

A second area of challenge can be the technology platform — of both the legacy and target systems. An organization may not have staff on hand with requisite knowledge or familiarity with either or both of these systems. A related concern is that the proprietary nature of many older legacy systems can make it very difficult to use standard tooling in order to connect to data repositories, understand the information that is housed there, and efficiently pull the information into the new platform. Commercial industry tools can help here, but those tools can become very expensive for highly proprietary legacy platforms.

Managing a data migration process can be an extremely complex endeavor that presents a road (and relatively consistent) set of challenges including the data itself, the platforms, and planning/execution.

Page 6: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

6OPTIMIZING DATA MIGRATION PROJECTS

These challenges mean that at various stages throughout the transformation program, there may be a need to engage with subject matter experts with thorough knowledge of both existing and target platforms.

3. Blind spots in planning and scheduling

A third potential area of challenge in data migration is when there is not sufficient time allocated to critical steps. This isn’t entirely surprising, since an

agency may undertake a major data transition only rarely. The people who worked on the last data migration effort may have moved on to other agencies or roles, or may have even retired. In any case, the lessons learned in the previous migration may be only partially relevant to the current project.

Another problem is that data migration scope is rarely considered at the onset of modernization programs, and trying to change data migration requirements at the 11th hour is a recipe for disaster. Most technology vendors tend to do a great job of selling you the next new thing — but they’re far less concerned about making sure you can actually use it. They make promises about how easily you’ll be able to transition existing information into the new platform without fully understanding the data you already have. Effective planning, however, builds in the time and resources, early in the lifecycle, to test whether the process and technology will actually work when you “flip the switch.”

Overcoming the obstacles

In short, there are many moving parts involved in an effective data migration process, and a plan that does not adequately address them may impact the program’s ability to modernize on schedule, or even worse, interrupt mission-critical work during deployment. It pays to treat the data migration effort as its own subsystem or program track, allowing you to budget appropriate resources for the effort, and sometimes even to manage it as a separate effort in the modernization program.

There are many moving parts involved in an effective data migration process, and a plan that does not adequately address them may impact the program’s ability to modernize on schedule.

Page 7: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

7OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 2

Data Migration Best Practices for Reducing Infrastructure Costs

In addition to the strategic challenges mentioned above, there are budgetary issues to consider. Although you’ve already incurred significant costs to acquire and deploy the new operating platform you’re migrating to, to realize the return on that investment, you also need to adequately resource the migration process.

The potential costs of data migration generally fall into several categories, including commercial off the shelf (COTS) tools, server infrastructure, software development, and consultation with technology partners. This can be a frustrating and expensive realization. You’ve probably already spent a significant amount of your budget on the new platform itself and its dependent infrastructure. The last thing you want to do is spend additional resources on Extract, Transfer and Load (ETL) infrastructure for the data migration process — unless you absolutely have to.

What’s more, some of these infrastructure products could cost you hundreds of thousands of dollars (or more), depending on how many servers you need to deploy them on — and yet you may only need to use them in production for a few days (or hours even) during the migration process. So it makes sense to consider using data migration best practices to control your related infrastructure costs.

Consider your options — and their costs

Fortunately, there are a variety of low-cost (or even no-cost) tools that you can use to conduct the ETL process, or its alternative, Extract, Load and Transfer (ELT).

When considering tools, start by reviewing the ones that you have already purchased as part of your new data platform. Many database operating environments — whether Microsoft, Oracle, IBM, or others — include embedded tools that help facilitate the process of querying, bulk loading, importing, or exporting data. More often than you might think, these embedded tools will be sufficient for your needs.

Page 8: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

8OPTIMIZING DATA MIGRATION PROJECTS

In other cases, you might need to acquire expanded capabilities. One option for doing so would be to pay for developers to write code to handle any specialized ETL/ELT needs during the migration process, and combine them with embedded database tools. Common scripting tools (Unix Shell, Perl, Python, and Microsoft PowerShell) can be enormously useful for this and are very easy to develop with. Another option is to consider emerging open source products which can be leveraged for the most common ETL/ELT patterns required for data migration. And of course, there are multiple COTS options that are industry-proven for helping organizations meet their data migration needs (although they come at a cost).

The underlying strategy

Data migration best practices suggest that you should first determine whether your embedded infrastructure tools are sufficient for the job. If so, that’s great — and you can move ahead.

If not, you’ll want to do a cost/benefit analysis of your other options — paying for custom development, buying COTS products, or developing with open source solutions. Expect that some code development is going to be required, no matter which option you use. It’s rarely a “drag-and-drop” process as some tools promise, and if it is, you are almost always able to do the same thing with embedded tools.

Be prepared to spend time optimizing your ETL approaches. ETL tools (both commercial and open source) can distract data migration developers from the critical need to understand the details of how information is being pulled from data sources and pushed into targets. Remember that every efficiency realized saves your organization critical migration time or server costs. Tool vendors have effective solutions for splitting your migration up into parallel parts to save time, but each of those parts can incur additional server and/or software licensing costs.

Look to the cloud?

Another possible solution is the use of cloud infrastructure — but there are pros and cons to doing so. If you’re only looking for server infrastructure to use while you migrate for a specific modernization effort, acquiring and leveraging cloud infrastructure could provide you with a measure of speed and flexibility that could be very useful. Cloud environments are sometimes very useful for development and test environments, since they can be implemented very quickly and decommissioned when development and testing are complete.

The downside of moving large volumes of data into the cloud, however, is that in some cases, you will still have to deal with some fairly significant limitations in available network bandwidth. If you have to move significant amounts of information over your network in a reasonable amount of time, using cloud infrastructure may not be optimal.

Data migration best practices emphasize leveraging whatever you have, and optimizing wherever you can.

Page 9: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

9OPTIMIZING DATA MIGRATION PROJECTS

Infrastructure as a resource

At the end of the day, infrastructure is just another resource to be managed in the modernization program. Data migration best practices emphasize leveraging whatever you have, and optimizing wherever you can. Then, and only then, should you buy more of what you need to get the job done.

Remember that every efficiency realized saves your organization critical migration time or server costs.

Page 10: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

10OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 3

How Metadata Management Can Accelerate Data Migration

Two major technical challenges in data migration are the large volumes of data residing in the legacy system, and the data’s inconsistent structure and quality. Typically, the data must be assessed, cleaned, or otherwise transformed before it can be moved. Even more challenging, the older data repositories may have been created using obsolete or proprietary applications. Documentation is incomplete, insufficient or non-existent. For example, the staff member who designed and built the database may have retired years ago, and is essentially unavailable.

What does my data really look like?

Data migration efforts are inherently risky, frequently due to assumptions that are easy to make. For example, you might assume that your stakeholders and users know and understand all the data in the current platform, and that you can wait until the project’s end to completely familiarize yourself with the data. Neither of these are prudent assumptions.

It’s going to take time to understand what your data looks like, so you’d better start early. Don’t ask your stakeholders to tell you about the data; instead, use the metadata you find to acquire the understanding you need, and then ask use your stakeholders to tell you how to deal with the exceptions. It’s a far better way to leverage your stakeholders’ valuable time and effort. Maximize the information you acquire from metadata to automate data analysis, identify valid information patters, and isolate outliers.

Metadata force multipliers

A good development team is always looking for ways to deliver products more effectively. Development tools frequently provide wizards, templates, and scaffolding capability to speed the production of common solutions. These approaches help each individual developer become more productive by

Page 11: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

11OPTIMIZING DATA MIGRATION PROJECTS

automating repetitive tasks. Data migration problems lend themselves well to such standardized procedures, since data sets are frequently processed in very similar ways during the migration.

The rules may change regarding how some data types are handled, as well as the specific fields and data items represented in each data set. But they still follow a pattern. Once you’ve identified the pattern, conducting metadata management enables you to start leveraging tools and building new ones to greatly expand developer productivity.

Innovative tools can be created to generate migration code and associated functional tests using available sources such as COBOL Copybook and relational database metadata. Such tools can empower developers to work much more

quickly and shorten the overall development time. In addition, by leveraging generated code for repetitive tasks, one can greatly reduce the risk of developer coding errors.

A better solution

Everyone in this situation is presented a fundamental choice: work smart or work hard. These sound like equivalent options, but it’s important to recognize that they involve significant internal resources, systems integrators, contractors, consultants, and time and materials task orders (not to mention change orders). This means that the “work hard” option is in most cases very expensive. Consequently, the type of work involved lends itself more to the “work smart” option. By collecting, analyzing, and leveraging metadata — that is, data about the data — project

teams can create tools to automate the process of analyzing, extracting, transforming, and loading legacy data into the new platform.

The solution begins with conducting metadata management to determine how the data is organized and structured in the legacy system. This metadata can exist in many formats, and the trick is to find the metadata that will be most effective in streamlining the process. Legacy systems almost always include certain types of metadata that describe the way data is organized and stored. In most cases, this provides sufficient insight (and repeatable patterns) for developers to use in automating repetitive migration activities based on those patterns. The potential sources for metadata are myriad — database schemas, COBOL copybooks, XML schemas — and the list goes on.

For optimal results, it’s essential to have experience and insight working in the environment in which the data was created and stored in, whether it’s a mainframe environment, mid-range systems, UNIX/LINUX, or Windows. In addition, it’s vital to have a solid understanding of the requirements and constraints of the target platform.

It’s going to take time to understand what your data looks like, so you’d better start early.

Page 12: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

12OPTIMIZING DATA MIGRATION PROJECTS

Early due diligence for long-term impact

Gathering and analyzing available metadata at the outset of a data migration project can require some additional time and effort. However, once you’ve collected and leveraged metadata through effective metadata management, each developer or analyst can do the work of five or ten. As a result, you can significantly accelerate the migration process, particularly when numerous data sets/tables are being migrated. Using the metadata at hand, automation can be applied to bring far greater consistency to the data during migration to the modernized platform.

By applying metadata management strategies, an organization can accelerate the process of data migration.

Page 13: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

13OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 4

Getting Past the Big Bang Myth to a Workable Data Migration Strategy

One of the fundamental decisions an organization must make about its data migration strategy is when to migrate information from existing systems into a modernized platform.

After all, data migration can be a fairly disruptive process, so there’s a strong appeal to getting it over with quickly, even if an outage must be imposed upon users and stakeholders. The “Big Bang” approach is a popular term that describes a strategy whose goal is to shorten and simplify the movement of data from the old to the new system. It’s essentially a “rip-the-bandaid-off” strategy that has gained a certain amount of buzz in the data modernization world — although based on our experience, it also presents significant risks to the organization. Sometimes it’s the right approach, but only if the conditions are correct.

How Big Bang migrations work (in theory)

In an ideal world, an organization could shut down its systems at close of business on Friday, and move the data on its legacy system to the new platform in a matter of hours. Theoretically, they could flip the switch on Monday morning, and all the data would be in place on the new platform and ready for use.

If a data migration strategy could actually work this easily and quickly, it would greatly reduce outages, disruptions, and confusion for data users and other stakeholders. But in my experience, such results are rarely, if ever achieved. The reality is that such an approach can only work when an agency is moving relatively small amounts of data, or for a limited number of offices.

How Big Bang migrations work (in the real world)

For most organizations, the complexity and volume of their data — and the fact that stakeholders need uninterrupted access to it — mean that the Big Bang approach is not a workable strategy. Stakeholders are not incentivized to agree to long-term outages, and even when they do, Big Bang migrations can encounter unexpected data or other issues which quickly turn an acceptable outage window into an unacceptable one. Fortunately, there are alternatives that program managers can use to mitigate some of the risks associated with the Big Bang approach.

Page 14: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

14OPTIMIZING DATA MIGRATION PROJECTS

Alternate strategies

One option is to take a phased approach in your data migration strategy, rolling out or piloting the new system for a few offices at a time. In certain cases, such an approach could allow an agency to have specific groups of users literally log off the old system on a Friday and access the data on the new system on Monday with little or no interruption. In other words, it can deliver results similar to those promised by the Big Bang approach, but on a far smaller and more manageable scale. Outages are shorter, and a smaller portion of stakeholders are affected by any given migration window.

Another strategy is to use a data synchronization solution to give stakeholders greater flexibility in using both the new and old systems simultaneously — at least until the data migration process has been completed. This approach offers the potential for capturing data changes that are occurring in either the old or new system and applying them in the other. By deploying such technology effectively

throughout the migration, users can continue to access and make changes to the information in the old system, with their changes also showing up in the new system (and vice versa).

Such a synchronization solution is not necessarily designed to stay in place forever. But in the short term, it can help program managers ensure that stakeholders can continue to see and use the information in both the old and new systems simultaneously. Some sort of synchronization is almost always needed when migrated users need to share information or participate in system transactions with legacy users.

It’s important to point out that synchronization approaches are not always the perfect solution, either. They can be complex, tricky to plan and deploy, and if not managed and implemented correctly, they too can result in data discrepancy and confusion among stakeholders. The cost and risk of synchronization approaches must be weighed against the operational and mission flexibility they bring.

Keep your eye on the finish line

The main takeaway is to be aware of the three approaches described here — Big Bang, phased, and synchronization-enabled — and to know that each has its strengths and limitations.

Ideally, an organization with complex data and multiple sites will consider each of these approaches and use the elements that best meet their needs. By creating and implementing a customized strategy, you can give your program managers greater flexibility in the way their data is migrated, and at the same time, give data users more options for seamlessly executing their missions during modernization efforts.

For most organizations, the complexity and volume of their data — and the fact that stakeholders need uninterrupted access to it — mean that the Big Bang approach is not a workable strategy.

Page 15: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

15OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 5

Supercharge Your Data Migration Plan with Process Automation

The term data migration can refer to data migration per se — a one-time movement of data from a legacy system to a new platform — as well as more general, ongoing processes for moving information back and forth between the new and old systems.

In either scenario, automating processes is absolutely essential for successful project execution. Whether it’s a one-time data migration project or any large-scale repetitive transfer of data in the enterprise, there are four key reasons why process automation is critical:

1. Speed

Most data migration processes will be complex. Data is going to come from lots of different sources, and map to many potential targets. Target data structures need to be created, indexes created, dependencies managed, in addition to the extraction and load activities.

In addition, the number of distinct steps can be extensive. For government agencies that are migrating significant amounts of legacy data into a modernized system, the migration process may comprise thousands of individual activities and steps, each of which needs to be sequenced and checked for errors. But by automating some or all of the process, with activities running in parallel, it can be implemented without delays as one job, with one command — and then it’s done.

Data migration is always a time-sensitive operation, and migration windows usually involve some type of outage. No one on the mission side likes to hear this. So it’s impossible to overstate the importance of optimizing the speed of the entire migration process.

Page 16: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

16OPTIMIZING DATA MIGRATION PROJECTS

2. Accuracy

When you have terabytes of information to migrate from a legacy system, it’s essential to be able to assess the accuracy of existing information and fix it wherever possible before you migrate it. You simply can’t take that amount of data and manually review it quickly to identify and resolve anomalies, possible conversion issues, and information that does not comply with target requirements. A data migration plan that applies automated scanning tools can look for anomalies and correct them, while minimizing or even eliminating any impact on the migration window.

Process automation is also a key quality factor. For very complex migrations involving lots of activities and steps, process automation takes human/operator error out of the equation. No one wants to find out in the 20th hour of a 24-hour migration window that someone missed a critical step in the 2nd hour that negatively affects the rest of the migration process.

3. Repeatability

Generally speaking, trying to migrate your data for the first time during the production cutover window represents an avoidable risk. No matter how completely you evaluate and scan legacy data, you’re going to come across data you did not expect. You must keep in mind that, just because you looked at the data yesterday, doesn’t mean you’ve seen it in its final state prior to the migration. After all, your users are still using the system, and new data could be added at any time. Unexpected obstacles and

data anomalies are going to occur, especially in complex, older, or proprietary systems, so it helps to test your process thoroughly before you go live.

Also, before you go live, you should have repeatedly migrated test databases as well as production data. Having the process automated allows you to perform this necessary repetition with minimal resource overhead.

4. Stakeholder buy-in

Outage windows are not the only things that will keep key stakeholders up at night during modernization. They also want assurances that when the migration is complete and users begin to use the new system, the information that was there when they logged off of the old system is correctly reflected in the new system. Again, when data volumes are large, neither the integration team, nor the entire collected user base for that matter, could reasonably be expected to check all the data in the new system. Fortunately, a data migration plan that leverages process automation can overcome this challenge by utilizing tools and techniques to verify post-migration data against its pre-migration state.

No matter how completely you evaluate and scan legacy data, you’re going to come across data you did not expect.

Page 17: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

17OPTIMIZING DATA MIGRATION PROJECTS

Use the results of the verification to provide stakeholders with detailed reports confirming that the post-migration data meets expectations.

The choice is clear

It may seem like a lot of additional work to automate processes in your data migration plan, but failing to do so would be “penny wise but pound foolish.” By automating the process of data migration, you can get to your modernization goals far more quickly and efficiently — and at the same time, end up with better, more reliable data.

Page 18: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

18OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 6

Ensuring Data Integrity — Before and After the Migration

One of the most crucial aspects of successful data migration is ensuring the data integrity at both ends of the migration process.

For data residing in the legacy system, far too many agencies tend to underestimate the amount of inaccuracies, discrepancies, and conflicts that exist. Some older database technologies are remarkably lenient about enforcing rules and constraints when adding or modifying data. And for systems and applications that have been in production for years, bugs that may have been taken care of years ago still may have allowed unexpected data to be recorded in the database.

Problematic data buried in a legacy system may be annoying, but they are not generally the cause of operational problems in and of themselves. A lot of the data is old, and is rarely if ever accessed by the existing system. In fact, many stakeholders may not even know of its existence. However, once you try to migrate malformed data into a new database that has strict policies for what data is allowed, it can create conflicts that slow or shut down your migration process.

Bad data comes in many forms, and can occur when valid data appears in the wrong place (for example a date expected to be “01-01-1999” shows up as “19990101”) or when internally-represented values (such as numeric or floating point data) contain corrupted or unexpected byte sequences. For the migration to proceed smoothly, it’s essential to find these anomalies and determine how to address them either before or during migration.

Handling problem data

One thing to note: inevitably, certain legacy data (such as the month/date errors cited above) will not make it into the new target system without some intervention. In some cases, platform vendors have been known to advise clients that “bad” data that fails to load should simply be ignored, or deleted from the source system prior to migration. This may be the easy path, but few organizations are going to take it. For some government organizations, it’s not an option at all. For example, agencies

Page 19: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

19OPTIMIZING DATA MIGRATION PROJECTS

involved in law enforcement are usually subject to laws and regulations that require the retention of investigative data for decades, including the bad data. From a legal standpoint, you can’t just pretend the bad data was never there.

We suggest a better alternative: assign valid but conspicuous values as “flags” for problem data in the new system. By doing so, when users encounter the flag in a certain record, they immediately know the migrated record contained information that could not be loaded. For each such instance, persist the original data from

the legacy system in a generic format and link it to the new record. This way, the user can quickly check the original source to meaningfully interpret the data.

Show me the proof

Stakeholders will need to be shown that their data is complete and correct once it has been migrated to the new platform — a level of confidence they will not be able to achieve by themselves. By ensuring that the data appears accurately in its new databases and repositories, an agency can prove to stakeholders that post-migration information has absolute data integrity.

When we work with clients, we often use frameworks that apply combinations of validation procedures and automated statistical and analytical tools. The frameworks allow an agency to examine large amounts of the migrated data, comparing and checking against expected results. Statistical techniques provide a general level of assurance

that migrated data is complete. If accuracy is a concern, apply checksum methods both before and after migration to efficiently confirm that critical data is correctly represented in the new system.

Automating the data quality review

On both the front and back end of the data migration process, there are mountains of data to test. Because of the data’s volume and complexity, finding and resolving every data anomaly is often easier said than done — and almost impossible if you’re using only manual intervention. This is why it’s so critical to use automated tools, either ones that were included with data profiling software, or ones that you develop on your own (which can be done at relatively low cost). In more challenging or complicated situations, you may need to purchase special case tools.

By effectively testing data integrity and resolving anomalies before data migration, and also testing the validity of data following migration, an organization can ensure that it retains all the information from its legacy system. Just as importantly, it can actually resolve certain longstanding inaccuracies, conflicts, and other lingering data quality issues. As a result, stakeholders and data users can be assured that the data they’re seeing today is not only accurate, but even improved from what it was in the legacy system.

By effectively testing data integrity and resolving anomalies before data migration, an organization can ensure that it retains all the information from its legacy system.

Page 20: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

20OPTIMIZING DATA MIGRATION PROJECTS

CHAPTER 7

Information Assurance Best Practices for In-Flight and At-Rest Data

It should not surprise anyone that the days of lax IT security practices are over. Information assurance has evolved out of the governance realm and into the world of implementation.

Almost every development work stream in a modern transformation project includes planning, building, and validating solutions to ensure the protection of critical and sensitive data, and data migration efforts are no exception. Whether it’s enterprise data in flight — that is, the data as it’s being transferred from your legacy system to your new system — or data at rest after it’s been migrated (in storage), you should adhere to two fundamental best practices to keep your information secure.

1. Understand where your sensitive data is — both in flight and at rest — where it’s at risk, and how to protect it.

The first fundamental best practice in information assurance is to be fully aware of exactly what personally identifiable information (PII) and sensitive data exists in your legacy system, and specifically where it resides. By their nature, legacy systems often contain data that was created before more rigorous data protection protocols were developed. It’s sometimes surprising to learn how frequently older systems were “grandfathered in” when new and more stringent information assurance policies were put into place.

It’s also essential to understand that it’s not always easy or intuitive to know all the instances where PII exists. For example, in some databases, many older systems still use Social Security Numbers as internal keys to uniquely identify user information. In fact, this was actually standard operating procedure before SSNs became the veritable Holy Grail for identity thieves. Depending on the database and the approach used, it’s possible that SSNs may not even appear on any specific screen or report — yet they still exist, often unencrypted or otherwise protected, within the database.

For this reason, it’s critical to use data profiling software to identify and locate any instances of PII or other sensitive information in the enterprise

Page 21: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

21OPTIMIZING DATA MIGRATION PROJECTS

data residing in your legacy system. Now for the not-so-fun part. This includes looking for sensitive data in places where you do not expect to find it. This means that finding every field in every data set or table that includes the terms “ssn,” “social_security,” etc. is only the first step. You should also use regular expression matching to find every occurrence of the familiar XXX-XX-XXXX sequence of numbers and hyphens in any character-based field in excess of 9 characters — even in places where the field or column name is not intuitive. (One easy way to do so is to use the regular expression pattern “\s*\d{3}-?\d{2}-?\d{4}\s*“. You might be surprised at what you find.)

Taking the time to search thoroughly for such hidden PII is a must. Armed with the knowledge of exactly where PII resides in your legacy system, you can take steps to adequately encrypt the data — both during its migration as well as when it is at rest in the new target system.

2. Use standard enterprise data protection standards throughout the process.

Naturally, throughout the data migration process you should continue implementing all the standard data protection processes and protocols your organization uses on a day-to-day basis. These are rather ubiquitous but effective techniques, ranging from AES encryption standards, using SSL on the wire, public key cryptography, digital certificates, and SHA hashing. Hashing data is not as secure as encrypting it, but it is still better than leaving it completely exposed. Check your target platforms to see which out-of-the-box choices are there before you start writing custom code. For example, if you are migrating to an Oracle RDBMS database running on Unix/Linux, there are many choices on how to encrypt the data at rest without additional code, including:

• Oracle Field Level Encryption

• Oracle Tablespace Encryption

• Filesystem or Logical Volume Encryption (choices vary based on Unix/Linux flavor)

• SAN or Array Controller Based Encryption (depending on your storage infrastructure)

When it comes to security, don’t assume anything

The bottom line is that the process of moving legacy data could potentially expose sensitive data acquired or created long ago that your current stakeholders may not be fully aware of. Never make assumptions about security and information assurance. The network you are on is only secure until the first hacker inserts a packet sniffer onto the network, and physical storage locked in the server room might seem secure until the first tape backup gets misplaced. Plan to implement necessary security measures both while the data is in flight and once it’s at rest in the new system.

Page 22: OPTIMIZING DATA MIGRATION PROJECTS...OPTIMIZING DATA MIGRATION PROJECTS 5was supposed to be there (at one point in time), the data that is really there is often uncertain. This is

22OPTIMIZING DATA MIGRATION PROJECTS

For government agencies, major data migration projects call for a variety of specific technologies, strategies and skills that may only be rarely needed, depending on how frequently they upgrade their major mission IT

platforms. It’s because of this infrequency that it’s so important to understand the many strategic and technical challenges involved.

Managed correctly, a data migration project can allow for uninterrupted access to critical data for mission users and a better return on technology investment for the agency as a whole. In many cases, it can also help resolve certain long-standing issues with data accuracy and consistency in the legacy platform. It’s worth taking it on as a project in its own right — planning and resourcing it appropriately, and managing it to optimize results for all your stakeholders.

CONCLUSION

A Smooth Data Migration Process Is Worth All the Effort