Slide 1 Copyright © 2011 Stephen D. Poe Where Will All the Data Go? Stephen D. Poe, EDP, CSM, CSPO Nautilus Solutions 9 June 2011
Jan 12, 2015
Slide 1Copyright © 2011 Stephen D. Poe
Where Will All the Data Go?
Stephen D. Poe, EDP, CSM, CSPONautilus Solutions
9 June 2011
Slide 2Copyright © 2011 Stephen D. Poe
Where Will All the Data Go? Our Agenda
• The Problem• Solutions• Technical Questions• Planning Issues
Slide 3Copyright © 2011 Stephen D. Poe
The Problem
Slide 4Copyright © 2011 Stephen D. Poe
How Big is the Problem?
• Overall– In 2003, UC Berkley estimated 5 exabytes of new
data stored on digital drives• 1 petabyte = 1,000 terabytes• 1 exabyte = 1,000 petabytes
– In 2008, IDC estimated 281 exabytes of digital information was created and replicated globally
• That’s 45GB for each person on earth• Specific examples
– Internet traffic in March 2010 was estimated at 21 exabytes
– Email storage now commonly 25GB per user– Individual statement (AFP) used to average
perhaps 10-15KB per statement• Now several MB per statement
– Color, more graphics• What happens when your online statement includes
personalized audio and video?
Slide 5Copyright © 2011 Stephen D. Poe
How Big is the Problem?
• The number of files is growing even faster– Average file size is shrinking
• No longer just large print files• Emails, IM log, single tweet, QR
request
– Example: storing 1 TB• 1,000 1GB production files• 1,000,000,000 1KB email files
Slide 6Copyright © 2011 Stephen D. Poe
Multi-Channel World• You archive your customer correspondence
– Bills, statements, notices• Only 24% of US bank account holders have gone paperless• 37% say they will never go paperless (Forrester)
• How about new multi-channel messages? – Email, instant messages, mobile, video, voice, Tweets,
and blog posts• Instant messages, Twitter posts and blog posts are not
archived in 80% of the organizations using them.
• All may be discoverable– They may need to be stored
Slide 7Copyright © 2011 Stephen D. Poe
Solutions
Slide 8Copyright © 2011 Stephen D. Poe
Framing the Archive Issue
• Our archives must meet:– All legal and regulatory requirements
• to hold all required electronic documents• for the mandated length(s) of time
– in a cost effective manner– with a defensible plan to manage them
• Insuring that, when required, we can reproduce the ‘original’
– enough to satisfy a judge
Slide 9Copyright © 2011 Stephen D. Poe
Archival System Components
• Storage Format(s)– Multiple, and growing
• Archival system– Hardware– Software– Network
• Retrieval/display software– Network
• Process and procedures
Slide 10Copyright © 2011 Stephen D. Poe
Archive Drivers
Source: AIIM ECM state of the Industry study 2010
Slide 11Copyright © 2011 Stephen D. Poe
Archive Projects
No plans, 5%
In next 12 months, 16%
Departmental, 24%
Across departments,
15%
Implementing enterprise,
28%
Completed enterprise,
12%
Source: AIIM ECM state of the Industry study 2010
Slide 12Copyright © 2011 Stephen D. Poe
Technical Questions
Slide 13Copyright © 2011 Stephen D. Poe
What Do You Have Now?
• ECM, WCM, MCM, repositories of record, archives– How many silos in-house all ready?– Who owns which data?
• Where should we keep it all? – Single repository for all data, all formats?– Separate repositories specialized for each?
Slide 14Copyright © 2011 Stephen D. Poe
Example - Storing Emails
Slide 15Copyright © 2011 Stephen D. Poe
Storage & Admin & Overhead, Oh My!
• Storage may be cheap– Management and ’-ilities aren’t
• Metrics to think about– $/terabtye continues to fall
• Perhaps $2000-$3000/TB for near-Pentabyte systems– Petabytes/IT Storage Administrator
• Burdened labor overhead of perhaps $100K per admin – And overhead
• Rent, electricity, cooling, security– What ‘Green Footprint’?
Slide 16Copyright © 2011 Stephen D. Poe
The Cloud• Remember ASPs?
– Review pros and cons• In-house vs. outsourced
– Where outsourced?• Regulatory environment
– Will this data ever cross a trans-national boundary?
• Recent Amazon.com outage– 4 days down – 98.9% annual up time – What are the SLAs?– What are the penalties?– But could you do better in-house?
• Corporate level of risk– To allow corporate data to be held off-site– But is it any safer in-house?
Slide 17Copyright © 2011 Stephen D. Poe
Legal
• Compliance with rules and regulations– Especially with evolving regulations– Joint legal/IT taskforce to keep up with changes?
• International considerations– EU privacy rules considerably tighter– Conflicts
• Limited or no sharing of data across borders• US discovery laws vs. EU privacy directives
Slide 18Copyright © 2011 Stephen D. Poe
Preserving Your Data
• How long do you need to archive– Legal and regulatory requirements
• 7 years – 100 years
• Average lifespan of a format & reader software– Perhaps 2-3 major OS upgrades
• Look at PDF/A for possible format– ISO standard for very long term archive & retrieval– Good for some (but not all) documents
Slide 19Copyright © 2011 Stephen D. Poe
Finding Your Data
• Key indices– Good enough in the past
• For legacy applications on older data• Structured taxonomies
– If you develop the taxonomies before designing the archive • The New Search
– Full text search is a goal– What does that mean against several Pentabytes of data?
• Metadata– Exceptionally valuable– Usually exceptionally expensive, especially to retrofit
Slide 20Copyright © 2011 Stephen D. Poe
Planning Issues
Slide 21Copyright © 2011 Stephen D. Poe
The ‘-ilities
• The ‘-ilities– Usability, reliability, maintainability, scalability,
availability, extensibility, security, portability• Difference between a system and a success
– Requires long term commitment to people, process, and standards
– Set standards, define metrics, monitor and fix issues as they arise
Slide 22Copyright © 2011 Stephen D. Poe
Archive Planning
• Detailed knowledge of what is to be archived– Current & future production processes– Legacy data and documents– Current multi-channel and social media– Future data and documents?
• Detailed knowledge of how it will be used– By whom– On what platform(s)?– For what purposes
Slide 23Copyright © 2011 Stephen D. Poe
Archive Planning
• Archive system design– Implementation– Maintenance & upgrades– Be flexible – things will change
• Corporate processes and procedures– Satisfy the –’ilities– Continue to meet the business goals– Plan for regular review and transitions
Slide 24Copyright © 2011 Stephen D. Poe
Archive Planning – A Checklist
• Develop the business plan– Business goals, business case, costs, funding, project management
• Technology review– Time estimates, requirement gathering, analyze, plan, get consensus
• Develop policies and process– Define processes, people, standards, tools, technologies, metrics
• Develop Project Plan– A PM is a good idea
• Gap Analysis and build underlying foundation– Environment, platforms, skill sets, enterprise architecture
• Develop plan details – Implement, test, modify
• Maintenance
Slide 25Copyright © 2011 Stephen D. Poe
For More Information
Stephen D. Poe, EDP
Nautilus Solutions+1.214.532.0443