- 1. W H I T E P A P E RHitachi Content Platform
CustomAciduisismodo Dolore EoloreObject Metadata Enhancement
Tool"Dionseq Uatummy Odolorem VelAdvanced Metadata Management
Capabilities forHitachi Content PlatformBy Christian Heiter,
Michael Malaret and David Haberland of Hitachi DataSystems Federal
Region and Clifford Grimm of Hitachi Content PlatformEngineering at
Hitachi Data SystemsOctober 2011
2. 2Table of ContentsExecutive Summary 3Introduction4Customer
Challenges 4Hitachi Content Platform Custom Object Metadata
Enhancement Tool: Standards, Performance and Custom Settings 5Based
on Open Standards 6System Operation, Environment and Performance
8User Settings and Customization 9Hitachi Content Platform Custom
Object Metadata Enhancement Tool: Architecture and Operation
9Ingest Function Process Flow 10Augment Function Process Flow11HCP
Namespace Usage by HCP Custom Object Metadata Enhancement Tool
12Source or Destination Locations12Reference Architecture and Host
Implementation Guidelines13Example Proof of Concept
Implementation14Parameters and Configuration Settings14Hitachi
Content Platform Primer15About Hitachi Content Platform
15Object-based Storage 16Namespaces and Tenants 17Namespace Access
17REST Interface 17Transmitting Data in Compressed Format 20Data
Access Permissions20Replication21Namespace Operations 22REST
Interface Primer23Service Offerings24Appendix A: References
25Appendix B: Feedback 26 3. 3Executive SummaryMany organizations
must typically manage multiple data stores, some of which contain
raw dataobjects with a small amount of metadata while others
contain related extended metadata. Themetadata is usually custom
metadata, which evolves over the life of the data object but cannot
bestored with the object itself. Managing multiple disparate data
stores adds considerable complexityand increases the total cost of
ownership.System implementation complexity can be reduced by
integrating the raw objects with their cor-responding metadata,
while providing the ability to add custom metadata at any point in
the future.If properly implemented, the new system will provide the
capability for advanced searches, includinga search across the
metadata, itself.The Hitachi Content Platform (HCP) "custom object
metadata enhancement tool" was developed toadd custom metadata
information to objects in an HCP repository. HCP provides an
intelligent datastore capability with retention and security
policies, data protection and content search. The com-bination of
HCP and this tool will reduce complexity and greatly expand the
richness of a repositorysearch, thus increasing the value of the
data and providing more advanced decision making andinference
capabilities. More powerful actionable intelligence will result
from this broader search.While this custom tool was originally
developed to enhance HCP objects with geospatial metadata,it was
intentionally implemented to be metadata-type agnostic. Using this
tool, any custom meta-data can be easily added to objects destined
for an HCP repository during the ingest phase or afterthey are
already in the repository using an augmentation operation. Any open
source or proprietarytool that can extract the metadata from an
input file can be used.The HCP custom object metadata enhancement
tool is one of the initial components in a broaderprogram to create
a Hitachi Data Systems file and content services "ecosystem." This
ecosystemwill enhance the file and content solution product
offerings from HDS with a set of tools that addcapabilities and
simplify usage in order to increase the value of the stored
content.This document is intended for the technical reader. It
provides a technical summary of the customobject metadata
enhancement tool as well as a high-level introduction to HCP. No
prior knowledgeof HCP is expected from the reader. The anticipated
result is a better understanding of HCP plus thecustom tool
solution and how it can add value to the data while reducing the
total cost of ownershipfor the customer. 4. 4IntroductionHitachi
Data Systems has created a new tool called the Hitachi Content
Platform (HCP) customobject metadata enhancement tool, which
expands the capability of Hitachi Content Platform [1].This tool
allows file objects stored in HCP to be augmented with additional
custom metadata infor-mation to significantly increase data
correlation using HCPs index and search capability.
Metadataenhancements will reduce the need for multiple data
repositories containing duplicated objects,potentially simplifying
the data architecture by integrating multiple disparate data
stores.The resulting expanded content store will greatly increase
the datas value and provide advancedsearch and correlation
capabilities. This increased effectiveness of the content searches
and timelyactionable information will be increased as a result of
the tool.Specific missions and applications can be supported, with
HCP storing file objects such as imagesor other rich media plus
their related custom metadata. The custom metadata could be
proprietary,classified or based on open standards or formats. The
metadata augmentation can be performedeither during the initial
object ingestion or by post-processing existing large data stores.
The lattercase allows a large repository to be updated with new
information without having to re-ingest orcreate a new copy on
another system. As needs change and new object information is
available,additional metadata can be added.The custom object
metadata enhancement tool will allow HCP product features to be
utilized acrossnew application spaces. HCP provides scalability to
40PB of storage, with high data integrity anddata replication.
Multiple virtual content platforms can be created from a single
physical imple-mentation with all resulting tenants securely
managed with individualized options for data retentionpolicies,
encryption, versioning and detailed audit logging. Other existing
features in HCP will allowfor distributed implementations to
increase system resiliency. HCP also supports advanced
Hitachistorage virtualization capabilities for even greater
efficiency, scalability and flexibility.Customer ChallengesHitachi
Content Platform with the custom object metadata enhancement tool
may present a viablesolution for organizations with one or more of
the following challenges:Very large data sets that have already
been ingested into HCP, but which require enhancements of the
stored information with custom metadataInability to add custom
metadata while ingesting content into HCPA need to cost-effectively
enhance the search capabilities for large data stores across a
larger information space for the same file objectsData located in
distributed locations but which would benefit from a distributed
search capabilityPolicy management of the data setsEnforced access
rights and namespaces to security-protected data
partitionsDisparate data stores with multiple data and accompanying
metadata sets 5. 5Hitachi Content Platform Custom ObjectMetadata
Enhancement Tool: Standards,Performance and Custom SettingsHCP
custom object metadata enhancement tool is a standalone application
that runs in conjunctionwith Hitachi Data Migrator software,
powered by CommVault (see Figure 1). It discovers objectsin a local
user directory, extracts metadata information from each object and
creates an XML filecontaining the metadata; then the tool either
ingests the file with the object into HCP or adds theinformation to
the corresponding object previously ingested. Figure 1. Hitachi
Content Platform Custom Metadata Enhancement Tool: Solution
ArchitectureKey features of the HCP custom object metadata
enhancement tool include:Allows HCP file objects to be augmented
with custom metadataCreates custom metadata to be stored in XML
format in HCPProvides capability to either add custom metadata
during the ingestion phase or to post-process and augment existing
HCP file objects with custom metadataPerforms custom metadata
operations either on local files or mounted remote directories con-
taining the files 6. 6Runs periodically as a user-space application
on any serverProvides tool parameter settings and customization
capabilities, including: New file check and process interval Update
or replace existing custom metadata if the object already exists in
the HCP data store HCP source and destination location
namespaceEnhances the value of the information in the HCP data
store by allowing for more advanced searchesProvides an end-user
pluggable custom metadata generation architectureProvides whole
object ingestion with HCP v4.1, allowing for a more efficient
single write opera- tionInterfaces through the HCP Representational
State Transfer (REST) interfaceSupported as a virtual machineHCP
custom object metadata enhancement tool will periodically start the
metadata extraction pro-cess. At this time it will either ingest
the new files with the new metadata or add the new metadatato
existing objects already in the HCP data store. The tool has
provided an extensible custom meta-data generation architecture;
this allows the user to configure the tool to call the appropriate
externalapplication. The callable metadata extraction application
can be any open source or proprietarysoftware that extracts key
information from the file object.Based on Open StandardsHCP custom
object metadata enhancement tool will invoke user-pluggable
applications to extractthe metadata from the objects, and then
reformat the data in an XML file to be ingested into HCP(see Figure
2). The XML open standard was selected because it will extend the
useful life of the dataand reduce the long-term operational costs,
since it does not require proprietary tools to supportproprietary
formats. As new data becomes available, the existing XML-based
information in the datastore can be further enhanced by any new
application that creates new metadata. 7. 7Figure 2. XML-formatted
Custom Metadata Sample Resulting from the FWtools Application:
IncludesGeospatial Information to Augment an Existing Hitachi
Content Platform Object 8. 8HCP custom object metadata enhancement
tool is constructed to use HCP open standard REST[2] interface.
This industry-standard interface is used for distributed hypermedia
systems such asthe World Wide Web and typically involves an HTTP
context. REST removes the need for proprietaryinterfaces, which
means it can quickly accelerate integration and long-term
maintenance costs. Italso provides the capability for simpler
customization as mission needs require, even throughout thelife of
a long-term mission.System Operation, Environment and
PerformanceCustom metadata can be added to the HCP data store in 2
different manners. The 1st allows theenhancement to be performed
during the ingest operation. In this case, new objects are found
inthe user directory by HCP custom object metadata enhancement
tool. The tool will first call thepluggable application to extract
the metadata and create an XML representation of the
resultingmetadata. Then it will ingest the object and the
corresponding metadata into the HCP data store.The 2nd functional
method allows existing HCP file objects to be enhanced (augmented)
with newmetadata. HCP custom object metadata enhancement tool will
see the new file in the input direc-tory. If the exact object
already exists in the data store, then the tool will call the
external program toextract the metadata, convert it to an XML
representation and ingest the newly formed metadata forthe
corresponding object.HCP custom object metadata enhancement tool
can be configured to search local directories onthe same machine
where it is running, or it can search for objects on files located
in a mountedremote directory. The tool also has been tested to run
in a virtual machine, pulling data from thelocal directory inside
the virtual machine. 9. 9Performance can be enhanced with HCP v4.1
since it provides the capability for a whole object in-gestion
operation. This allows a single write operation to be performed
with both the file object andthe corresponding metadata, thus
saving network bandwidth and system resources.User Settings and
CustomizationHCP custom object metadata enhancement tool provides a
number of user-configurable settings,including:Metadata extraction
application. This is the application that will be run on each file
to extract the relevant metadata. This is implemented as a
pluggable interface.Process run interval period. The user can
select the interval when the input user directory is checked for
new files and processed. This setting allows for adaptation to
situations ranging from new data provided at an extremely high rate
to when the new data is infrequently provided.Update or replace
selection. If HCP custom object metadata enhancement tool discovers
that the object already exists in the HCP data store, then the user
has the option to either update or replace the existing custom
metadata.Input directory. The user can specify either a local
directory on the same machine where HCP custom object metadata
enhancement tool is running, or a remote directory that has been
previ- ously mounted and is accessible.HCP destination namespace.
The user can select the destination HCP namespace.HCP namespace
authorization. The user can specify the HCP access authorization
information for the destination HCP namespace.File process count.
The user can specify the number of files that will be processed in
each interval. Adjusting this will require some tuning since there
will be variability in the implementation. Examples include:
Plug-inapplications will process files at varying speeds. File
sizes will vary. File addition rates will vary.HCP custom object
metadata enhancement tool provides a custom metadata
generationarchitecture that is end user pluggable. Therefore, any
open-source, customer-proprietary orvendor-proprietary metadata
extraction application can be used and changed as
needed.Customization of the application, itself, can be easily
performed by individuals with Java experience.C-language-based
interfaces to the lower level system operations are provided with
HCP customobject metadata enhancement tool to allow for further
customization, as required.Hitachi Content Platform Custom
ObjectMetadata Enhancement Tool: Architectureand OperationHCP
custom object metadata enhancement tool encapsulates a number of
functions into an ex-tensible and customizable tool suite (see
Figure 3). It watches for new files in a local user directory 10.
10 and runs each file through an external metadata extraction
program to see if there is any metadata. Then, it ingests the
resulting metadata to augment the corresponding HCP file object. If
the file object does not already exist in the data store, then the
tool will ingest the object, itself, as well. The tools running
application program will wake up on a periodic basis to perform
these functions.Figure 3. Components and Interfaces between Hitachi
Content Platform and HitachiContent Platform Custom Object Metadata
Enhancement Tool The REST interface is used for all communication
between HCP custom object metadata enhance- ment tool and HCP. This
is done for several reasons, including portability, supportability
and perfor- mance. REST is used by many database and distributed
web applications and is implemented using a behavioral model. The
beauty is: REST is an open standard and offers simplicity in its
stateless model. This makes it much easier to integrate distributed
components. There are 2 operational modes for HCP custom object
metadata enhancement tool: an in-band file object and metadata
ingest mode, and an out-of-band metadata augmentation mode. Both
are described below. Ingest Function Process Flow The ingest
function is an in-band mode whereby new file objects are
pre-processed to extract the custom metadata before ingestion into
the HCP data store. This function is useful when new data is being
ingested so that the accompanying metadata is added at the same
time as the file object. Detailed operation of the ingest function
is shown in Figure 4. At a user-defined periodic interval, the HCP
custom object metadata enhancement tool process will wake up and
begin searching for new files in the user directory. The list of
files is processed in order by the tool; each file in the resulting
list is provided as inputto the external metadata extraction
program. The extraction 11. 11 program will read the specified file
and send the resulting XML-formatted metadata information to the
tool . The tool will read the specified file and send the
information pair (object + metadata) to HCP. HCP will then write
each component to the respective location in the data store,
completing the ingest operation.Figure 4. Detailed HCP Custom
Object Metadata Enhancement Tool Ingest Process Flow:Post-processor
Extracts Custom Metadata, Augments Existing HCP Objects. Augment
Function Process Flow The augment function is an out-of-band mode
whereby existing file objects are post-processed in order to
augment the stored information with new object metadata. This is
useful when a large amount of data already exists in HCP.
Otherwise, all of the objects would have to be re-ingested into
another data repository, which could take considerable time and
network resources. Detailed operation of the augment function is
shown in Figure 5. As indicated by marker , the customer has
previously ingested a large number of files into the HCP data
store. HCP custom object metadata enhancement tool will
periodically wake up and query the existing HCP data store,
searching for files without metadata, which had not been modified
since the previous query. Files matching the criteria are supplied
to the metadata extraction application , which will read the file
object from the local directoryand provide any custom metadata from
the files formatted in an XML format. HCP custom object metadata
enhancement tool will then ingest the custom metadatato augment the
corresponding HCP objects . 12. 12Figure 5. Detailed HCP Custom
Metadata Enhancement Tool Process Flow During anAugment Function:
Post-processor Extracts Custom Metadata, Augments Existing
HCPObjects HCP Namespace Usage by HCP Custom Object Metadata
Enhancement Tool HCP provides access to the repository as
partitioned namespaces. A namespace is a logical grouping of
objects such that the objects in one namespace are not visible in
any other namespace. To the user of a namespace, the namespace is
the repository and it may appear as a network- accessible mount
point. This brief introduction allows for the discussion about
source and destination locations, but more detail on HCP and
namespaces are provided below. Source or Destination Locations HCP
custom object metadata enhancement tool provides flexibility in the
input source location as well as the output destination. In the
case of an ingest operation, the input source could be from a file
system on the machine where HCP custom object metadata enhancement
tool is running, or from a file system on a network-mounted remote
directory. For an augmentation operation, the ob- jects would be
sourced from the root folder in either an HCP default namespace or
an authenticated namespace. The destination of any HCP custom
object metadata enhancement tool file operation is always to an HCP
repository, but either namespace is allowed. The destination
namespace can be the same as the source namespace, or it can be a
different namespace. The path should contain the root folder within
the appropriate namespace. A summary of the allowable locations is
shown in Table 1. 13. 13 TABLE 1. HCP CUSTOM METADATA ENHANCEMENT
TOOL: ALLOWABLE SOURCE AND DESTINATION LOCATIONSObject LocationFile
System HCP Default HCP Authenticated Namespace NamespaceSource
Input Yes YesYesDestination Output NoYesYes Reference Architecture
and Host Implementation Guidelines In a typical implementation, HCP
custom object metadata enhancement tool runs on a host ma- chine,
which is not part of HCP. The tool requires minimal resources, and
the host machine could either be a physical machine or a virtual
machine. The processor, memory and storage requirements are driven
more by the plug-in metadata extraction application as well as the
size of the objects and the required object process rate. If
possible, administrators should provide adequate memory to al- low
the operating system to keep the object, as well as the metadata
extraction application resident in memory since the application
will be called repeatedly (i.e. for every new object to be
processed). Since HCP custom object metadata enhancement tool
requires only a single machine (physical or virtual), its reference
architecture is more dependent on the HCP implementation than the
tool node. Figure 6 depicts an example implementation with the
tools physical node connected to a 4-node HCP 500 system. This HCP
was configured with failover and uses modular storage with LUNs
pro- visioned from individual RAID groups. The tool node in this
diagram shows the new content being sourced from either a local
directory on the node, or from a remote directory (but not both).
14. 14Figure 6. HCP Custom Object Metadata Enhancement Tool
Reference Architecture: HCPImplementation as a 4-node HCP 500
Supporting Failover Using Modular Storage withLUNs Provisioned from
Individual RAID Groups Example Proof of Concept Implementation As a
proof of concept demonstration, both HCP custom object metadata
enhancement tool func- tions were utilized. The tool was first used
to enhance existing objects previously ingested, but which required
augmentation with newly provided geospatial metadata information.
The demonstra- tion also ingested new objects augmented with the
corresponding geospatial-based metadata. The pluggable metadata
application used was an open-source geographic information system
(GIS) program called FWtools [3]. FWtools provides the ability to
view geospatial information from a variety of format types, while
also providing the ability to extract the metadata for the
supported file types including the National Imagery Transmission
Format (NITF) [4]. NITF files are used by federal agencies and
system integrators focused on correlating information in the
objects with geospatial information, all from multiple events and
data sources. Parameters and Configuration Settings HCP custom
object metadata enhancement tool has a number of tunable parameters
and configuration settings that must be properly set before
starting normal operation. All of these settings can be found in
the "ingestor.properties" file. All of the settings are listed in
Table 2 along with the corresponding description. 15. 15 TABLE 2.
TUNABLE HCP CUSTOM OBjECT METADATA ENHANCEMENT TOOL PARAMETERS AND
CONFIGURATION SETTINGS ParameterDescription source.pathLocal path
to the directory that contains the data to ingest
source.maxBatchSizeMaximum number of file handles to "batch" per
loop iteration destination.user HCP data access: user to use for
ingest destination.password HCP data access password for
destination.user account destination.passwordEncodedIndication if
the destination.password value is encoded in md5 format
destination.rootpath Root path REST URL to HCP to place content
metadata.classes Comma separated, ordered list of class(es) to load
to extract metadata fromfiles execution.loopcountNumber of times to
load up the batch with files to process
execution.stopRequestFileName of file in process, local directory
to watch for to indicate to stopprocessing
execution.pauseRequestFile Name of file on local machine to watch
for to indicate to pause processing: Foras long as the file exists,
the program will be paused. Delete the file to resume.Changing this
value while the program is in the paused state will not cause
thenew value to be used until resumed.
execution.deleteSourceFilesIndicates whether the source files
should be deleted after written to HCP: If thefile does not have
correct permissions, attempt to change and try again.
execution.forceDeleteSourceFiles Indicates whether the source file
permissions should be forced to be deletedby changing the source
file permissions execution.deleteSourceEmptyDirsIndicates whether
the empty directories in the source files should be periodi-cally
cleaned up execution.updateMetadata Indicates whether metadata
should be updated for existing metadata onobjects in HCP: If set to
false, source files will be ignored (but deleted, ifindicated).
execution.pauseSleepInSecondsNumber of seconds to sleep during
pause for between checks for resume execution.batchSleepInSecond
Number of seconds to sleep at end of batch run before attempting
anotherbatch execution.debugging.httpheadersIndicates whether HTTP
headers should be written to the console (stdout) Hitachi Content
Platform Primer The functionality described here is based on
Hitachi Content Platform version 4.1, but some content might be
applicable to prior HCP versions. About Hitachi Content Platform
Hitachi Content Platform is a distributed storage system designed
to support large, growing re- positories of fixed-content data. HCP
stores objects that include both data and the corresponding
metadata. It distributes these objects across the storage space but
still presents them as files in a standard directory structure. 16.
16 HCP provides access to stored objects through the HTTP protocol,
as well as through user interfaces such as the namespace browser
and search console. HCP is a combination of hardware and software
that provides an object-based data storage envi- ronment. An HCP
repository stores all types of data, including simple text files as
well as multigiga- byte satellite, medical or database images. HCP
provides easy access to the repository for adding, retrieving and
deleting the stored data. HCP uses write once, read many (WORM)
storage technol- ogy and a variety of policies and internal
processes to ensure the integrity of the stored data and the
efficient use of storage capacity. Key features of HCP include:
Scalability up to 40PB of storage in a single cluster Capability to
provision a single cluster into multiple virtual content platforms
("tenants"), each withits own unique configuration and access
control to manage data placement and content distribu-tion to
appropriate audiences Connection capabilities to a wide range of
applications and protocols via http, REST, NFS, CIFSand more High
data integrity, with data integrity checking, RAID-6, replication,
encryption, WORM, multipleversions of objects and audit logging
Automation of data migration from old storage to new storage
Management and enforcement policies for retention, disposal,
shredding and other complianceand lifecycle management operations
Increased value of unstructured data using metadata and custom
metadata for automation andsearch Capability to create a single,
multipurpose, unstructured data platform for archive, cloud
andbackup capabilities Capability to monitor and report on storage
and bandwidth use of different tenants for charge-back Enhanced
management capabilities with comprehensive interfaces for cloud and
distributedenvironments Scalability to branch and remote offices
via Hitachi Data Ingestor The following section introduces basic
HCP concepts and includes information regarding HCP namespaces.
Object-based Storage HCP stores objects in the repository. Each
object permanently associates data HCP receives (for example, a
file, an image or a database) with information about that data,
called metadata. An object encapsulates: Fixed-content data, which
is an exact digital reproduction of data as it existed before it
wasstored. Once it is in the repository, this fixed content data
cannot be modified. System metadata offers system-managed
properties that describe the fixed-content data (forexample, its
size and creation date). System metadata includes settings, such as
retention and 17. 17data protection level, that influence how
transactions and internal processes affect the object. Custom
metadata is metadata that a user or application provides to further
describe an object. Itis typically specified as XML and can be used
to create self-describing objects. Future users andapplications can
use this metadata to understand and repurpose the object content.
Namespaces and Tenants An HCP repository is partitioned into
namespaces. A namespace is a logical grouping of objects such that
the objects in one namespace are not visible in any other
namespace. To the user of a namespace, the namespace is the
repository. Namespaces provide a mechanism for separating the data
stored for different applications, business units, or customers.
For example, a deployment could have one namespace for accounts
receivable and another for accounts payable. Namespaces also enable
operations to work against selected subsets of repository objects.
For example, a query could be performed that targets the accounts
receivable and accounts payable namespaces but not the employee
namespace. Namespaces are owned and managed by administrative
entities called tenants. A tenant typically corresponds to an
actual organization such as a company or a division or department
within a com- pany. A tenant can also correspond to an individual
person. Namespace Access HCP provides several techniques for
accessing and managing data in the namespace. These include: REST
interface Metadata query API Namespace browser Search console
Hitachi Data Migrator HCP client tools REST Interface Clients use
an HTTP-based REST interface to access the namespace. Using this
interface, actions can be performed such as adding objects to the
namespace, viewing and retrieving objects, chang- ing object
metadata and deleting objects. The namespace can be accessed
programmatically with applications, interactively with a
command-line tool or through a graphical user interface (GUI).
Figure 7 shows the relationship between original data, objects in a
namespace and the HTTP access protocol. 18. 18Figure 7. Client-HCP
Namespace: Relationship between Original Data, Objects in
aNamespace and HTTP Access to the HCP Data Store Metadata Query API
HCP allows clients to use HTTP requests to find objects that meet
specific criteria, including object change time, index setting,
operations on the object and the object location. If the client has
the appropriate permissions, it can query multiple namespaces, and
a single request can query multiple HCP namespaces and the default
namespace. A metadata query to HCP will return a set of records
containing metadata that describes the matching objects. If the
query matches a large number of objects, multiple requests can be
used to page sequentially through the records and retrieve only a
specific number of records in response to each request. Namespace
Browser The HCP namespace browser provides management of the
namespace content and the ability to view information about
namespaces. The browser functions include: List, view, and retrieve
objects and versions of objects Create empty directories Store and
delete objects Display namespace information, including: The
namespaces that can be accessed Retentionclasses for use within a
namespace Permissionsfor namespace access Statistics about a
namespace Search Console The HCP search console is an easy-to-use
web application that provides the capability to search for and
manage objects based on specified criteria. For example, a search
for objects stored before a certain date or larger than a specified
size could then be deleted or marked accordingly to prevent them
from being deleted. 19. 19 The search console works with either of
2 implementations, which must be enabled at the HCP system level:
The Hitachi Data Discovery Suite (HDDS) search facility interacts
with HDDS, which performssearches and returns results to the HCP
search console. HDDS is a separate product from HCP. The HCP search
facility is integrated with HCP and works internally to perform
searches andreturn results to the search console. Only one of the
search facilities can be enabled in the HCP GUI at any given time.
If neither is enabled, HCP does not support using the search
console to search namespaces. The system associated with the
enabled search facility is called the active search system. The
active search system (that is, HDDS or HCP) maintains an index of
data objects in each search- enabled namespace. The index is based
on object content and metadata. The active search system uses the
index for fast retrieval of search results. When objects are added
to or removed from the namespace or when object metadata changes,
the active search system automatically updates the index to keep it
current. For information on using the search console, please
reference [5]. Note: Not all namespaces support search if the
namespace administrator has not enabled search. Hitachi Data
Migrator Hitachi Data Migrator is a high-performance, multithreaded
client-side utility for viewing, copying, and deleting data. Data
Migrator functions include: Copy objects, files and directories
between local file systems, HCP namespaces and earlier HCParchives
Delete objects, files and directories, including performing bulk
delete operations View the content of objects and files, including
the content of old versions of objects Rename files and directories
on the local file system View object, file and directory properties
Create empty directories Add, replace or delete custom metadata for
objects Data Migrator has both a GUI and a command-line interface
(CLI). For information on using Data Migrator, please reference
[6]. HCP Client Tools HCP comes with a set of command-line tools
that allows data to be copied or moved between a client and an HCP
system. The tools also provide a search capability using specified
criteria. Additionally, empty directories can be created in a local
or remote file system or on an HCP system. The client tools support
multiple namespace access protocols and multiple client platforms.
The command syntax is the same for all supported configurations.
20. 20 For information on installing and using the client tools,
please reference [7]. Note: For most purposes, the HCP client tools
have been superseded by Hitachi Data Migrator. However, they have
some features, such as finding files that are not available in Data
Migrator. Transmitting Data in Compressed Format Object data or
custom metadata can be compressed in gzip format to save bandwidth
before sending it to HCP. The PUT request contains the subrequest
to tell HCP that data is compressed. HCP will then know to
decompress the data before storing it. Similarly, in a GET request,
HCP can be told to return object data or custom metadata in
compressed format. In this case, the returned data must first be
decompressed before use. HCP supports only the gzip algorithm for
compressed data transmission. HCP can be told that the request body
is compressed by including a Content-Encoding header with the value
gzip. In this case, HCP uses the gzip algorithm to decompress the
received data. HCP can be told to send a compressed response by
specifying an Accept-Encoding header. If the header specifies gzip,
a list of compression algorithms that includes gzip, or *, HCP uses
the gzip algorithm to compress the data before sending it. For
examples of sending and receiving objects in compressed format,
please reference Chapter 4, "Working with objects and versions" in
[8]. Notes: HCP can also compress and decompress metadata query API
requests and responses.For more information on this, please
reference the HCP product document titled "Using aNamespace," in
the section titled "Request HTTP elements." Since HCP normally
compresses stored object data and custom metadata, it is
unnecessaryto explicitly compress objects for storage. However, if
gzip-compressed objects or custommetadata are to be stored, do not
use a Content-Encoding header. To retrieve stored gzip-com-pressed
data, do not use an Accept-Encoding header. Data Access Permissions
All namespace access clients must have permission to access and
perform actions on data. Table 3 describes the permissions and the
operations allowed. 21. 21 TABLE 3. HCP PERMISSIONS AND ALLOWABLE
OPERATIONS Permission Operations Read yRetrieve objects and system
metadata.yCheck for object existence.yCheck for and retrieve custom
metadata. WriteyAdd objects.yCreate directories.y and change system
and custom metadata.Set Delete Delete objects, empty directories
and remove custom metadata. PurgeDelete objects and their
historical versions. Privileged yDelete or purge objects regardless
of retention.yPlace objects on hold. Search Search for objects. For
information on this, please reference Chapter 8 Usingthe HCP
metadata queryAPI Conduct search in [8]. Some operations require
multiple permissions. For example, to place an object on hold, the
user must have both write and privileged permissions. Similarly,
performing a privileged purge will require delete, privileged and
purge permissions. Permissions are set at 2 levels: Namespace-level
permissions. This permission mask specifies the maximum permissions
forany user that accesses the namespace. Data access account. This
specifies permissions for an individual user. Accessing anamespace
will require a data access account with a username and password.
The accountspecifies available namespaces and associated
permissions. The required permissions for a particular operation
must be enabled in both the namespace-level permission mask and the
corresponding data access account permissions. Replication
Replication is the process of keeping selected tenants and
namespaces in 2 HCP systems in sync with each other. Basically,
this entails copying object creations, deletions and metadata
changes from one system to the other. HCP also replicates the
tenant and namespace configuration, data access accounts and
retention classes. The HCP system in which the objects are
initially created is called the primary system. The 2nd system is
called the replica. Replication has several purposes, including: If
the primary system becomes unavailable (for example, due to network
issues), the replica canprovide continued data availability. If the
primary system suffers irreparable damage, the replica can serve as
a source for disasterrecovery. 22. 22 If an object cannot be read
from the primary system (for example, because a server is unavail-
able), HCP can try to read it from the replica. Note: Replication
is an add-on feature to HCP. Not all systems include it. Namespace
Operations Familiar commands and tools are used to perform
operations on a namespace. Some operations relate to specific types
of metadata. For more information on this metadata, please
reference Chapter 2, "Understanding objects" section in [8].
Operations that store or retrieve data can optionally transmit the
data in gzip-compressed format. For more information on this, see
the individual commands used for those operations. Operation
Restrictions The operations that can be performed are subject to
the following restrictions: The HTTP request headers must include
valid user information. The namespace must be configured to allow
HTTP or HTTPS access from the client IP address. The namespace
configuration and user permissions must allow the operation. For
information on user permissions, please reference Chapter 10,
"Using the Namespace Browser" in [8]. Supported Operations The
following operations can be performed on a namespace: Write data to
the namespace. If versioning is enabled, store new versions of
existing objects. Override default metadata when storing an object.
Create an empty directory in the namespace. Check for object
existence. View the content of an object. View object metadata.
Delete an object. Delete an empty directory. Set retention for an
object that has none. Extend the retention period for an object.
Set or change a retention class for an object. Hold or release an
object. Enable shredding of an object. Change the index setting for
an object. Add, replace or delete custom metadata for an object.
Add or retrieve object data and custom metadata in a single
operation. 23. 23 Check for and read custom metadata. List
retention classes available in the namespace. List namespace
permissions for the user. List the namespace statistics. List the
accessible namespaces. Use the HCP metadata query API to get
information about objects that meet specified criteria inone or
more namespaces. Prohibited operations HCP never allows users to:
Rename an object or directory. Overwrite a successfully stored
object. However, if versioning is enabled, new versions of anobject
can be written. Modify the fixed-content portion of an object.
Delete an object that is under retention if the privileged
permission is not granted or if thenamespace is configured to
prevent this operation. Delete a directory that contains one or
more objects. Shorten an explicitly set retention period. REST
Interface Primer The Representational State Transfer (REST)
interface is a behavioral model used by many database and
distributed web applications. Its beauty lies is in its simplicity.
From the Wikipedia definition: REST-style architectures consist of
clients and servers. Clients initiate requests to servers; servers
process requests and return appropriate responses. Requests and
responses are built around the transfer of representations of
resources. A resource can be essentially any coherent and
meaningful concept that may be addressed. A representation of a
resource is typically a document that captures the current or
intended state of a resource. At any particular time, a client can
either be in transition between application states or "at rest." A
client in a rest state is able to interact with its user, but
creates no load and consumes no per-client storage on the servers
or on the network. The client begins sending requests when it is
ready to make the transition to a new state. While one or more
requests are outstanding, the client is considered to be in
transition. The representation of each application state contains
links that may be used next time the client chooses to initiate a
new state transition. REST was initially described in the context
of HTTP, but is not limited to that protocol. RESTful architectures
can be based on other Application Layer protocols if they already
provide a rich and uniform vocabulary for applications based on the
transfer of meaningful representational state. RESTful applications
maximize the use of the pre- 24. 24 existing, well-defined
interface and other built-in capabilities provided by the chosen
network protocol, and minimize the addition of new
application-specific features on top of it. Service Offerings
Customization and support services are available. Please contact
your HDS Account Manager for additional information. 25. 25
Appendix A: References [1] Hitachi Content Platform (HCP):
http://www.hds.com/assets/pdf/hitachi-datasheet-content-
platform.pdf [2] REST interface:
http://en.wikipedia.org/wiki/Representational_State_Transfer [3]
FWTools for GIS imaging: http://fwtools.maptools.org [4] National
Imagery Transmission Format (NITF) files:
http://en.wikipedia.org/wiki/National_Imagery_ Transmission_Format
[5] HCP "Searching Namespaces" manual, part of the HCP Product
Documentation Set [6] HCP "Using HCP Data Migrator" manual, part of
the HCP Product Documentation Set [7] HCP "Using the HCP Client
Tools" manual, part of the HCP Product Documentation Set [8] HCP
"Using a Namespace" manual, part of the HCP Product Documentation
Set 26. 26 Appendix B: Feedback Hitachi Data Systems welcomes your
feedback. Please share your thoughts by sending an email message to
[email protected], [email protected],
[email protected] or [email protected]. Please be sure
to include the title of this white paper in your email message. 27.
Corporate Headquarters Regional Contact Information750 Central
Expressway Americas: +1 408 970 1000 or [email protected] Clara,
California 95050-2627 USA Europe, Middle East and Africa: +44 (0)
1753 618000 or [email protected] Pacific: +852 3189
7900 or [email protected] is a registered trademark
of Hitachi, Ltd., in the United States and other countries. Hitachi
Data Systems is a registered trademark and service mark of Hitachi,
Ltd., in the UnitedStates and other countries.All other trademarks,
service marks and company names in this document or website are
properties of their respective owners.Notice: This document is for
informational purposes only, and does not set forth any warranty,
expressed or implied, concerning any equipment or service offered
or to be offered byHitachi Data Systems Corporation. Hitachi Data
Systems Corporation 2011. All Rights Reserved. WP-410-A DG October
2011