Hitachi content platform custom object metadata enhancement tool

1. W H I T E P A P E RHitachi Content Platform CustomAciduisismodo Dolore EoloreObject Metadata Enhancement Tool"Dionseq Uatummy Odolorem VelAdvanced Metadata Management Capabilities forHitachi Content PlatformBy Christian Heiter, Michael Malaret and David Haberland of Hitachi DataSystems Federal Region and Clifford Grimm of Hitachi Content PlatformEngineering at Hitachi Data SystemsOctober 2011

2. 2Table of ContentsExecutive Summary 3Introduction4Customer Challenges 4Hitachi Content Platform Custom Object Metadata Enhancement Tool: Standards, Performance and Custom Settings 5Based on Open Standards 6System Operation, Environment and Performance 8User Settings and Customization 9Hitachi Content Platform Custom Object Metadata Enhancement Tool: Architecture and Operation 9Ingest Function Process Flow 10Augment Function Process Flow11HCP Namespace Usage by HCP Custom Object Metadata Enhancement Tool 12Source or Destination Locations12Reference Architecture and Host Implementation Guidelines13Example Proof of Concept Implementation14Parameters and Configuration Settings14Hitachi Content Platform Primer15About Hitachi Content Platform 15Object-based Storage 16Namespaces and Tenants 17Namespace Access 17REST Interface 17Transmitting Data in Compressed Format 20Data Access Permissions20Replication21Namespace Operations 22REST Interface Primer23Service Offerings24Appendix A: References 25Appendix B: Feedback 26 3. 3Executive SummaryMany organizations must typically manage multiple data stores, some of which contain raw dataobjects with a small amount of metadata while others contain related extended metadata. Themetadata is usually custom metadata, which evolves over the life of the data object but cannot bestored with the object itself. Managing multiple disparate data stores adds considerable complexityand increases the total cost of ownership.System implementation complexity can be reduced by integrating the raw objects with their cor-responding metadata, while providing the ability to add custom metadata at any point in the future.If properly implemented, the new system will provide the capability for advanced searches, includinga search across the metadata, itself.The Hitachi Content Platform (HCP) "custom object metadata enhancement tool" was developed toadd custom metadata information to objects in an HCP repository. HCP provides an intelligent datastore capability with retention and security policies, data protection and content search. The com-bination of HCP and this tool will reduce complexity and greatly expand the richness of a repositorysearch, thus increasing the value of the data and providing more advanced decision making andinference capabilities. More powerful actionable intelligence will result from this broader search.While this custom tool was originally developed to enhance HCP objects with geospatial metadata,it was intentionally implemented to be metadata-type agnostic. Using this tool, any custom meta-data can be easily added to objects destined for an HCP repository during the ingest phase or afterthey are already in the repository using an augmentation operation. Any open source or proprietarytool that can extract the metadata from an input file can be used.The HCP custom object metadata enhancement tool is one of the initial components in a broaderprogram to create a Hitachi Data Systems file and content services "ecosystem." This ecosystemwill enhance the file and content solution product offerings from HDS with a set of tools that addcapabilities and simplify usage in order to increase the value of the stored content.This document is intended for the technical reader. It provides a technical summary of the customobject metadata enhancement tool as well as a high-level introduction to HCP. No prior knowledgeof HCP is expected from the reader. The anticipated result is a better understanding of HCP plus thecustom tool solution and how it can add value to the data while reducing the total cost of ownershipfor the customer. 4. 4IntroductionHitachi Data Systems has created a new tool called the Hitachi Content Platform (HCP) customobject metadata enhancement tool, which expands the capability of Hitachi Content Platform [1].This tool allows file objects stored in HCP to be augmented with additional custom metadata infor-mation to significantly increase data correlation using HCPs index and search capability. Metadataenhancements will reduce the need for multiple data repositories containing duplicated objects,potentially simplifying the data architecture by integrating multiple disparate data stores.The resulting expanded content store will greatly increase the datas value and provide advancedsearch and correlation capabilities. This increased effectiveness of the content searches and timelyactionable information will be increased as a result of the tool.Specific missions and applications can be supported, with HCP storing file objects such as imagesor other rich media plus their related custom metadata. The custom metadata could be proprietary,classified or based on open standards or formats. The metadata augmentation can be performedeither during the initial object ingestion or by post-processing existing large data stores. The lattercase allows a large repository to be updated with new information without having to re-ingest orcreate a new copy on another system. As needs change and new object information is available,additional metadata can be added.The custom object metadata enhancement tool will allow HCP product features to be utilized acrossnew application spaces. HCP provides scalability to 40PB of storage, with high data integrity anddata replication. Multiple virtual content platforms can be created from a single physical imple-mentation with all resulting tenants securely managed with individualized options for data retentionpolicies, encryption, versioning and detailed audit logging. Other existing features in HCP will allowfor distributed implementations to increase system resiliency. HCP also supports advanced Hitachistorage virtualization capabilities for even greater efficiency, scalability and flexibility.Customer ChallengesHitachi Content Platform with the custom object metadata enhancement tool may present a viablesolution for organizations with one or more of the following challenges:Very large data sets that have already been ingested into HCP, but which require enhancements of the stored information with custom metadataInability to add custom metadata while ingesting content into HCPA need to cost-effectively enhance the search capabilities for large data stores across a larger information space for the same file objectsData located in distributed locations but which would benefit from a distributed search capabilityPolicy management of the data setsEnforced access rights and namespaces to security-protected data partitionsDisparate data stores with multiple data and accompanying metadata sets 5. 5Hitachi Content Platform Custom ObjectMetadata Enhancement Tool: Standards,Performance and Custom SettingsHCP custom object metadata enhancement tool is a standalone application that runs in conjunctionwith Hitachi Data Migrator software, powered by CommVault (see Figure 1). It discovers objectsin a local user directory, extracts metadata information from each object and creates an XML filecontaining the metadata; then the tool either ingests the file with the object into HCP or adds theinformation to the corresponding object previously ingested. Figure 1. Hitachi Content Platform Custom Metadata Enhancement Tool: Solution ArchitectureKey features of the HCP custom object metadata enhancement tool include:Allows HCP file objects to be augmented with custom metadataCreates custom metadata to be stored in XML format in HCPProvides capability to either add custom metadata during the ingestion phase or to post-process and augment existing HCP file objects with custom metadataPerforms custom metadata operations either on local files or mounted remote directories containing the files 6. 6Runs periodically as a user-space application on any serverProvides tool parameter settings and customization capabilities, including: New file check and process interval Update or replace existing custom metadata if the object already exists in the HCP data store HCP source and destination location namespaceEnhances the value of the information in the HCP data store by allowing for more advanced searchesProvides an end-user pluggable custom metadata generation architectureProvides whole object ingestion with HCP v4.1, allowing for a more efficient single write opera- tionInterfaces through the HCP Representational State Transfer (REST) interfaceSupported as a virtual machineHCP custom object metadata enhancement tool will periodically start the metadata extraction pro-cess. At this time it will either ingest the new files with the new metadata or add the new metadatato existing objects already in the HCP data store. The tool has provided an extensible custom meta-data generation architecture; this allows the user to configure the tool to call the appropriate externalapplication. The callable metadata extraction application can be any open source or proprietarysoftware that extracts key information from the file object.Based on Open StandardsHCP custom object metadata enhancement tool will invoke user-pluggable applications to extractthe metadata from the objects, and then reformat the data in an XML file to be ingested into HCP(see Figure 2). The XML open standard was selected because it will extend the useful life of the dataand reduce the long-term operational costs, since it does not require proprietary tools to supportproprietary formats. As new data becomes available, the existing XML-based information in the datastore can be further enhanced by any new application that creates new metadata. 7. 7Figure 2. XML-formatted Custom Metadata Sample Resulting from the FWtools Application: IncludesGeospatial Information to Augment an Existing Hitachi Content Platform Object 8. 8HCP custom object metadata enhancement tool is constructed to use HCP open standard REST[2] interface. This industry-standard interface is used for distributed hypermedia systems such asthe World Wide Web and typically involves an HTTP context. REST removes the need for proprietaryinterfaces, which means it can quickly accelerate integration and long-term maintenance costs. Italso provides the capability for simpler customization as mission needs require, even throughout thelife of a long-term mission.System Operation, Environment and PerformanceCustom metadata can be added to the HCP data store in 2 different manners. The 1st allows theenhancement to be performed during the ingest operation. In this case, new objects are found inthe user directory by HCP custom object metadata enhancement tool. The tool will first call thepluggable application to extract the metadata and create an XML representation of the resultingmetadata. Then it will ingest the object and the corresponding metadata into the HCP data store.The 2nd functional method allows existing HCP file objects to be enhanced (augmented) with newmetadata. HCP custom object metadata enhancement tool will see the new file in the input direc-tory. If the exact object already exists in the data store, then the tool will call the external program toextract the metadata, convert it to an XML representation and ingest the newly formed metadata forthe corresponding object.HCP custom object metadata enhancement tool can be configured to search local directories onthe same machine where it is running, or it can search for objects on files located in a mountedremote directory. The tool also has been tested to run in a virtual machine, pulling data from thelocal directory inside the virtual machine. 9. 9Performance can be enhanced with HCP v4.1 since it provides the capability for a whole object in-gestion operation. This allows a single write operation to be performed with both the file object andthe corresponding metadata, thus saving network bandwidth and system resources.User Settings and CustomizationHCP custom object metadata enhancement tool provides a number of user-configurable settings,including:Metadata extraction application. This is the application that will be run on each file to extract the relevant metadata. This is implemented as a pluggable interface.Process run interval period. The user can select the interval when the input user directory is checked for new files and processed. This setting allows for adaptation to situations ranging from new data provided at an extremely high rate to when the new data is infrequently provided.Update or replace selection. If HCP custom object metadata enhancement tool discovers that the object already exists in the HCP data store, then the user has the option to either update or replace the existing custom metadata.Input directory. The user can specify either a local directory on the same machine where HCP custom object metadata enhancement tool is running, or a remote directory that has been previously mounted and is accessible.HCP destination namespace. The user can select the destination HCP namespace.HCP namespace authorization. The user can specify the HCP access authorization information for the destination HCP namespace.File process count. The user can specify the number of files that will be processed in each interval. Adjusting this will require some tuning since there will be variability in the implementation. Examples include: Plug-inapplications will process files at varying speeds. File sizes will vary. File addition rates will vary.HCP custom object metadata enhancement tool provides a custom metadata generationarchitecture that is end user pluggable. Therefore, any open-source, customer-proprietary orvendor-proprietary metadata extraction application can be used and changed as needed.Customization of the application, itself, can be easily performed by individuals with Java experience.C-language-based interfaces to the lower level system operations are provided with HCP customobject metadata enhancement tool to allow for further customization, as required.Hitachi Content Platform Custom ObjectMetadata Enhancement Tool: Architectureand OperationHCP custom object metadata enhancement tool encapsulates a number of functions into an ex-tensible and customizable tool suite (see Figure 3). It watches for new files in a local user directory 10. 10 and runs each file through an external metadata extraction program to see if there is any metadata. Then, it ingests the resulting metadata to augment the corresponding HCP file object. If the file object does not already exist in the data store, then the tool will ingest the object, itself, as well. The tools running application program will wake up on a periodic basis to perform these functions.Figure 3. Components and Interfaces between Hitachi Content Platform and HitachiContent Platform Custom Object Metadata Enhancement Tool The REST interface is used for all communication between HCP custom object metadata enhancement tool and HCP. This is done for several reasons, including portability, supportability and performance. REST is used by many database and distributed web applications and is implemented using a behavioral model. The beauty is: REST is an open standard and offers simplicity in its stateless model. This makes it much easier to integrate distributed components. There are 2 operational modes for HCP custom object metadata enhancement tool: an in-band file object and metadata ingest mode, and an out-of-band metadata augmentation mode. Both are described below. Ingest Function Process Flow The ingest function is an in-band mode whereby new file objects are pre-processed to extract the custom metadata before ingestion into the HCP data store. This function is useful when new data is being ingested so that the accompanying metadata is added at the same time as the file object. Detailed operation of the ingest function is shown in Figure 4. At a user-defined periodic interval, the HCP custom object metadata enhancement tool process will wake up and begin searching for new files in the user directory. The list of files is processed in order by the tool; each file in the resulting list is provided as inputto the external metadata extraction program. The extraction 11. 11 program will read the specified file and send the resulting XML-formatted metadata information to the tool . The tool will read the specified file and send the information pair (object + metadata) to HCP. HCP will then write each component to the respective location in the data store, completing the ingest operation.Figure 4. Detailed HCP Custom Object Metadata Enhancement Tool Ingest Process Flow:Post-processor Extracts Custom Metadata, Augments Existing HCP Objects. Augment Function Process Flow The augment function is an out-of-band mode whereby existing file objects are post-processed in order to augment the stored information with new object metadata. This is useful when a large amount of data already exists in HCP. Otherwise, all of the objects would have to be re-ingested into another data repository, which could take considerable time and network resources. Detailed operation of the augment function is shown in Figure 5. As indicated by marker , the customer has previously ingested a large number of files into the HCP data store. HCP custom object metadata enhancement tool will periodically wake up and query the existing HCP data store, searching for files without metadata, which had not been modified since the previous query. Files matching the criteria are supplied to the metadata extraction application , which will read the file object from the local directoryand provide any custom metadata from the files formatted in an XML format. HCP custom object metadata enhancement tool will then ingest the custom metadatato augment the corresponding HCP objects . 12. 12Figure 5. Detailed HCP Custom Metadata Enhancement Tool Process Flow During anAugment Function: Post-processor Extracts Custom Metadata, Augments Existing HCPObjects HCP Namespace Usage by HCP Custom Object Metadata Enhancement Tool HCP provides access to the repository as partitioned namespaces. A namespace is a logical grouping of objects such that the objects in one namespace are not visible in any other namespace. To the user of a namespace, the namespace is the repository and it may appear as a network- accessible mount point. This brief introduction allows for the discussion about source and destination locations, but more detail on HCP and namespaces are provided below. Source or Destination Locations HCP custom object metadata enhancement tool provides flexibility in the input source location as well as the output destination. In the case of an ingest operation, the input source could be from a file system on the machine where HCP custom object metadata enhancement tool is running, or from a file system on a network-mounted remote directory. For an augmentation operation, the objects would be sourced from the root folder in either an HCP default namespace or an authenticated namespace. The destination of any HCP custom object metadata enhancement tool file operation is always to an HCP repository, but either namespace is allowed. The destination namespace can be the same as the source namespace, or it can be a different namespace. The path should contain the root folder within the appropriate namespace. A summary of the allowable locations is shown in Table 1. 13. 13 TABLE 1. HCP CUSTOM METADATA ENHANCEMENT TOOL: ALLOWABLE SOURCE AND DESTINATION LOCATIONSObject LocationFile System HCP Default HCP Authenticated Namespace NamespaceSource Input Yes YesYesDestination Output NoYesYes Reference Architecture and Host Implementation Guidelines In a typical implementation, HCP custom object metadata enhancement tool runs on a host machine, which is not part of HCP. The tool requires minimal resources, and the host machine could either be a physical machine or a virtual machine. The processor, memory and storage requirements are driven more by the plug-in metadata extraction application as well as the size of the objects and the required object process rate. If possible, administrators should provide adequate memory to allow the operating system to keep the object, as well as the metadata extraction application resident in memory since the application will be called repeatedly (i.e. for every new object to be processed). Since HCP custom object metadata enhancement tool requires only a single machine (physical or virtual), its reference architecture is more dependent on the HCP implementation than the tool node. Figure 6 depicts an example implementation with the tools physical node connected to a 4-node HCP 500 system. This HCP was configured with failover and uses modular storage with LUNs provisioned from individual RAID groups. The tool node in this diagram shows the new content being sourced from either a local directory on the node, or from a remote directory (but not both). 14. 14Figure 6. HCP Custom Object Metadata Enhancement Tool Reference Architecture: HCPImplementation as a 4-node HCP 500 Supporting Failover Using Modular Storage withLUNs Provisioned from Individual RAID Groups Example Proof of Concept Implementation As a proof of concept demonstration, both HCP custom object metadata enhancement tool functions were utilized. The tool was first used to enhance existing objects previously ingested, but which required augmentation with newly provided geospatial metadata information. The demonstration also ingested new objects augmented with the corresponding geospatial-based metadata. The pluggable metadata application used was an open-source geographic information system (GIS) program called FWtools [3]. FWtools provides the ability to view geospatial information from a variety of format types, while also providing the ability to extract the metadata for the supported file types including the National Imagery Transmission Format (NITF) [4]. NITF files are used by federal agencies and system integrators focused on correlating information in the objects with geospatial information, all from multiple events and data sources. Parameters and Configuration Settings HCP custom object metadata enhancement tool has a number of tunable parameters and configuration settings that must be properly set before starting normal operation. All of these settings can be found in the "ingestor.properties" file. All of the settings are listed in Table 2 along with the corresponding description. 15. 15 TABLE 2. TUNABLE HCP CUSTOM OBjECT METADATA ENHANCEMENT TOOL PARAMETERS AND CONFIGURATION SETTINGS ParameterDescription source.pathLocal path to the directory that contains the data to ingest source.maxBatchSizeMaximum number of file handles to "batch" per loop iteration destination.user HCP data access: user to use for ingest destination.password HCP data access password for destination.user account destination.passwordEncodedIndication if the destination.password value is encoded in md5 format destination.rootpath Root path REST URL to HCP to place content metadata.classes Comma separated, ordered list of class(es) to load to extract metadata fromfiles execution.loopcountNumber of times to load up the batch with files to process execution.stopRequestFileName of file in process, local directory to watch for to indicate to stopprocessing execution.pauseRequestFile Name of file on local machine to watch for to indicate to pause processing: Foras long as the file exists, the program will be paused. Delete the file to resume.Changing this value while the program is in the paused state will not cause thenew value to be used until resumed. execution.deleteSourceFilesIndicates whether the source files should be deleted after written to HCP: If thefile does not have correct permissions, attempt to change and try again. execution.forceDeleteSourceFiles Indicates whether the source file permissions should be forced to be deletedby changing the source file permissions execution.deleteSourceEmptyDirsIndicates whether the empty directories in the source files should be periodi-cally cleaned up execution.updateMetadata Indicates whether metadata should be updated for existing metadata onobjects in HCP: If set to false, source files will be ignored (but deleted, ifindicated). execution.pauseSleepInSecondsNumber of seconds to sleep during pause for between checks for resume execution.batchSleepInSecond Number of seconds to sleep at end of batch run before attempting anotherbatch execution.debugging.httpheadersIndicates whether HTTP headers should be written to the console (stdout) Hitachi Content Platform Primer The functionality described here is based on Hitachi Content Platform version 4.1, but some content might be applicable to prior HCP versions. About Hitachi Content Platform Hitachi Content Platform is a distributed storage system designed to support large, growing repositories of fixed-content data. HCP stores objects that include both data and the corresponding metadata. It distributes these objects across the storage space but still presents them as files in a standard directory structure. 16. 16 HCP provides access to stored objects through the HTTP protocol, as well as through user interfaces such as the namespace browser and search console. HCP is a combination of hardware and software that provides an object-based data storage environment. An HCP repository stores all types of data, including simple text files as well as multigiga- byte satellite, medical or database images. HCP provides easy access to the repository for adding, retrieving and deleting the stored data. HCP uses write once, read many (WORM) storage technology and a variety of policies and internal processes to ensure the integrity of the stored data and the efficient use of storage capacity. Key features of HCP include: Scalability up to 40PB of storage in a single cluster Capability to provision a single cluster into multiple virtual content platforms ("tenants"), each withits own unique configuration and access control to manage data placement and content distribu-tion to appropriate audiences Connection capabilities to a wide range of applications and protocols via http, REST, NFS, CIFSand more High data integrity, with data integrity checking, RAID-6, replication, encryption, WORM, multipleversions of objects and audit logging Automation of data migration from old storage to new storage Management and enforcement policies for retention, disposal, shredding and other complianceand lifecycle management operations Increased value of unstructured data using metadata and custom metadata for automation andsearch Capability to create a single, multipurpose, unstructured data platform for archive, cloud andbackup capabilities Capability to monitor and report on storage and bandwidth use of different tenants for charge-back Enhanced management capabilities with comprehensive interfaces for cloud and distributedenvironments Scalability to branch and remote offices via Hitachi Data Ingestor The following section introduces basic HCP concepts and includes information regarding HCP namespaces. Object-based Storage HCP stores objects in the repository. Each object permanently associates data HCP receives (for example, a file, an image or a database) with information about that data, called metadata. An object encapsulates: Fixed-content data, which is an exact digital reproduction of data as it existed before it wasstored. Once it is in the repository, this fixed content data cannot be modified. System metadata offers system-managed properties that describe the fixed-content data (forexample, its size and creation date). System metadata includes settings, such as retention and 17. 17data protection level, that influence how transactions and internal processes affect the object. Custom metadata is metadata that a user or application provides to further describe an object. Itis typically specified as XML and can be used to create self-describing objects. Future users andapplications can use this metadata to understand and repurpose the object content. Namespaces and Tenants An HCP repository is partitioned into namespaces. A namespace is a logical grouping of objects such that the objects in one namespace are not visible in any other namespace. To the user of a namespace, the namespace is the repository. Namespaces provide a mechanism for separating the data stored for different applications, business units, or customers. For example, a deployment could have one namespace for accounts receivable and another for accounts payable. Namespaces also enable operations to work against selected subsets of repository objects. For example, a query could be performed that targets the accounts receivable and accounts payable namespaces but not the employee namespace. Namespaces are owned and managed by administrative entities called tenants. A tenant typically corresponds to an actual organization such as a company or a division or department within a company. A tenant can also correspond to an individual person. Namespace Access HCP provides several techniques for accessing and managing data in the namespace. These include: REST interface Metadata query API Namespace browser Search console Hitachi Data Migrator HCP client tools REST Interface Clients use an HTTP-based REST interface to access the namespace. Using this interface, actions can be performed such as adding objects to the namespace, viewing and retrieving objects, changing object metadata and deleting objects. The namespace can be accessed programmatically with applications, interactively with a command-line tool or through a graphical user interface (GUI). Figure 7 shows the relationship between original data, objects in a namespace and the HTTP access protocol. 18. 18Figure 7. Client-HCP Namespace: Relationship between Original Data, Objects in aNamespace and HTTP Access to the HCP Data Store Metadata Query API HCP allows clients to use HTTP requests to find objects that meet specific criteria, including object change time, index setting, operations on the object and the object location. If the client has the appropriate permissions, it can query multiple namespaces, and a single request can query multiple HCP namespaces and the default namespace. A metadata query to HCP will return a set of records containing metadata that describes the matching objects. If the query matches a large number of objects, multiple requests can be used to page sequentially through the records and retrieve only a specific number of records in response to each request. Namespace Browser The HCP namespace browser provides management of the namespace content and the ability to view information about namespaces. The browser functions include: List, view, and retrieve objects and versions of objects Create empty directories Store and delete objects Display namespace information, including: The namespaces that can be accessed Retentionclasses for use within a namespace Permissionsfor namespace access Statistics about a namespace Search Console The HCP search console is an easy-to-use web application that provides the capability to search for and manage objects based on specified criteria. For example, a search for objects stored before a certain date or larger than a specified size could then be deleted or marked accordingly to prevent them from being deleted. 19. 19 The search console works with either of 2 implementations, which must be enabled at the HCP system level: The Hitachi Data Discovery Suite (HDDS) search facility interacts with HDDS, which performssearches and returns results to the HCP search console. HDDS is a separate product from HCP. The HCP search facility is integrated with HCP and works internally to perform searches andreturn results to the search console. Only one of the search facilities can be enabled in the HCP GUI at any given time. If neither is enabled, HCP does not support using the search console to search namespaces. The system associated with the enabled search facility is called the active search system. The active search system (that is, HDDS or HCP) maintains an index of data objects in each search- enabled namespace. The index is based on object content and metadata. The active search system uses the index for fast retrieval of search results. When objects are added to or removed from the namespace or when object metadata changes, the active search system automatically updates the index to keep it current. For information on using the search console, please reference [5]. Note: Not all namespaces support search if the namespace administrator has not enabled search. Hitachi Data Migrator Hitachi Data Migrator is a high-performance, multithreaded client-side utility for viewing, copying, and deleting data. Data Migrator functions include: Copy objects, files and directories between local file systems, HCP namespaces and earlier HCParchives Delete objects, files and directories, including performing bulk delete operations View the content of objects and files, including the content of old versions of objects Rename files and directories on the local file system View object, file and directory properties Create empty directories Add, replace or delete custom metadata for objects Data Migrator has both a GUI and a command-line interface (CLI). For information on using Data Migrator, please reference [6]. HCP Client Tools HCP comes with a set of command-line tools that allows data to be copied or moved between a client and an HCP system. The tools also provide a search capability using specified criteria. Additionally, empty directories can be created in a local or remote file system or on an HCP system. The client tools support multiple namespace access protocols and multiple client platforms. The command syntax is the same for all supported configurations. 20. 20 For information on installing and using the client tools, please reference [7]. Note: For most purposes, the HCP client tools have been superseded by Hitachi Data Migrator. However, they have some features, such as finding files that are not available in Data Migrator. Transmitting Data in Compressed Format Object data or custom metadata can be compressed in gzip format to save bandwidth before sending it to HCP. The PUT request contains the subrequest to tell HCP that data is compressed. HCP will then know to decompress the data before storing it. Similarly, in a GET request, HCP can be told to return object data or custom metadata in compressed format. In this case, the returned data must first be decompressed before use. HCP supports only the gzip algorithm for compressed data transmission. HCP can be told that the request body is compressed by including a Content-Encoding header with the value gzip. In this case, HCP uses the gzip algorithm to decompress the received data. HCP can be told to send a compressed response by specifying an Accept-Encoding header. If the header specifies gzip, a list of compression algorithms that includes gzip, or *, HCP uses the gzip algorithm to compress the data before sending it. For examples of sending and receiving objects in compressed format, please reference Chapter 4, "Working with objects and versions" in [8]. Notes: HCP can also compress and decompress metadata query API requests and responses.For more information on this, please reference the HCP product document titled "Using aNamespace," in the section titled "Request HTTP elements." Since HCP normally compresses stored object data and custom metadata, it is unnecessaryto explicitly compress objects for storage. However, if gzip-compressed objects or custommetadata are to be stored, do not use a Content-Encoding header. To retrieve stored gzip-com-pressed data, do not use an Accept-Encoding header. Data Access Permissions All namespace access clients must have permission to access and perform actions on data. Table 3 describes the permissions and the operations allowed. 21. 21 TABLE 3. HCP PERMISSIONS AND ALLOWABLE OPERATIONS Permission Operations Read yRetrieve objects and system metadata.yCheck for object existence.yCheck for and retrieve custom metadata. WriteyAdd objects.yCreate directories.y and change system and custom metadata.Set Delete Delete objects, empty directories and remove custom metadata. PurgeDelete objects and their historical versions. Privileged yDelete or purge objects regardless of retention.yPlace objects on hold. Search Search for objects. For information on this, please reference Chapter 8 Usingthe HCP metadata queryAPI Conduct search in [8]. Some operations require multiple permissions. For example, to place an object on hold, the user must have both write and privileged permissions. Similarly, performing a privileged purge will require delete, privileged and purge permissions. Permissions are set at 2 levels: Namespace-level permissions. This permission mask specifies the maximum permissions forany user that accesses the namespace. Data access account. This specifies permissions for an individual user. Accessing anamespace will require a data access account with a username and password. The accountspecifies available namespaces and associated permissions. The required permissions for a particular operation must be enabled in both the namespace-level permission mask and the corresponding data access account permissions. Replication Replication is the process of keeping selected tenants and namespaces in 2 HCP systems in sync with each other. Basically, this entails copying object creations, deletions and metadata changes from one system to the other. HCP also replicates the tenant and namespace configuration, data access accounts and retention classes. The HCP system in which the objects are initially created is called the primary system. The 2nd system is called the replica. Replication has several purposes, including: If the primary system becomes unavailable (for example, due to network issues), the replica canprovide continued data availability. If the primary system suffers irreparable damage, the replica can serve as a source for disasterrecovery. 22. 22 If an object cannot be read from the primary system (for example, because a server is unavailable), HCP can try to read it from the replica. Note: Replication is an add-on feature to HCP. Not all systems include it. Namespace Operations Familiar commands and tools are used to perform operations on a namespace. Some operations relate to specific types of metadata. For more information on this metadata, please reference Chapter 2, "Understanding objects" section in [8]. Operations that store or retrieve data can optionally transmit the data in gzip-compressed format. For more information on this, see the individual commands used for those operations. Operation Restrictions The operations that can be performed are subject to the following restrictions: The HTTP request headers must include valid user information. The namespace must be configured to allow HTTP or HTTPS access from the client IP address. The namespace configuration and user permissions must allow the operation. For information on user permissions, please reference Chapter 10, "Using the Namespace Browser" in [8]. Supported Operations The following operations can be performed on a namespace: Write data to the namespace. If versioning is enabled, store new versions of existing objects. Override default metadata when storing an object. Create an empty directory in the namespace. Check for object existence. View the content of an object. View object metadata. Delete an object. Delete an empty directory. Set retention for an object that has none. Extend the retention period for an object. Set or change a retention class for an object. Hold or release an object. Enable shredding of an object. Change the index setting for an object. Add, replace or delete custom metadata for an object. Add or retrieve object data and custom metadata in a single operation. 23. 23 Check for and read custom metadata. List retention classes available in the namespace. List namespace permissions for the user. List the namespace statistics. List the accessible namespaces. Use the HCP metadata query API to get information about objects that meet specified criteria inone or more namespaces. Prohibited operations HCP never allows users to: Rename an object or directory. Overwrite a successfully stored object. However, if versioning is enabled, new versions of anobject can be written. Modify the fixed-content portion of an object. Delete an object that is under retention if the privileged permission is not granted or if thenamespace is configured to prevent this operation. Delete a directory that contains one or more objects. Shorten an explicitly set retention period. REST Interface Primer The Representational State Transfer (REST) interface is a behavioral model used by many database and distributed web applications. Its beauty lies is in its simplicity. From the Wikipedia definition: REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. A resource can be essentially any coherent and meaningful concept that may be addressed. A representation of a resource is typically a document that captures the current or intended state of a resource. At any particular time, a client can either be in transition between application states or "at rest." A client in a rest state is able to interact with its user, but creates no load and consumes no per-client storage on the servers or on the network. The client begins sending requests when it is ready to make the transition to a new state. While one or more requests are outstanding, the client is considered to be in transition. The representation of each application state contains links that may be used next time the client chooses to initiate a new state transition. REST was initially described in the context of HTTP, but is not limited to that protocol. RESTful architectures can be based on other Application Layer protocols if they already provide a rich and uniform vocabulary for applications based on the transfer of meaningful representational state. RESTful applications maximize the use of the pre- 24. 24 existing, well-defined interface and other built-in capabilities provided by the chosen network protocol, and minimize the addition of new application-specific features on top of it. Service Offerings Customization and support services are available. Please contact your HDS Account Manager for additional information. 25. 25 Appendix A: References [1] Hitachi Content Platform (HCP): http://www.hds.com/assets/pdf/hitachi-datasheet-content- platform.pdf [2] REST interface: http://en.wikipedia.org/wiki/Representational_State_Transfer [3] FWTools for GIS imaging: http://fwtools.maptools.org [4] National Imagery Transmission Format (NITF) files: http://en.wikipedia.org/wiki/National_Imagery_ Transmission_Format [5] HCP "Searching Namespaces" manual, part of the HCP Product Documentation Set [6] HCP "Using HCP Data Migrator" manual, part of the HCP Product Documentation Set [7] HCP "Using the HCP Client Tools" manual, part of the HCP Product Documentation Set [8] HCP "Using a Namespace" manual, part of the HCP Product Documentation Set 26. 26 Appendix B: Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to [email protected], [email protected], [email protected] or [email protected]. Please be sure to include the title of this white paper in your email message. 27. Corporate Headquarters Regional Contact Information750 Central Expressway Americas: +1 408 970 1000 or [email protected] Clara, California 95050-2627 USA Europe, Middle East and Africa: +44 (0) 1753 618000 or [email protected] Pacific: +852 3189 7900 or [email protected] is a registered trademark of Hitachi, Ltd., in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd., in the UnitedStates and other countries.All other trademarks, service marks and company names in this document or website are properties of their respective owners.Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered byHitachi Data Systems Corporation. Hitachi Data Systems Corporation 2011. All Rights Reserved. WP-410-A DG October 2011

Hitachi content platform custom object metadata enhancement tool

Technology