Package ‘scidb’ March 27, 2013 Type Package Title An R interface to SciDB Version 1.0-1 Date 2013-03-27 Author Paradigm4, B. W. Lewis Maintainer B. W. Lewis <[email protected]> Copyright Paradigm4, Inc. Description SciDB is an open-source array database (http://scidb.org). The scidb package provides an R interface to SciDB. BugReports https://github.com/Paradigm4/SciDBR/issues Depends stats, methods, iterators License AGPL-3 LazyLoad yes NeedsCompilation yes Repository CRAN Date/Publication 2013-03-27 18:17:28 R topics documented: scidb-package ........................................ 2 as.scidb ........................................... 3 between ........................................... 4 df2scidb ........................................... 5 dim.scidb .......................................... 6 dimnames.scidb ....................................... 7 iquery ............................................ 8 1
25
Embed
Package ‘scidb’ · SciDB attributes representing each column of the data.frame. The functions as.scidband df2scidb are equivalent in this use case. The SciDB array row and column
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
X A matrix of double-precision floating point values or a data.frame.
name The name of the SciDB array to create, defaulting to the R variable name ifavailable.
rowChunkSize Maximum SciDB chunk size for the 1st array dimension.
colChunkSize Maximum SciDB chunk size for the 2nd array dimension (ignored for vectorsand data.frames).
start Starting dimension numeric index value or values.
gc Set to TRUE to remove SciDB array when R object is garbage collected or Rexists. FALSE means SciDB array persists.
... additional arguments to pass to df2scidb (see df2scidb).
Details
Used with a matrix or vector argument, the as.scidb function creates a single-attribute SciDBarray named name and copies the data from X into it, returning a scidb object reference to the newarray. The SciDB array will be 1-D if X is a vector, and 2-D if X is a matrix.
If X is a data.frame, then as.scidb creates a one-dimensional multi-attribute SciDB array, withSciDB attributes representing each column of the data.frame. The functions as.scidb and df2scidbare equivalent in this use case.
The SciDB array row and column chunk sizes are set to the minimum of the number of rows andcolumns of X and the specified rowChunkSize and colChunkSize arguments, respectively. Thecolumn chunk size argument is ignored if the X is a vector.
This function supports double-precision, integer (32-bit), logical, and single-character array at-tribute types.
## Not run:X <- matrix(runif(20),5)A <- as.scidb(X)as.scidb(iris)scidblist()print(A)
## End(Not run)
between between
Description
Use between to select contiguous subarrays in indexing operations. The between function wouldnot normally be used directly but rather inside bracket indexing operations as shown in the example.This function is designed to support efficient indexing of contiguous subarrays for arrays with non-integer dimensions.
Usage
between(a,b)
Arguments
a A SciDB array range bound (numeric or string in the case of non-integer dimen-sion)
b A SciDB array range bound (numeric or string in the case of non-integer dimen-sion)
Value
A function that evaluates to a list of the specified bounds.
Note
Between requires argument values that correspond to the array dimension types (no dimensioncasting is performed).
## Not run:# Build a two-dimensional array called ’yikes’ with a non-integer dimensionscidbremove(c("cazart","yikes"),error=warning)iquery("store(apply(build_sparse(<val:double>[i=0:9,5,1,j=0:20,2,0],random()/1000000000,i<j or j=20),a,format(i,’aiquery("create array yikes<val:double>[a(string)=*,5,1,j=0:20,2,0]")iquery("redimension_store(cazart,yikes)")
name The SciDB array name, defaults to the R variable name if available.
dimlabel Name the SciDB dimension.
chunkSize The SciDB chunk size.
rowOverlap The SciDB chunk overlap.
types An optional vector explicitly specifying the SciDB attribute types. Length mustmatch the number of columns of the data frame.
nullable An optional vector indicating the SciDB nullable property of each attribute.Length must match the number of columns of the data frame.
real_format The format string used to print real values.
gc Optional logical value. If TRUE, then resulting SciDB array will be garbage-collected when the R variable referencing it is. The default value is FALSE.
6 dim.scidb
Details
df2scidb is a workhorse utility function that transfers an R data frame into a 1-D SciDB array viaintermediate CSV formatting. The columns of the data frame correspond to attributes in the SciDBarray. The iquery function returns query results using a similar method as R data frames.
Value
NULL is invisibly returned. SciDB errors are propagated as R error conditions.
## Not run:# Build a two-dimensional array called ’yikes’ with a non-integer dimensionscidbremove(c("cazart","yikes"),error=warning)iquery("store(apply(build_sparse(<val:double>[i=0:9,5,1,j=0:20,2,0],random()/1000000000,i<j or j=20),a,format(i,’aiquery("create array yikes<val:double>[a(string)=*,5,1,j=0:20,2,0]")iquery("redimension_store(cazart,yikes)")
yikes <- scidb("yikes")
dimnames(yikes)
# Here is an alternate approach that iterates through dimension names# along one dimension for huge arrays:i <- iquery("yikes:a", return=TRUE, iterative=TRUE)
8 iquery
nextElem(i)
## End(Not run)
iquery Simple SciDB query tool
Description
Issue SciDB queries and optionally return output in a data frame.
query A SciDB query string (character). Separate multiple queries with semicolons.
return Set to TRUE to return output. Otherwise don’t return query output. Only avail-able when afl=TRUE
afl TRUE indicates query is in AFL form, FALSE indicates AQL.
iterative Set to TRUE to return a result iterator. FALSE returns entire result at once.
n Maximum number of rows to return when iterating through results.
excludecol An optional numeric range of columns to exclude from iterative results (onlyapplies when iterative=TRIE).
... Options passed on to read.table used to parse results.
Details
The iquery function is a simple analog of the command-line SciDB iquery program.
Value
If return=TRUE, return the query result in data frame form (similar to the command-line -olcsv+output option).
If return=FALSE, return the query ID number.
SciDB errors encountered during query processing are propagated to R and can be handled withnormal R error handling mechanisms.
Set itreative=TRUE to return a result iterator. Use the iterator nextElem function to iterativelyreturn results, a maximum of n results at a time. See help in the iterators package for examplesand options.
## Not run:iquery("list(’instances’)",return=TRUE)
# A simple example that iterates through results using foreach# Build an array with 1 million numbers from zero to 1.iquery("store(build(<x:double>[i=1:1000000,100000,0],i/1000000),X)")# Apply a function and return result in an iterator:i <- iquery("apply(X, y, sin(x))", return=TRUE, iterative=TRUE)
# Sum up x and y (and dimension i too)library("foreach")foreach(j=i, .combine=function(...)colSums(rbind(...)))
# Compare with the much faster equivalent inside SciDB:iquery("aggregate(apply(X, y, sin(x)),sum(x),sum(y))", return=TRUE)
Return 2nd dimension names of a SciDB array with two or more dimensions, or just the singledimension names of a 1-d SciDB array reference object. Warning! Huge SciDB arrays can retuntoo much. See scidb for an example alternative approach.
Create an array-like R object reference to a SciDB array.
Usage
scidb(name, attribute = "", ‘data.frame‘, gc)
Arguments
name Name of the SciDB array to reference.
attribute Name of an attribute within the array.
data.frame Return a data.frame-like object (requires 1D SciDB array).
gc TRUE means SciDB array shall be removed when R object is garbage collectedor R exists. FALSE means SciDB array persists.
Details
The referenced array may be any SciDB array. One-dimensional SciDB arrays may be referencesas data.frame-like objects in which the SciDB array attributes appear as data.frame columns, or asone-column matrices.
SciDB arrays of dimension 2 or more appear as R arrays. In such cases, only one attribute may bereferenced per object. (The SciDB array may contain multiple attributes.)
If the SciDB array contains more than one attribute, use the attribute argument to specify whichattribute to use in the reference object. The first listed attribute is used by default if no attribute isspecified.
The scidb class supports sparse and dense SciDB arrays of any dimension. Attribute types real,integer (32-bit), logical, and single-character (one byte) are supported. Integer and non-integerdimensions are supported, with some limitations described below.
R does not have a native 64-bit integer type. SciDB uses 64-bit integer dimensions. The scidb pack-age uses R double-precision floating point integers to index SciDB integer dimensions, restrictingR to dimension values below 2^(53).
String-valued SciDB non-integer dimensions are also supported. Other types of SciDB non-integerdimensions are not supported yet by the scidb package.
14 scidb
The scidb class generally follows SciDB indexing convention, which sometimes differs from Rindexing conventions. In particular, note that the starting SciDB integer index is arbitrary, but oftenzero. (The upper-left corner of R arrays is always indexed by [1,1,...].) SciDB integer dimensionindices will be displayed as R dimension names. Subarray indexing operations use the SciDB con-vention. Thus, zero and negative indices are literally interpreted and passed to SciDB. In particular,negative indices do not indicate index omission, unlike standard R arrays.
With the exception of the empty indexing operation, [], subarray indexing operations return newSciDB reference array objects. Use the empty indexing operation to materialize data from theSciDB backing array into a normal R array.
The scidb class omits dimension indices with all empty cells in sparse SciDB arrays when materi-alizing data as R arrays with the [] operator.
Use the between function to efficiently index rectangular contiguous subarrays.
Subsets of sparse SciDB arrays are bounded by the data extent and returned to R as dense arrayswith a default fill-in value in the empty cells. The default empty value may be specified duringindexing with the default= argument (see the examples below), and gloablly using the packageoption options(scidb.default.value=NA) (see scidb-package). See the vignette examples fora more compete discussion of sparsity.
Value
A scidb object that references the indicated SciDB array.
## Not run:scidbconnect()# A basic 1-d array, showing selection of attribute:df2scidb(iris,nullable=FALSE)A <- scidb("iris", attribute="Petal_Width")dim(A)
# A sparse 3-d array example:scidbremove(A, error=warning)iquery("store(build_sparse(<val:double>[i=0:9,10,0,j=0:9,5,0,k=0:9,2,0],k,k<99 and (j=1 or j=3 or j=5 or j=7)),A)")A <- scidb("A")
# Indexing operations return new SciDB arrays:A[,2:3,5:8]
# But, their data can be materialized into an R array with []:A[,2:3,5:8] []
scidb-class 15
# A sparse 2-d array.scidbremove(c("A"), error=warning)iquery("store(build_sparse(<val:double>[i=0:9,5,0,j=0:9,5,0],i,i=j),A)")A <- scidb("A")
# The materialization process from sparse SciDB arrays to R uses a default# fill-in value globally specified in options("scidb.default.value"). Or, use# the ’default’ argument to change the default:scidbremove(c("A","B"),error=warning)iquery("store(build_sparse(<x:double>[i=0:9,5,0,j=0:9,5,0],1,i=(j-1)),A)")iquery("create array B<x:double>[a(string)=10,5,0,b(string)=10,5,0]")iquery("redimension_store(apply(A,a,’a’+string(i),b,’b’+string(j)),B)")B <- scidb("B")
B[]B[,default=0]
## End(Not run)
scidb-class Class "scidb"
Description
A class that represents SciDB arrays as R arrays.
Objects from the Class
Objects can be created by calls of the form new("scidb", ...), scidb("ARRAY_NAME", ...), oras.scidb(R_MATRIX, ...).
Slots
call: Object of class "call" How scidb was called.
name: Object of class "character" scidb array name.
D: Object of class "list" List of scidb dimension information.
dim: Object of class "numericOrNULL" Vector of dimension lengths.
length: Object of class "numeric" Length of array object.
attribute: Object of class "character" The SciDB array attribute in use by scidb array referenceobject (only one attribute may be referenced at a time).
attributes: Object of class "character" Vector of all available attributes for the SciDB array.
nullable: Object of class "logical" Is the attribute nullable (TRUE/FALSE)?
type: Object of class "character" SciDB type of the referenced attribute.
types: Object of class "character" Vector of SciDB types for all the array attributes.
16 scidb-class
gc: Object of class "environment" An environment used to link the SciDB array to the R garbagecollector.
.S3Class: Object of class "character" ~~
Methods
%*% signature(x = "matrix", y = "scidb"): ...
%*% signature(x = "scidb", y = "matrix"): ...
%*% signature(x = "scidb", y = "scidb"): ...
crossprod signature(x = "matrix", y = "scidb"): ...
crossprod signature(x = "scidb", y = "matrix"): ...
crossprod signature(x = "scidb", y = "scidb"): ...
tcrossprod signature(x = "matrix", y = "scidb"): ...
tcrossprod signature(x = "scidb", y = "matrix"): ...
tcrossprod signature(x = "scidb", y = "scidb"): ...
is.scidb signature(x = "scidb"): ...
is.scidb signature(x = "ANY"): ...
print signature(x = "scidb"): ...
head signature(x = "scidb"): ...
filter signature(x = "scidb", y = "character"): ...
Display a heatmap-like image of the 2-d scidb array reference object x. grid(m,n) specifiesthe repartitioned array block sizes and op is a valid SciDB aggregation function applied to therepartitioned chunks.
show signature(object = "scidb"): ...
Notes
SciDB arrays are general n-dimensional sparse arrays with integer or non-integer dimensions. Thescidb class represents SciDB arrays in a way that mimics standard R arrays in many ways.
Matrix arithmetic operations are overloaded for 2-D arrays.
host The host name or I.P. address of the SciDB database instance to connect to(character).
port The port number of the SciDB database simple HTTP service to connect to(integer).
Details
The SciDB connection state is maintained internally to the scidb package. We internalize state tofacilitate operations involving scidb objects.
Thus, only one open SciDB connection is supported at a time.
One may connect to and use multiple SciDB databases by sequentially calling scidbconnect be-tween operations. Note that scidb objects are not valid across different SciDB databases.
Value
NULL is invisibly returned. SciDB connection errors are propagated to R and may be handled withthe usual R error handling mechanisms.
Note
Disconnection is automatically handled by the package.
Group the 1D SciDB array x by the indicated formula, applying the SciDB aggregate expres-sion FUN, specified as a character string, to the groups. The SciDB aggregate expression mustbe named using the SciDB "as" keyword. This function will redimension the SciDB array andapply the aggregate. For example: A = as.scidb("iris"); g = aggregate(A, Petal_Length ~ Species, "avg(Petal_Length) as mean")
show signature(object = "scidbdf"): ...
scidbdisconnect 19
Notes
Like the related scidb class, the scidbdf class represents SciDB arrays as R objects. The scidbdfclass presents 1-D SciDB arrays, potentially with many SciDB attributes (variables), as a data.frame-like object.
SciDB connections are automatically disconnected by the package in normal practice. The scidbdisconnectfunction forces the current connection to disconnect.
XXX ~~ If necessary, more details than the description above ~~
Value
XXX ~Describe the value returned XXX If it is a LIST, use XXX
comp1 Description of ’comp1’
XXX
comp2 Description of ’comp2’
XXX ...
scidbremove 21
Note
XXX ~~further notes~~
Author(s)
XXX ~~who you are~~
References
XXX ~put references to the literature/web site here ~
See Also
XXX ~~objects to See Also as help, ~~~
Examples
##---- Should be DIRECTLY executable !! ----##-- ==> Define data, use random,##--or do help(data=index) for the standard data sets.
## The function is currently defined asfunction (e1, e2){
x = basename(tempfile(pattern = "array"))scidbquery(paste("store(multiply(", e1@name, ",", e2@name,
"),", x, ")", sep = ""))return(scidb(x))
}
scidbremove Remove an array.
Description
Remove (delete) an array from SciDB.
Usage
scidbremove(x, error = stop)
Arguments
x The name of the SciDB array to remove (character). This may also be a vectorof names to remove.
error Error handling function.
22 str.scidb
Value
NULL is invisibly returned. SciDB errors are returned as normal R errors and may be handledaccordingly.
Note
Supply a user-defined error handling function to avoid stopping on error (for example trying todelete a non-existing array). For example, use error=warning to covert errors to warnings but stillreport them.