DbCorbaClient / SamApi (or, what I did on my Thanksgiving vacation...) Lauri Loebel Carpenter 10 December 2002
DbCorbaClient / SamApi
(or, what I did on my Thanksgiving vacation...)
Lauri Loebel Carpenter
10 December 2002
DbCorbaClient / SamApi
• What dbServer does now– difficulties being addressed– hopeful outcomes
• New approach– what the new packages do– what the client code is enabled to do
• Specific use cases:– SamApiCore, SamApiClient– other potential uses...
• Still needs...
Take one SAM database...
Database Tables:SQL> describe stations Name Null? Type ----------------------------------------- -------- ---------------------------- STATION_ID NOT NULL NUMBER(38) STATION_NAME NOT NULL VARCHAR2(50) CREATE_DATE NOT NULL DATE CREATE_USER NOT NULL VARCHAR2(50) UPDATE_DATE DATE UPDATE_USER VARCHAR2(50) STATION_DESC VARCHAR2(4000) LIFE_CYCLE_STATUS_ID NUMBER(38) STATION_MONITOR_LEVEL_ID NUMBER(38)
add a pinch of oracleParser...DictStations.py:dict = { 'aspects': [ { 'cppname': '_lifeCycleStatusId', 'dbname': 'LIFE_CYCLE_STATUS_ID', 'dbtype': 'number', 'desc': 0, 'duplicate': 0, 'id': 0, 'idltype': 'long', 'ooname': 'lifeCycleStatusId', 'queryable': 1}, { 'cppname': '_stationMonitorLevelId', 'dbname': 'STATION_MONITOR_LEVEL_ID', 'dbtype': 'number', 'desc': 0, 'duplicate': 0, 'id': 0, 'idltype': 'long', 'ooname': 'stationMonitorLevelId', 'queryable': 1}, ......
and a dash of Dict_Modifier...
dictModifiers = { 'AnalysisProjects' : { 'dbname' : { 'start_time' : { 'dbtype' : 'datetime' }, 'project_desc' : { 'dbcase' : 'mixed' }, 'end_time' : { 'dbtype' : 'datetime' } } }, ... 'Stations' : { 'dbname' : { 'station_desc' : { 'dbcase' : 'mixed' } } },...
Combine thoroughly with dbgen:
Voila: DbStations.py:class DbStations: def getOne(self, theColumnList=[], theWhereDict={},
forUpdate=0, rollback=1): ... def get(self, theColumnList=[], theWhereDict={},
minrows=0, maxrows=0, forUpdate=0, rollback=1, orderList=[], limit=0): ...
(..also methods to query and getAttrList).
SAMDbServer Primitives:
DbStation.py SAMDbServerget, getOne, query
SAMDbServer samUserIDL
Problem: IDL Layer // Struct to pass back requests
struct RequestInfo { unsigned long requestId; string requestStatus; string createDate; string createUser; unsigned long jobName; }; typedef sequence<RequestInfo> RequestInfoList; // Struct to pass back requests struct FullRequestInfo { unsigned long requestId; string requestStatus; string workGroupName; string userName; unsigned long numberOfEvents; string comments; unsigned long priority; string email; string jobName; }; typedef sequence<FullRequestInfo> FullRequestInfoList; // Get 1 or more requests by workgroup and username RequestInfoList getMCRequest (in string workGroup,in string userName) raises (SAM::Exception);
Very specific methods/structures
• Slow turnaround time for developing queries – new dbServer method,– new IDL interface,– new SAMCommon
• d0ora1 is making *logic* decisions about what to do with the data
• fnorb is not good at marshalling/ unmarshalling complicated structures (very slow)
Need a way to pass arbitrary structures:
User: whatever = server.get(‘something’)
IDL Happens HERE
Server: whatever = internals.return(‘something’)or raise NoCanDo
Without having to know the exact structure and content of WHAT you know and what you WANT to know when you define the IDL.
Hence: DbCorbaClient layer
• Very simple interface for client code to do arbitrary get, getOne, query.
• Request is handled by the DbServer– no oracleClient required on client end
• IDL interface remains CONSTANT, does not need to be changed!
• All done via an IDL Dictionary (or DictionaryList) structure of CorbaAny values.
At the heart: corbaDictionary
• corbaDictionary( {pyDict} ) converts to IDL list of SAM::AttrValue pairs
• corbaDictionary( IDLList ) converts to {pyDict}
• Handles the python “None” value internally – DbCorbaClient will return None values!
Example:Code:
#!/usr/bin/env pythonimport DbCorbaClients = DbCorbaClient.DbCorbaClient(‘DbStations’) # maps directly to DbStations.pyprint s.getAttrList() # maps directly to Dict aspectsprint s.get( [‘stationId’, ‘stationName’], {‘stationName’ : {‘oper’ : “like ‘%ana%’”}} )print s.getOne( [], {‘stationName’ : ‘central-analysis’} )print s.query( ‘select station_name from stations where’ + \ ‘ station_id in (1,2,3,4,5)’)
Output:['lifeCycleStatusId', 'stationMonitorLevelId', 'stationId', 'stationName', 'createDate', 'createUser', 'updateDate', 'updateUser', 'stationDesc'][[9, 'big_smp_analysis_server'], [1163, 'central-analysis'], [1353, 'linux-analysis-cluster-1'], [2003, 'ccin2p3-analysis']][1, 2, 1163, 'central-analysis', '10/19/1999', 'wellner', '10/29/2002', 'lauri', 'D0MINO'][['station1'], ['im'], ['station3'], ['station4'], ['station5']]
Return format: PY_DICT_FORMAT
• Default return format mimics the lists returned by get/getOne/query within the dbServer
• Optional PY_DICT_FORMAT can be used to return data as dictionaries instead of lists.
PY_DICT_FORMAT example:Code:
#!/usr/bin/env pythonimport DbCorbaClients = DbCorbaClient.DbCorbaClient(‘DbStations’,
returnFormat=DbCorbaClient.PY_DICT_FORMAT)print s.getAttrList()print s.get( [‘stationId’, ‘stationName’], {‘stationName’ : {‘oper’ : “like ‘%ana%’”}} )print s.getOne( [], {‘stationName’ : ‘central-analysis’} )
Output:['lifeCycleStatusId', 'stationMonitorLevelId', 'stationId', 'stationName', 'createDate', 'createUser', 'updateDate', 'updateUser', 'stationDesc'][{'stationId': 9, 'stationName': 'big_smp_analysis_server'}, {'stationId': 1163, 'stationName': 'central-analysis'}, {'stationId': 1353, 'stationName': 'linux-analysis-cluster-1'}, {'stationId': 2003, 'stationName': 'ccin2p3-analysis'}]{'createDate': '10/19/1999', 'stationMonitorLevelId': 2, 'lifeCycleStatusId': 1, 'createUser': 'wellner', 'stationName': 'central-analysis', 'updateDate': '10/29/2002', 'stationDesc': 'D0MINO', 'stationId': 1163, 'updateUser': 'lauri'}
This is the REAL POWER:
• We know how to pass arbitrarily complex python Dictionary Structures through CORBA.
• Using this, clients can now generate their own queries on the SAM data without requiring a new dbServer method for each and every one.
• corbaDictionary (from sam_common) encapsulates all of the complication of packing/unpacking the data to/from IDL
Bottom Line: Two important breakthroughs
• corbaDictionary– pass arbitrarily nested run-time
dictionaries through a “constant” interface
• DbCorbaClient– client can call get, getOne, query
without requiring an oracleClient license
First Application: SamApiCore
• Intelligent client wrapper around DbCorbaClient
• Each object represents ONE ROW of a dbTable (e.g., a getOne operation)
• adds our knowledge of “nameOrId” initialization
SamApiCore:
Direct mapping between Dict/DictModifier and object methods:
station = SamApiCore.SamApiCoreStation(nameOrId)person = SamApiCore.SamApiCorePerson(usernameOrId)...
station.stationName()station.stationId()...person.firstName()
Example:
Code:#!/usr/bin/env python
import SamApiCore
s = SamApiCore.SamApiCoreStation('central-analysis')
print "dbAttrList = %s" % s.getDbAttributeList()print "stationName = %s" % s.stationName()print "stationId = %s" % s.stationId()print "stationDesc = %s" % s.stationDesc()
Output:dbAttrList = ['lifeCycleStatusId', 'stationMonitorLevelId', 'stationId', 'stationName', 'createDate', 'createUser', 'updateDate', 'updateUser', 'stationDesc']stationName = central-analysisstationId = 1163stationDesc = D0MINO
Next Layer: SamApiClient
• Basic sorts of “container” objects for one “SAM” object– Consumer: has consumerId, contains SamApiCore
objects for station, project, application, etc.
– DataFile: has fileId, methods to get fileLineage, consumer who produced the file, etc.
• “get” methods implemented internally using “_getAndSetForFutureReference”– results are cached in the object
Example Client code:
#!/usr/bin/env pythonimport SamApiClientc = SamApiClient.SamApiClientConsumer(40916)print("Consumer id %s" % c.consumerId())print("\tuser = %s" % c.getUser())print("\tstation = %s" % c.getStation())print("\tapplicationName = %s, applicationVersion = %s, applicationFamily = %s"
% (c.getApplicationName(), c.getApplicationVersion(), c.getApplicationFamily()))print("\twork group name = %s" % c.getWorkGroupName())print("\tprojDefId = %s, projDefName = %s" % (c.getProjDefId(),
c.getProjDefName()))
p = SamApiClient.SamApiClientPerson( c.getUser() )wgList = p.getRegisteredWorkingGroups()print("%s is registered to use groups: %s" % (p.userName(), wgList))snapshotFileIdList = c.getSnapshotFileIdList()deliveredFileIdList = c.getDeliveredFileIdList()consumedFileIdList = c.getConsumedFileIdList()print("Snapshot contained %s files" % len(snapshotFileIdList))print("Station delivered %s files" % len(deliveredFileIdList))print("Consumer consumed %s files" % len(consumedFileIdList))
Output:
Consumer id 40916 user = jozwiak station = chris applicationName = test-harness, applicationVersion = 1,
applicationFamily = test-harness work group name = test projDefId = 62452, projDefName = test-harness__06-03-02-08-05-
54jozwiak is registered to use groups: ['mcc99', 'test', 'demo', 'algo',
'trigsim', 'muon', 'calalign', 'emid', 'muid', 'tauid', 'test5', 'test1', 'test2', 'test3', 'test4', 'online', 'd0production', 'dzero']
Snapshot contained 104 filesStation delivered 33 filesConsumer consumed 33 files
Another example: projDefTreeWalker.py
• Input: project definition name or id
• Output: all consumers that ever used this project definition, and anything you want to know about each consumer– which files were delivered/consumed– which application was used– what user/group/station/etc....
• ...in less than 70 lines of client code...
Samples to look at:
projDefTreeWalker:
source: http://d0db-dev.fnal.gov/sam_api/projDefTreeWalker.py
output: http://d0db-dev.fnal.gov/sam_api/projDefTreeWalker.output.txt
SamApiCore:
source: http://d0db-dev.fnal.gov/sam_api/coreTest.py
output: http://d0db-dev.fnal.gov/sam_api/coreTest.output.txt
SamApiClient:source: http://d0db-dev.fnal.gov/sam_api/clientTest.py
output: http://d0db-dev.fnal.gov/sam_api/clientTest.output.txt
Other potential use cases:
• Metadata client?– passing of arbitrarily nested dictionaries?– sounds ideal!
• Station monitoring utilities?– get(‘what’) instead of dump(‘all’)
• output formatted by the client application, not by the station
• ???...
Current Status:
• DbCorbaClient layer: complete– but testing by others may uncover problems in the
corbaDictionary layer (lists of lists of dictionaries, etc.?)
• SamApiCore: framework is complete, implementation is not.– I only implemented the tables I understand and use.
• SamApiClient: needs input from customers about how they wish to view the higher-level objects.
What else is needed?
• c++ implementation of corbaDictionary including the “None” value ... and then begin using this in the station code
• SamApiCore implementation for all DbTable.py files– enhancement to dbgen: auto-generate SamApiCore
classes
– a good shifter project?
... what else is needed?
• Should probably be converted to a directory structure a la python modules, rather than one big imported file
• tool to generate “sensible” documentation about the SamApiCore/SamApiClient classes– pydoc doesn’t quite make it...
• testing by folks other than myself
A big question:
• Are lots of little queries better than fewer incredibly big queries?– I hope so...– We may need to turn down the debugging in
the log files...
Further Reading:
Pydoc-generated documentation, SamApi:http://d0db-dev.fnal.gov/sam_api/SamApiCore.htmlhttp://d0db-dev.fnal.gov/sam_api/SamApiClient.html
corbaDictionary: sam_common/src/python/CommonCorbaClasses.py
DbCorbaClient:sam_common/src/python/DbCorbaClient.py,sam_db_server/src/DbCorbaImpl.py, InDbCorba.py
SamApi:sam_api/src/python/SamApiCore.py, SamApiClient.py
Open questions…
• Should it be called dbServerClient? dbClient?– Implement dbStationClient separately in c++
– Note: dbClientImpl and dbCorbaClient both do their own unpacking of corbaDictionaries
• Need easier exception passing… or samExceptions need to start inheriting from Exception…
• Some of the container objects create too many contained objects without needing to– shouldn’t look up the children methods until a child is requested
– Usually as a result of grandchildren being created…