CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ Messaging Systems for the Grid Daniel Rodrigues
Apr 01, 2015
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Messaging Systems for the Grid
Daniel Rodrigues
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Summary
• Messaging Systems Overview• Monitoring context in the Grid• The MSG – Messaging System for Grids• Fast Forward
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Before going any further, the philosophy: “Software development trend is to somehow mimic real world!” – Daniel Rodrigues– Procedural Programming Beaureaucracy– Object Oriented World entities and interaction– Aspects Cut through the mess!– Agents Real People.
– Messaging Systems Communication• It might be sound, image, snailmail, etc.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Why use messaging?– For communicating we could use:
• File transfer• Shared Databases• Remote Procedure Invocation• Web Services• Mail• CORBA
– They do exist;– They have common ideas;– They share implementations;– You might be using more than one to achieve a
result that suits your needs!
“Now look, you know different people think about life in different ways. Lawyers think life is a big court room; Doctors probably thinks life is like a big operation; Bus drivers think life is...er...a big bus I guess. Who knows what the hell those guys think. Anyway, I've always thought of life as a big football game...”
Black Grape, England’s Irie
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Why use messaging?– Key ideas and benefits:
• Loosely coupled distributed communication;• Exceptional interoperability;• Asynchronous;• Reliable;• Configurable Persistence (just like your tax collector)
– Drawbacks:• More complex programming model (we do like
bureaucracy after all )• Harder to do sequenced and synchronous model• Performance? (maybe FTP could do the trick)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Ok, may we finally see a picture?
Publisher
Publisher
Publisher
Publisher
Consumer
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• That’s all?• Enterprise Integration Patterns
– Designing, Building and deploying Messaging Solutions
– Gregor Hohpe / Bobby Woolf
• Core Patterns• Some not so wild Patterns
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Patterns: Message
– Header • Routing information• Description
– Body • Data• Ignored by the messaging system
– EventMessage, CommandMessage, DocumentMessage, RequestReply
– Could be SOAP, JMS, Stomp, etc.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Patterns: Message Channel
– Point-to-Point• Snail Mail• Queues
– Publish-Subscribe• Television/radio Broadcast• Topics
– DataTypeChannel, InvalidMessageChannel, DeadLetterChannel, ChannelAdapter, MessageBus, MessagingBridge
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Patterns: Message Endpoint
– Publisher • Gets data from application and creates a message.
– Consumer• Extracts data from a message and passes it on to the application.
– SelectiveConsumer, CompetingConsumer, DurableSubscriber, MessageDispatcher, TransactionalClients, EventDrivenConsumer.
– Endpoints either sends or receives messages, and are channel specific. (Ears mouth,eyes are not the same thing)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Other Patterns– Message Routers
• Message may be routed to different channels depending on its characteristics;
• Simple Example: use a wild card topic! – grid.usage.transfer.*, where it will be forwarded to
grid.usage.transfer.<INFRASTRUCTURE>
– MessageTranslators• Translation at different layers (data structure, types,
representation, or transport).• e.g. transport protocols: TCP => HTTP => SOAP => JMS
– Pipes and Filters• Message may need processing in different steps.• A Message goes through filtering and pipes that perform
different functions (e.g, authN, authZ)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Isn’t it too complex to implement all this?– Indeed.
• But someone has already done most of the work for you:– Commercial solutions:
• Tibco Rendezvous, IBM WebSphere MQ, SUN Java Message Service, Microsoft MSMQ, BEA MessageQ, SonicMQ, 29West UME/LBM.
– OpenSource providers:• Apache ActiveMQ, ObjectWeb JORAM, Open JMS.
• Each are adequate to different problems.– Integration on different platforms;– Latency concerns;– High throughputs;
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Messaging Systems
• Where is it used? – Financial Services
• exchanges, brokerages, hedge funds;– Insurance Companies– Banking Industry– Telecoms
• Usually embedded in integrated solutions– Enterprise Backbones;
• WebsphereMQ example (March 2007):– 10.000 customers– 10 billion messages carrying US$1 quadrillion (US$ 1 000 000 000 000 000) worth of business
transactions.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Summary
• Messaging Systems Overview• Monitoring context in the Grid• The MSG – Messaging System for Grids• Fast forward
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• How does Message Oriented Middleware fit into the WLCG monitoring context?
• Grid is a complex infrastructure, with many different services deployed in different environments.
• We need to monitor the services in order to:– Know when an action to repair is necessary;– Help improve the overall reliability;– Provide stakeholders with current and historical status
information.
• A vast amount of monitoring data is produced– Local fabric monitoring( e.g., Nagios, LEMON)– Remote monitoring (e.g., SAM)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• Who is involved (stakeholders)?– Site Administrators– Grid Operators
• CIC on Duty• Regional Operation center
– WLCG Project management– Virtual Organizations
• WLCG Experiments
– Monitoring developers + operators
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• High Level Model:
LEMON
Nagios
SAM
R-GMA
SAME
GridView
GridView
ExperimentDashboard
GridIce
GridIceHTTP
LDAP
GOCDB
Dashboard
GridView
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• WLCG Monitoring Working Group:– Initially focused on stakeholder requirements
• Distill into a set of architectural principles• Propose some new technologies to help
– Reuse of standard commodity components
– Used to design site-local monitoring prototype
– An attempt to extend this to a more global view• Knowing that operations model is changing from
central to regional/national/local
• Looking on the architectural principles…
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• Reduce time to respond– “Site administrators are closest to the problems,
and need to know about them first”
• Tell others what you want to know– “If you’re monitoring a site remotely, it’s only
polite to give the data to the site” Chris Brew– Remote systems should feed back information to
sites
• Don’t impose systems on sites– Cannot dictate a monitoring system
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• No monolithic systems– Different systems should specialize in their areas
of expertise
• No central bottlenecks– “Local problems detected locally shouldn’t
require remote services to work out what the problem is”
• Specific Visualization for each stakeholder– All are using same underlying data
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• The starting point is what we have now:– Availability testing framework – SAM/RSV– Job and Data reliability monitoring – Gridview– Grid topology – GOCDB/Registration DB– Dynamic view of the grid – BDII/CeMon– Accounting – APEL/Gratia– Experiment views – Dashboards– Fabric monitoring – Nagios, LEMON, …– Grid operations tools – CIC Portal
• They work together right now– To a certain extent !
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
• We need:– Loose coupling of systems– Distributed components– Reliable delivery of messages– Standard methods of communication– Flexibility to add new producers and consumers
of the information without having to reconfigure everything
• Message Oriented Middleware provides this– And is widely used in similar scenarios
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• Reliablity and persistence of messaging built into the broker network.– Mitigates the single point of failures we’ve had with
previous solutions
Monitoring context
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• Not a silver bullet– Still can end up with spaghetti
• Tight specification of interaction of components– Message format specifications– Standard metadata schema– Message Queue naming schemas– Protocols
• System management is key– You’ve got code for free from the messaging system– But you need to write your management layer
• Component co-ordination• Configuration• Message tracing• Debugging
Monitoring context
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• Conclusion
– The monitoring context is highly distributed;
– Many components could benefit from gathering common information in a reliable, flexible way;
– MOM is a way of leveraging the current underlying infrastructures;
Monitoring context
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• A real life working example:
Monitoring Context
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Monitoring Context
WLCG Monitoring – some worked examples - 28
Application
Database archiver
component
Transparent Broker Network
Messaging System Adaptor
Standard process
Standard components
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Summary
• Messaging Systems Overview• Monitoring context in the Grid• The MSG – Messaging System for Grids• Fast forward
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG Overview
• An infrastructure providing an easy way to send messages;– Each message has a well defined format adhering to
a message class specification
• Well defined set of message classes• Three main components:
– Apache ActiveMQ broker;– msg-publish-simple;– msg-consume2oracle;
• Using file-based SAN persistency;• Publish-Subscribe Channels (Topics)• Durable Subscribers
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: Message
• Message endpoints on a topic should:– Consumers: expect a well formatted message– Producers: send a properly formatted message
• Message Classes:– To each corresponds a specification– One message may contain multiple records– Each record consists of plain text key-value
pairs, terminated by “EOT”– A few fields are mandatory: Consumers are
expecting them!– Some fields may be sent as an header (for later
filtering using selectors)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• Example: destination: /topic/grid.usage.transferpersistent:truetransferProtocol: GridFTPmsgEncodedTime: 2008-05-21T22:29:57,712Z
MSG: Message
transferProtocol: GridFTPpublishingHost: lxfsrc5807.cern.chvoName: cmssrcHost: lxfsrc5807.cern.chdestHost: c2fs008.grid.sinica.edu.twgridftpStreams: 10numberBytes: 2684354560fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_3e6startTime: 20-05-2008T13:17:07.514952ZendTime: 20-05-2008T13:33:58.156241ZuserName: cms001EOTtransferProtocol: GridFTPpublishingHost: lxfsrc5807.cern.chvoName: cmssrcHost: lxfsrc5807.cern.chdestHost: diskserv-san-20.cr.cnaf.infn.itgridftpStreams: 3numberBytes: 2684354560fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_F1startTime: 20-05-2008T13:17:46.811483ZendTime: 20-05-2008T13:34:21.227585ZuserName: cms001EOT
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: Apache ActiveMQ
• Powerful OpenSource MessageBroker– Currently running v4.1 & v5.1
• Message Channels– Publish-Subscribe;– Point to Point;– VirtualDestinations, Wildcards,
CompositeDestinations;– Synchronous / Asynchronous sending.
• Wide range of supported protocols and clients– Open Wire for high performance clients;– STOMP (Simple Text Oriented Protocol);– REST, XMMP, AMQP;
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: Apache ActiveMQ
• Configurable persistence– JDBC + High performance journal– File based MessageStore (Since 5.0)
• Clustering– Master/Slave failover
• Provides High Availability
– Network of Brokers• Avoid Client/server || hub/spoke single point of failure• Store and forward with consumer priority• Increasing Scalability
• Consumers and Producers load balancing• Selectors • Discovery
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: msg-publish-simple
• Send messages into the Message Channel– Validates well formatted against message class;– Reassembles records according to selected
headers;
• Very lightweight script– Depends only on Python > 2.3– Uses python asyncore
• Designed to run anywhere (e.g. WN’s)– Can use many broker endpoints (will select one
which is available)– Use either STOMP or plain HTTP
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: msg-consume2oracle
• Consumes messages– Creates a durable subscription;– Can read different message classes on different topics
(one durable subscription per topic!)• Publishes into Oracle.
– Extracts records from incoming messages;– Inserts records into an Oracle View, corresponding to the
message class definition.– Only need to worry about the trigger!
• Configurable system management– Publishes back client status information
• Messages received in a topic;• Records inserted of a given message class;
• Very lightweight script– Depends only on Python > 2.3– also cx_oracle
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: performance
• Extensive testing of broker many features under different configurations
• Test results available on twiki, here are some:
• Broker ran for 6 weeks with no crashes– 50 million messages of several sizes (0 to 10 kB)
forwarded to consumers;
– 12 million incoming messages from producers;
– Up to 40 producers/80 consumers;
– Stable under irregular testing pattern;
• Setting persistence limits throughput.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: performance
• Throughput testing
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: performance
• Testing persistency
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: performance
• Testing persistency
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: performance
• Testing clustering– Fast internal openwire!
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: results
• Flagship: OSG RSV – SAM bridge– Running since January. – Crashed once, because there were not enough
file descriptors configured.
• Gridview - GridFTP transfers.– Currently publishing from 27 cms t1transfer
machines;– In testbed right now, a validation consumer;
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Summary
• Messaging Systems Overview• Monitoring context in the Grid• The MSG – Messaging System for Grids• Fast Forward
– In the monitoring context.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: results
• Migrating to Regions
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
MSG: results
• Messaging based archiving & reporting
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Thank you for your attention.
Additional Questions?
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Thank you for your attention.
Additional Questions?