Node Architecture For Enterprise

From AccountIT
Revision as of 13:14, 16 March 2014 by Arvinder (Talk | contribs)

Jump to: navigation, search

This page describes the overall technology stack used within the IB appliance, and to some degree also within the supporting business systems.

Overview

The diagram below gives an overview of the technologies used in the different nodes.


Node.png

Guiding Principles

The main guiding principle, for choosing technologies is that:

We want to have has few technologies as possible, yet for each problem we want to have the best possible technology.

- We want few technologies, because the fewer we have, the easier they are to master. At the same time we want the best technologies, because they solve our problems in the most efficient way.

- We should never have two different technologies for solving the same problem.

Play Framework

The AccountIT application will be based on the Play Framework. Play provides a UI framework that is based on HTML5 and JavaScript

Messaging

Messaging is provided at two levels: Internally in computing nodes, and externally between computing nodes. The first is provided by Apache Camel (via ActiveMQ) as mentioned above. The second in a distributed cluster-setup by RabbitMQ.

RabbitMQ will also be used to integrate into and between the supporting business systems, such as CRM, ERP, support and sales systems.

Datastore

We are moving into no-sql database models, because these support our data requirements the best. Several database models and technologies must be chosen, to support our wide variety of data. The diagram below indicates which database models are best applied where. In case we use third party applications, the database model is dictated by the application vendors.

<< Figure of database types >>

So far Riak CS has been chosen for key-value-oriented data. Riak CS is chosen because it provides an attractive peer-to-peer distributed clustering setup and supports very big files. As per our guiding principles we should not have any other technologies for this type of data, unless there is a very specific benefit from doing so.

The technology for other database models has still not been chosen. However the following candidates are being considered:

   Cassandra, Neo4j or traditional relational database for search-oriented data
   Neo4j or traditional relational database (star-schema) for analysis-oriented data; unless the analysis is done in third party systems that provide their own databases
   Riak CS or workflow frameworks/systems for process-oriented data; again third party systems may be the better solution in this case

All of our existing systems use traditional relational databases, mostly Oracel, and we may still keep that technology, even just for legacy reasons. In that case the Oracle-compliant EnterpriseDB is considered an attractive candidate.

Monitoring

With a system landscape consisting of nodes in clusters monitoring occurs at two levels:

   At node level with each node exposing monitoring information through a monitoring agent. A surveillance / monitoring tool used by IT operations can collect the information provided by the monitoring agent
   An aggregated view of the cluster of nodes, displaying the overall state of the cluster of some node. The monitoring aggreate is hosted by the "Management Node"

The external inteface of a monitoring agent is provided by http://jolokia.org, which exposes JMX MBean information via a JSON over HTTP. Jolokia is packaged within Hawtio, which additionally provides a Web UI to the JMX MBean information that jolokia provides. Thus each node as well as the aggreated information is available via a web-browser through Hawtio.

For non-java applications like RabbitMQ and RIAK, there is a need for a java client that can collect relevant information regarding the applications health state which can be exposed using jolokia and Hawtio.

Deployment

   A very interesting technology to look at it the Apache Chukwa log collection and analysis framework . This clustered framework is ideal for logging from many nodes and even many IB appliances, which is required for both system and business level monitoring. Alternatives include Apache Flume, Scribe (developed at Facebook) and Fluentd.
   Another area that needs a solid technological foundation is network communication stack frameworks for implementing the ETI gateways.
   Automatic scaling and deployment needs to be looked at.
   Tools and APIs for monitoring, metering and managing the appliance also needs to be looked at. Such tools could include HAWTIO, JMX etc. Integration into underlying operations frameworks such as Nagios or Tivoli is also necessary. Requirements are not fully known at this time.
   Authentication, integration into Microsoft Active Directory (AD), LDAP, OAuth and SAML. Requirements are not fully known at this time.