The abstract concept of Time in event processing

There has been an interesting thread in one of the Apama technical forum, regarding the notion of time in Apama, that can be more broadly applied to the world of CEP in general.

Timers and pauses are part of those matters that could possibly make your application racy and undeterministic.  You don’t want your logic to be tied up to runtime systems differences and load. For example, you have two input piece of data that are 0.9 second apart when they arrive to your system input. First event arrives to your CEP engine, and as a consequence you set a timer of 1 second to collect all the data within a window of a second within the first event. Obviously you expect the timer to fire after the second is consumed. Right? Right! But what if after the first event a heavy calculation fires, and keeps your engine so busy to slow down consuming more events. Second event may or may not reach processing after one second. But should you worry about your application to give different outputs depending of your load?

Architectural considerations around time handling (but more in general around determinism) will give you a good representation on how sophisticated a CEP engine is. Best frameworks around will handle at core level such type of problems, as a second level OS, so that at application level you don’t need to take care of such issues.

Apama, the CEP engine I am most familiar with – unsurprisingly given the maturity of the platform – handles the time in a brilliant way. Time is effectively just another input event that the CEP engine sends to its input queue. In this way a timers cannot fire while you are processing information because events are processed sequentially together with all the other inputs. So to mention the above example, if you create a timer for 1 second, then you are effectively reacting to the time event that has been merged into the input stream after the second piece of data. In this way there is no possibility of a race, those two guys will be processed in diligent order.

This is just brilliant, and it has a number of advantages. You can disable sending such timing events and send it manually or take them from a recorded input flow (for example yesterday market prices), and just drive your CEP identically to when those events were received in first time. Time displayed in your clients will be artificially set to the past (identical to what those clients would have seen when those events were recorded in first place). Corollary is that you can now take liberties in terms of how the time passes (in the internal CEP representation) by just changing the speed you send your input. Nothing will be different than in your original real time execution (and here is where the determinism comes handy) because time is exactly another input of your application incidentally contained in a file or database, or whatever is your source.

You want a cup of tea, not to worry, just stop sending input data and time will freeze. The ability of going as slow as you wish is important when you are trying to understand what happened in relation of a certain situation like debugging. On the other side the ability of going as fast as possible is paramount when you are backtesting, to have a quick turnaround and understand how your new algo would have acted against hours, days or weeks of your recorded data (think infra night optimization of your algo parameter).

Unit testing is another area that highly benefit from determinism. As you can drive time, you can write unit tests that reflect perfectly the artificial situation you want to recreate.

This is exploited also in CEP farms. When you have a cluster of CEP engines you want them to have a time representation that is in sync between each other and not driven by the clock of each box. A way to achieve this is to have only one CEP (possibly the entry point of all the data) that produces time events and propagates the information to the others.

Not to mention the ability of decoupling your logic from your time source. I worked quite extensively in the Surveillance area at my friends of Turquoise and NYSE. If you want to detect an abuse that is, let’s say, caused by submitting two orders within a certain time, then you don’t want to mess around. The time you want to take in account needs to be identified by the time those orders arrived to the stock exchange (and audited in the order as a field), not time they arrived and processed in your downstream Surveillance system that could suffer of drift.

Oh men… I forgot that, do I need to rewrite my application?

Nope, just set the time using the field included in your data, so your CEP will think the time is exactly the time of when the event arrived in the exchange. No need of changing your application, everything related to timing will diligently adapt to your new definition of time.


Posted in Apama, CEP | 2 Comments

Twitter hello world for CEP

So, we saw what the raw Twitter API can do. Twitter’s as a company  is a very interesting example of web success by thinking out of the box. You’d have noticed that the API has much more capability than the website itself. Many consider this as the very reason of the explosion of the service: many developers think they can deliver a much better service than the official Twitter website itself, and as a matter of fact they do. Most users don’t use Twitter.com to update their status, they use some sort of other client, mostly from smartphones, built on the API and if you want to to extract information from Twitter other than somebody’s timeline, you would probably need to use some services built, again, using the API.

Of course the real winner here is…  Twitter. They have been smart enough to understand their asset was not the number of users of their website, but the amount of information uploaded by users in their servers, under the form, of course, of tweets.  So now the service has gone mainstream, and somebody is zilionaire .

Putting apart the reason of a success, the question you are asking in this moment is probably what I can do with a  Twitter adapter linked to my CEP engine. Answer is: many things.

First of al Twitter is effectively a publish/subscribe messaging service similar to JMS but available in the cloud. You can use direct messages for P2P and an user’s timeline as a JMS topic. You can make it available to every client or make it private.

To give some real example, I hooked up the Twitter adapter to the Algorithmic Trading Accelerator (ATA) that comes with Apama for Capital Market. The ATA was built with the idea of a framework that gives you much of the things you will need when you want to do algo trading. There is a blotter, a position tracker, an excellent risk firewall, and there are also some examples to showcase how you would create an trading algorithm application based on this framework. Moreover it comes with an exchange simulator, so you can try your algo stuff without actually needing a real connection to the exchange.

So, first thing I did, is to use twitter to send command to my algo trading application (so the ATA). To do that I set the twitter adapter to send all direct messages received to the CEP engine, and of course some logic in the engine itself to interpret such messages. Then I took my phone and I typed (advice, disable suretype):

Send Order to Apama Using Twitter

Just write some logic to process the body, and unsurprising, this will be translated in a request of a new order to BUY 100 stock of Google at 351.93, using just an outright order:

ATA blotter after the ATA

Now imagine you are at the pub and you have had few beers. All of the sudden you feel the urge of buying 7.0 million barrel of crude oil… if that is the case, sure you don’t want to put an outright order, right? You would make the market to flush crash… so take your BlackBerry, and type:

Send Apama Iceberg Order using TwitterAn Iceberg will be created, and you will buy only a thousands barrels per second! This is really what you want so you can enjoy your next beer, you’ll get your 7 million barrels in something less than two hours. If you were at the office you would have seen the following execution dashboard:

Order slicing, avoid a a flash crash...

Of course you can use Twitter connectivity not only to feed data in, but, probably most critically to output data in the form of status messages or direct messages. For example in the example below, I have modified the Iceberg to output status every minute to the user’s timeline:

Another neat functionality you can exploit to output is direct messaging. This pairs quite well in all the applications that raise alerts. I am thinking for example at the Abuse and Market Surveillance accelerator, that could alert users of severe problems (..for Apama presales, how sweet would be to a presentation to raise an alert and send to your customer’s twitter account…), but also I am thinking at the risk firewall.

The risk firewall has the concept of risk warning and objection. A warning is usually raised if a softer condition, while an order rejection is usually caused in a most severe condition. You can for example have a rule that your position on Oil will cause an alert when you try to reach a long of 5 million, but any order that could potentially cause a position bigger than 10 million will be rejected. The system is much sophisticated than this, as it keeps track of open and closed position, and gives Algo the chance to reserve their position. It is a neat and well architected system and really every trading entity should use something with this level of protection.

So, just to give an example I used the risk firewall that comes with the ATA, to send me a DM in case an alert is raised:

Of course, you can answer the tweet and as the answer will go straight to the CEP engine, you can code handy command like “ban:vitoI”, or “cancellAll:vitoI”.

The nice thing is that effectively you can use any Twitter client and consumer and attach to your application: a phone, a website, or, of course, another CEP engine, located on the other side of the globe, without having to worry about about the infrastructure needed, that could be potentially a nightmare. Of course latency is somehow bigger than the few microseconds you may want in case of order specific transmission. However latency is not so big as I first expected. Direct message are delivered within one second, and if you are listening on a user timeline it is still below 2 seconds.

Well, how cool was that? (It gets better, wait and see!)

Posted in Apama, ATA, CEP, Twitter | 1 Comment

Playing with twitter, the boring bits…

So, we left last time in search for data. As said, a very good place to start with is twitter. A free service, a good API, and some interesting applications possible.

The Twitter API is relatively easy to use. One thing to keep in mind is that Twitter limits you between 150 and 350 requests per hour.

The implementation of the Twitter adapter is not difficult. Just chose your language, and encapsulate the REST protocol into calls. The only boring part is that there are many types of request and response, and you will need to map each of this request and response into an event interface.

Laziness is one the three virtues of a good programmer, so to avoid having to writing all those definition, I use meta-code to automatically create this mapping. In Apama land, this means to have some piece of code that translates each upstreaming event (direction from CEP engine to the adapter) into a java class, and use the fields name to reconstruct the “set” methods to invoke. As the method name is resolved dynamically I use Java reflection.

On the downstream you go on the other way. You take the result of each “get” method (again, you loop and call the get* methods using reflection), and chuck into an event with fields named as the method itself. So getStatus() return “I love pizza”, your event will contain the field name “status”:”I love pizza”.

Of course the events definition needs to follow also this convention, but, being lazy, I also created the mapping definition using the same approach, so you always ensure consistency between the event definition in the CEP engine (called Correlator in Apama land) and the runtime instances created by the adapter. This approach is robust to API changes and moreover can be really applied in every java adapter, so this is the way I usually build adapters (laziness^2).

The only limitation you have is that while at CEP level you can have events as field of other events, you cannot reflect this into the adapter semantic as it cannot nests semantic mapping rules. In other words when defining a given field as an event, you cannot just reuse the rule you have defined for that specific nested event.

Said that, you can of course work this limitation around by just using nested classes fields as first level fields, so no big drama…

So, ok, once this is done, you pretty much get all the goodies that come with the twitter API inside the CEP engine:

  • Search API Methods (seach, access trends, etc)
  • Timeline Methods (public or user timelines, mentions, and retweets)
  • Status Methods (show, update, destroy methods)
  • User Methods (show, lookup, search, etc)
  • List, List Members, List Subscribers Methods
  • Direct Message Methods
  • Friendship Methods (create, destroy, etc)
  • Social Graph Methods
  • Account, Spam reporting and Block Methods
  • Favourite Methods
  • Saved Searches Methods
  • Trends Methods
  • Geo methods (nearby places, reverse geocode, etc)
  • Streaming stuff…

So, as the title already disclosed these were the boring technical bits, next time we’ll see what we can do with all of this. I promise that is going to be juicy, so stay tuned!

Posted in Apama, CEP, Twitter | 1 Comment

In search of real time data

As a fellow blogger with a love for real CEP application, my main drive is to show real applications.  Real applications though, need real data.

In the course of my job carrier I had no shortage of incoming data…  exchange market data (read prices), coming and going, and you application sitting there analysing, reacting by sending order back to the exchange. All the sort of stuff you would imagine when working with algo trading application.

I had also times in which I wished I had less data. Few years ago I worked for close to one year to architect and implement the Apama Abuse Surveillance system in place at Turquoise. Turquoise is an equities trading platform based in London, effectively a stock exchange and they send each single user request (as new, cancel and amend order requests),  and trades to their Surveillance system. Meaning that you get your Event Processing Engine overwhelmed by thousands of events per seconds. And those are the times that people like me gets very excited, as you need to get things right to cope with such event rate and still keeping looking for pattern. Apama makes things easier. Its service oriented architecture means that you can go from a laptop to a farm in relatively small amount of effort. Also, the fact that it is deterministically across platforms, means you can do on your laptop and deploy on super fast unix box in a blink of a eye. That was fun, and the guys at Turquoise are great.

So, this is to say that, while in places like investment banks, exchanges, factories and so, the amount of data to analyse is the challenge that CEP is there to solve, when you go back at home and sit on your PC, having a feed of good and reliable data becomes the issue.

So, is there something that could provide that? Yeah, there is more than one service around that is worth exploring, and that, as matter of fact I had explored this (long) weekend.

The first idea I am going to show is a Twitter integration with Apama, and a series of applications based on that.

Stay tuned…

Posted in CEP | Leave a comment

A new blog for CEP

Hi all,

well this is my first blog post, ever. I never had a blog, nor a Facebook profile. I have never written anything meaningful, except when I was obliged at school or at work. And even the extent of meaningful in these cases has been long debated!

My name is Vito Imburgia. I was born many years ago in Palermo, Italy and moved abroad to the Netherlands just after my degree to work for a sofware company, Progress Software. Very little after moving to the Netherlands I had the possibility to work in the Apama division, and moved to Cambridge, UK, where I have been a consultant in the Capital Market.

After 8 years in Progress Software I decided that time was ripe to move on and I took another challenge in one of the biggest Australian financial institutions in Sydney, Australia. Since then I work in the algorithmic trading space as algotrading specialist.  Apama engine is still my main focus area: you guessed it, I love the capabilities of the  product and its language.

The purpose of this blog though is not to talk about me, nor of my wondering around the world but to have a place where people like me can talk about Complex Event Processing (in this blog referenced as CEP), possibly in the contexts of concrete use cases and applications. An early disclosure: although I am not endorsed by Apama, my background has been Apama for long time, and for me CEP is Apama and Apama is CEP.

This blog is not the place to decide or to argument which technology is best suited for CEP so I do hope anybody interested or passionate about CEP get in touch and put himself up for contribution no matter what his product background is: Esper, Aleri, Coral 8, Streambase, you name it.

Having said that, chances are that the world will ignore me, therefore I will probably be the main author posting stuff here. I am not a technology philosopher or an evangelist, and, rather than talking about abstract concept, I will try to focus on practical CEP working examples, referencing real code and stuff that I have written on my spare time.  Unfortunately (or fortunately), those samples and code will not be anyhow related to my current, previous or future jobs as those are strictly bound to other people’s intellectual property.

As you have probably figured out at this point, English is not my first language, so I hope you can bear with it. Albeit not perfect it should be understandable.

To finish, CEP is coincidentally also the neighbourhood of Palermo where I grew up, most precisely in Cruillas, but this blog won’t be about that! Too bad, I have so many fun stories about growing up in this rough but charming area of Sicily…

Posted in Apama, CEP, Vito | 3 Comments