Nick Sellen

A framework for writing graph-based apps

12 Aug 2013

I've been working on a framework for building apps by describing them as a sequence of connected operations forming a graph. It's an early stage but reasonably functional. It's not publicly available at the moment.

It has some relationship to:

This is a quick overview of a few aspects of it.

What are the main features?

Data interchange

A key property of flow-based frameworks is the way data is passed between nodes/operations (I use these terms interchangeably). Many other frameworks pass data packets consisting of single values. When many values are needed, complex techniques are needed to ensure they stay related (sending grouping data packets for example).

My format is a hierarchical data structure (which maps naturally to JSON, although stored as Java objects internally) which can send lots of data at once, as it passes through the graph, values can be added or removed.

Here is an example data structure for an incoming HTTP request:

{
  "request" : {
    "headers" : {
      "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
      "Accept-Encoding" : "gzip,deflate,sdch",
      "Accept-Language" : "en-US,en;q=0.8",
      "Cache-Control" : "max-age=0",
      "Connection" : "keep-alive",
      "Content-Length" : "145",
      "Content-Type" : "application/x-www-form-urlencoded",
      "Cookie" : "__utma=255221783.1886109828.1316815608.1366932806.1367257395.37; __utmz=255221783.1366932806.36.4.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)",
      "Host" : "demo.nicksellen.co.uk:5005",
      "Origin" : "http://demo.nicksellen.co.uk:5005",
      "Referer" : "http://demo.nicksellen.co.uk:5005/",
      "User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36"
    },
    "host" : "demo.nicksellen.co.uk",
    "method" : "GET",
    "params" : { },
    "path" : "/"
  }
}

Threading model

Some flow-based frameworks map operations to threads and let each process run single-threaded (adding incoming calls to a queue). This framework inverts that, each incoming request runs in a single thread and expects the operations to able to be called from any thread.

This has the advantage that data and state for a given request is not passed between threads and therefore no synchronization is required for most operations.

Network operations run on top of netty and the graph executes in it's event loop so multiple incoming requests can be multiplexed onto each worker thread. Execution yields after each operation is performed.

Modelling execution flow

There is some relationship to actor-based systems, and some flow-based frameworks operate more like these - autonomous objects that can send messages at will, in which the graph represents the connections between the nodes/actors, but not the flow of execution.

In this framework the graph represents the flow of execution for a single request. Each operation is called exactly 0 or 1 times until there are no more left. This has advantages and disadvantages: it allows us to reason more about how execution proceeds (for example we know when nodes need to wait for other nodes to complete, we can propagate errors downstream, and we can tell if the execution will never reach a given node), but it does mean the execution flow is fixed at the time the graph is built.

A consequence (for now) is that a few common types of operation for flow-based applications are not possible - we can't have one operation split a file by lines and have subsequent operations process individual lines (I think a good solution will be to model this as nested flows - having operations which can kick off execution of another complete flow).

A simple todo app

In the spirit of simple example apps, I've used the framework to build a basic todo app.

You can use the app live at demo.nicksellen.co.uk:5005

The graph

This is a graphviz diagram of the live application. It is generated automatically and is a 1-to-1 mapping of the running application.

Connections that might not be called are represented with dashed lines. All others will be called (assuming the source node is called and runs without error). a little todo app

The application specification

The json document below is the entire application. Running this in the framework will construct the operations, wire them up, create the database (an in-memory h2 SQL database in this case) and start listening on the specified port.

Any changes made to the application (currently picked up via file-watching, but could be an API call) will cause the application to be rebuilt and reloaded without any downtime. Not a single request will be dropped.

A few things to notice:

  • there is no ORM layer, just write SQL directly, it's nice!
  • SQL can have embedded values insert into todos (name, done) values ({request.data.todo}, false) - at graph build time these are converted into JDBC prepared statements and values inserted at call-time, they are safe from SQL injection
  • unfinished: if you insert a too long todo name (>40) you see a stack trace and dump of the current data
    1. very short stack trace, everything runs on top of the netty event loop
    2. the bottom section shows you the type of data that passes between the operations
  • unfinished: there is no handling for unmatched URLs, which would normally be a 404 error - the request is still returned (with an error) as we can detirmine that execution will never reach the final node
  • nodes are not defined seperately from connections - nodes definitions can be embedded in others, it's up to nodes to define how they are used - at graph build time the nodes and connections are flattened
{
	"name" : "a little todo app",
	"flows" : {
		"http" : {
			"type" : "http",
			"settings" : {
				"host" : "demo.nicksellen.co.uk",
				"port" : 5005,
				"start-at" : "HTTP router",
				"end-at" : "send HTTP response"
			}
		}
	},
	"services" : {
		"database" : {
			"type" : "h2",
			"settings" : {
				"url" : "jdbc:h2:mem:",
				"username" : "sa",
				"password" : "sa",
				"create-sql" : "create table todos (id bigint auto_increment, name varchar(40), done boolean, user varchar(40))"
			}
		}
	},
	"operations" : {
		"HTTP router" : {
			"type" : "http/router",
			"settings" : {
				"routes" : {
					"GET /" : {
						"type" : "sequential",
						"settings" : {
							"nodes" : [
								{
									"name" : "load todos from db",
									"type" : "jdbc/query",
									"settings" : {
										"query" : "select * from todos",
										"field" : "todos"
									}
								},
								{
									"name" : "render todos",
									"type" : "jade",
									"settings" : {
										"template" : "ul(class = 'todos')\n  for todo in todos.results\n    li(class = '#{todo.done ? \"done\" : \"\"}')\n      span(class = 'name')= todo.name\n      form(action = '/#{todo.id}', method = 'POST')\n        input(type = 'hidden', name = '_method', value = 'PUT')\n        input(type = 'hidden', name = 'done', value = todo.done ? 'false' : 'true')\n        input(type = 'submit', value = todo.done ? 'undo' : 'done')\n      if todo.done\n        form(action = '/#{todo.id}', method = 'POST')\n          input(type = 'hidden', name = '_method', value = 'DELETE')\n          input(type = 'submit', value = 'remove')\nform(class = 'add', action = '/', method = 'POST')\n  span I need to\n  input(type = 'text', name='todo', tabindex = 1)\n  input(type = 'submit', value='add')",
										"output" : "html.main"
									},
									"then" : "render layout"
								}
							]
						}
					},
					"POST /" : {
						"name" : "add todo",
						"type" : "jdbc/query",
						"settings" : {
							"query" : "insert into todos (name, done) values ({request.data.todo}, false)"
						},
						"then" : "redirect to homepage"
					},
					"PUT /{id}" : {
						"name" : "update doneness",
						"type" : "jdbc/query",
						"settings" : {
							"query" : "update todos set done = {request.data.done} where id = {request.params.id}"
						},
						"then" : "redirect to homepage"
					},
					"DELETE /{id}" : {
						"name" : "delete todo",
						"type" : "jdbc/query",
						"settings" : {
							"query" : "delete from todos where id = {request.params.id} limit 1"
						},
						"then" : "redirect to homepage"
					}
				}
			}
		},
		"redirect to homepage" : {
			"type" : "http/redirect",
			"settings" : {
				"url" : "/"
			},
			"then" : "send HTTP response"
		},
		"render layout" : {
			"type" : "jade",
			"settings" : {
				"template" : "html\n  head\n      style(type = 'text/css').\n\n        body { \n          font-family: arial; \n        } \n        .container {\n        width: 600px;\n        margin: 0 auto;\n        padding-top: 10px;\n        }\n        .container > h1 {\n        padding: 5px 10px;\n        border-bottom: 4px solid #ddd;\n        display: inline;\n        }\n        .name,\n        input[type=text] {\n        font-size: 28px;\n        }\n        ul {\n          margin:  0;\n          padding: 0;\n        margin-top: 30px;\n        margin-bottom: 30px;\n        }\n        li {\n          border-width: 3px;\n          border-style: solid;\n          border-color: #fff;\n          aborder-bottom: 3px solid red;\n        color: #f44;\n        font-weight: bold;\n          list-style: none;\n          margin: 10px 0;\n          padding: 10px 5px 10px 10px;\n        }\n        li.done {\n          border-color: #fff;\n        background-color: #fff;\n        }\n        li.done .name {  \n          text-decoration: line-through; \n        color: #3dce77;\n        }\n        li form,\n        li input[type=submit] {\n          display: inline;\n        }\n        input[type=submit] {\n          border: none;\n          display: inline;\n          margin: 2px 5px 2px 5px;\n          padding: 5px;\n          font-size: 14px;\n          background-color: transparent;\n          cursor: pointer;\n          color: #0380a5;\n        }\n        input[type=submit]:hover {\n          color: #33b0e5;\n        aborder-bottom: 1px solid #33b0e5;;\n        }\n        li input[type=submit] {\n          float: right;\n        }\n        form.add {\n        font-size: 22px;\n        }\n        form.add input[type=text] {\n        padding: 4px 10px;\n        margin-right: 10px;\n        width: 410px;\n        margin-left: 10px;\n        }\n        form.add input[type=submit] {\n        float: right;\n        margin: 0px 10px 0 0;\n        font-size: 30px;\n        }\n\n\n\n\n\n\n        \n  body\n    .container\n      h1 Todo!\n      != html.main\n"
			},
			"then" : "send HTTP response"
		},
		"send HTTP response" : {
			"type" : "no-op"
		}
	}
}

Performance

It's pretty fast too. Here's the todo app (with 3 todos listed, using an H2 in-memory database) doing 1776 requests per second on an EC2 t1.micro instance (being tested from an m1.xlarge).

$ wrk -c 256 -d 10 -t 8 http://demo.nicksellen.co.uk:5005/
Running 10s test @ http://demo.nicksellen.co.uk:5005/
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   148.00ms   53.30ms 716.64ms   81.52%
    Req/Sec   222.61     20.38   286.00     65.46%
  17764 requests in 10.00s, 42.73MB read
Requests/sec:   1775.77
Transfer/sec:      4.27MB

... and for a simple template with no db lookup (not represented in the app spec above), nearly 7000 requests per second.

ยข wrk -c 256 -d 10 -t 8 http://demo.nicksellen.co.uk:5005/simple
Running 10s test @ http://demo.nicksellen.co.uk:5005/simple
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    69.15ms  172.78ms   2.14s    94.66%
    Req/Sec     0.91k   373.77     1.19k    79.52%
  69096 requests in 10.00s, 5.80MB read
Requests/sec:   6910.15
Transfer/sec:    593.84KB