Deploying applications on yarn using Apache Twill – introduction

With the introduction of yarn, hadoop had transformed from a pure map reduce computation engine (and dfs), into a general cluster that supports different types of workloads, that coordinates their resource consumption.
They did that by extracting resource management functionality into a separate component, yarn.

Since then, several data processing engines made their way into hadoop e.g. spark, cloudera impala, tez etc..

In my opinion, the next logical step would be deploying our business applications on yarn, reducing our overall number of machines and taking advantage of yarn’s capabilities to improve our applications durability.

But yarn’s API is complex and fairly low level, which makes the resulting client code very verbose, and that creates a high entry barrier.

Apache twill for the rescue!

Apache twill abstracts away the complexities of using yarn, it exposes a very simple API to configure an application simple api to configure the application e.g. number of instances, allocated memory\cores etc… and a model similar to java’s threading model.

A twill application can have one or more different TwillRunnable implementations, each with a different resource allocation, we can also configure twill to deploy static files with our runnables, such as html\javascript files for web applications.

Take a look at the following twill application

A wrapper around a web application, it starts a web server and serves static files.

public class WebApp extends AbstractTwillRunnable {
    public void run() {
        // start web server
        // configure static resources path

Listens to messages on a queue and acts upon them

public class QueueEndpoint extends AbstractTwillRunnable {
    public void run() {
        // do something

Application specification

public class ExampleTwillApplication implements TwillApplication {
  public TwillSpecification configure() {
  return TwillSpecification.Builder.with()
       .setName("example twill application")
         .add(new WebApp(),
            .setMemory(2, ResourceSpecification.SizeUnit.GIGA)
         .add("public", this.getClass().getClassLoader().getResource("").toURI(), true)
         .add("", this.getClass().getClassLoader().getResource("").toURI(), false)
          .add(new QueueEndpoint(),
               .setMemory(1, ResourceSpecification.SizeUnit.GIGA)

Twill bundles several extra features that helps writing distributed application such as:

  • Discovery – a TwillRunnable instance can announce its presence and be discovered through the TwillController
  • Command messages – messages can be broadcast to all runnables or sent to specific ones.
  • Dynamic resource allocation – resource allocation can be changed using the application’s controller while the application is running.


Apache Twill makes deploying applications on yarn a breeze, if this kind of stuff is on your road map – I recommend that you’ll take a look at this library.
more info on the project’s site: 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Powered by

Up ↑