Deploying applications on yarn using Apache Twill – introduction

With the introduction of yarn, hadoop had transformed from a pure map reduce computation engine (and dfs), into a general cluster that supports different types of workloads, that coordinates their resource consumption.
They did that by extracting resource management functionality into a separate component, yarn.

Since then, several data processing engines made their way into hadoop e.g. spark, cloudera impala, tez etc..

In my opinion, the next logical step would be deploying our business applications on yarn, reducing our overall number of machines and taking advantage of yarn’s capabilities to improve our applications durability.

But yarn’s API is complex and fairly low level, which makes the resulting client code very verbose, and that creates a high entry barrier.

Apache twill for the rescue!

Apache twill abstracts away the complexities of using yarn, it exposes a very simple API to configure an application simple api to configure the application e.g. number of instances, allocated memory\cores etc… and a model similar to java’s threading model.

A twill application can have one or more different TwillRunnable implementations, each with a different resource allocation, we can also configure twill to deploy static files with our runnables, such as html\javascript files for web applications.

Take a look at the following twill application

A wrapper around a web application, it starts a web server and serves static files.

public class WebApp extends AbstractTwillRunnable {
    @Override
    public void run() {
        // start web server
        // configure static resources path 
    }
}

Listens to messages on a queue and acts upon them

public class QueueEndpoint extends AbstractTwillRunnable {
    @Override
    public void run() {
        // do something
    }
}

Application specification

public class ExampleTwillApplication implements TwillApplication {
@Override
  public TwillSpecification configure() {
  return TwillSpecification.Builder.with()
       .setName("example twill application")
        .withRunnable()
         .add(new WebApp(),
            ResourceSpecification.Builder.with()
            .setVirtualCores(2)
            .setMemory(2, ResourceSpecification.SizeUnit.GIGA)
            .setInstances(2)
              .build())
            .withLocalFiles()
         .add("public", this.getClass().getClassLoader().getResource("static_web_files.zip").toURI(), true)
         .add("some-script.sh", this.getClass().getClassLoader().getResource("upload-files.sh").toURI(), false)
          .apply()
          .add(new QueueEndpoint(),
                ResourceSpecification.Builder.with()
               .setVirtualCores(1)
               .setMemory(1, ResourceSpecification.SizeUnit.GIGA)
               .setInstances(1)
                 .build())
               .noLocalFiles()
               .anyOrder()
               .build();
    }
}

Twill bundles several extra features that helps writing distributed application such as:

  • Discovery – a TwillRunnable instance can announce its presence and be discovered through the TwillController
  • Command messages – messages can be broadcast to all runnables or sent to specific ones.
  • Dynamic resource allocation – resource allocation can be changed using the application’s controller while the application is running.

Summary

Apache Twill makes deploying applications on yarn a breeze, if this kind of stuff is on your road map – I recommend that you’ll take a look at this library.
more info on the project’s site: 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s