Deep dive: memory management in Apache spark

Memory allocation in spark has three key contention points, this post is a break down of the three, and a description of the progress that was made in each one

The contention points are:

  1. Contention between memory allocated for execution and for storage (cache)
  2. Contention between tasks running in the same process
  3. Contention between operators executing in the same task

Contention between memory allocated for execution and for storage (cache)

Execution, meaning the actual program and storage meaning the space used to cache RDDs for later processing.

Static memory allocation – spark v1.4

The memory was divided statically between execution and storage, memory couldn’t be shared, even if it was unused. for example – if the memory allocated for storage was not used, because little or no data was cached- it couldn’t be used for execution

Unified memory space – spark v1.6

The same memory space was used for both execution and storage, memory used for execution takes priority over memory used for cache, this means that when ever a memory contention between the two occurs, the memory used for cache will be spilled to disk.

This reason behind this decision is the understanding that execution data will always be re-read, while for cached data – not necessarily.

If an your app relies on caching, you can specify the amount of memory that if used for storage, will never be evicted. the memory is not pre-allocated.

Contention between tasks running in the same process

Static allocation

Each task is pre allocated with its share – relative to the number of tasks, i.e. given 4 tasks, each task will get 1/4 of the available memory, even if no other tasks are running.

Dynamic allocation – since spark v1.0 (2014)

The task’s share of the memory is computed relative to the number or running tasks, if a new task starts running, tasks that are already running will spill to disk to free memory, i.e. a task gets 1/N of the available memory, where N is the number of currently running tasks

This approach handles stragglers better,the last running task will be allocated with all of the available memory and with that, a chance to complete faster. a common reason for stragglers is a skewed partition size.

Contention between operators executing in the same task

Some operators like aggregate, use lots of memory, if an operator uses up all of the memory available for the task, then the next operator wont have any memory to work with and the program would fail.

A simple solution could be statically allocating memory for operators, but obviously it wouldn’t be optimal.

Cooperative spilling – the solution since spark v1.6

Operators use all memory they need, whenever an operator needs extra memory and no memory is available, it instructs the previous operator to spill to disk and free some memory.

This results in just enough data being spilled and fairness between operators.

Summary

Static, pre allocation of resources is the simpler solution. but it creates a situation that, some memory might not be utilized just because is was pre-allocated to a task\operator, which is not use all of the memory allocated for it, or might not be running at all.

Instead, the guys at Databricks chose to allocate memory dynamically, and when there is a contention – force other consumers (task\operator) to spill to disk and free up memory.

This results in better memory utilization without the developer having to tune each job to get optimal performance.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s