Memory allocation in spark has three key contention points, this post is a break down of the three, and a description of the progress that was made in each one The contention points are: Contention between memory allocated for execution and for storage (cache) Contention between tasks running in the same process Contention between operators executing in the same... Continue Reading →
Spark application logging
When coding a spark application, we often want to write some application logs to trace or track our application's progress. we would want to benefit from spark's log4j's configuration i.e log collection etc... so naturally, we would declare a logger instance at the class level and use it in our closure. Unfortunately we can't do... Continue Reading →
Setting up a central logging infrastructure for hadoop and spark
logs are critical for troubleshooting, but when an application is distributed across multiple machines, things gets complicated. things gets even more complicated when your application uses 3rd party APIs, and the answer you are looking for is hiding in one of those other systems logs (which are distributed as well). you end up going through lots... Continue Reading →