This blog post explores the unexpected performance impact of using coalesce(1) in Apache Spark in distributed computing. I discovered that while coalesce avoids costly shuffling, it unintentionally sacrifices parallelism in some cases, leading to a significant slowdown in runtime performance.
A few days ago, I wanted to preview a massive 1.1 GB file stored on an S3 bucket. Downloading it would take a lot of time so I asked my senior for help, and he told me to run the usual aws copy file command with the unix tool head. I received the results within seconds, but the underlying process was not intuitive to me. Trying to uncover this magic ...
In this blog, I'll walk you through the challenges, strategies, and insights gained when I participated in ACM-ICPC regionals, shedding light on the world of algorithmic problem-solving for college students. Join me as I recount our adventure, from the anxious prelude to the epic contest, and explore the valuable lessons learned along the way.
In this blog, I'll share my first-hand experience with GitHub Actions, illustrating how it streamlined my development workflow. I'll walk you through its core concepts, integration into project structures, and how it solved a critical problem for me personally.