Follow the Apache Flink® Community for improving Flink's Runtime on Various Cluster Managers
Imagine you're organizing a large party. You need to arrange staff, allocate space, and coordinate activities in different areas. Some servers might only need to work during specific time slots, some areas might need more or fewer people. If you can flexibly manage these resources, the party will run more smoothly.
Flink faces similar challenges when running on clusters. As a distributed system, it needs to run efficiently on different cluster management systems like YARN, Mesos, and Kubernetes. FLIP-6 was proposed to address this challenge, making Flink better at utilizing features of these cluster management systems.
Let's first look at the changes this improvement brings:
Before FLIP-6, Flink had some frustrating issues when running on clusters:
Like a restaurant having to reserve all tables at once during booking, Flink tasks needed to request all required resources at startup. This caused two problems: potentially reserving too many resources causing waste, and difficulty in adding resources later if needed.
Imagine if all restaurant tables had to be four-seaters - inconvenient for both couples and groups of six. Flink faced a similar issue on YARN - all containers had to be the same size, unable to adjust based on actual needs.
Deploying Flink tasks in Docker or Kubernetes environments required first starting the framework, then submitting tasks - like having to build the stage before starting the show. This two-step process felt unnatural.
Let's understand the new architecture introduced by FLIP-6 through a simple diagram:
The most important change in FLIP-6 is making each JobManager responsible for just one task. It's like assigning a dedicated stage manager to each performance, rather than having one manager handle multiple shows. This brings several benefits:
The ResourceManager became a smarter resource manager that can:
Improvements on YARN are particularly noticeable:
Main improvements:
In containerized environments:
In Mesos environments:
Flink can now dynamically request and release resources based on actual task needs. Like a restaurant adjusting seating based on customer numbers, this improves resource utilization.
Support for using different container specifications within the same task. For example, compute-intensive operations can use more CPU resources, while memory-intensive operations can use larger memory configurations.
Especially in container environments, Flink tasks can now be deployed like regular containerized applications, eliminating the two-step process.
Choose the appropriate deployment mode based on your specific needs:
FLIP-6's improvements make Flink run more efficiently and flexibly on various cluster management systems. It's like equipping Flink with an intelligent resource scheduling system that can dynamically adjust resources based on actual needs, making task execution smoother.
This improvement also lays a solid foundation for Flink's future development. As cloud-native technologies evolve, containerization and dynamic resource management become increasingly important, and FLIP-6's improvements enable Flink to better adapt to these technology trends.
Notably, this FLIP has been completed and released in Flink 1.5. Its successful implementation marks a new height in Flink's resource management and deployment capabilities, providing users with a better experience.
Flink Materialized Table: Building Unified Stream and Batch ETL
Apache Flink FLIP-7: Visualizing Monitoring Metrics in Web UI
175 posts | 48 followers
FollowApache Flink Community China - August 2, 2019
Apache Flink Community China - December 25, 2019
Apache Flink Community China - February 28, 2022
Alibaba Cloud Indonesia - January 24, 2025
ApsaraDB - February 29, 2024
Apache Flink Community China - November 8, 2023
175 posts | 48 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreApsaraDB Dedicated Cluster provided by Alibaba Cloud is a dedicated service for managing databases on the cloud.
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreMore Posts by Apache Flink Community