Flink State Snapshots

Flink: Keyed State

tags: Flink State Snapshots source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/stateful-stream-processing/#keyed-state Keyed state is maintained in what can be thought of as an embedded key/value store.

Flink: Exactly Once Guarantees

tags: Flink State Snapshots,Fault Tolerance via State Snapshots source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/fault%5Ftolerance/#exactly-once-guarantees Depending on the choices you make, Flink possiable outcomes: Flink makes no effort to recover from failures (at most once) Nothing is lost, but you may experience duplicated results (at least once) Nothing is lost or duplicated (exactly once) Given that Flink recovers from faults by rewinding and replaying the source data streams, when the ideal situation is described as exactly once this does not mean that every event will be processed exactly once. Instead, it means that every event will affect the state being managed by Flink exactly once. ...

Flink: How does State Snapshotting Work?

tags: Fault Tolerance via State Snapshots,Flink State Snapshots,Wikipedia: Chandy–Lamport algorithm source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/fault%5Ftolerance/#how-does-state-snapshotting-work Workflow: Checkpoint coordinator (part of the job manager) instructs a task manager to begin a checkpoint. Insert numbered checkpoint barriers into their streams of all the sources record their offsets. checkpoint barriers flow through the job graph, indicating the part of the stream before and after each checkpoint. Checkpoint n will contain the state of each operator that resulted from having consumed every event before checkpoint barrier n, and none of the events after it. ...

Flink Checkpoint

tags: Flink State Snapshots,Fault Tolerance via State Snapshots a snapshot taken automatically by Flink for the purpose of being able to recover from faults. Checkpoints can be incremental, and are optimized for being restored quickly.

Flink Savepoint

tags: Flink State Snapshots a snapshot triggered manually by a user (or an API call) for some operational purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints are always complete, and are optimized for operational flexibility.

Flink Checkpoint Storage

tags: Flink State Snapshots,Fault Tolerance via State Snapshots,Flink Checkpoint source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/fault%5Ftolerance/#checkpoint-storage Flink periodically takes persistent snapshots of all the state in every operator and copies these snapshots somewhere more durable, such as a distributed file system. In the event of the failure, Flink can restore the complete state of your application and resume processing as though nothing had gone wrong. Two implementations: A distributed file system. JobManager’s heap.

State Backends

tags: Flink State Snapshots,Fault Tolerance via State Snapshots,Stateful Stream Processing Two implementations of state backends are available: RocksDB An embedded key/value store keeps its working state on disk. Overhead Accesses and updates involve serialization and deserialization. Java heap-based state backend Keeps its working state in memory, on the Java heap. Risk Large amount state will cause OOM. Conclusion Both of these state backends are able to do asynchronous snapshotting, meaning that they can take a snapshot without impeding the ongoing stream processing. ...

Fault Tolerance via State Snapshots

tags: Flink State Snapshots,Stateful Stream Processing source: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/fault%5Ftolerance/

知乎：Flink实时计算-深入理解 Checkpoint和Savepoint

tags: Flink,Flink State Snapshots,Flink Checkpoint,Flink Savepoint source: https://zhuanlan.zhihu.com/p/79526638

Links to this note