实时数据处理:Apache Kafka与Flink实战 实时数据处理Apache Kafka与Flink实战大家好我是欧阳瑞Rich Own。今天想和大家聊聊实时数据处理这个重要话题。作为一个全栈开发者实时数据处理已经成为现代应用的核心能力。今天就来分享一下Apache Kafka和Flink的实战经验。实时数据处理概述应用场景场景说明实时监控实时日志分析、监控告警实时推荐个性化推荐系统实时计算实时统计、实时报表实时风控欺诈检测、异常识别技术选型消息队列 → Kafka/RabbitMQ 实时计算 → Flink/Spark Streaming 消息存储 → Kafka/PulsarApache Kafka核心概念概念说明Topic消息主题Partition分区Producer生产者Consumer消费者Consumer Group消费者组生产者配置const { Kafka } require(kafkajs); const kafka new Kafka({ clientId: my-app, brokers: [localhost:9092] }); const producer kafka.producer(); async function produce() { await producer.connect(); await producer.send({ topic: user-events, messages: [ { value: JSON.stringify({ userId: 1, event: login }) }, { value: JSON.stringify({ userId: 2, event: purchase }) } ] }); await producer.disconnect(); }消费者配置const consumer kafka.consumer({ groupId: my-group }); async function consume() { await consumer.connect(); await consumer.subscribe({ topic: user-events, fromBeginning: true }); await consumer.run({ eachMessage: async ({ topic, partition, message }) { console.log({ value: message.value.toString() }); } }); }Apache Flink核心概念概念说明DataStream数据流Window窗口Operator算子State状态Flink作业示例import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; public class KafkaFlinkJob { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamString stream env.addSource(new FlinkKafkaConsumer( user-events, new SimpleStringSchema(), properties )); stream .map(json - { UserEvent event parseJson(json); return event; }) .keyBy(event - event.userId) .window(TumblingEventTimeWindows.of(Time.minutes(5))) .count() .print(); env.execute(Kafka Flink Job); } }实时计算案例// 使用flink-streaming-java的JavaScript API const { StreamExecutionEnvironment } require(flink-streaming-java); const env StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection([1, 2, 3, 4, 5]) .map(x x * 2) .filter(x x 5) .print(); env.execute(Simple Job);实战案例实时用户行为分析Kafka → Flink → Redis → Dashboard 1. 用户行为数据写入Kafka 2. Flink消费Kafka计算实时统计 3. 将结果写入Redis 4. Dashboard从Redis读取数据展示// Flink处理逻辑 const stream env.addSource(kafkaConsumer); stream .map(record JSON.parse(record.value())) .keyBy(record record.userId) .window(SlidingEventTimeWindows.of(Time.seconds(30), Time.seconds(10))) .aggregate( () ({ count: 0, events: [] }), (acc, record) { acc.count; acc.events.push(record); return acc; }, (key, window, aggregates) { return { userId: key, count: aggregates.count }; } ) .addSink(redisSink);总结实时数据处理是现代应用的核心能力。通过Kafka和Flink的组合可以构建高性能的实时数据处理系统。我的鬃狮蜥Hash对实时处理也有自己的理解——它总是实时监控周围环境捕捉任何移动的蟋蟀这也许就是自然界的实时数据处理吧如果你对实时数据处理有任何问题欢迎留言交流我是欧阳瑞极客之路永无止境技术栈Apache Kafka · Apache Flink · 实时计算