初二作文大全
⼤数据实时流数据处理分析_以流⽅式进⾏⼤数据流处理
⼤数据实时流数据处理分析
阅读:⼤数据时代。 ⽉。 5关于流处理的早期观察: :
The article suggest adopting the right solution, Flink, for big data processing. Flink is interesting and built for stream processing.
怀孕自测小方法本⽂建议采⽤正确的解决⽅案Flink进⾏⼤数据处理。 Flink很有趣,并且为流处理⽽构建。
The broader view and take away may be to solve problems using the right solution. We saw many painful tries in history and in current practices still: do huge large scale data in traditional databas, do unstructured data processing in relational databa, do graph processing in tables way, do stream processing in micro-batch way and etc. The specific problem should be handled by a solution built for that problem and that solution can be the most efficient and convenient one.
⼴阔的视野和带⾛的可能是使⽤正确的解决⽅案来解决问题。 我们在历史上和当前的实践中仍然看到了许多痛苦的尝试:在传统数据库中进⾏海量数据处理,在关系数据库中进⾏⾮结构化数据处理,以表⽅式进⾏图形处理,以微批处理⽅式进⾏流处理等。应该使⽤针对该问题的解决⽅案来解决该问题,并且
该解决⽅案可能是最有效,最⽅便的解决⽅案。
Some good examples and points from the article.
⽂章中的⼀些很好的例⼦和要点。
王家升
“In reality, however, processing data with as low latency as possible has been a challenge for a long time….a customer asked me how to produce an up-to-date aggregation over a tumbling five-minute window of a growing table using Hive.”
各种花“实际上,长期以来,以尽可能低的延迟来处理数据⼀直是⼀个挑战……。⼀位客户问我如何在⼀张不断增长的桌⼦的五分钟滚动窗⼝上使⽤以下⽅法⽣成最新的汇总信息:蜂巢。”入乡随俗是什么意思
“the customer and business ur really need: a reprentation of data as a stream and the ability to do in-stream
急治打嗝妙方complex/stateful analytics. ”
如何约“客户和企业⽤户真正需要的是:将数据表⽰为流,并具有进⾏流内复杂/状态分析的能⼒。 ”
“Customers and end-urs wrangle with the latency gap in all kinds of interesting and expensive ways.”
“客户和最终⽤户以各种有趣且昂贵的⽅式来解决延迟差距。”
“it’s refreshing to be given constructs of stream, state, time and snapshots as the building blocks of event processing rather than incomplete concepts of keys, values, and execution phas.”优美语句摘抄
“令⼈⽿⽬⼀新的是,将流,状态,时间和快照的构造作为事件处理的基础,⽽不是不完整的键,值和执⾏阶段的概念。”
“The first approach is to u batch as a starting then try to build streaming on top of batch. This likely won’t meet strict latency requirements, though, becau micro-batching to simulate streaming requires some fixed overhead–hence the proportion of the overhead increas as you try to reduce latency.”
“第⼀种⽅法是以批处理为起点然后尝试在批处理之上构建流。 不过,这可能⽆法满⾜严格的延迟要求,因为模拟微流化需要流⽔需要固定的开销,因此,当您尝试减少延迟时,开销的⽐例会增加。”
“However we asked ourlves if the data is being generated in real-time, why must it not be process
ed downstream in real-time?”
“但是,我们问⾃⼰是否是实时⽣成数据,为什么不必须对其进⾏实时下游处理?”
“requirements around low latency processing and complex analysis cannot be met in an inexpensive, scalable and fault-tolerant way.”
“围绕低延迟处理和复杂分析的需求⽆法以廉价,可扩展和容错的⽅式来满⾜。”
⼤数据实时流数据处理分析