Spring XD的设计基础是Spring生态系统，包括Spring Integration、Data和Batch等项目-CSDN博客

本文链接：https://blog.csdn.net/blog_programb/article/details/106174387

Spring XD支持以下Hadoop版本：

官方支持的Hadoop版本

Apache Hadoop 1.0.4：这是Spring XD早期版本支持的Hadoop版本之一。
Apache Hadoop 1.1.2：同样被Spring XD早期版本支持。
Apache Hadoop 2.0.5-alpha：Spring XD在早期版本中也支持了这个Hadoop 2.x的早期版本。
Apache Hadoop 2.4.1：Spring for Apache Hadoop 2.1版本支持此版本。
Apache Hadoop 2.5.2：Spring for Apache Hadoop 2.1版本支持此版本。
Apache Hadoop 2.6.0：Spring for Apache Hadoop 2.1版本默认支持此版本。
Pivotal HD 1.0：Spring XD早期版本支持Pivotal HD 1.0。
Pivotal HD 2.1：Spring for Apache Hadoop 2.1版本支持此版本。
Cloudera CDH5 5.3.0：Spring for Apache Hadoop 2.1版本支持此版本。
Hortonworks HDP 2.2：Spring for Apache Hadoop 2.1版本支持此版本。

其他可能支持的版本

Hadoop 1.2.x：虽然不是官方明确列出的版本，但根据Spring XD早期版本对Hadoop 1.x系列的支持，可能可以通过一些配置调整来支持Hadoop 1.2.x。

需要注意的是，Spring XD的维护已经停止，因此对于较新的Hadoop版本可能缺乏官方支持。如果需要使用较新的Hadoop版本，可能需要进行额外的配置和测试以确保兼容性。

Spring XD可以通过以下步骤与Hadoop集成：

1. 配置Hadoop环境

确保Hadoop集群已经搭建并运行正常，能够通过hadoop fs -ls等命令访问HDFS。

2. 修改Spring XD配置文件

编辑Spring XD的server.yaml配置文件，添加Hadoop相关配置信息：

hadoop:
  fsUri: hdfs://<HDFS地址>:<端口>
  resourceManagerHost: <ResourceManager地址>
  resourceManagerPort: <ResourceManager端口>
  yarnApplicationClasspath: <YARN应用类路径>

例如：

hadoop:
  fsUri: hdfs://10.10.1.110:8020
  resourceManagerHost: 10.10.1.110
  resourceManagerPort: 8032
  yarnApplicationClasspath: /etc/hadoop/conf

3. 创建HDFS目录并设置权限

切换到Hadoop用户，使用以下命令创建目录并设置权限：

hadoop fs -mkdir /xd
hadoop fs -chmod -R 777 /xd

4. 启动Spring XD

启动Spring XD的Admin和Container服务：

bin/xd-admin
bin/xd-container

5. 创建流并部署

进入xd-shell交互环境，创建流并部署：

stream create --name myhdfsstream1 --definition "time | hdfs" --deploy

此时，Spring XD会将time模块产生的数据写入到HDFS中指定的目录。

6. 验证数据写入

使用Hadoop命令查看HDFS中数据是否写入成功：

hadoop fs -ls /xd/myhdfsstream1

通过以上步骤，Spring XD就可以与Hadoop集成，实现数据的采集、处理和存储。

Spring XD is an open - source, distributed, and extensible system for building data - intensive applications. Here is a detailed introduction:

Overview

Spring XD is designed to simplify the development of data - processing applications by providing a high - level, declarative programming model. It enables developers to quickly assemble and deploy complex data - processing pipelines with ease.

Key Features

Modular Architecture: It is composed of various modules such as sources, processors, and sinks. Sources are used to ingest data from different sources like files, databases, or message queues. Processors perform operations on the data, such as transformation, filtering, and aggregation. Sinks are responsible for sending the processed data to the final destination, which could be a database, a file system, or a messaging system. This modular design allows for easy customization and reuse of components.
Declarative Programming Model: Spring XD uses a declarative approach to define data - processing pipelines. Developers can describe the flow of data and the operations to be performed using a simple DSL (Domain - Specific Language). For example, a pipeline that reads data from a file, processes it, and writes it to a database can be defined in a few lines of code, reducing the amount of boilerplate code required.
Scalability and High Availability: It is built on top of distributed computing frameworks like Apache Hadoop and Apache Storm, which provide inherent scalability and fault - tolerance. Spring XD can automatically scale the data - processing tasks across multiple nodes in a cluster, ensuring high performance and availability. In case of a node failure, the system can automatically re - route the data - processing tasks to other available nodes, minimizing downtime.
Integration with Spring Ecosystem: As part of the Spring family of projects, Spring XD integrates seamlessly with other Spring frameworks such as Spring Boot, Spring Data, and Spring Batch. This allows developers to leverage the existing Spring - based infrastructure and development practices, making it easier to build and manage data - intensive applications.

Use Cases

Data Ingestion and Transformation: Spring XD is widely used for ingesting data from various sources, performing real - time or batch - based transformations, and loading the processed data into a data warehouse or a database. For example, it can be used to extract data from multiple CSV files, clean and transform the data, and load it into a relational database for further analysis.
Stream Processing: It is well - suited for stream - processing applications where data arrives in a continuous stream and needs to be processed in real - time. Spring XD can handle high - volume data streams, perform operations like filtering, aggregation, and event detection, and send the results to downstream systems for immediate action or further analysis.
Batch Processing: Spring XD can also be used for traditional batch - processing tasks, such as nightly data - loading jobs or batch - oriented data - transformation tasks. It can manage the execution of batch jobs, handle dependencies between jobs, and provide monitoring and reporting capabilities.

Community and Support

Spring XD has an active open - source community, which means that there is a wealth of documentation, tutorials, and sample code available. The community also contributes to the continuous improvement and evolution of the project, adding new features and fixing bugs. Additionally, Pivotal, the company behind Spring XD, provides commercial support and services for enterprises that need additional assistance in deploying and maintaining Spring XD - based applications.

Spring XD是一个开源的分布式数据处理平台，旨在简化大数据应用程序的开发和部署。它提供了一套工具和编程模型，用于协调和管理大规模的数据流程和分析任务。以下是Spring XD的主要功能和用途：

数据采集和集成

Spring XD可以从各种来源（如消息队列、文件系统、HTTP等）采集和集成数据。它支持多种数据格式（如JSON、XML等）和协议（如JMS、AMQP等），可以通过简单的配置实现数据的采集和传输。

批处理

Spring XD提供了强大的批处理功能，可以对大规模的数据集进行处理。它支持高吞吐量和高性能的数据处理，并且可以轻松地与Hadoop和其他批处理框架集成。

流处理

Spring XD可以处理实时的数据流。它支持数据流的实时处理、过滤、转换和聚合，并且可以通过简单的配置实现数据流的管道。它还支持高可靠性和可伸缩性，可以处理高并发的数据流。

数据转换和清洗

Spring XD可以进行数据转换和清洗，使得数据格式一致和规范。它支持多种转换和清洗方式，如数据映射、规则引擎、规则过滤等。

实时分析和可视化

Spring XD可以进行实时数据分析和可视化。它可以实时分析数据流，并生成各种可视化图表和报表，以帮助用户理解和分析数据。

机器学习

Spring XD还支持机器学习功能，可以用于构建和训练机器学习模型。它提供了丰富的机器学习算法和工具，可以处理和分析大规模的数据集，并根据数据进行模型训练和预测。Spring XD的机器学习功能可以应用于各种领域，如推荐系统、智能分析和自动化决策等。

Spring XD的设计基础是Spring生态系统，包括Spring Integration、Data和Batch等项目。它提供了一个即开即用的服务服务器、可插拔模块系统、高级配置DSL（领域特定语言）以及一种将数据处理实例分布部署于Hadoop集群内外的简易模型。

Spring XD的体系结构设计与高度集成的能力是其核心亮点。通过利用Spring Integration的强大路由和转换机制，它可以无缝整合各种数据源和目标。此外，其模块化设计允许开发者轻松扩展功能，比如添加新的数据源、处理器或是sink（数据接收器）。

尽管Spring XD不再由VMware积极维护，但其作为曾经引领大数据处理潮流的重量级框架，依然值得深入探讨。

Today we are officially kicking off a new initiative called Spring XD whose theme is “tackling Big Data complexity”1.

The Spring Data team has been incredibly busy over the past few years, not only providing support for NoSQL datastores but also simplifying the development experience with Hadoop. With the creation of the Spring for Apache Hadoop project, we made it easier to get started developing Hadoop applications by providing a rich configuration model and a consistent programming model across Hadoop ecosystem projects such as Hive and Pig. As Spring users would expect, one can:

Configure and run MapReduce jobs as container managed objects.

Use template helper classes for HDFS, HBase, Pig and Hive to remove boilerplate code from your applications.

Spring for Apache Hadoop provides a strong foundation for building Hadoop applications. Spring XD builds upon these foundational assets and further simplifies the process of creating real-world big data solutions. Specifically, Spring XD addresses common big data use-cases such as:

High throughput distributed data ingestion into HDFS from a variety of input sources.
Real-time analytics at ingestion time, e.g. gathering metrics and counting values.
Hadoop workflow management via batch jobs that combine interactions with standard enterprise systems (e.g. RDBMS) as well as Hadoop operations (e.g. MapReduce, HDFS, Pig, Hive or Cascading).
High throughput data export, e.g. from HDFS to a RDBMS or NoSQL database.

The Spring Data book covers several of these use-cases, and the sample code for that book is available in our GitHub repository. Those examples are built upon Spring Batch and Spring Integration in addition to the Spring for Apache Hadoop project.

When it comes to managing event-driven data ingestion streams, Spring Integration provides a proven model, inspired by the well-established Enterprise Integration Patterns. Likewise, Spring Batch is a powerful solution for managing workflows, with robust support for the most important requirements such as job state management and retry/restart capabilities, and is the basis for JSR-352.

Extending the frameworks to support Big Data use-cases started with the book examples, but with Spring XD we aim to take that support to another level. First, we will provide a consistent model that spans the four use-case categories listed above. That model will be immediately familiar to those with Spring experience. Second, as Spring XD evolves we will be moving well beyond the API layer to provide an out-of-the-box executable server, a pluggable module system, a simple model for distributing data collection instances on or off the Hadoop cluster, and more.

If this sounds interesting to you, get involved! You can fork the repository and/or monitor JIRA. It’s practically a clean-slate now, but we wanted to make sure that our community members had a chance to get in on the ground floor. As always, we consider the feedback from our broad and passionate community to be our greatest asset. We have been doing a lot of prototyping over the past year, so you will see some code drops soon. Also, we plan to post blogs after each sprint so that you can follow along with the progress. And, if you haven’t yet registered for SpringOne, please do; Spring XD will be featured prominently.

Finally, be sure to sign up for our live streaming event tomorrow (April 24th): Pivotal: A New Platform for a New Era.

1XD = eXtreme Data or ‘x’ as in y = mx + b 😉

今天，我们正式启动一个名为Spring XD的新计划，其主题是“解决大数据复杂性”1。
在过去几年中，Spring数据团队异常忙碌，不仅为NoSQL数据存储提供支持，而且简化了Hadoop的开发体验。随着Spring for Apache Hadoop项目的创建，我们通过跨Hadoop生态系统项目（如Hive和Pig）提供丰富的配置模型和一致的编程模型，使开发Hadoop应用程序变得更加容易。正如Spring用户所期望的，我们可以：
将MapReduce作业配置为容器管理对象并运行。
使用HDFS、HBase、Pig和Hive的模板助手类从应用程序中删除样板代码。
Apache Hadoop的Spring为构建Hadoop应用提供了坚实的基础。Spring XD建立在这些基础资产之上，进一步简化了创建真实世界大数据解决方案的过程。具体来说，Spring XD解决了常见的大数据用例，例如：
从各种输入源向HDFS的高吞吐量分布式数据摄取。
摄取时间的实时分析，例如收集指标和计数值。
Hadoop工作流管理，通过批处理作业将交互与标准企业系统（如RDBMS）以及Hadoop操作（如MapReduce、HDFS、Pig、Hive或级联）结合起来。
高吞吐量数据导出，例如从HDFS到RDBMS或NoSQL数据库。
在这里插入图片描述