自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(27)
  • 收藏
  • 关注

转载 Spark:The Definitive Book第十四章笔记

In addition to the Resilient Distributed Dataset (RDD) interface, the second kind of low-level API in Spark is two types of “distributed shared variables”: broadcast variables and accumulators. T...

2019-03-04 10:36:00 315

转载 Spark:The Definitive Book第十三章笔记

This chapter covers the advanced RDD operations and focuses on key–value RDDs, a powerful abstraction for manipulating data. We also touch on some more advanced topics like custom partitioning, a...

2019-03-04 10:03:00 266

转载 Spark:The Definitive Book第十二章笔记

What Are the Low-Level APIs?There are two sets of low-level APIs: there is one for manipulating distributed data (RDDs), and another for distributing and manipulating distributed shared variable...

2019-02-28 11:24:00 472

转载 Spark:The Definitive Book第十一章笔记

Datasets are a strictly Java Virtual Machine (JVM) language feature that work only with Scala and Java. Using Datasets, you can define the object that each row in your Dataset will consist of. In...

2019-02-23 14:51:00 366

转载 Spark:The Definitive Book第十章笔记

What Is SQL?Big Data and SQL: Apache HiveBig Data and SQL: Spark SQLThe power of Spark SQL derives from several key facts: SQL analysts can now take advantage of Spark’s computation abilities ...

2019-02-23 11:05:00 307

转载 Spark:The Definitive Book第九章笔记

Spark Core DataSource:CSVJSONParquetORCJDBC/ODBC connectionsPlain-text filesThe Structure of the Data Sources APIRead API StructureThe core structure for reading data is as follows:Data...

2019-02-23 09:58:00 337

转载 Spark:The Definitive Book第八章笔记

Join ExpressionsA join brings together two sets of data, the left and the right, by comparing the value of one or more keys of the left and right and evaluating the result of a join expression t...

2019-02-19 12:29:00 193

转载 Spark:The Definitive Book第七章笔记

分组的类型:The simplest grouping is to just summarize a complete DataFrame by performing an aggregation in a select statement.A “group by” allows you to specify one or more keys as well as one or mo...

2019-02-19 11:06:00 183

转载 Spark:The Definitive Book第六章笔记

Where to Look for APIsDataFrame本质上是类型为Row的DataSet,需要多看https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset来发现API的更新。DataFrameStatFunctions与DataFrameNaFunctions...

2019-02-16 12:40:00 183

转载 Spark:The Definitive Book第五章笔记

DataFrame由record序列组成,record的类型是Row类型。columns代表者计算表达式可以在独立的record上运行。Schema定义了各列的名称和数据类型。分区定义了DataFrame和DataSet在集群上的物理分配。Schemas可以让数据源定义Schema(又叫做读时模式)或者自己明确定义模式。警告:读时模式可能会导致精度问题,在用Spark做ET...

2019-02-14 16:58:00 181

转载 subgraph示例

import org.apache.spark._import org.apache.spark.graphx._import org.apache.spark.rdd.RDDval users: RDD[(VertexId, (String, String))] = sc.parallelize(Array( (3L, ("rxin", "student...

2018-12-20 11:40:00 538

转载 学习Spark GraphX

import org.apache.spark._import org.apache.spark.graphx._import org.apache.spark.rdd.RDDval userGraph: Graph[(String, String), String]Name: Compile ErrorMessage: <console>:30: error: ...

2018-12-20 11:07:00 152

转载 [速记]Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/doc...

https://www.cnblogs.com/informatics/p/8276172.html转载于:https://www.cnblogs.com/DataNerd/p/9154489.html

2018-06-08 10:54:00 98

转载 [速记]python: symbol lookup error: /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0: undefined symbol: g_log_...

python: symbol lookup error: /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0: undefined symbol: g_log_structured_standardhttps://packages.debian.org/sid/amd64/libatk1.0-0/downloadsudo dpkg -i *.deb...

2018-05-27 01:43:00 2292

转载 《Hive编程指南》14.3 投影变换的实践出错原因分析

自己在学习14.3节投影变换执行SQL语句hive (default)> SELECT TRANSFORM(col1, col2) USING '/bin/cut -f1' AS newA, newB FROM a;时出现了这个错误Ended Job = job_local1231989520_0004 with errorsError during job, obtainin...

2018-05-03 22:48:00 680

转载 Maven No sources to compile

现象:自己在用maven执行package命令时出现No sources to compile提示,生成的jar文件没有class文件。原因:项目不是使用maven创建的,项目的目录结构不正确。解决方案:使用maven创建项目,来生成正确的目录结构。参考网址:https://stackoverflow.com/questions/27897104/maven-no-sources-...

2018-05-01 00:44:00 1367

转载 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (make-assem...

自己在使用maven进行package操作时出现[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (make-assembly) on project hive-udf: Error reading assemblies: No assembly descri...

2018-04-30 19:42:00 2940

转载 在使用maven时出现Invalid packaging for parent pom.xml, must be _pom_ but is _xxx类问题的处理...

自己在使用maven进行clean操作时出现Invalid packaging for parent pom.xml, must be pom but is _jar这个错误。在Stack Overflow上找到了类似的问题,https://stackoverflow.com/questions/13330930/invalid-packaging-for-parent-pom-xml...

2018-04-30 18:54:00 1958

转载 用Docker从零开始安装配置Hadoop环境

安装配置Centos下载Docker镜像docker pull centos参考网址:https://hub.docker.com/_/centos/启动Docker镜像并进行必要配置ifconfig, ssh-server, wget问题:/usr/sbin/sshd -D 执行不成功, 处理方式:跳过参考网址:https://blog.csdn.net/m...

2018-04-12 07:08:00 64

转载 在Spark2.1.0中使用Date作为DateFrame列

参考网址:How to store custom objects in Dataset?转载于:https://www.cnblogs.com/DataNerd/p/8684613.html

2018-03-31 22:50:00 144

转载 写的一些代码

自己看书学习时写的练习代码GitHub地址转载于:https://www.cnblogs.com/DataNerd/p/8680703.html

2018-03-31 03:08:00 83

转载 看过的代码

GitHub地址转载于:https://www.cnblogs.com/DataNerd/p/8680699.html

2018-03-31 03:03:00 95

转载 ScipyLectures-simple学习笔记

Chapter 11.4.3 中的常用 magic function。Chapter 2字符串复制>>> 2*b'hellohello' 类型转换>>> float(1)1.0 注意 整数除法 Python2 和Python3 的差别# Python 2>>> 3 / 21 # Python...

2017-12-05 19:01:00 119

转载 机器学习1一个月2017/11/24-2017/12/24

机器学习 andrew ng coursera高等数学 上高等数学 下线性代数概率论与数理统计最优化导论机器学习基石机器学习技法转载于:https://www.cnblogs.com/DataNerd/p/7890983.html...

2017-11-24 15:52:00 110

转载 机器学习课程 matlab 练习

Columns 6557 through 6560 -7.6419 -0.3008 -6.2724 -4.7964 Columns 6561 through 6564 -7.1002 -4.3957 -9.8648 -5.9318 Columns 6565 through 6568 -8.8009 -6.6060 ...

2017-11-15 13:50:00 1392

转载 2017年11月14日 星期二

2017年11月14日 周二TODO机器学习 Andrew Ng 2机器学习 Andrew Ng 3机器学习 Andrew Ng 4机器学习 Andrew Ng 5机器学习 Andrew Ng 6机器学习 Andrew Ng 7机器学习 Andrew Ng 8机器学习 Andrew Ng 9Done机器学习 Andrew Ng 2-7机器学习 Andrew Ng...

2017-11-15 00:39:00 225

转载 关于我

读过的书、看过的视频、学过的MOOC之不完全整理记录环境搭建&配置&etc看过的代码写的一些代码通过邮件联系我:bruce-du@hotmail.com转载于:https://www.cnblogs.com/DataNerd/p/7831732.html...

2017-11-14 11:41:00 99

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除