2017-08-25
Spark与Scala
• 分类:
python
• 标签:
python
Scala现在是为聪明人创造的,以后也是为聪明人服务的。——马丁·奥德斯基
Spark的安装与配置
mac环境下需要注意本机自带的python2环境与安装的python3的冲突问题,Spark默认使用python环境是python2,可在Spark的配置目录下将下行添加至配置文件
1 | export PYSPARK_PYTHON=python3 |
Spark连接jupyter
Install anaconda for OSX.
Install jupyter typing the next line in your terminal Click me for more info.
1
2ilovejobs@mymac:~$ conda install jupyter
Update jupyter just in case.
1
2ilovejobs@mymac:~$ conda update jupyter
Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.
1
2
3ilovejobs@mymac:~$ cd Downloads
ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgzCreate an
Apps
folder on your home (i.e):1
2ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
Move the uncompressed folder
spark-1.5.1
to the~/Apps
directory.1
2ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
Move to the
~/Apps
directory and verify that spark is there.1
2
3
4ilovejobs@mymac:~/Downloads$ cd ~/Apps
ilovejobs@mymac:~/Apps$ ls -l
drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1Here is the first tricky part. Add the spark binaries to your
$PATH
:1
2
3ilovejobs@mymac:~/Apps$ cd
ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profileHere is the second tricky part. Add this environment variables also:
1
2
3ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profileSource the profile to make these variables available for this terminal
1 | ilovejobs@mymac:~$ source .profile |
Create a
~/notebooks
directory.1
2ilovejobs@mymac:~$ mkdir notebooks
Move to
~/notebooks
and run pyspark:1
2ilovejobs@mymac:~$ cd notebooks
ilovejobs@mymac:~/notebooks$ pyspark
SparkContext
默认情况下一台电脑只能拥有一个SparkContext,特别是在使用jupyter的时候,如果一个notebook启用了SparkContext,需要遵循以下步骤,从而在新创建的notebook上在使用sc
关闭原来的SparkContext,即调用sc.stop()
关闭kernel
在新的notebook重启
不需要新创建,直接获取sc
也可添加重启pyspark组合操作
Spark-RDD
pass
dzzxjl