十一城

跬步千里,小流江海。

Home Linux ML Python Java Thoughts KmKg BookCan Links About

2017-08-25
Spark与Scala

• 分类: python • 标签:

Scala现在是为聪明人创造的,以后也是为聪明人服务的。——马丁·奥德斯基

Spark的安装与配置

mac环境下需要注意本机自带的python2环境与安装的python3的冲突问题,Spark默认使用python环境是python2,可在Spark的配置目录下将下行添加至配置文件

1
export PYSPARK_PYTHON=python3

Spark连接jupyter

  1. Install anaconda for OSX.

  2. Install jupyter typing the next line in your terminal Click me for more info.

    1
    2
    ilovejobs@mymac:~$ conda install jupyter

  3. Update jupyter just in case.

    1
    2
    ilovejobs@mymac:~$ conda update jupyter

  4. Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.

    1
    2
    3
    ilovejobs@mymac:~$ cd Downloads 
    ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz

  5. Create an Apps folder on your home (i.e):

    1
    2
    ilovejobs@mymac:~/Downloads$ mkdir ~/Apps

  6. Move the uncompressed folder spark-1.5.1 to the ~/Apps directory.

    1
    2
    ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps

  7. Move to the ~/Apps directory and verify that spark is there.

    1
    2
    3
    4
    ilovejobs@mymac:~/Downloads$ cd ~/Apps
    ilovejobs@mymac:~/Apps$ ls -l
    drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1

  8. Here is the first tricky part. Add the spark binaries to your $PATH:

    1
    2
    3
    ilovejobs@mymac:~/Apps$ cd
    ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile

  9. Here is the second tricky part. Add this environment variables also:

    1
    2
    3
    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile

  10. Source the profile to make these variables available for this terminal

1
2
ilovejobs@mymac:~$ source .profile

  1. Create a ~/notebooks directory.

    1
    2
    ilovejobs@mymac:~$ mkdir notebooks

  2. Move to ~/notebooks and run pyspark:

    1
    2
    ilovejobs@mymac:~$ cd notebooks
    ilovejobs@mymac:~/notebooks$ pyspark

SparkContext

默认情况下一台电脑只能拥有一个SparkContext,特别是在使用jupyter的时候,如果一个notebook启用了SparkContext,需要遵循以下步骤,从而在新创建的notebook上在使用sc

  • 关闭原来的SparkContext,即调用sc.stop()

  • 关闭kernel

  • 在新的notebook重启

  • 不需要新创建,直接获取sc

  • 也可添加重启pyspark组合操作

Spark-RDD

pass


dzzxjl

Home Linux ML Python Java Thoughts KmKg BookCan Links About