Skip to content
This repository was archived by the owner on Jul 26, 2025. It is now read-only.

Commit 512ff7a

Browse files
committed
readme, tispark: update TiSpark and enable sparkR and pyspark (#27)
* update tispark to 1.0 * add TiSparkR * upgrade tispark version to 1.0.1
1 parent a6460a1 commit 512ff7a

File tree

6 files changed

+125
-4
lines changed

6 files changed

+125
-4
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,3 +184,12 @@ scala> spark.sql("select count(*) from lineitem").show
184184
| 60175|
185185
+--------+
186186
```
187+
188+
You can also access Spark with Python or R using the following commands:
189+
190+
```
191+
docker-compose exec tispark-master /opt/spark/bin/pyspark
192+
docker-compose exec tispark-master /opt/spark/bin/sparkR
193+
```
194+
195+
More documents about TiSpark can be found [here](https://github.com/pingcap/tispark).

tispark/Dockerfile

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,40 @@ FROM anapsix/alpine-java:8
22

33
ENV SPARK_VERSION=2.1.1 \
44
HADOOP_VERSION=2.7 \
5-
TISPARK_VERSION=0.1.0-SNAPSHOT \
5+
TISPARK_VERSION=1.0.1 \
6+
TISPARK_R_VERSION=1.1 \
7+
TISPARK_PYTHON_VERSION=1.0.1 \
68
SPARK_HOME=/opt/spark \
79
SPARK_NO_DAEMONIZE=true \
810
SPARK_MASTER_PORT=7077 \
911
SPARK_MASTER_HOST=0.0.0.0 \
1012
SPARK_MASTER_WEBUI_PORT=8080
1113

14+
ADD R /TiSparkR
15+
1216
# base image only contains busybox version nohup and ps
1317
# spark scripts needs nohup in coreutils and ps in procps
1418
# and we can use mysql-client to test tidb connection
15-
RUN apk --no-cache add coreutils procps mysql-client python py-pip R \
16-
&& pip install pytispark==1.0.1 pyspark==2.1.2
19+
RUN apk --no-cache add \
20+
coreutils \
21+
mysql-client \
22+
procps \
23+
python \
24+
py-pip \
25+
R \
26+
&& pip install --no-cache-dir pytispark==${TISPARK_PYTHON_VERSION} \
27+
&& R CMD build TiSparkR \
28+
&& R CMD INSTALL TiSparkR_${TISPARK_R_VERSION}.tar.gz \
29+
&& rm -rf /TiSparkR_${TISPARK_R_VERSION}.tar.gz /TiSparkR
1730

1831
RUN wget -q https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
1932
&& tar zxf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz -C /opt/ \
2033
&& ln -s /opt/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} ${SPARK_HOME} \
21-
&& wget -q http://download.pingcap.org/tispark-${TISPARK_VERSION}-jar-with-dependencies.jar -P ${SPARK_HOME}/jars \
34+
&& wget -q https://github.com/pingcap/tispark/releases/download/${TISPARK_VERSION}/tispark-core-${TISPARK_VERSION}-jar-with-dependencies.jar -P ${SPARK_HOME}/jars \
2235
&& wget -q http://download.pingcap.org/tispark-sample-data.tar.gz \
2336
&& tar zxf tispark-sample-data.tar.gz -C ${SPARK_HOME}/data/ \
2437
&& rm -rf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz tispark-sample-data.tar.gz
2538

39+
ENV PYTHONPATH=${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${SPARK_HOME}/python:$PYTHONPATH
40+
2641
WORKDIR ${SPARK_HOME}

tispark/R/DESCRIPTION

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Package: TiSparkR
2+
Type: Package
3+
Title: TiSpark for R
4+
Version: 1.1
5+
Author: PingCAP
6+
Maintainer: Novemser <[email protected]>
7+
Description: A shabby thin layer to support TiSpark in R language.
8+
License: Apache 2.0
9+
Copyright: 2017 PingCAP, Inc.
10+
Encoding: UTF-8
11+
LazyData: true

tispark/R/NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
exportPattern("^[[:alpha:]]+")

tispark/R/R/tisparkR.R

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#
2+
# Copyright 2017 PingCAP, Inc.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
#
15+
#
16+
17+
# Title : TiSparkR
18+
# Objective : TiSpark entry for R
19+
# Created by: novemser
20+
# Created on: 17-11-1
21+
22+
# Function:createTiContext
23+
# Create a new TiContext via the spark session passed in
24+
#
25+
# @return A new TiContext created on session
26+
# @param session A Spark Session for TiContext creation
27+
createTiContext <- function(session) {
28+
sparkR.newJObject("org.apache.spark.sql.TiContext", session)
29+
}
30+
31+
# Function:tidbMapDatabase
32+
# Mapping TiContext designated database to `dbName`.
33+
#
34+
# @param tiContext TiSpark context
35+
# @param dbName Database name to map
36+
# @param isPrefix Whether to use dbName As Prefix
37+
# @param loadStatistics Whether to use statistics information from TiDB
38+
tidbMapDatabase <- function(tiContext, dbName, isPrefix=FALSE, loadStatistics=TRUE) {
39+
sparkR.callJMethod(tiContext, "tidbMapDatabase", dbName, isPrefix, loadStatistics)
40+
paste("Mapping to database:", dbName)
41+
}

tispark/R/README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
## TiSparkR
2+
TiSparkR is a thin layer built to support the R language with TiSpark.
3+
4+
### Usage
5+
1. Download the TiSparkR source code and build a binary package (run `R CMD build R` in TiSpark root directory). Install it to your local R library (e.g. via `R CMD INSTALL TiSparkR_1.0.0.tar.gz`)
6+
7+
2. Build or download TiSpark dependency jar `tispark-core-1.0-RC1-jar-with-dependencies.jar` [here](https://github.com/pingcap/tispark).
8+
9+
3. `cd` to your Spark home directory, and run:
10+
```
11+
./bin/sparkR --jars /where-ever-it-is/tispark-core-${version}-jar-with-dependencies.jar
12+
```
13+
Note that you should replace the `TiSpark` jar path with your own.
14+
15+
4. Use as below in your R console:
16+
```R
17+
# import tisparkR library
18+
> library(TiSparkR)
19+
# create a TiContext instance
20+
> ti <- createTiContext(spark)
21+
# Map TiContext to database:tpch_test
22+
> tidbMapDatabase(ti, "tpch_test")
23+
24+
# Run a sql query
25+
> customers <- sql("select * from customer")
26+
# Print schema
27+
> printSchema(customers)
28+
root
29+
|-- c_custkey: long (nullable = true)
30+
|-- c_name: string (nullable = true)
31+
|-- c_address: string (nullable = true)
32+
|-- c_nationkey: long (nullable = true)
33+
|-- c_phone: string (nullable = true)
34+
|-- c_acctbal: decimal(15,2) (nullable = true)
35+
|-- c_mktsegment: string (nullable = true)
36+
|-- c_comment: string (nullable = true)
37+
38+
# Run a count query
39+
> count <- sql("select count(*) from customer")
40+
# Print count result
41+
> head(count)
42+
count(1)
43+
1 150
44+
```

0 commit comments

Comments
 (0)