當(dāng)前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

Apache Griffin安装

發(fā)布時間：2024/6/21 综合教程 22 生活家

生活随笔收集整理的這篇文章主要介紹了 Apache Griffin安装小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

介紹

1.原理：

從hivemetadata中加載數(shù)據(jù)源
根據(jù)用戶指定的數(shù)據(jù)質(zhì)量檢查的規(guī)則，將規(guī)則轉(zhuǎn)換為Spark程序，利用Spark這種強(qiáng)大的計(jì)算能力，為數(shù)據(jù)質(zhì)量做出檢測分析。

2.程序設(shè)計(jì)模塊

measure：
計(jì)算層，使用spark計(jì)算用戶制定的數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則，由scala開發(fā)。
service：
服務(wù)層，對接ui的后端接口，定時調(diào)度、向livy提交spark程序的角色。
ui：
展現(xiàn)層，由angular2開發(fā)

安裝

一、集群基礎(chǔ)環(huán)境

1.JDK (1.8 or later versions)

2.PostgreSQL(version 10.4) or MySQL(version 8.0.11)

3.Hadoop(2.6.0 or later)

4.Hive(version 2.x),安裝參考：https://www.cnblogs.com/caoxb/p/11333741.html

5.Spark(version 2.2.1) 安裝參考： https://blog.csdn.net/k393393/article/details/92440892

6.Livy 安裝參考：https://www.cnblogs.com/students/p/11400940.html

7.ElasticSearch(5.0 or later versions). 參考https://blog.csdn.net/fiery_heart/article/details/85265585

8.Scala

二、安裝Grigffin

1、MySQL：

1）在MySQL中創(chuàng)建數(shù)據(jù)庫quartz，

2）然后執(zhí)行Init_quartz_mysql_innodb.sql腳本初始化表信息：

mysql -u <username> -p <password> quartz < Init_quartz_mysql_innodb.sql

2、Hadoop和Hive：

從Hadoop服務(wù)器拷貝配置文件到Livy服務(wù)器上，這里假設(shè)將配置文件放在/usr/data/conf目錄下。

在Hadoop服務(wù)器上創(chuàng)建/home/spark_conf目錄，并將Hive的配置文件hive-site.xml上傳到該目錄下：

#創(chuàng)建/home/spark_conf目錄
hadoop fs -mkdir -p /home/spark_conf
#上傳hive-site.xml
hadoop fs -put hive-site.xml /home/spark_conf/

3、設(shè)置環(huán)境變量：

#!/bin/bash
export JAVA_HOME=/data/jdk1.8.0_192

#spark目錄
export SPARK_HOME=/usr/data/spark-2.1.1-bin-2.6.3
#livy命令目錄
export LIVY_HOME=/usr/data/livy/bin
#hadoop配置文件目錄
export HADOOP_CONF_DIR=/usr/data/conf

4、Livy配置：

更新livy/conf下的livy.conf配置文件：

livy.server.host = 127.0.0.1
livy.spark.master = yarn
livy.spark.deployMode = cluster
livy.repl.enable-hive-context = true

啟動livy：

livy-server start

5、Elasticsearch配置：

在ES里創(chuàng)建griffin索引：

curl -H "Content-Type: application/json" -XPUT http://es:9200/griffin?include_type_name=true '
{
    "aliases": {},
    "mappings": {
        "accuracy": {
            "properties": {
                "name": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "tmst": {
                    "type": "date"
                }
            }
        }
    },
    "settings": {
        "index": {
            "number_of_replicas": "2",
            "number_of_shards": "5"
        }
    }
}'

源碼打包部署

在這里我使用源碼編譯打包的方式來部署Griffin，Griffin的源碼地址是：https://github.com/apache/griffin.git，這里我使用的源碼tag是griffin-0.4.0

Griffin的源碼結(jié)構(gòu)很清晰，主要包括griffin-doc、measure、service和ui四個模塊，其中g(shù)riffin-doc負(fù)責(zé)存放Griffin的文檔，measure負(fù)責(zé)與spark交互，執(zhí)行統(tǒng)計(jì)任務(wù)，service使用spring boot作為服務(wù)實(shí)現(xiàn)，負(fù)責(zé)給ui模塊提供交互所需的restful api，保存統(tǒng)計(jì)任務(wù)，展示統(tǒng)計(jì)結(jié)果。

源碼導(dǎo)入構(gòu)建完畢后，需要修改配置文件，具體修改的配置文件如下：

1、service/src/main/resources/application.properties：

# Apache Griffin應(yīng)用名稱
spring.application.name=griffin_service
# MySQL數(shù)據(jù)庫配置信息
spring.datasource.url=jdbc:mysql://10.xxx.xx.xxx:3306/griffin_quartz?useSSL=false
spring.datasource.username=xxxxx
spring.datasource.password=xxxxx
spring.jpa.generate-ddl=true
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.jpa.show-sql=true
# Hive metastore配置信息
hive.metastore.uris=thrift://namenode.test01.xxx:9083
hive.metastore.dbname=default
hive.hmshandler.retry.attempts=15
hive.hmshandler.retry.interval=2000ms
# Hive cache time
cache.evict.hive.fixedRate.in.milliseconds=900000
# Kafka schema registry，按需配置
kafka.schema.registry.url=http://namenode.test01.xxx:8081
# Update job instance state at regular intervals
jobInstance.fixedDelay.in.milliseconds=60000
# Expired time of job instance which is 7 days that is 604800000 milliseconds.Time unit only supports milliseconds
jobInstance.expired.milliseconds=604800000
# schedule predicate job every 5 minutes and repeat 12 times at most
#interval time unit s:second m:minute h:hour d:day,only support these four units
predicate.job.interval=5m
predicate.job.repeat.count=12
# external properties directory location
external.config.location=
# external BATCH or STREAMING env
external.env.location=
# login strategy ("default" or "ldap")
login.strategy=default
# ldap，登錄策略為ldap時配置
ldap.url=ldap://hostname:port
ldap.email=@example.com
ldap.searchBase=DC=org,DC=example
ldap.searchPattern=(sAMAccountName={0})
# hdfs default name
fs.defaultFS=
# elasticsearch配置
elasticsearch.host=griffindq02-test1-rgtj1-tj1
elasticsearch.port=9200
elasticsearch.scheme=http
# elasticsearch.user = user
# elasticsearch.password = password
# livy配置
livy.uri=http://10.104.xxx.xxx:8998/batches
# yarn url配置
yarn.uri=http://10.104.xxx.xxx:8088
# griffin event listener
internal.event.listeners=GriffinJobEventHook

2、service/src/main/resources/quartz.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
# 
#   http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
org.quartz.scheduler.instanceName=spring-boot-quartz
org.quartz.scheduler.instanceId=AUTO
org.quartz.threadPool.threadCount=5
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
# If you use postgresql as your database,set this property value to org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
# If you use mysql as your database,set this property value to org.quartz.impl.jdbcjobstore.StdJDBCDelegate
# If you use h2 as your database, it's ok to set this property value to StdJDBCDelegate, PostgreSQLDelegate or others
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.useProperties=true
org.quartz.jobStore.misfireThreshold=60000
org.quartz.jobStore.tablePrefix=QRTZ_
org.quartz.jobStore.isClustered=true
org.quartz.jobStore.clusterCheckinInterval=20000

3、service/src/main/resources/sparkProperties.json：

{
  "file": "hdfs:///griffin/griffin-measure.jar",
  "className": "org.apache.griffin.measure.Application",
  "name": "griffin",
  "queue": "default",
  "numExecutors": 2,
  "executorCores": 1,
  "driverMemory": "1g",
  "executorMemory": "1g",
  "conf": {
    "spark.yarn.dist.files": "hdfs:///home/spark_conf/hive-site.xml"
  },
  "files": [
  ]
}

4、service/src/main/resources/env/env_batch.json：

{
  "spark": {
    "log.level": "INFO"
  },
  "sinks": [
    {
      "type": "CONSOLE",
      "config": {
        "max.log.lines": 10
      }
    },
    {
      "type": "HDFS",
      "config": {
        "path": "hdfs://namenodetest01.xx.xxxx.com:9001/griffin/persist",
        "max.persist.lines": 10000,
        "max.lines.per.file": 10000
      }
    },
    {
      "type": "ELASTICSEARCH",
      "config": {
        "method": "post",
        "api": "http://10.xxx.xxx.xxx:9200/griffin/accuracy",
        "connection.timeout": "1m",
        "retry": 10
      }
    }
  ],
  "griffin.checkpoint": []
}

配置文件修改好后，在idea里的terminal里執(zhí)行如下maven命令進(jìn)行編譯打包：

mvn -Dmaven.test.skip=true clean install

命令執(zhí)行完成后，會在service和measure模塊的target目錄下分別看到service-0.4.0.jar和measure-0.4.0.jar兩個jar，將這兩個jar分別拷貝到服務(wù)器目錄下。這兩個jar的使用方式如下：

1、使用如下命令將measure-0.4.0.jar這個jar上傳到HDFS的/griffin文件目錄里：

#改變jar名稱
mv measure-0.4.0.jar griffin-measure.jar
mv service-0.4.0.jar griffin-service.jar
#上傳griffin-measure.jar到HDFS文件目錄里
hadoop fs -put measure-0.4.0.jar /griffin/

這樣做的目的主要是因?yàn)閟park在yarn集群上執(zhí)行任務(wù)時，需要到HDFS的/griffin目錄下加載griffin-measure.jar，避免發(fā)生類org.apache.griffin.measure.Application找不到的錯誤。

2、運(yùn)行service-0.4.0.jar，啟動Griffin管理后臺：

nohup java -jar service-0.4.0.jar>service.out 2>&1 &

幾秒鐘后，我們可以訪問Apache Griffin的默認(rèn)UI(默認(rèn)情況下，spring boot的端口是8080)。

http://IP:8080

基于Apache Griffin Kafka源數(shù)據(jù)計(jì)算

http://griffin.apache.org/docs/usecases.html

實(shí)時數(shù)據(jù)檢測目前未有界面配置，可以通過api的方式提交實(shí)時數(shù)據(jù)監(jiān)控

總結(jié)

以上是生活随笔為你收集整理的Apache Griffin安装的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：系统进程
下一篇：电信机顶盒多少钱一个（装宽带机顶盒要钱吗