Connect client server
The following explains how to connect a client VM created by the user to the HDE cluster.
Hadoop
The steps to connect Hadoop are as follows:
1. Copy configuration files
The configuration of components installed on the HDE cluster is located in /etc/[component name]/conf
.
There may be cases where some nodes have configurations and others do not, and below is how to check the configuration for Hadoop and Hive.
(To access the configuration file locations for each component, compress them using the tar
command and copy them using the scp
command.)
########################################
# Check hadoop configuration
########################################
$ cd /etc/hadoop/conf
$ ls -alh
total 216
drwxr-xr-x 3 ubuntu ubuntu 4096 Sep 26 18:19 ./
drwxr-xr-x 3 ubuntu ubuntu 4096 Sep 26 18:18 ../
-rw-r--r-- 1 ubuntu ubuntu 9610 Sep 26 18:19 capacity-scheduler.xml
-rw-r--r-- 1 ubuntu ubuntu 1335 Jul 29 2022 configuration.xsl
-rw-r--r-- 1 ubuntu ubuntu 2567 Jul 29 2022 container-executor.cfg
-rw-r--r-- 1 ubuntu ubuntu 5017 Sep 26 18:19 core-site.xml
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 26 18:19 dfs.hosts.exclude
-rw-r--r-- 1 ubuntu ubuntu 3999 Jul 29 2022 hadoop-env.cmd
########################################
# Check hive configuration
########################################
$ cd /etc/hive/conf
$ ls -alh
total 380
drwxr-xr-x 2 ubuntu ubuntu 4096 Sep 26 18:19 ./
drwxr-xr-x 10 ubuntu ubuntu 4096 Sep 26 18:21 ../
-rw-r--r-- 1 ubuntu ubuntu 1596 Oct 24 2019 beeline-log4j2.properties.template
-rw-r--r-- 1 ubuntu ubuntu 300727 Apr 4 2022 hive-default.xml.template
-rw-rw-r-- 1 ubuntu ubuntu 2194 Sep 26 18:19 hive-env.sh
-rw-r--r-- 1 ubuntu ubuntu 2365 Oct 24 2019 hive-env.sh.template
########################################
# Compress and copy hadoop configuration
########################################
# Compress hadoop configuration
$ tar czf hadoop-conf.tgz /etc/hadoop/conf/*
# Copy hadoop configuration
$ scp -i {PRIVATE_KEY file} hadoop-conf.tgz ubuntu@{target node IP}:{PATH}
2. Add /etc/hosts settings
To connect to the cluster, you need to add the host information to the user VM.
To do this, check the /etc/hosts file on the VM of the cluster you want to connect to and copy it.
(You also need to add the user VM's information to each node of the cluster.)
- HDE cluster nodes
- Need to add user VM's host information
- User VM
- Need to add HDE cluster node information
########################################
# Modify /etc/hosts file
########################################
$ sudo vi /etc/hosts
########################################
# Check /etc/hosts settings
######################################## $ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
172.30.35.48 host-172-30-35-48
172.30.34.235 host-172-30-34-235
172.30.34.124 host-172-30-34-124
3. Download and extract files
Download the following version of Java and the appropriate Hadoop version for the HDE version.
Component | Version | Location |
---|---|---|
java | jdk8 | |
HDE-1.0.0 | hadoop 2.10.1 | |
HDE-1.1.0 | hadoop 2.10.1 | |
HDE-1.1.1 | hadoop 2.10.2 | |
HDE-2.0.0 | hadoop 3.3.4 |
########################
# Download files
# Java, Hadoop
$ wget https://objectstorage.kr-central-1.kakaocloud.com/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-install-file/component/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
$ wget https://objectstorage.kr-central-1.kakaocloud.com/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-install-file/component/hadoop-3.3.4-kbp.tar.gz
########################
# Extract files
$ tar zxf hadoop-3.3.4-kbp.tar.gz
$ tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
4. Run commands
After extracting, you can run the following commands.
# Check current file structure
$ ls -alh
total 782620
drwxr-x--- 7 ubuntu ubuntu 4096 Sep 27 08:57 ./
drwxr-xr-x 3 root root 4096 Sep 27 08:54 ../
-rw-r--r-- 1 ubuntu ubuntu 220 Jan 6 2022 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Jan 6 2022 .bashrc
drwx------ 2 ubuntu ubuntu 4096 Sep 27 08:56 .cache/
-rw-r--r-- 1 ubuntu ubuntu 807 Jan 6 2022 .profile
drwx------ 2 ubuntu ubuntu 4096 Sep 27 08:54 .ssh/
-rw-rw-r-- 1 ubuntu ubuntu 191 Sep 27 08:56 .wget-hsts
-rw-rw-r-- 1 ubuntu ubuntu 103200089 Aug 9 05:14 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
drwxrwxr-x 3 ubuntu ubuntu 4096 Sep 27 08:57 etc/
drwxr-xr-x 10 ubuntu ubuntu 4096 Jul 29 2022 hadoop-3.3.4/
-rw-rw-r-- 1 ubuntu ubuntu 698117781 Aug 9 05:10 hadoop-3.3.4-kbp.tar.gz
-rw-rw-r-- 1 ubuntu ubuntu 28867 Sep 27 08:57 hadoop-conf.tgz
drwxr-xr-x 8 ubuntu ubuntu 4096 Jul 15 2020 jdk8u262-b10/
# Export environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
# Run HDFS query commands
$ ./hadoop-3.3.4/bin/hadoop fs -ls hdfs://host-172-30-35-48/
Found 8 items
drwxrwxrwt - yarn hadoop 0 2023-09-27 00:07 hdfs://host-172-30-35-48/app-logs
drwxrwxrwx - hdfs hadoop 0 2023-09-26 23:18 hdfs://host-172-30-35-48/apps
drwxr-xr-t - yarn hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/ats
drwxr-xr-x - hdfs hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/hadoop
drwxr-xr-x - mapred hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/mr-history
drwxrwxrwt - hdfs hadoop 0 2023-09-26 09:21 hdfs://host-172-30-35-48/tmp
drwxr-xr-x - hdfs hadoop 0 2023-09-26 09:21 hdfs://host-172-30-35-48/user
drwxrwxrwt - yarn hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/var
Hive
The steps to connect Hive are as follows:
- Copy configuration files
- Add /etc/hosts settings
- Download and extract files
- Run commands
Copy configuration files
and /etc/hosts settings
are the same as Hadoop.
Download and extract files
Download the following version of Java and the appropriate Hive version for the HDE version.
Component | Version | Location |
---|---|---|
java | jdk8 | |
HDE-1.0.0 | hive 2.3.2 | |
HDE-1.1.0 | hive 2.3.9 | |
HDE-1.1.1 | hive 2.3.9 | |
HDE-2.0.0 | hive 3.1.3 |
Connect Hive configuration
To connect Hive configuration, you must first configure Hadoop.
After configuring Hadoop, export the necessary settings before connecting, and then proceed by connecting the Hive configuration in the server's configuration file.
#######################################
# Configure environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/
#######################################
# Run beeline and check database information
$ ./apache-hive-3.1.3-bin/bin/beeline
Beeline version 3.1.3 by Apache Hive
beeline> !connect jdbc:hive2://host-172-30-35-48:10000/default
Connecting to jdbc:hive2://host-172-30-35-48:10000/default
Enter username for jdbc:hive2://host-172-30-35-48:10000/default:
Enter password for jdbc:hive2://host-172-30-35-48:10000/default:
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://host-172-30-35-48:10000/defau> show databases;
INFO : Compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 1.181 seconds
INFO : Executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 0.63 seconds
INFO : OK
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (2.467 seconds)
Spark
Download and extract files
Download the following version of Java and the appropriate Spark version for the HDE version.
Component | Version | Location |
---|---|---|
java | jdk8 | |
HDE-1.0.0 | spark 2.4.6 | |
HDE-1.1.0 | spark 2.4.8 | |
HDE-1.1.1 | spark 2.4.8 | |
HDE-2.0.0 | spark 3.2.2 |
Connect Spark configuration
#######################################
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/
# Add the information output by the hadoop classpath command to spark-env.sh
# If the path to the hadoop commands is already added to PATH, no need to set it separately
$ export SPARK_DIST_CLASSPATH="../home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/lib/*:/home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/*.."
#######################################
# Check /etc/hosts information
$ User VM and the cluster should have the user VM's host information and the cluster's host information added.
$ spark-shell
23/09/29 04:23:12 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
23/09/29 04:23:12 INFO BlockManagerMasterEndpoint: Registering block manager host-172-30-33-233:41301 with 366.3 MiB RAM, BlockManagerId(1, host-172-30-33-233, 41301, None)
Spark context Web UI available at http://host-172-30-33-46:4040
Spark context available as 'sc' (master = yarn, app id = application_1695806330771_0003).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.2
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_262)
Type in expressions to have them evaluated.
Type :help for more information.
Check errors
- Common errors
- Caused by: java.net.UnknownHostException: host-172-30-33-46
- Occurs when the cluster's host information is not added to the user VM, and the user VM's host information is not added to the cluster's VM.
- Caused by: java.lang.ClassNotFoundException: org.apache.log4j.spi.Filter
- Occurs when the information is not correctly added to SPARK_DIST_CLASSPATH.
- Caused by: java.net.UnknownHostException: host-172-30-33-46
Hbase
Download and extract files
Download the following version of Java and the appropriate HBase version for the HDE version.
Component | Version | Location |
---|---|---|
java | jdk8 | |
HDE-1.0.0 | hbase 1.4.13 | |
HDE-1.1.0 | hbase 1.7.1 | |
HDE-1.1.1 | hbase 1.7.1 | |
HDE-2.0.0 | hbase 2.4.13 |
Connect HBase configuration
To configure HBase, Hadoop must also be configured.
#######################################
# Set environment variables
export JAVA_HOME=/home/ubuntu/jdk8u262-b10
export HADOOP_CONF_DIR=/home/ubuntu/etc/hadoop/conf/
export HBASE_CONF_DIR=/home/ubuntu/etc/hbase/conf/
$ ./hbase-1.4.13/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020
hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load
hbase(main):002:0> exit