Client-Server Connection

The following explains how to connect a client VM created by the user to the HDE cluster.

Hadoop

The procedure to connect Hadoop is as follows:

1. Copy Configuration Files

The configuration files for the components installed in the HDE cluster are located at /etc/[component-name]/conf.
There may be instances where some nodes have configuration files, while others do not. Below are methods to verify the Hadoop and Hive configurations.
(To copy the configuration files, navigate to the respective component's config folder and compress the files using the tar command, then transfer them using the scp command.)

Hadoop - Copy Configuration Files
########################################
# Check Hadoop settings
########################################
$ cd /etc/hadoop/conf
$ ls -alh
total 216
drwxr-xr-x 3 ubuntu ubuntu  4096 Sep 26 18:19 ./
drwxr-xr-x 3 ubuntu ubuntu  4096 Sep 26 18:18 ../
-rw-r--r-- 1 ubuntu ubuntu  9610 Sep 26 18:19 capacity-scheduler.xml
-rw-r--r-- 1 ubuntu ubuntu  1335 Jul 29  2022 configuration.xsl
-rw-r--r-- 1 ubuntu ubuntu  2567 Jul 29  2022 container-executor.cfg
-rw-r--r-- 1 ubuntu ubuntu  5017 Sep 26 18:19 core-site.xml
-rw-rw-r-- 1 ubuntu ubuntu     0 Sep 26 18:19 dfs.hosts.exclude
-rw-r--r-- 1 ubuntu ubuntu  3999 Jul 29  2022 hadoop-env.cmd
 
########################################
# Check Hive settings
########################################
$ cd /etc/hive/conf
$ ls -alh
total 380
drwxr-xr-x  2 ubuntu ubuntu   4096 Sep 26 18:19 ./
drwxr-xr-x 10 ubuntu ubuntu   4096 Sep 26 18:21 ../
-rw-r--r--  1 ubuntu ubuntu   1596 Oct 24  2019 beeline-log4j2.properties.template
-rw-r--r--  1 ubuntu ubuntu 300727 Apr  4  2022 hive-default.xml.template
-rw-rw-r--  1 ubuntu ubuntu   2194 Sep 26 18:19 hive-env.sh
-rw-r--r--  1 ubuntu ubuntu   2365 Oct 24  2019 hive-env.sh.template
 
########################################
# Compress and copy Hadoop configuration
########################################
# Compress Hadoop settings
$ tar czf hadoop-conf.tgz /etc/hadoop/conf/*
 
# Copy Hadoop settings
$ scp -i {PRIVATE_KEY_FILE} hadoop-conf.tgz ubuntu@{TARGET_NODE_IP}:{PATH}

2. Add /etc/hosts Settings

To connect to the cluster, you need to add the host information to the user VM.
To do this, check the /etc/hosts file on the VM that you wish to connect to the cluster and copy it.
(You also need to add the user VM's information to the cluster's nodes.)

HDE Cluster Nodes
- Add user VM's host information
User VM
- Add the HDE cluster node's information

Hadoop - Edit and Check /etc/hosts File
########################################
# Edit /etc/hosts file
########################################
$ sudo vi /etc/hosts
 
########################################
# Check /etc/hosts settings
######################################## $ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain  localhost
172.30.35.48    host-172-30-35-48
172.30.34.235   host-172-30-34-235
172.30.34.124   host-172-30-34-124

3. Download and Extract Files

Download the following version of Java and the Hadoop version that matches the HDE version.

Component	Version	Location
java	jdk8
HDE-1.0.0	hadoop 2.10.1
HDE-1.1.0	hadoop 2.10.1
HDE-1.1.1	hadoop 2.10.2
HDE-2.0.0	hadoop 3.3.4

########################
# Download files
# Java, Hadoop
$ wget https://objectstorage.kr-central-2.kakaocloud.com/v1/e96c0af292734ab0845d64a061f9c96b/kbp-install-file/component/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
$ wget https://objectstorage.kr-central-2.kakaocloud.com/v1/e96c0af292734ab0845d64a061f9c96b/kbp-install-file/component/hadoop-3.3.4-kbp.tar.gz
 
 
########################
# Extract the files
$ tar zxf hadoop-3.3.4-kbp.tar.gz
$ tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz

4. Execute Commands

After extracting the files, you can execute the following commands:

Execute Commands
# Check the current file structure
$ ls -alh
total 782620
drwxr-x---  7 ubuntu ubuntu      4096 Sep 27 08:57 ./
drwxr-xr-x  3 root   root        4096 Sep 27 08:54 ../
-rw-r--r--  1 ubuntu ubuntu       220 Jan  6  2022 .bash_logout
-rw-r--r--  1 ubuntu ubuntu      3771 Jan  6  2022 .bashrc
drwx------  2 ubuntu ubuntu      4096 Sep 27 08:56 .cache/
-rw-r--r--  1 ubuntu ubuntu       807 Jan  6  2022 .profile
drwx------  2 ubuntu ubuntu      4096 Sep 27 08:54 .ssh/
-rw-rw-r--  1 ubuntu ubuntu       191 Sep 27 08:56 .wget-hsts
-rw-rw-r--  1 ubuntu ubuntu 103200089 Aug  9 05:14 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
drwxrwxr-x  3 ubuntu ubuntu      4096 Sep 27 08:57 etc/
drwxr-xr-x 10 ubuntu ubuntu      4096 Jul 29  2022 hadoop-3.3.4/
-rw-rw-r--  1 ubuntu ubuntu 698117781 Aug  9 05:10 hadoop-3.3.4-kbp.tar.gz
-rw-rw-r--  1 ubuntu ubuntu     28867 Sep 27 08:57 hadoop-conf.tgz
drwxr-xr-x  8 ubuntu ubuntu      4096 Jul 15  2020 jdk8u262-b10/
 
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
 
# Run HDFS command
$ ./hadoop-3.3.4/bin/hadoop fs -ls hdfs://host-172-30-35-48/
Found 8 items
drwxrwxrwt   - yarn   hadoop          0 2023-09-27 00:07 hdfs://host-172-30-35-48/app-logs
drwxrwxrwx   - hdfs   hadoop          0 2023-09-26 23:18 hdfs://host-172-30-35-48/apps
drwxr-xr-t   - yarn   hadoop          0 2023-09-26 09:20 hdfs://host-172-30-35-48/ats
drwxr-xr-x   - hdfs   hadoop          0 2023-09-26 09:20 hdfs://host-172-30-35-48/hadoop
drwxr-xr-x   - mapred hadoop          0 2023-09-26 09:20 hdfs://host-172-30-35-48/mr-history
drwxrwxrwt   - hdfs   hadoop          0 2023-09-26 09:21 hdfs://host-172-30-35-48/tmp
drwxr-xr-x   - hdfs   hadoop          0 2023-09-26 09:21 hdfs://host-172-30-35-48/user
drwxrwxrwt   - yarn   hadoop          0 2023-09-26 09:20 hdfs://host-172-30-35-48/var

Hive

The procedure to connect Hive is as follows:

Copy configuration files
Add /etc/hosts settings
Download and extract files
Execute commands

info

The Copy configuration files and /etc/hosts settings steps are the same as Hadoop.

Download and Extract Files

Download the following version of Java and Hive for the corresponding HDE version.

Component	Version	Location
java	jdk8
HDE-1.0.0	hive 2.3.2
HDE-1.1.0	hive 2.3.9

Connect Hive Configuration

To connect Hive, first configure Hadoop.
After configuring Hadoop, export necessary settings and then connect the Hive settings before proceeding.

Hive Configuration Connection
#######################################
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/
 
#######################################
# Run beeline and check the database info
$ ./apache-hive-3.1.3-bin/bin/beeline
Beeline version 3.1.3 by Apache Hive
beeline> !connect jdbc:hive2://host-172-30-35-48:10000/default
Connecting to jdbc:hive2://host-172-30-35-48:10000/default
Enter username for jdbc:hive2://host-172-30-35-48:10000/default:
Enter password for jdbc:hive2://host-172-30-35-48:10000/default:
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://host-172-30-35-48:10000/defau> show databases;
INFO  : Compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 1.181 seconds
INFO  : Executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 0.63 seconds
INFO  : OK
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (2.467 seconds)

Spark

Download and Extract Files

Download the following version of Java and the Spark version that matches the HDE version.

Component	Version	Location
java	jdk8
HDE-1.0.0	spark 2.4.6

Connect Spark Configuration

Spark Configuration Connection
#######################################
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/
 
# Add the classpath information from the hadoop classpath command to spark-env.sh
$ export SPARK_DIST_CLASSPATH="../home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/lib/*:/home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/*.."
 
# Check /etc/hosts settings
$ Check if both the user VM's host information and cluster's host information are added in the /etc/hosts file of both user VM and cluster

$ spark-shell
23/09/29 04:23:12 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
23/09/29 04:23:12 INFO BlockManagerMasterEndpoint: Registering block manager host-172-30-33-233:41301 with 366.3 MiB RAM, BlockManagerId(1, host-172-30-33-233, 41301, None)
Spark context Web UI available at http://host-172-30-33-46:4040
Spark context available as 'sc' (master = yarn, app id = application_1695806330771_0003).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.2
      /_/

Error Check

Common Errors
- Caused by: java.net.UnknownHostException: host-172-30-33-46
  - Occurs when the host information of the user VM is not added to the cluster, or vice versa.
- Caused by: java.lang.ClassNotFoundException: org.apache.log4j.spi.Filter
  - Occurs when the SPARK_DIST_CLASSPATH information is not correctly added.

HBase

Download and Extract Files

Download the following version of Java and the HBase version corresponding to the HDE version.

Component	Version	Location
java	jdk8
HDE-1.0.0	hbase 1.4.13
HDE-1.1.0	hbase 1.7.1
HDE-1.1.1	hbase 1.7.1
HDE-2.0.0	hbase 2.4.13

Connect HBase Configuration

To connect HBase, Hadoop also needs to be configured.

Connect HBase Configuration
#######################################
# Set environment variables  
export JAVA_HOME=/home/ubuntu/jdk8u262-b10
export HADOOP_CONF_DIR=/home/ubuntu/etc/hadoop/conf/
export HBASE_CONF_DIR=/home/ubuntu/etc/hbase/conf/

$ ./hbase-1.4.13/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020

hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load

hbase(main):002:0> exit

Hadoop​

1. Copy Configuration Files​

2. Add /etc/hosts Settings​

3. Download and Extract Files​

4. Execute Commands​

Hive​

Download and Extract Files​

Connect Hive Configuration​

Spark​

Download and Extract Files​

Connect Spark Configuration​

Error Check​

HBase​

Download and Extract Files​

Connect HBase Configuration​

Hadoop

1. Copy Configuration Files

2. Add /etc/hosts Settings

3. Download and Extract Files

4. Execute Commands

Hive

Download and Extract Files

Connect Hive Configuration

Spark

Download and Extract Files

Connect Spark Configuration

Error Check

HBase

Download and Extract Files

Connect HBase Configuration