Skip to main content

Client-Server Connection

The following explains how to connect a client VM created by the user to the HDE cluster.

Hadoop

The procedure to connect Hadoop is as follows:

1. Copy Configuration Files

The configuration files for the components installed in the HDE cluster are located at /etc/[component-name]/conf.
There may be instances where some nodes have configuration files, while others do not. Below are methods to verify the Hadoop and Hive configurations.
(To copy the configuration files, navigate to the respective component's config folder and compress the files using the tar command, then transfer them using the scp command.)

Hadoop - Copy Configuration Files
########################################
# Check Hadoop settings
########################################
$ cd /etc/hadoop/conf
$ ls -alh
total 216
drwxr-xr-x 3 ubuntu ubuntu 4096 Sep 26 18:19 ./
drwxr-xr-x 3 ubuntu ubuntu 4096 Sep 26 18:18 ../
-rw-r--r-- 1 ubuntu ubuntu 9610 Sep 26 18:19 capacity-scheduler.xml
-rw-r--r-- 1 ubuntu ubuntu 1335 Jul 29 2022 configuration.xsl
-rw-r--r-- 1 ubuntu ubuntu 2567 Jul 29 2022 container-executor.cfg
-rw-r--r-- 1 ubuntu ubuntu 5017 Sep 26 18:19 core-site.xml
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 26 18:19 dfs.hosts.exclude
-rw-r--r-- 1 ubuntu ubuntu 3999 Jul 29 2022 hadoop-env.cmd

########################################
# Check Hive settings
########################################
$ cd /etc/hive/conf
$ ls -alh
total 380
drwxr-xr-x 2 ubuntu ubuntu 4096 Sep 26 18:19 ./
drwxr-xr-x 10 ubuntu ubuntu 4096 Sep 26 18:21 ../
-rw-r--r-- 1 ubuntu ubuntu 1596 Oct 24 2019 beeline-log4j2.properties.template
-rw-r--r-- 1 ubuntu ubuntu 300727 Apr 4 2022 hive-default.xml.template
-rw-rw-r-- 1 ubuntu ubuntu 2194 Sep 26 18:19 hive-env.sh
-rw-r--r-- 1 ubuntu ubuntu 2365 Oct 24 2019 hive-env.sh.template

########################################
# Compress and copy Hadoop configuration
########################################
# Compress Hadoop settings
$ tar czf hadoop-conf.tgz /etc/hadoop/conf/*

# Copy Hadoop settings
$ scp -i {PRIVATE_KEY_FILE} hadoop-conf.tgz ubuntu@{TARGET_NODE_IP}:{PATH}

2. Add /etc/hosts Settings

To connect to the cluster, you need to add the host information to the user VM.
To do this, check the /etc/hosts file on the VM that you wish to connect to the cluster and copy it.
(You also need to add the user VM's information to the cluster's nodes.)

  • HDE Cluster Nodes
    • Add user VM's host information
  • User VM
    • Add the HDE cluster node's information
Hadoop - Edit and Check /etc/hosts File
########################################
# Edit /etc/hosts file
########################################
$ sudo vi /etc/hosts

########################################
# Check /etc/hosts settings
######################################## $ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
172.30.35.48 host-172-30-35-48
172.30.34.235 host-172-30-34-235
172.30.34.124 host-172-30-34-124

3. Download and Extract Files

Download the following version of Java and the Hadoop version that matches the HDE version.

ComponentVersionLocation
javajdk8
HDE-1.0.0hadoop 2.10.1
HDE-1.1.0hadoop 2.10.1
HDE-1.1.1hadoop 2.10.2
HDE-2.0.0hadoop 3.3.4
########################
# Download files
# Java, Hadoop
$ wget https://objectstorage.kr-central-2.kakaocloud.com/v1/e96c0af292734ab0845d64a061f9c96b/kbp-install-file/component/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
$ wget https://objectstorage.kr-central-2.kakaocloud.com/v1/e96c0af292734ab0845d64a061f9c96b/kbp-install-file/component/hadoop-3.3.4-kbp.tar.gz


########################
# Extract the files
$ tar zxf hadoop-3.3.4-kbp.tar.gz
$ tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz

4. Execute Commands

After extracting the files, you can execute the following commands:

Execute Commands
# Check the current file structure
$ ls -alh
total 782620
drwxr-x--- 7 ubuntu ubuntu 4096 Sep 27 08:57 ./
drwxr-xr-x 3 root root 4096 Sep 27 08:54 ../
-rw-r--r-- 1 ubuntu ubuntu 220 Jan 6 2022 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Jan 6 2022 .bashrc
drwx------ 2 ubuntu ubuntu 4096 Sep 27 08:56 .cache/
-rw-r--r-- 1 ubuntu ubuntu 807 Jan 6 2022 .profile
drwx------ 2 ubuntu ubuntu 4096 Sep 27 08:54 .ssh/
-rw-rw-r-- 1 ubuntu ubuntu 191 Sep 27 08:56 .wget-hsts
-rw-rw-r-- 1 ubuntu ubuntu 103200089 Aug 9 05:14 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
drwxrwxr-x 3 ubuntu ubuntu 4096 Sep 27 08:57 etc/
drwxr-xr-x 10 ubuntu ubuntu 4096 Jul 29 2022 hadoop-3.3.4/
-rw-rw-r-- 1 ubuntu ubuntu 698117781 Aug 9 05:10 hadoop-3.3.4-kbp.tar.gz
-rw-rw-r-- 1 ubuntu ubuntu 28867 Sep 27 08:57 hadoop-conf.tgz
drwxr-xr-x 8 ubuntu ubuntu 4096 Jul 15 2020 jdk8u262-b10/

# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/

# Run HDFS command
$ ./hadoop-3.3.4/bin/hadoop fs -ls hdfs://host-172-30-35-48/
Found 8 items
drwxrwxrwt - yarn hadoop 0 2023-09-27 00:07 hdfs://host-172-30-35-48/app-logs
drwxrwxrwx - hdfs hadoop 0 2023-09-26 23:18 hdfs://host-172-30-35-48/apps
drwxr-xr-t - yarn hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/ats
drwxr-xr-x - hdfs hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/hadoop
drwxr-xr-x - mapred hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/mr-history
drwxrwxrwt - hdfs hadoop 0 2023-09-26 09:21 hdfs://host-172-30-35-48/tmp
drwxr-xr-x - hdfs hadoop 0 2023-09-26 09:21 hdfs://host-172-30-35-48/user
drwxrwxrwt - yarn hadoop 0 2023-09-26 09:20 hdfs://host-172-30-35-48/var

Hive

The procedure to connect Hive is as follows:

  • Copy configuration files
  • Add /etc/hosts settings
  • Download and extract files
  • Execute commands
info

The Copy configuration files and /etc/hosts settings steps are the same as Hadoop.

Download and Extract Files

Download the following version of Java and Hive for the corresponding HDE version.

ComponentVersionLocation
javajdk8
HDE-1.0.0hive 2.3.2
HDE-1.1.0hive 2.3.9

Connect Hive Configuration

To connect Hive, first configure Hadoop.
After configuring Hadoop, export necessary settings and then connect the Hive settings before proceeding.

Hive Configuration Connection
#######################################
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/

#######################################
# Run beeline and check the database info
$ ./apache-hive-3.1.3-bin/bin/beeline
Beeline version 3.1.3 by Apache Hive
beeline> !connect jdbc:hive2://host-172-30-35-48:10000/default
Connecting to jdbc:hive2://host-172-30-35-48:10000/default
Enter username for jdbc:hive2://host-172-30-35-48:10000/default:
Enter password for jdbc:hive2://host-172-30-35-48:10000/default:
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://host-172-30-35-48:10000/defau> show databases;
INFO : Compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 1.181 seconds
INFO : Executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a): show databases
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=ubuntu_20230927090910_62ea1ad1-58b6-49ee-871b-ee07cb7ed32a); Time taken: 0.63 seconds
INFO : OK
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (2.467 seconds)

Spark

Download and Extract Files

Download the following version of Java and the Spark version that matches the HDE version.

ComponentVersionLocation
javajdk8
HDE-1.0.0spark 2.4.6     

Connect Spark Configuration

Spark Configuration Connection
#######################################
# Set environment variables
$ export JAVA_HOME=/home/ubuntu/jdk8u262-b10
$ export HADOOP_HOME=/home/ubuntu/hadoop-3.3.4/
$ export HADOOP_CONF=/home/ubuntu/etc/hadoop/conf/
$ export HIVE_CONF_DIR=/home/ubuntu/etc/hive/conf/

# Add the classpath information from the hadoop classpath command to spark-env.sh
$ export SPARK_DIST_CLASSPATH="../home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/lib/*:/home/ubuntu/spark/hadoop-3.3.4//share/hadoop/yarn/*.."

# Check /etc/hosts settings
$ Check if both the user VM's host information and cluster's host information are added in the /etc/hosts file of both user VM and cluster

$ spark-shell
23/09/29 04:23:12 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
23/09/29 04:23:12 INFO BlockManagerMasterEndpoint: Registering block manager host-172-30-33-233:41301 with 366.3 MiB RAM, BlockManagerId(1, host-172-30-33-233, 41301, None)
Spark context Web UI available at http://host-172-30-33-46:4040
Spark context available as 'sc' (master = yarn, app id = application_1695806330771_0003).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.2
/_/

Error Check

  • Common Errors
    • Caused by: java.net.UnknownHostException: host-172-30-33-46
      • Occurs when the host information of the user VM is not added to the cluster, or vice versa.
    • Caused by: java.lang.ClassNotFoundException: org.apache.log4j.spi.Filter
      • Occurs when the SPARK_DIST_CLASSPATH information is not correctly added.

HBase

Download and Extract Files

Download the following version of Java and the HBase version corresponding to the HDE version.

ComponentVersionLocation
javajdk8
HDE-1.0.0hbase 1.4.13
HDE-1.1.0hbase 1.7.1
HDE-1.1.1hbase 1.7.1
HDE-2.0.0hbase 2.4.13

Connect HBase Configuration

To connect HBase, Hadoop also needs to be configured.

Connect HBase Configuration
#######################################
# Set environment variables
export JAVA_HOME=/home/ubuntu/jdk8u262-b10
export HADOOP_CONF_DIR=/home/ubuntu/etc/hadoop/conf/
export HBASE_CONF_DIR=/home/ubuntu/etc/hbase/conf/

$ ./hbase-1.4.13/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020

hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load

hbase(main):002:0> exit