Ethan Millar

7 years ago · 2 min. reading time · ~100 ·

Blogging

>

Ethan blog

>

Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration Experts

Hadoop integration professionals will make you learn how to explore metadata in kind of tables in Apache Hive via this post. You can read this post and find how hadoop professionals explore metadata in Hive.

Introduction:

Apache Hadoop is a data framework which can support to process the big data. Hive is data warehouse which build on top of Hadoop. Hive is very powerful in providing the query in big data. Because it creates the mapping metadata to real data in Hadoop distributed file system and can process the data in Map Reduce. Besides, Hive can change the execution engine to process with Spark or Tez in the latest version. Hive have feature which support to do a complex data type with UDFs and a variety of built-in functions. For UDFs in Hive, I will introduce in another blog.

Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration ExpertsHADOOP Cluster
(HDFS + Map-Reduce)

Name Node Job Tracker
= =

3 &
=m

In Hive, it has a relational database on the master node (Name node) to keep storing all Hive statuses. For example, when we create a table with command "CREATE TABLE Student(id string) LOCATION 'hdfs://data/sample/';", this table schema is stored in the database as a metadata of Hive.

Assume that we have a partitioned table, the partitions information will be stored in the relational database on name node (so it allows Hive to use lists of partitions and find the data very easily). These things are called 'metadata'. Metadata contains information such as format table, mapping location, file of data etc. And it is stored in memory of name node.

When we drop an internal table (default table), it drops both the data and the metadata in memory from name node. However, when we drop an external table, it only drops the metadata and our data is still keep on the Hadoop distributed file system. That means hive is ignorant of that data now. It does not touch the data itself.

This is very important when working with Hive - Hadoop. In my experiences, I have seen a lot of engineers and developers have this mistake then lost entire the data from our datawarehouse. I hope that this blog will help us understand about metadata concept and kinds of table in Hive.

Environment

Java: JDK 1.7

Cloudera version: CDH5.4.7, please refer to this link: http://www.cloudera.com/downloads/cdh/5-4-7.html

Initial steps

1. We need to prepare some input data file, open the file with vi tool to create a local file:

vi file1

1;Jack

2;Ryan

3;Jean

2. We need to put the local files to Hadoop Distributed File System (HDFS), use this command:

hadoop fs -mkdir -p /data/mydata/sample

hadoop fs -put file1 /data/mydata/sample/

Code walk through and verify the result

This is Hive script which using Hadoop, Hive to create and drop external and default table

DROPTABLE IF EXISTSmydatabase.sample;

CREATE EXTERNAL TABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';

DROPTABLE IF EXISTSmydatabase.sample;

CREATETABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';

1. We need to check if the local is put to Hadoop distributed file system or not

hadoop fs -ls /data/mydata/sample/

It should be showed the file1 in the /data/mydata/sample

2. We will access to Hive and run this command:

DROPTABLE IF EXISTSmydatabase.sample;

CREATE EXTERNAL TABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';

3. We will use this command to check if the table is created or not

show create table mydatabase.sample

-> It should be showed the structure of sample table

4. We will drop the external table with this command

drop table mydatabase.sample

5. We will try again at step 3 and see that the table is not exist anymore

6. Now we will check the datafromhdfs to make sure Hive deleted only metadata or deleted both metadata and data.

hadoop fs -ls /data/mydata/sample/

-> You can see the data still there. Therefore, you can see that external table only delete metadata.

7. Now we will run this command to create default Hive table

DROPTABLE IF EXISTSmydatabase.sample;

CREATETABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';

8. We will follow step 3, 4, 5, 6 to verify how Hive handles metadata and actual data in Hadoop distributed file system

hadoop fs -ls /data/mydata/sample/

-> You can see the data is gone. Therefore, you can see that internal table deletesboth metadata and actual data.

The following steps are the same for load data, indexing, create view in Hive tables (external and internal tables). Hope that you guys can understand how Hive works with kinds of table.

This article is intended by hadoop integration professionals to make people learn how to explore metadata in kind of tables in Apache Hive. You can share your thoughts regarding this post with other readers

"

#Hive #Apache Hive #Hadoop #DROPTABLE #mydata

Comments

Articles from Ethan Millar

View blog

2 years ago · 3 min. reading time

You may be interested in these jobs

Sr. Associate

Found in: beBee S2 IN - 2 hours ago

Cognizant Technology Solutions Bangalore, India OTHER

Delivery Manager · Qualification: · B Sc, B Com, Relevant Diploma Degrees (CSC, Electronics), BEResponsibility: · Business / Customer• Understand and articulate complex problems related to the specific technology. · • Provide business development support by assisting in RFP/ RFI ...
Lead Engineer

Found in: Talent500 IN C2 - 2 hours ago

Kenvue Bengaluru, India

S4 HANA Full Stack Developer · Kenvue GCC, Consumer Health is recruiting for an S4 HANA Full Stack developer, located in Skillman, NJ. The Digital Platform Transformation Program is a critical component of the Consumer Health strategy to become a digital first company. Consumer H ...
Robotics Engineer

Found in: Talent IN C2 - 2 hours ago

Sisco Jobs Secunderabad, India

Job Description · Job Title: Robotics Engineer · Location: Remote · Employment Type: Full-time · Role Description: · We are seeking a talented and driven Robotics Engineer to join our innovative team on a full-time basis. As a Robotics Engineer, you will be responsible for design ...

Ethan Millar

Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration Experts

Environment

Initial steps

Code walk through and verify the result

Comments

Articles from Ethan Millar

Which Tips Can Software Testers To Make Ready for the Next Recession?

Make HTTP Requests from CRM using Power-Automate

Is Java Good For Big Data Development?

You may be interested in these jobs

Sr. Associate

Lead Engineer

Robotics Engineer

for Recruiters

Information