Member-only story
Introduction
HBase is a data model that is designed to provide quick random access to huge amounts of structured data. This tutorial shows how to set up HBase on Hadoop File Systems (HDFS) using Google Cloud Instance.
Being a file system, HDFS is good for sequential data access, but it lacks the random read/write capability. HBase runs on top of the Hadoop File System and provides read and write access. Data producers can store the data in HDFS through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase.
Apart from HBase, here are some other NoSQL databases:
Storage Mechanism in HBase
HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key-value pairs. A table has multiple column families and each column family can have any number of columns. Subsequent column values are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an HBase:
- Table is a collection of rows.
- Row is a collection of column families.
- Column family is a collection of columns.
- Column is a collection of key-value pairs.