企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
**建表** <hr/> ```sql -- customer 表名 -- NAME 列族名,且至少创建一个列族、必须是大写字母 hbase(main):010:0> create 'customer', {NAME=>'addr'}, {NAME=>'order'} -- 也可以省略NAME hbase(main):010:0> create 'customer', 'addr', 'order' ``` 在hdfs的目录如下: ``` /hbase/data/default/customer/c8e45208523fab04f160da91c24909bb/addr /hbase/data/default/customer/c8e45208523fab04f160da91c24909bb/order ``` 一个列族就是hdfs的一个目录。<br/> **插入单行数据** <hr/> ```sql -- r00001 为rowkey -- city,state等就是列 -- montreal、ON等就是值 hbase(main):011:0> put 'customer', 'r00001', 'addr:city', 'montreal' hbase(main):012:0> put 'customer', 'r00001', 'addr:state', 'ON' hbase(main):013:0> put 'customer', 'r00001', 'order:numb', '123456' hbase(main):014:0> put 'customer', 'r00001', 'order:date', '2015-12-19' ``` 查看对应的hdfs目录是空的 ![](https://img.kancloud.cn/5d/d0/5dd0044fee23355b818deaf08aecf075_1465x258.png) ![](https://img.kancloud.cn/5f/91/5f91521f0f0d5b109a1be748d53f09fa_1486x282.png) 这就证明,<mark>HBase 写入数据是没有直接落磁盘的,是先写在缓冲中</mark>。为了方便演示,可以手动刷出缓存,命令为`flush`。 ```sql -- 可以只输入flush查看示例 hbase(main):015:0> flush hbase> flush 'TABLENAME' hbase> flush 'REGIONNAME' hbase> flush 'ENCODED_REGIONNAME' -- flush customer表,将数据写到磁盘中 hbase(main):016:0> flush 'customer' ``` flush 完成后,可以看到对应的列族目录下,有数据文件了,该文件就是 HFile ![](https://img.kancloud.cn/10/c8/10c8419d82754511c136834bdf7e2909_1478x219.png) ![](https://img.kancloud.cn/61/8f/618fe8e5ff0d7843a2fd47ed34951057_1473x233.png) 列族目录下的文件,可以尝试使用 hdfs 命令查看其内容,会发现是二进制的内容。HBase 提供了查看该文件内容的方式。下面命令在 Linux 命令行执行,不是在 hbase shell 中。 ```sql [root@hadoop101 hbase]# bin/hbase hfile -v -p -f /hbase/data/default/customer/c8e45208523fab04f160da91c24909bb/addr/26972466157c4e06b73b83a57f49b05f K: r00001/addr:city/1608146319888/Put/vlen=8/seqid=4 V: montreal K: r00001/addr:state/1608146462198/Put/vlen=2/seqid=5 V: ON Scanned kv count -> 2 ``` <br/> **插入多行数据** <hr/> ```sql hbase(main):017:0> put 'customer', 'r00002', 'addr:city', 'miami' hbase(main):018:0> put 'customer', 'r00002', 'addr:state', 'FL' hbase(main):019:0> put 'customer', 'r00003', 'addr:city', 'dallas' hbase(main):020:0> put 'customer', 'r00004', 'addr:state', 'TX' hbase(main):021:0> flush 'customer' ``` `flush`一次后重新生成了新的HFile文件 ![](https://img.kancloud.cn/e0/bb/e0bb8de7f759543d2459feb98730e153_1484x288.png) ```sql -- 查看 [root@hadoop101 hbase]# bin/hbase hfile -v -p -f /hbase/data/default/customer/c8e45208523fab04f160da91c24909bb/addr/c076c18ae7e74712aa805584b7a165b5 K: r00002/addr:city/1608148485122/Put/vlen=5/seqid=12 V: miami K: r00002/addr:state/1608148500312/Put/vlen=2/seqid=13 V: FL K: r00003/addr:city/1608148511348/Put/vlen=6/seqid=14 V: dallas K: r00004/addr:state/1608148527251/Put/vlen=2/seqid=15 V: TX Scanned kv count -> 4 ``` <br/> **上传hdfs文档到hbase表中** <hr/> (1)准备的test.csv数据如下图 ![](https://img.kancloud.cn/59/80/5980d62fdb1647918a776e3b5420a070_1264x152.png) ```sql (2)需要将test.csv上传到hdfs [root@hadoop101 hadoop]# bin/hdfs dfs -put /hbase_datas/test.csv /hbase/data/ (3)创建customer表 hbase(main):008:0> create 'customer', {NAME=>'order'} (4)将hdfs上的test.csv数据复制到customer表中 -- test.csv共有三列,hbase会选择第一列作为HBASE_ROW_KEY [root@hadoop101 hbase]# bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,order:name,order:number customer /hbase/data/test.csv (5)查看customer表 hbase(main):021:0> scan 'customer' ROW COLUMN+CELL 1 column=order:name, timestamp=1608281767284, value=NoFrill 1 column=order:number, timestamp=1608281767284, value=10 2 column=order:name, timestamp=1608281767284, value=Lablaws 2 column=order:number, timestamp=1608281767284, value=23 3 column=order:name, timestamp=1608281767284, value=FoodMart 3 column=order:number, timestamp=1608281767284, value=18 4 column=order:name, timestamp=1608281767284, value=FoodLovers 4 column=order:number, timestamp=1608281767284, value=26 5 column=order:name, timestamp=1608281767284, value=Walmart 5 column=order:number, timestamp=1608281767284, value=30 ``` <br/> **扫描查看表** <hr/> ```sql -- 查看整张表 hbase(main):022:0> scan 'customer' ROW COLUMN+CELL r00001 column=addr:city, timestamp=1608146319888, value=montreal r00001 column=addr:state, timestamp=1608146462198, value=ON r00001 column=order:date, timestamp=1608146484642, value=2015-12-19 r00001 column=order:numb, timestamp=1608146472978, value=123456 r00002 column=addr:city, timestamp=1608148485122, value=miami r00002 column=addr:state, timestamp=1608148500312, value=FL r00003 column=addr:city, timestamp=1608148511348, value=dallas r00004 column=addr:state, timestamp=1608148527251, value=TX -- 查看指定的列 hbase(main):023:0> scan 'customer', {COLUMNS=>['order:numb'], VERSIONS=>2} ROW COLUMN+CELL r00001 column=order:numb, timestamp=1608146472978, value=123456 -- 查看指定范围内的行 hbase(main):026:0> scan 'customer', {STARTROW=>'r00001', STOPROW=>'r00003'} ROW COLUMN+CELL r00001 column=addr:city, timestamp=1608146319888, value=montreal r00001 column=addr:state, timestamp=1608146462198, value=ON r00001 column=order:date, timestamp=1608146484642, value=2015-12-19 r00001 column=order:numb, timestamp=1608146472978, value=123456 r00002 column=addr:city, timestamp=1608148485122, value=miami r00002 column=addr:state, timestamp=1608148500312, value=FL ``` <br/> **查看表结构** <hr/> ```sql hbase(main):027:0> desc 'customer' ``` <br/> **更新指定字段的数据** ```sql hbase(main):028:0> put 'customer', 'r00001', 'order:numb', '654321' ``` <br/> **get查看数据** ```sql -- 查看某一行的数据 hbase(main):031:0> get 'customer', 'r00001' COLUMN CELL addr:city timestamp=1608146319888, value=montreal addr:state timestamp=1608146462198, value=ON order:date timestamp=1608146484642, value=2015-12-19 order:numb timestamp=1608154475714, value=654321 -- 查看某一行某一列族的数据 hbase(main):032:0> get 'customer', 'r00001','addr' COLUMN CELL addr:city timestamp=1608146319888, value=montreal addr:state timestamp=1608146462198, value=ON -- 查看某一行某一列族某一列的数据 hbase(main):033:0> get 'customer', 'r00001', 'addr:city' COLUMN CELL addr:city timestamp=1608146319888, value=montreal ``` <br/> **统计一张表共有多少行** <hr/> ```sql hbase(main):034:0> count 'customer' => 4 ``` <br/> **删除数据** <hr/> ```sql -- 删除某个rowkey的数据 hbase(main):035:0> deleteall 'customer', 'r00001' -- 删除某列的数据 hbase(main):041:0> delete 'customer', 'r00002', 'addr:city' ``` <br/> **清空表数据** <hr/> ```sql -- 要想清空表,需要启用该表,如果表没有被启用,则使用enable命令启用 hbase(main):047:0> enable'customer' -- 清空表,清空表的顺序为disable,然后是truncate,但disable由hbase自动完成 hbase(main):049:0> truncate 'customer' ``` <br/> **删除表** <hr/> ```sql -- 要删除一个表,需要将该表置为disable状态 hbase(main):051:0> disable 'customer' -- 删除表 hbase(main):052:0> drop 'customer' ``` <br/> **变更表版本容量** <hr/> ```sql -- 设置customer表最多可以存储5个最新版本的order列族 -- 默认最多只能存储一个版本 hbase(main):082:0> alter 'customer', NAME=>'order', VERSIONS=>5 -- 添加6个order:numb数据 hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '1' hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '2' hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '3' hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '4' hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '5' hbase(main):084:0> put 'customer', 'r00005', 'order:numb', '6' -- 查看,可以看到只保留了最新的5个版本 hbase(main):097:0> get 'customer','r00005',{COLUMN=>'order:numb',VERSIONS=>6} COLUMN CELL order:numb timestamp=1608157210804, value=6 order:numb timestamp=1608157203564, value=5 order:numb timestamp=1608157196530, value=4 order:numb timestamp=1608157187598, value=3 order:numb timestamp=1608156987999, value=2 ``` **表名重命名** ```sql (1)禁用表 hbase shell>disable 'tableName' (2)制作表的快照 hbase shell> snapshot 'tableName', 'tableSnapshot' (3)克隆快照为新表的名字 hbase shell> clone_snapshot 'tableSnapshot', 'newTableName' (4)删除快照 hbase shell> delete_snapshot 'tableSnapshot' (5)删除原来的表 hbase shell> drop 'tableName' ```