Winse Blog

走走停停, 熙熙攘攘, 忙忙碌碌, 不知何畏.

Hdfs异构存储实操

[注意] 查看官方文档一定要和自己使用的环境对应!操作 storagepolicies 不同版本对应的命令不同(2.6.3<->2.7.2)!

我这里测试环境使用的是 2.6.3 Heterogeneous Storage: Archival Storage, SSD & Memory

配置

直接把内存盘放到 /dev/shm 下,单独挂载一个 tmpfs 的效果也差不多。r2.7.2 Memory Storage Support in HDFS 2.6.3没有这个文档 概念都适应的。

1 调节系统参数

1
2
3
4
5
vi /etc/security/limits.conf

  hadoop           -       nofile          65535
  hadoop           -       nproc           65535
  hadoop           -       memlock         268435456

需要调节memlock的大小,否则启动datanode报错。

1
2
3
4
2016-05-05 19:22:22,674 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 134217728 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1067)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:417)

2 添加RAM_DISK

1
2
3
4
5
6
7
8
9
10
11
12
vi hdfs-site.xml

  <property>
  <name>dfs.datanode.data.dir</name>
  <value>/data/bigdata/hadoop/dfs/data,[RAM_DISK]/dev/shm/dfs/data</value>
  </property>

  <property>
  <name>dfs.datanode.max.locked.memory</name>
  <value>134217728</value>
  </property>
  

注意内存盘的写法,[RAM_DISK] 必须这些写,不然datanode不知道指定路径的storage的类型(默认是 DISK )。Storage_Types_and_Storage_Policies

The default storage type of a datanode storage location will be DISK if it does not have a storage type tagged explicitly.

3 同步配置并重启dfs

1
2
3
4
5
[root@cu2 ~]# scp /etc/security/limits.conf cu3:/etc/security/
[hadoop@cu2 hadoop-2.6.3]$ rsync -vaz etc cu3:~/hadoop-2.6.3/ 

[hadoop@cu2 hadoop-2.6.3] sbin/stop-dfs.sh
[hadoop@cu2 hadoop-2.6.3] sbin/start-dfs.sh

可以去到datanode查看日志,可以看到 /dev/shm/dfs/data 路径 StorageTypeRAM_DISK

1
2
3
4
2016-05-05 19:33:39,862 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: /data/bigdata/hadoop/dfs/data/current
2016-05-05 19:33:39,862 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/bigdata/hadoop/dfs/data/current, StorageType: DISK
2016-05-05 19:33:39,863 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: /dev/shm/dfs/data/current
2016-05-05 19:33:39,863 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /dev/shm/dfs/data/current, StorageType: RAM_DISK

同时查看 内存盘 的路径内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[hadoop@cu2 hadoop-2.6.3]$ ssh cu3 tree /dev/shm/dfs
/dev/shm/dfs
└── data
    ├── current
    │   ├── BP-1108852639-192.168.0.148-1452322889531
    │   │   ├── current
    │   │   │   ├── finalized
    │   │   │   ├── rbw
    │   │   │   └── VERSION
    │   │   └── tmp
    │   └── VERSION
    └── in_use.lock

7 directories, 3 files

测试使用

通过三个例子对比,简单描述下使用。首先,使用默认的方式(主要用于对比),第二个例子写文件是添加参数,第三个设置目录的存储类型(目录/文件会继承父目录的存储类型)

1 测试1

1
2
3
4
5
6
7
8
9
10
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -put README.txt /tmp/

[hadoop@cu2 hadoop-2.6.3]$ hdfs fsck /tmp/README.txt -files -blocks -locations
...
/tmp/README.txt 1366 bytes, 1 block(s):  OK
0. BP-1108852639-192.168.0.148-1452322889531:blk_1073752574_11776 len=1366 repl=1 [192.168.0.148:50010]

[hadoop@cu3 hadoop-2.6.3]$ find /data/bigdata/hadoop/dfs/data/ /dev/shm/dfs/data/ -name "*1073752574*"
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir41/blk_1073752574_11776.meta
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir41/blk_1073752574

2 写文件时添加 lazy_persist 标识

1
2
3
4
5
6
7
8
9
10
11
12
# 添加 -l 参数,后台代码会加上 LAZY_PERSIST 标识。
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -help put 
-put [-f] [-p] [-l] <localsrc> ... <dst> :
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  Flags:
                                                                       
  -p  Preserves access and modification times, ownership and the mode. 
  -f  Overwrites the destination if it already exists.                 
  -l  Allow DataNode to lazily persist the file to disk. Forces        
         replication factor of 1. This flag will result in reduced
         durability. Use with care.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# -l 参数会把 replication 强制设置成数字1 !
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -put -l README.txt /tmp/readme.txt2

# 查看namenode的日志,可以看到文件写入到 RAM_DISK 类型的存储
[hadoop@cu2 hadoop-2.6.3]$ less logs/hadoop-hadoop-namenode-cu2.log 

  2016-05-05 20:38:36,465 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/readme.txt2._COPYING_. BP-1108852639-192.168.0.148-1452322889531 blk_1073752578_11780{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-dcb2673f-3297-4bd7-af1c-ac0ee3eebaf9:NORMAL:192.168.0.30:50010|RBW]]}
  2016-05-05 20:38:36,592 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.0.30:50010 is added to blk_1073752578_11780{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[RAM_DISK]DS-bf1ab64f-7eb3-41e0-8466-43287de9893d:NORMAL:192.168.0.30:50010|FINALIZED]]} size 0
  2016-05-05 20:38:36,594 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/readme.txt2._COPYING_ is closed by DFSClient_NONMAPREDUCE_-1388277364_1

# 具体的内容所在位置
[hadoop@cu4 ~]$ tree /dev/shm/dfs/data/
/dev/shm/dfs/data/
├── current
│   ├── BP-1108852639-192.168.0.148-1452322889531
│   │   ├── current
│   │   │   ├── finalized
│   │   │   │   └── subdir0
│   │   │   │       └── subdir42
│   │   │   │           ├── blk_1073752578
│   │   │   │           └── blk_1073752578_11780.meta
│   │   │   ├── rbw
│   │   │   └── VERSION
│   │   └── tmp
│   └── VERSION
└── in_use.lock

3 设置目录的存储类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -mkdir /ramdisk
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfsadmin -setStoragePolicy /ramdisk LAZY_PERSIST 
Set storage policy LAZY_PERSIST on /ramdisk

[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -put README.txt /ramdisk

[hadoop@cu2 hadoop-2.6.3]$ hdfs dfsadmin -getStoragePolicy /ramdisk
The storage policy of /ramdisk:
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}

# 不支持通配符
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfsadmin -getStoragePolicy /ramdisk/*
getStoragePolicy: File/Directory does not exist: /ramdisk/*

[hadoop@cu2 hadoop-2.6.3]$ hdfs dfsadmin -getStoragePolicy /ramdisk/README.txt
The storage policy of /ramdisk/README.txt:
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}


# 添加replication参数,再测试多个备份只有一个写ram_disk
[hadoop@cu2 hadoop-2.6.3]$ hdfs dfs -Ddfs.replication=3 -put README.txt /ramdisk/readme.txt2

[hadoop@cu2 hadoop-2.6.3]$ hdfs dfsadmin -getStoragePolicy /ramdisk/readme.txt2
The storage policy of /ramdisk/readme.txt2:
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}

[hadoop@cu2 hadoop-2.6.3]$ hdfs fsck /ramdisk/readme.txt2 -files -blocks -locations

  /ramdisk/readme.txt2 1366 bytes, 1 block(s):  OK
  0. BP-1108852639-192.168.0.148-1452322889531:blk_1073752580_11782 len=1366 repl=3 [192.168.0.30:50010, 192.168.0.174:50010, 192.168.0.148:50010]

[hadoop@cu3 ~]$ find /data/bigdata/hadoop/dfs/data /dev/shm/dfs/data -name "*1073752580*"
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580_11782.meta

# 已经把ram_disk的内容持久化到磁盘了("Lazy_Persist")
[hadoop@cu4 ~]$ find /data/bigdata/hadoop/dfs/data /dev/shm/dfs/data -name "*1073752580*"
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/lazypersist/subdir0/subdir42/blk_1073752580
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/lazypersist/subdir0/subdir42/blk_1073752580_11782.meta
/dev/shm/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580
/dev/shm/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580_11782.meta

[hadoop@cu5 ~]$ find /data/bigdata/hadoop/dfs/data /dev/shm/dfs/data -name "*1073752580*"
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580_11782.meta
/data/bigdata/hadoop/dfs/data/current/BP-1108852639-192.168.0.148-1452322889531/current/finalized/subdir0/subdir42/blk_1073752580

[设想] 对于那些处理完就删除的临时文件,可以把持久化的时间设置的久一点 dfs.datanode.lazywriter.interval.sec。这样就不需要写磁盘了。

不要妄想了,反正都会持久化!就是缓冲的效果,其他没有了!!一次性存储并且不需要持久化的还是用alluxio吧。

1
2
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.LazyWriter#saveNextReplica
  org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.RamDiskAsyncLazyPersistService#submitLazyPersistTask

参考

–END

Comments