记一次hbase写入慢排查分析

记一次hbase写入慢排查分析 2022-07-13 428

版本：1.2.6

现象：hbase写入耗时长，单机slow put数量上升，单机flush慢，内存达到高水位block request至高水位下(内存达到低水位会不断flush至低水位下)，目前24G内存，高水位9.6G(24*0.4)，低水位7.68G(9.6*0.8)

具体分析：

1 flush慢并非写入hdfs慢，怀疑是scan耗时太长，甚至可能死锁了

2 flush的size大很好理解，并不是说达到128MB立即就flush了，是扔到flush队列了，flush队列可能有普通region的flush(队列，先进先出)和水位线flush(优先flush占内存大的)，如果flush很慢，等处理到你的时候，你可能内存已经远超阈值了

3 hlog数量狂增怀疑删除机制问题?

解决方法：将高水位升至14.4G(24*0.6), 尽量避免请求block

日志: 水位线的长耗时flush-flush所有store Flush of region hubble_app_log_diagnosis,574D9B34-A876-4A4B-A846-B5D0873BA2CC92233704017307906970718,1635282389698.d0b92fc4724e4f230616b26d03cffa17. due to global heap pressure. Total Memstore size=9.4 G, Region memstore size=168.0 M 2022-01-18 08:32:09,640 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 2/2 column families, memstore=168.04 MB 2022-01-18 08:33:12,603 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2549251141, memsize=168.0 M, hasBloomFilter=true, into tmp file hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/d0b92fc4724e4f230616b26d03cffa17/.tmp/81e64cda1e87440dbfbd168bb4f9b318 2022-01-18 08:33:12,610 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/d0b92fc4724e4f230616b26d03cffa17/cf/81e64cda1e87440dbfbd168bb4f9b318, entries=520165, sequenceid=2549251141, filesize=19.6 M 2022-01-18 08:33:12,611 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~168.04 MB/176205272, currentsize=4.94 MB/5180784 for region hubble_app_log_diagnosis,574D9B34-A876-4A4B-A846-B5D0873BA2CC92233704017307906970718,1635282389698.d0b92fc4724e4f230616b26d03cffa17. in 62971ms, sequenceid=2549251141, compaction requested=true

普通flush-某region的memstore总大小大于128MB，触发region的flush，默认flushPolicy是flush大于16MB的store，若region的memstore总大小达到128MB*4-触发block, 抛出RegionTooBusyException异常，这个和上面的slow put(block至高水位以下)不一样 Flushing 1/2 column families, memstore=128.00 MB; cf=128.01 MB 2022-01-20 22:32:39,330 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2366406713, memsize=128.0 M, hasBloomFilter=true, into tmp file hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/da586b925c3f41515c0d6870e01371f5/.tmp/b932318cba5948b9baa0a92fc9763ed6 2022-01-20 22:32:39,342 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://hubble:8020/hbase_data/data/default/hubble_app_log_diagnosis/da586b925c3f41515c0d6870e01371f5/cf/b932318cba5948b9baa0a92fc9763ed6, entries=396461, sequenceid=2366406713, filesize=14.5 M 2022-01-20 22:32:39,343 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~128.01 MB/134223640, currentsize=58.96 KB/60376 for region hubble_app_log_diagnosis,4352a67c50b1ebf3e70b712c75d4bb3992233704432624727940749,1624348860256.da586b925c3f41515c0d6870e01371f5. in 547ms, sequenceid=2366406713, compaction requested=true

相关issue: https://issues.apache.org/jira/browse/HBASE-2087 1 The method MemStoreFlusher#checkStoreFileCount is called from flushRegion. flushRegion is called by MemStoreFlusher#run thread. If the checkStoreFileCount finds too many store files, it’ll stick around waiting on a compaction to happen. While its hanging, the MemStoreFlusher#run is held up. No other region can flush. Meantime WALs will be rolling and memory will be accumulating writes. 2 The issue of delaying memstore flush still occurs after backport hbase-15871.Reverse scan takes a long time to seek previous row in the memstore full of deleted cells. 3 Memstore flush doesn’t finish because of backwardseek() in memstore scanner. 4 In write heavy scenario one of the regions does not get flushed causing RegionTooBusyExceptione

文章来自:IT技术分享网
分享地址:http://www.5ityx.cn/cate100/67337.html

上一篇： .gitignore 文件不生效问题 & 解决方法

下一篇： .gitignore与.git/info/exclude区别

记一次hbase写入慢排查分析

记一次hbase写入慢排查分析 相关内容

聚合标签

记一次hbase写入慢排查分析相关内容