zookeeper delete yarn rmstore

hadoop 版本为:apache 2.7.2
这几天在搭建的时候遇到yarn无法启动。报错信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
2019-06-18 16:34:08,362 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system already initialized!
2019-06-18 16:34:08,362 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2019-06-18 16:34:08,362 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2019-06-18 16:34:08,362 WARN org.apache.hadoop.metrics2.util.MBeans: Error creating MBean object name: Hadoop:service=ResourceManager,name=RMNMInfo
org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=ResourceManager,name=RMNMInfo already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:122)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:102)
at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:92)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:55)
at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=ResourceManager,name=RMNMInfo already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:118)
... 20 more
2019-06-18 16:34:08,363 WARN org.apache.hadoop.metrics2.util.MBeans: Failed to register MBean "null"
javax.management.RuntimeOperationsException: Exception occurred trying to register the MBean
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:951)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: No object name specified
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:949)
... 21 more
2019-06-18 16:34:08,363 INFO org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo: Registered RMNMInfo MBean
2019-06-18 16:34:08,363 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2019-06-18 16:34:08,364 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.NodesListManager failed in state INITED; cause: org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists!
org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:74)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2019-06-16 21:07:39,030 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=Users [yarn] and members of the groups [<users>] are allowed
2019-06-16 21:07:39,030 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:124)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:812)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122)
... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:577)
at org.apache.hadoop.service.Abstr

解决办法:

我们需要删除yarn在ZK上的 rmstore 信息, 之后重启yarn,就可以了。

但是在删除zk上 rmstore 信息的时候, 遇到了问题, yarn在注册时候的时候自己添加上ACL。所以我们直接删除是不行的。

但我们可以可以重新设置一个ACL,就可以了, 如下:

1
2
3
4
5
6
cd $ZOOKEEPER_HOME/bin
./zkCli.sh # 连接hadoop配置的zk,如果是客户端需添加 -server IP:port
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper, hadoop-ha, hbase, rmstore]
[zk: localhost:2181(CONNECTED) 1] rmr /rmstore
Authentication is not valid : /rmstore/ZKRMStateRoot/RMVersionNode

我们可以看一下这个目录的ACL

1
2
3
4
5
[zk: localhost:2181(CONNECTED) 2] getAcl /rmstore/ZKRMStateRoot
'world,'anyone
: rwa
'digest,'shining-namenode01.host.com:yelhKlz39YVCV9p4NTModoBq9fw=
: cd

我们重新设置ACL,并删除目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[zk: localhost:2181(CONNECTED) 3] setAcl /rmstore/ZKRMStateRoot world:anyone:rwcda
cZxid = 0x10000001b
ctime = Mon May 27 14:58:45 CST 2019
mZxid = 0x10000001b
mtime = Mon May 27 14:58:45 CST 2019
pZxid = 0x10016efd3
cversion = 380191
dataVersion = 0
aclVersion = 5
ephemeralOwner = 0x0
dataLength = 0
numChildren = 5
[zk: localhost:2181(CONNECTED) 4] getAcl /rmstore/ZKRMStateRoot
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 6] rmr /rmstore/ZKRMStateRoot
[zk: localhost:2181(CONNECTED) 7] ls /
[zookeeper, hadoop-ha, hbase, rmstore]
[zk: localhost:2181(CONNECTED) 8] rmr /rmstore
[zk: localhost:2181(CONNECTED) 9] ls /
[zookeeper, hadoop-ha, hbase]

之后重新启动yarn,让yarn重新在zk上注册就可以了。

相关文章链接:
https://jxy.me/2015/05/02/hadoop-resourcemanager-recovery/
https://my.oschina.net/dabird/blog/3023781

感谢您的支持!