Automating HiveServer/Hive Metastore Restarts After Lockup

Intro

At LivingSocial, one of the key components we use in the Hadoop ecosystem is Hive. I’ve been working here and seen us migrate from 0.7 up to (currently) 0.13. One of the problems I’ve encountered over the years has been HiveServer (1 or 2) or the Hive Metastore “locking up” - i.e. calls to the service just hang. Usually when this happens, someone from our warehouse team will go into the server and manually restart the init.d service (as we are not using Ambari or Cloudera Manager). However, depending on response times - this can cause issues when we have long running ETL jobs overnight.

This post addresses a new method I’ve recently discovered for emulating hive service lockups. These will probably be old hat for many java devs, but were new to me.

Background

Over the years we’ve tried various monitor scripts to attempt to check to see if Hive is no longer responding. Some of the methods we’ve used include:

Check for excessive CPU usage (usually Hive pegging one or more cores at 100%),
Real time scan of the log looking for errors and restart if a particular error was encountered > 20 times in a 2 min period,
A “simple query” executed every 30 min (select * from table limit 5)
An every {{ unit of time }} restart of the underlying service (usually once a day, but sometimes more frequent)

These all work to varying degrees, but we still encounter the occasional lockup that slips through the various checks. It would be great to be able to detect these lockups as soon as they occur, and immediately restart.

What I found

Basically, I wanted to find a way to lock up hive in a controlled enviornment. Looking up “how to lock up a jvm” on google was…interesting, and not very fruitful. Eventually, a coworker mentioned - “why not just use Thread.sleep()?”. Which made a lot of sense to me.

But - i needed a way of injecting Thread.sleep() into the running hive-metastore process. So - I looked into JDB. At first - I tried attaching jdb to the running job. However, I quickly found out that doing so results in a read-only jdb connection.

So - I decided to try to start up the Hive Metastore using the jdb directly (click below to show how I figured out exactly what command to run)

I ran the following in jdb to “lock up” the metastore

Initializing jdb ...
> run
run org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.13.1-cdh5.3.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started:
> threads
Group system:
(java.lang.ref.Reference$ReferenceHandler)0x160         Reference Handler                         cond. waiting
(java.lang.ref.Finalizer$FinalizerThread)0x15f          Finalizer                                 cond. waiting
(java.lang.Thread)0x15e                                 Signal Dispatcher                         running
(java.lang.Thread)0x45e                                 process reaper                            cond. waiting
Group main:
(java.lang.Thread)0x1                                   main                                      running
(org.apache.hadoop.hive.metastore.HiveMetaStore$3)0x552 Thread-4                                  cond. waiting
(com.google.common.base.internal.Finalizer)0x72c        com.google.common.base.internal.Finalizer cond. waiting
(java.lang.Thread)0x744                                 BoneCP-keep-alive-scheduler               cond. waiting
(java.lang.Thread)0x746                                 BoneCP-pool-watch-thread                  cond. waiting
(com.google.common.base.internal.Finalizer)0x84f        com.google.common.base.internal.Finalizer cond. waiting
(java.lang.Thread)0x850                                 BoneCP-keep-alive-scheduler               cond. waiting
(java.lang.Thread)0x851                                 BoneCP-pool-watch-thread                  cond. waiting
> suspend 0x1
>

By suspending the thread, I could now see how other apps would respond. I proceeded to issue a “desc table” command via beeline. It hung! So - now I’ve got something which appears to emulate a “metastore lockup”.

So - what can I do with this info?

How can I tell if the metastore locked up?

I’ve played around with rbhive and knew that “thrift_socket” was the lowest point in its stack for HS2, so why not start there? Instead of looking at thrift_socket though, I figured - let’s just try a simple network socket. My first thought was - let’s just say “hi” over a socket connection to the running metastore instance (i.e. before suspending)

[44] pry(main)> socket = Socket.new(:INET, :STREAM)
=> #<Socket:fd 15>
[45] pry(main)> socket.connect(sockaddr)
=> 0
[46] pry(main)> socket.write("GET / HTTP/1.0\r\n\r\n")
=> 18
[47] pry(main)> socket.read
=> ""

hmmm - I’ve got an empty string back. Not nil. Interesting. What happens when I try to do this with the thread asleep

[71] pry(main)> Timeout::timeout(15) {
[71] pry(main)*   socket = Socket.new(:INET, :STREAM)
[71] pry(main)*   socket.connect(sockaddr)
[71] pry(main)*
[71] pry(main)*   socket.write("hello")
[71] pry(main)*   socket.read
[71] pry(main)* }
Timeout::Error: execution expired
from (pry):84:in `read'

Great! Now we’ve got a socket that times out when I try to read back from the socket! I also tried shutting down the metastore and connecting to the port - ended up with Errno::ECONNREFUSED: Connection refused - connect(2) for 192.168.50.2:9083. So - now we’ve got some relatively simple logic to determine whether the metastore has locked up!

The rest of the way

Now - I’ve got my logic, so I wrote a simple ruby script which daemonizes the above logic, and is controlled via a sysV init script (our servers are running CentOS). My script runs the above logic every 30 seconds, and - on timeout - attempts restart - first by shutting down via service, then via kill -15 , and finally via kill -9 (if needed).

One issue I found right after the initial deploy was that the monitor was continuously restarting the metastore (oops…). Turns out that I needed to close_write the socket after writing “hello”. After adding that to the above script, the monitor ran successfully (and has been for the last 2+ days so far).

After these changes, my code is pretty much this

[hive_metastore_restart.rb] [ ] (metastore_restart.rb) download

Timeout::timeout(30) do
  socket = TCPSocket.new @hive_metastore_server, @hive_metastore_port
  begin
    socket.write("hello")
    socket.close_write
    x = socket.read

    #not sure when this would happen....
    if x.nil?
      @monitored_app_state = :unknown
      ### raise something eventually
    else
      conditional_log(:running, "hive metastore appears to be running ok")
      @monitored_app_state = :running
    end
  rescue Errno::ECONNRESET,Errno::ECONNREFUSED => e
    conditional_log(:dead, "exception #{e} found. This typically occurs when hive-metastore is not running. ")
    conditional_log(:dead, 'try running `sudo service hive-metastore status`')
    @monitored_app_state = :dead
  ensure
    socket.close
  end
rescue Timeout::Error
## restart hive-metastore!!
end

Hopefully this will help us avoid additional downtime with hive-metastore.