Using Verdict

Using Verdict

In terminal

Please see our Quick Start Guide for the instructions on connecting to Verdict in terminal. The page includes starting Verdict on top of Apache Hive, Apache Impala, and Apache Spark (and PySpark) in terminal.

JDBC in Java/Python applications

Verdict ships with a JDBC driver. Using the driver, you can use Verdict on top of your existing JDBC-supported database systems. Currently, Verdict supports the JDBC connections to Apache Hive and Apache Impala. Contact us if you want to use Verdict on other database systems. We need to add a small driver (due to non-standard SQL features used by different databases).

Class name for Verdict’s JDBC driver: edu.umich.verdict.jdbc.Driver
Connection string for Verdict’s JDBC driver:
- Hive: jdbc:verdict:hive2://host:port/default_database;key1=value1;key2=value2;...
- Impala: jdbc:verdict:impala://host:port/default_database;key1=value1;key2=value2;...

To enable the Kerberos authentication, add principal=user/host@domain pair in the key-value pairs of the JDBC connection string. You can also pass Verdict configuration options in key-value pairs. For example, to change Verdict’s log level to DEBUG, add verdict.loglevel=debug in the key-value pairs. See this page for more configuration options for Verdict.

Details

The rule for Verdict’s connection string is to include the verdict: after ‘jdbc:’. Internally, Verdict communicates with the target database (e.g., Hive, Impala, etc.) using a JDBC connection as well. For this, Verdict uses the passed JDBC connection string after removing (1) the verdict: keyword and (2) configuration options for Verdict. This means you can pass any existing JDBC options to Verdict as they are.

Java example

Below example connects to Verdict-on-Impala and runs a simple count query.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class VerdictJdbcExample {
  public static void main(String[] args) throws ClassNotFoundException, SQLException {
    Class.forName("edu.umich.verdict.jdbc.Driver");
    String url = "jdbc:verdict:impala://host:port/default";
    Connection conn = DriverManager.getConnection(url);
    Statement stmt = conn.createStatement();
    String sql2 = "select count(*) from orders";
    ResultSet rs = stmt.executeQuery(sql2);
  }
}

Of course, the Verdict’s JDBC driver (target/) must be included in the Java class path when the above code is compiled and run.

Python example

Below example connects to Verdict-on-Impala and runs a simple count query.

import jaydebeapi
import jpype
import pandas as pd
from jpype import *

class_name = 'edu.umich.verdict.jdbc.Driver'
url = 'jdbc:verdict:impala://host:port/default'
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, "-Djava.class.path=%s" % classpath)
conn = jaydebeapi.connect(jclassname = class_name, url = url)
curs = conn.cursor()
curs.execute('select count(*) from orders')
columns = [desc[0] for desc in curs.description]       # getting column headers
pd.DataFrame(curs.fetchall(), columns = columns)

Note that the Python libraries needed for creating a JVM instance (jpype) and making a JDBC connection (jaydebeapi) must be installed.

In Apache Zeppelin

Apache Zeppelin is a notebook-based application that runs in a web browser. It can connect to Apache Spark by default and also to other database systems through the JDBC interface. Verdict can also easily integrate with Apache Zeppelin.

On Spark

Zeppelin includes the Spark interpreter by default. For Verdict to work in Spark interpreter, it is enough to include Verdict’s core jar file as a dependency. First, go to Zeppelin’s interpreter setting, and find the interpreter for Spark. Click the edit button and add the path to Verdict’s core jar file (which is created in the target directory when Verdict’s source code is built) as a dependency.

Now you can import Verdict and run queries as follows.

import edu.umich.verdict.VerdictSparkHiveContext

val vc = new VerdictSparkHiveContext(sc)               // sc: SparkContext
vc.sql("show databases").show(false)
df = vc.sql("select count(*) from instacart.orders")   // returns a Spark DataFrame
df.show(false)

On Hive, Impala

Zeppelin can connect to Verdict on any database (including Hive, Impala) using the JDBC interface. For this, go to Zeppelin’s interpreter setting, and click the create button. Enter “verdict-impala” (or any name you want) in “Interpreter Name”, choose “jdbc” in “Interpreter Group”, enter edu.umich.verdict.jdbc.Driver for default.driver, and enter jdbc:verdict:impala://host:port/schema for default.url (change the database name appropriately according to this page).

You may need to set default.user or other authentication fields as needed for your existing database connection (Verdict will pass those parameters as it makes another connection internally to your existing database).

Under dependencies, add all JDBC drivers for your existing databases and `` (which is created in the target directory when Verdict is built).

In Jupyter

On PySpark

You can use Verdict in Jupyter (that connects to PySpark) by following the similar approach as in this page. In other words, simply include the path to the Verdict’s core jar file as a Driver’s Java class path when starting a Jupyter notebook server.

$ export PYSPARK_DRIVER_PYTHO = "path-to-jupyter"
$ export PYSPARK_DRIVER_PYTHON_OPTS = "notebook"
$ export PYTHONPATH = $(pwd)/python:$PYTHONPATH
$ pyspark --driver-class-path $(pwd)/target/

The above command will start the Jupyter server in which you can import PySpark modules.

On Hive, Impala

You can connect to Verdict on any database that support JDBC connections (including Hive, Impala) as in this page.

In Hue

Hue supports custom JDBC connections. Please see this page for instructions.