We will install VerdictDB’s Python interface, ingest data (i.e., create another special copy of your existing table), and issue a simple query to VerdictDB. In this Quickstart Guide, we will use MySQL for VerdictDB’s backend database.
- Miniconda for Python 3.7: You can locally install miniconda by following the instructions on this page.
- Test Data (TPC-H 1G): Please follow the instructions on this page.
Install and Import
Install PyVerdict as follows. If you are using locally-installed miniconda, PyVerdict will also be locally installed.
pip install pyverdict --upgrade
Now, you can import PyVerdict as a regular module and check its version. Note that PyVerdict’s version number has no relationship with the version of core VerdictDB library.
import pyverdict pyverdict.__version__ # prints pyverdict version pyverdict.__verdictdb_version__ # prints the core VerdictDB version verdict = pyverdict.mysql('localhost', 'root', 'password') # make a connection verdict.sql('show schemas') # retrieves schemas
This examples assumes that the
lineitem table (in the test data) has been created under the
# VerdictDB relies on a special table called scramble to speed up its queries. verdict.sql('create scramble tpch1g.lineitem_x from tpch1g.lineitem') # Automatically use tpch1g.lineitem_x to speed up # 1 row(s) in the result (3.915 seconds) # c2 # 0 4291411.0 verdict.sql("select count(*) from tpch1g.lineitem where l_shipdate > '1994-01-02'")
To run the same query without any speedup, prepend
bypass in front of the query.
# 1 row(s) in the result (57.561 seconds) # count(*) # 0 4331220 verdict.sql("bypass select count(*) from tpch1g.lineitem where l_shipdate > '1994-01-02'")
VerdictDB (and pyverdict) also supports other aggregates, e.g., sum, avg, count(distinct col).
Note: When the same query is issued multiple times, some systems return answers immediately using cached results. If you want more accurate comparisons, simply change the query predicates slightly, e.g., ‘1994-01-03’ instead of ‘1994-01-02’.