Handling Reserved Keywords in DSBulk for Seamless Data Migration

Migrating to YugabyteDB offers significant advantages in terms of high availability, global distribution, and horizontal scalability—features essential for managing modern database workloads. However, data migration can be a complex process, particularly when transforming your schema definition. Differences in datatype support, query syntax, and core features across systems can complicate the transformation.

One of the challenges is dealing with reserved keywords in the source schema that cannot be directly used in the target system. This can require changes not only in the database schema during transformation but also in application code and related tooling.

What Are Reserved Keywords?

Reserved keywords are words that have a special meaning in a programming language or database system. These keywords are used to define operations or structures and therfore cannot be used as identifiers—such as column or table names—without specific handling. Examples of reserved keywords in Apache Cassandra include “select”, “update”, and “table”; these are part of the CQL grammar used to process data definition (DDL) and data manipulation (DML) statements.

Using reserved keywords in your schema is generally discouraged as it complicates both schema management and application logic. However, sometimes the business logic has been designed in a way that requires the use of these words; for example, if the business needs to keep a record of every update to a process, and the application engineers didn’t know about database reserved keywords, then you can end up with a table named “update.”

To account for these situations, identifier names that match reserved keywords can be enclosed in double quotes. For example:

CREATE TABLE "update" (id INT PRIMARY KEY);

YugabyteDB supports a Cassandra-compatible YCQL API and a Postgres-compatible YSQL API. In most cases, this compatibility allows for easy migrations, but there are minor, and intentional, differences relevant to YugabyteDB’s enhanced scalability and global distribution.

For the YCQL API, for example, a broader set of keywords are reserved compared to Cassandra, in order to support additional functionality (see this GitHub issue for a discussion). This expanded set of reserved keywords can cause issues when migrating schemas from Cassandra to YugabyteDB, especially during data import. YugabyteDB’s YSQL API also has reserved keywords similar to those in PostgreSQL, with minor differences.

The table below shows a comparison of reserved keywords among the database management systems:

Apache Cassandra YugabyteDB YCQL PostgreSQL YugabyteDB YSQL
add
all all all
allow allow
alter
analyse analyse analyse
analyze analyze analyze
and and and and
any any any
apply
array array array
as as as
asc asc asc asc
asymmetric asymmetric asymmetric
authorize authorize
batch
begin
both both both
by
case case case
cast cast cast
check check check
collate collate collate
column column column
columnfamily
constraint constraint constraint
create create create create
current_catalog current_catalog current_catalog
current_date current_date current_date
current_role current_role current_role
current_time current_time current_time
current_timestamp current_timestamp current_timestamp
current_user current_user current_user
default default default
deferrable deferrable deferrable
delete
desc desc desc desc
describe describe
distinct distinct distinct
do do do
drop
else else else
end end end
entries
except except except
execute
false false false
fetch fetch fetch
for for for
foreign foreign foreign
from from from from
full full full
grant grant grant grant
group group
having having having
if if
in in in in
index
infinity infinity
initially initially initially
insert
intersect intersect intersect
into into into into
is is is
keyspace keyspace
keyspaces
lateral lateral lateral
leading leading leading
limit limit limit limit
localtime localtime localtime
localtimestamp localtimestamp localtimestamp
materialized
modify modify
nan nan
norecursive
not not not not
null null null null
of
offset offset
on on on on
only only only
or or or or
order order order order
partition_hash
placing placing placing
primary primary primary primary
references references
rename
returning returning returning
returns
revoke
schema schema
select select select select
session_user session_user session_user
set
some some some
symmetric symmetric symmetric
table table table table
then then then
to to to to
token token
trailing trailing trailing
true true true
truncate
union union union
unique unique unique
unlogged
update
use use
user user
using using using using
variadic variadic variadic
view
when when
where where where where
window window window
with with with with

Note: The list of reserved keywords is subject to change. Tested on October 15, 2024.

Additional references: https://github.com/apache/cassandra/blob/trunk/src/resources/org/apache/cassandra/cql3/reserved_keywords.txt https://github.com/postgres/postgres/blob/master/src/include/parser/kwlist.h https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/yql/cql/ql/kwlist.h

If you’d like to test a list of words against your own database management system or version, here is a short bash script to generate a set of table creation statements. You can then run the statements against a test copy of your database to see which ones throw a syntax error. Add words as needed.

#!/bin/bash

words=("add" "all" "allow" "alter" "analyse" "analyze" "and" "any" "apply" "array" "as" "asc" "asymmetric" "authorize" "batch" "begin" "both" "by" "case" "cast" "check" "collate" "column" "columnfamily" "constraint" "create" "current_catalog" "current_date" "current_role" "current_time" "current_timestamp" "current_user" "default" "deferrable" "delete" "desc" "describe" "distinct" "do" "drop" "else" "end" "entries" "except" "execute" "false" "fetch" "for" "foreign" "from" "full" "grant" "group" "having" "if" "in" "index" "infinity" "initially" "insert" "intersect" "into" "is" "keyspace" "keyspaces" "lateral" "leading" "limit" "localtime" "localtimestamp" "materialized" "modify" "nan" "norecursive" "not" "null" "of" "offset" "on" "only" "or" "order" "partition_hash" "placing" "primary" "references" "rename" "returning" "returns" "revoke" "schema" "select" "session_user" "set" "some" "symmetric" "table" "then" "to" "token" "trailing" "true" "truncate" "union" "unique" "unlogged" "update" "use" "user" "using" "variadic" "view" "when" "where" "window" "with")

for word in "${words[@]}"; do
  sql="create table $word (id int primary key);"
  echo $sql
done

Handling Reserved Keywords in DSBulk

The DSBulk tool (forked for YugabyteDB at https://github.com/yugabyte/dsbulk) is used for a variety of tasks involving Cassandra-compatible databases, including YugabyteDB YCQL. DSBulk is a versatile command-line tool supporting multiple data formats such as CSV and JSON, making it ideal for exporting data, loading data, and verifying data consistency.

When using DSBulk to import data, reserved keywords need to be handled properly to avoid errors. One way is by using the --query flag to specify the insert query directly, enclosing reserved keywords in double quotes. For example, if a column is named “update” (a reserved keyword), it can be handled like this:

dsbulk load -h 127.0.0.1 -url my.csv --query 'insert into keyspace1.table1(id,\"update\",name) values (?,?,?)' --header false

In this example, the keyword update is wrapped in double quotes to indicate that it is a column name rather than a part of the command syntax.

Tips for Managing Reserved Keywords During Migration

  • Know your keywords: Familiarize yourself with the reserved keywords in your database system. Each database has its own set, so refer to the code or documentation, and test as shown above.

  • Use proper escaping: If you are forced to use a reserved keyword as an identifier, make sure to escape it appropriately. In most database systems, this involves enclosing the keyword in double quotes.

  • Use the --query flag: The --query flag provides control by allowing you to specify insert statements directly during data loading, making it easier to handle reserved keywords properly.

By managing reserved keywords correctly, you can prevent syntax errors and ensure a smoother data migration process.