Handling Reserved Keywords in DSBulk for Seamless Data Migration
Migrating to YugabyteDB offers significant advantages in terms of high availability, global distribution, and horizontal scalability—features essential for managing modern database workloads. However, data migration can be a complex process, particularly when transforming your schema definition. Differences in datatype support, query syntax, and core features across systems can complicate the transformation.
One of the challenges is dealing with reserved keywords in the source schema that cannot be directly used in the target system. This can require changes not only in the database schema during transformation but also in application code and related tooling.
What Are Reserved Keywords?
Reserved keywords are words that have a special meaning in a programming language or database system. These keywords are used to define operations or structures and therfore cannot be used as identifiers—such as column or table names—without specific handling. Examples of reserved keywords in Apache Cassandra include “select”, “update”, and “table”; these are part of the CQL grammar used to process data definition (DDL) and data manipulation (DML) statements.
Using reserved keywords in your schema is generally discouraged as it complicates both schema management and application logic. However, sometimes the business logic has been designed in a way that requires the use of these words; for example, if the business needs to keep a record of every update to a process, and the application engineers didn’t know about database reserved keywords, then you can end up with a table named “update.”
To account for these situations, identifier names that match reserved keywords can be enclosed in double quotes. For example:
CREATE TABLE "update" (id INT PRIMARY KEY);
YugabyteDB supports a Cassandra-compatible YCQL API and a Postgres-compatible YSQL API. In most cases, this compatibility allows for easy migrations, but there are minor, and intentional, differences relevant to YugabyteDB’s enhanced scalability and global distribution.
For the YCQL API, for example, a broader set of keywords are reserved compared to Cassandra, in order to support additional functionality (see this GitHub issue for a discussion). This expanded set of reserved keywords can cause issues when migrating schemas from Cassandra to YugabyteDB, especially during data import. YugabyteDB’s YSQL API also has reserved keywords similar to those in PostgreSQL, with minor differences.
The table below shows a comparison of reserved keywords among the database management systems:
Apache Cassandra | YugabyteDB YCQL | PostgreSQL | YugabyteDB YSQL |
---|---|---|---|
add | |||
all | all | all | |
allow | allow | ||
alter | |||
analyse | analyse | analyse | |
analyze | analyze | analyze | |
and | and | and | and |
any | any | any | |
apply | |||
array | array | array | |
as | as | as | |
asc | asc | asc | asc |
asymmetric | asymmetric | asymmetric | |
authorize | authorize | ||
batch | |||
begin | |||
both | both | both | |
by | |||
case | case | case | |
cast | cast | cast | |
check | check | check | |
collate | collate | collate | |
column | column | column | |
columnfamily | |||
constraint | constraint | constraint | |
create | create | create | create |
current_catalog | current_catalog | current_catalog | |
current_date | current_date | current_date | |
current_role | current_role | current_role | |
current_time | current_time | current_time | |
current_timestamp | current_timestamp | current_timestamp | |
current_user | current_user | current_user | |
default | default | default | |
deferrable | deferrable | deferrable | |
delete | |||
desc | desc | desc | desc |
describe | describe | ||
distinct | distinct | distinct | |
do | do | do | |
drop | |||
else | else | else | |
end | end | end | |
entries | |||
except | except | except | |
execute | |||
false | false | false | |
fetch | fetch | fetch | |
for | for | for | |
foreign | foreign | foreign | |
from | from | from | from |
full | full | full | |
grant | grant | grant | grant |
group | group | ||
having | having | having | |
if | if | ||
in | in | in | in |
index | |||
infinity | infinity | ||
initially | initially | initially | |
insert | |||
intersect | intersect | intersect | |
into | into | into | into |
is | is | is | |
keyspace | keyspace | ||
keyspaces | |||
lateral | lateral | lateral | |
leading | leading | leading | |
limit | limit | limit | limit |
localtime | localtime | localtime | |
localtimestamp | localtimestamp | localtimestamp | |
materialized | |||
modify | modify | ||
nan | nan | ||
norecursive | |||
not | not | not | not |
null | null | null | null |
of | |||
offset | offset | ||
on | on | on | on |
only | only | only | |
or | or | or | or |
order | order | order | order |
partition_hash | |||
placing | placing | placing | |
primary | primary | primary | primary |
references | references | ||
rename | |||
returning | returning | returning | |
returns | |||
revoke | |||
schema | schema | ||
select | select | select | select |
session_user | session_user | session_user | |
set | |||
some | some | some | |
symmetric | symmetric | symmetric | |
table | table | table | table |
then | then | then | |
to | to | to | to |
token | token | ||
trailing | trailing | trailing | |
true | true | true | |
truncate | |||
union | union | union | |
unique | unique | unique | |
unlogged | |||
update | |||
use | use | ||
user | user | ||
using | using | using | using |
variadic | variadic | variadic | |
view | |||
when | when | ||
where | where | where | where |
window | window | window | |
with | with | with | with |
Note: The list of reserved keywords is subject to change. Tested on October 15, 2024.
Additional references: https://github.com/apache/cassandra/blob/trunk/src/resources/org/apache/cassandra/cql3/reserved_keywords.txt https://github.com/postgres/postgres/blob/master/src/include/parser/kwlist.h https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/yql/cql/ql/kwlist.h
If you’d like to test a list of words against your own database management system or version, here is a short bash script to generate a set of table creation statements. You can then run the statements against a test copy of your database to see which ones throw a syntax error. Add words as needed.
#!/bin/bash
words=("add" "all" "allow" "alter" "analyse" "analyze" "and" "any" "apply" "array" "as" "asc" "asymmetric" "authorize" "batch" "begin" "both" "by" "case" "cast" "check" "collate" "column" "columnfamily" "constraint" "create" "current_catalog" "current_date" "current_role" "current_time" "current_timestamp" "current_user" "default" "deferrable" "delete" "desc" "describe" "distinct" "do" "drop" "else" "end" "entries" "except" "execute" "false" "fetch" "for" "foreign" "from" "full" "grant" "group" "having" "if" "in" "index" "infinity" "initially" "insert" "intersect" "into" "is" "keyspace" "keyspaces" "lateral" "leading" "limit" "localtime" "localtimestamp" "materialized" "modify" "nan" "norecursive" "not" "null" "of" "offset" "on" "only" "or" "order" "partition_hash" "placing" "primary" "references" "rename" "returning" "returns" "revoke" "schema" "select" "session_user" "set" "some" "symmetric" "table" "then" "to" "token" "trailing" "true" "truncate" "union" "unique" "unlogged" "update" "use" "user" "using" "variadic" "view" "when" "where" "window" "with")
for word in "${words[@]}"; do
sql="create table $word (id int primary key);"
echo $sql
done
Handling Reserved Keywords in DSBulk
The DSBulk tool (forked for YugabyteDB at https://github.com/yugabyte/dsbulk) is used for a variety of tasks involving Cassandra-compatible databases, including YugabyteDB YCQL. DSBulk is a versatile command-line tool supporting multiple data formats such as CSV and JSON, making it ideal for exporting data, loading data, and verifying data consistency.
When using DSBulk to import data, reserved keywords need to be handled properly to avoid errors. One way is by using the --query
flag to specify the insert query directly, enclosing reserved keywords in double quotes. For example, if a column is named “update” (a reserved keyword), it can be handled like this:
dsbulk load -h 127.0.0.1 -url my.csv --query 'insert into keyspace1.table1(id,\"update\",name) values (?,?,?)' --header false
In this example, the keyword update
is wrapped in double quotes to indicate that it is a column name rather than a part of the command syntax.
Tips for Managing Reserved Keywords During Migration
-
Know your keywords: Familiarize yourself with the reserved keywords in your database system. Each database has its own set, so refer to the code or documentation, and test as shown above.
-
Use proper escaping: If you are forced to use a reserved keyword as an identifier, make sure to escape it appropriately. In most database systems, this involves enclosing the keyword in double quotes.
-
Use the
--query
flag: The--query
flag provides control by allowing you to specify insert statements directly during data loading, making it easier to handle reserved keywords properly.
By managing reserved keywords correctly, you can prevent syntax errors and ensure a smoother data migration process.