migrating data

Database transformation from SQL Server to YugabyteDB

Share this post on:

A database transformation and migration project takes solid planning and testing. I’ve found that three common changes required when transforming a SQL Server database to YugabyteDB YSQL are related to syntax, performance, and stored procedures. These will get you started on your transformation project.

Syntax

Transforming a schema from MS SQL to YugabyteDB requires some minor syntax changes. This is true for any cross-database transformation. The YugabyteDB YSQL API utilizes PostgreSQL syntax.

Basic formatting in MS SQL scripts

If you load exported MS SQL project scripts directly into YugabyteDB, make the following basic changes:

  • Remove left and right brackets.
  • Remove “dbo.”
  • Change GO to semicolon.
  • Objects will be converted to lower case, although in most cases queries will be supported using any case format.

Datatype conversions

Database transformation also requires datatype changes due to syntax differences. Change the following datatypes when transforming from MS SQL to PostgreSQL/YugabyteDB.

BINARY(n) -> BYTEA
BIT -> BOOLEAN (or INT for a more flexible choice)
VARCHAR(max) -> TEXT
VARBINARY -> BYTEA
NVARCHAR(n) -> VARCHAR(n)
NVARCHAR(max) -> TEXT
NTEXT -> TEXT
FLOAT(p) -> DOUBLE PRECISION
DATETIME -> TIMESTAMP or TIMESTAMPZ to retain timezone
TINYINT -> SMALLINT
UNIQUEIDENTIFIER -> CHAR(16) or UUID
ROWVERSION -> BYTEA
SMALLMONEY -> MONEY
IMAGE -> BYTEA

Auto-increment 

IDENTITY (1, 1) is not supported; use GENERATED ALWAYS AS IDENTITY or sequences. Either way, it uses sequences behind the scenes, and using sequences explicitly will allow you more control over them.

The datatype of an identity column has to be integer (int, smallint, bigint).

See also: https://docs.yugabyte.com/preview/develop/best-practices-ysql/#faster-sequences-with-server-level-caching 

Extended properties

Use the COMMENT command to add detail like MS SQL’s extended properties to database objects. 

See also: https://docs.yugabyte.com/preview/api/ysql/the-sql-language/statements/ddl_comment/ 

Queries

Change top to limit to return a subset of rows from a query.

For example, change select top 1... to select.... limit 1.

Performance

From a performance perspective, consider a few basic things regarding indexing and primary keys when moving to a distributed system. 

Indexes

A critical performance improvement will be forming the primary key in a way that takes advantage of the distributed nature of YugabyteDB. The primary key consists of a partitioning/sharding key and a clustering/ordering key. For example, a primary key (userid, username) will distribute partitions based on userid and then order them within the partition based on username. Both partitioning keys and clustering keys can contain multiple columns.

The following two things are typically a good start:

  1. Change any existing primary key to the partitioning key within the primary key.
  2. Change any “CLUSTERED INDEX” indexes to the clustering keys within the primary key.

Further tuning will be required after these basic changes. For example, in a mature schema, the number of secondary indexes will have grown; the number transformed into YugabyteDB should be limited to what will be used by current select queries to avoid unnecessary load during writes. There are also often opportunities to combine columns into a single index, or to use partial indexes. Query logging can be enabled before a test run to determine which indexes are being used, as well as other performance data.

Views

Views are typically used in MS SQL to simplify joins on multiple tables. After basic performance tuning of partitioning/clustering keys as mentioned above, these can be revisited. One thing to consider is that joining multiple tables in a distributed SQL database requires multiple lookups across the cluster. The latency involved in this can be mitigated with things like decreasing the number of tablets for smaller (under 100s of millions of rows) tables and colocated lookup tables.

Stored Procedures

Finally, the language used to write stored procedures will often change when moving to a new system.

Stored procedures in MS SQL are written in Transact SQL, which has a different syntax than the typical PLPGSQL language used in PostgreSQL/YugabyteDB procedures and functions.

Although procedures are available, if a return value is required, a function will have to be used. The main difference from a developer point of view is that procedures are called and functions are selected. See an example below.

T-SQL:

create procedure dbo.adduser
(
   @UserID.         char(4) = NULL,
   @UserName        char(255) = NULL
)
as
BEGIN
   print 'Description: Will add names into the users table.'
   print ''
  
   if (@UserID is NULL) or (@UserName is NULL)
   begin
       print 'Please check your syntax:'
       print 'syntax: adduser parameter1, parameter2'
       print '  parameter1 [char] [required] = User ID'
       print '  parameter2 [char] [required] = User Name'
       print ''
       return
   end
  
   insert into users values(@UserID, @UserName, getdate())
   select * from users where userid = @UserID

END

CALL adduser('1234','Valerie');

PLPGSQL:

create function adduser
(
   userid   char(4) = NULL,
   username char(255) = NULL
)

returns setof user
language 'plpgsql'
as $$
  
BEGIN
   raise notice 'Description: Will add names into the users table.';
   raise notice '';
   
   if (userid is NULL) or (username is NULL) then

       begin
           raise notice 'Please check your syntax:';
           raise notice 'syntax: adduser parameter1, parameter2';
           raise notice '  parameter1 [char] [required] = User ID';
           raise notice '  parameter2 [char] [required] = User Name';
           raise notice '';
       end;
   else
   
       insert into users values(userid, username, now());
       return query
           select * from users where userid = userid;
   end if;

END;
$$;

SELECT adduser('1234','Valerie');

Author: Valerie Parham-Thompson

View all posts by Valerie Parham-Thompson >

Leave a Reply

Your email address will not be published. Required fields are marked *