CSQL Main Memory Database Management System and Middle tier cache Product Roadmap

Introduction

CSQL is compact main memory open source SQL database engine that delivers ultra fast performance. It can handle 100K selects/sec and 50K updates/sec. It can also work as a middle tier caching solution for any open source and commercial database thereby increasing the throughput of the application by 20 to 100 times without making any code changes.

CSQL - Main memory Database

Main memory databases are times faster than disk based database systems, as all the data is available in physical memory. It also avoids the buffer manager overhead which is found in disk based database systems. Moreover data access algorithms can work efficiently as compared to traditional disk based algorithms. One of the major factor which determines the performance of database management is disk I/O. Access time for main memory is orders of magnitude less than that of disk ( 100ns vs 10ms).

Real time applications perform random access(point lookups => f1 = ?) 90% of the time, for which records from different disk blocks needs to be read. This hampers the performance of the application. Main memory cost has reduced significantly in the recent years and even desktops now have 1 GB RAM. Database managment system which depends only on memory and does not any disk I/O will be many times faster than disk based database sytems. This led to design a main memory database engine which does not involve any disk I/O.

Focus

CSQL compact main memory open source SQL database engine, uses shared memory architecture, wherein the database will be available to the application process in its own process address space and can be accessed by simple pointer semantics. OS mutex mechanisms are costly as it requires context switch, csql implements its own mutex mechanism through atomic assembly instructions to reduce the machine cycles. CSQL is designed for one single purpose 'performance'. It is not a feature rich database product. It does only limited things, but with ultra fast high performance.

SQL and Standard interface support
CSQL supports all DML statements which involve single table and DDL statements related to table and index.
Standard JDBC and ODBC drivers are also available to access the SQL kernel. It also provides proprietary C++ SQL interface to access the SQL Kernel and C++ interface to access storage kernel. Apart from primitive data types, it supports Date, Time and TimeStamp data types.

Storage Engine Capabilities

CSQL storage engine supports all the ACID properties.
Atomicity is implemented by using physical and logical undo logs.
It supports Unique, Not null and primary key, foreign key contraints.
It supports the first three isolation levels, READ_UNCOMMITTED, READ_COMMITTED, READ_REPEATABLE.
Disk performance is stagnated and it is around 7.2K rpm for IDE and 15K for SCSI disks. But network speed is increasing at good pace. 1 Gbps Ethernet is reality now. Because of these reasons, CSQL decides to rely on network so that it performance improves with increase in network speed rather than getting stagnated as in case with disk based database systems. Because of the above said reason, durability property of transaction is supported in csql using synchronized replication.
Hash Index and Tree indexing mechanisms make sure that equality predicates and range predicates are executed faster. Index structures are highly concurrent as only short duration locks are only taken on them.
Multi granular locking aids to improve the concurency. Row level locking gives the highest level of concurrency.
Custom built fixed size and variable size allocators are highly concurrent and uses latch free algorithms which aids in increasing the performance and gives high level of concurrency.

Data Replication

CSQL supports update anywhere N-way replication model. Update logs are generated at SQL level rather than at storage layer. This reduces the network bandwidth usage and improves the overall throughput. Lets say user executes the statement "delete from emp where empno >100" with emp table having 1 million rows. This will generate 1 million minus 100 physical log records to be shipped if handled at storage layer. But in case of CSQL, it is handled at SQL layer which sends one packet containing the SQL statement. This statement is executed in the context of the transaction at the peer site.

CSQL supports three modes to synchronize the data between the replication sites. They are

a)DSYNC - when updates happens at source site, the same update operation is executed in the peer site before it returns to the application. This ensures that distributed locks are taken to provide distributed consistent data at any given point of time.

b)NSYNC - when updates happens at source site, update logs are generated and are sent to the peer site before it returns to the application. These logs will be executed at the peer site by process called "CSQLExecutor". This ensures that even if the source site goes down, it will be able to recover the data from the peer site. Logs that are sent to the peer site will eventually be executed by 'CSQLExecutor' process and will be available during the recovery process of source site.

c)ASYNC - when updates happens at source site, update logs are generated and immediately it returns to the application. This gives very good throughput but when source site crashes, it may loose few transactions as the logs generated are sent asynchronously by "LogShipper" process to the peer site.

The quorum shall be configured to have any site to be in any one of the above mode with other sites. For example site1 shall be NSYNC with site2 and ASYNC with site3 and vice versa. This gives application flexibility to choose appropriate mode between the replicated sites based on reliability, consistency and throughput requirements.

Database Caching

CSQL can act as updateable real time caching for relational table of any database system (commercial and open source database systems including ORACLE and Mysql). This cache resides in application tier (same host where application resides) and reduces the computation and network overhead involved in executing SQL statements. It shall be used by both read intensive applications as well as update intensive applications, thereby increasing the throughput by 20 to 100 times.

Caching is done at the table level granularity and application shall either specify tables to be cached in a configuration file or it can use CLI tool or API's to cache the table dynamically.It loads the full table from the target database into CSQL database. After loading, any select query will be handled by the csql itself. Non -select DML statements are executed at both csql and target database, so that the cache and target database are in sync. It supports three modes to execute DML statements at target database

Applications shall also access tables which are not there in cache, using the same driver provided by CSQL. These statements on non cached tables are handled by the 'Gateway' component in the caching subsystem, whose responsibility is to check, where the table is present and route it either csql or target database based on the SQL query. If application executes complex queries which are not supported in CSQL Engine, then they are also routed to target database for execution. This will allow the application to adapt the caching solution without any application code changes.

a)DSYNC - statement execute will return after the operation is succesfully executed at csql as well as target database. This gives high consistency between cached data and target database. If csql server goes down, during recovery it automatically loads the tables from the target database.

b)NSYNC - statement execute will return after the operation is successfully executed at csql and update logs are generated and sent to target database host. CSQLExecutor process will run on the host where target database is running whose responsibility is to receive the update logs and execute them on target database.

c)ASYNC - statement execute will return after the operation is successfully executed at csql and update logs are generated. These update logs and sent to target database host by another process called "Propagator". CSQLExecutor process will run on the host where target database is running whose responsibility is to receive the update logs from the propagator and execute them on target database.

Applications can choose appropriate mode based on durability and consistency requirements. For applications, where updates shall happen directly to the target database, it provides CLI and API to reload the table fully or incremental. Incremental loading will work best if only insert and delete operations needs to be loaded from target database. Direct updates to target database shall be fetched to cache by using full table reloading. Application also can configure to have automatic loading at regular intervals, either fully or incremental.

Incase of foreign key constraints, applications shall either use DYNC mode, in which constaint violations are detected during target database execution. Other alternative is to load all the tables which are involved in the relationship to cache and use either NSYNC or ASYNC options based on the requirement.

Product Roadmap

Latest Release
CSQL - 1.2 Beta (Dec-2007)
Features Included

Storage Engine with transaction, procmgmt, locking, logging and relational support
Primitive SQL Engine which supports DDL and Single table DML statements
Primitive JDBC Driver
Interactive SQL CLI - csql

Future Releases
CSQL - 1.3 Beta (Feb -2008)
Features Included

Primitive ODBC Driver

CSQL - 2.0 GA (Apr -2008)
Features Included
Database Caching with ASYNC, DSYNC, NSYNC modes

CSQL - 3.0 GA (Dec -2008)
Features Included

SQL99 support
Replication with ASYNC, DYSNC, NSYNC modes


Product Page