jump to navigation

Accelerate Postgres with CSQL MMDB April 15, 2009

Posted by Prabakaran Thirumalai in cache, csqlcache.
add a comment

CSQL , main memory database engine provides transparent caching for Postgres databases with no or minimal application code changes. Main memory databases are times faster than disk based databases. By caching data close to the application using CSQL MMDB, it reduces the network latency and provides unprecendented performance for data access.

For more information visit


Accelerate MySQL with CSQL MMDB April 15, 2009

Posted by Prabakaran Thirumalai in cache, csqlcache.
Tags: ,
add a comment

CSQL , main memory database engine provides transparent caching for MySQL databases with no or minimal application code changes. CSQL MMDB is 20-30X faster than disk based databases. By caching data close to the application using CSQL MMDB, it reduces the network latency and provides unprecendented performance for data access.

Main Memory Databases are times faster than traditional disk based database management systems such as Oracle, Sybase, MySQL, Postgres, etc. This increase in performance is not due the fact that it reduces the I/O cycles to fetch data from disk to memory. Even when disk based DBMS places all data in the buffer cache, they are slow because of its inherent data structure and access algorithm design based on disk based data storage.

For more information visit


Requirements of good data caching solution April 13, 2009

Posted by Prabakaran Thirumalai in cache, csqlcache.
1 comment so far

This article outlines the features of good data caching solution for read and write intensive applications

Updateable Cache Tables
Most of the existing cache solutions are read only which limits their usage to small segment of the applications, ie non-real time applications.

Bi-Directional Updates
For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.

Synchronous and Asynchronous update propagation
The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database.
Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.

Multiple cache granularity: Database level, Table level and Result-set caching
Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer’s data, etc

Recovery for cached tables
Incase of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.

Tools to validate the coherence of cache
In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually and the caching solution should provide tools to identify the mismatches and take corrective measures if required.

Horizontally Scalable
Clustering is employed in many solutions to increase the availability and to achieve load balancing. Caching platform should work in a clustered environment spanning to multiple nodes thereby keeping the cached data coherent across nodes.

Transparent access to non-cached tables reside in target database
Database Cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.

Transparent Fail over
There should not be any service outages, in case of caching platform failure. Client connections should be routed to the target database.

Minimal changes to application for adapting the caching solution
Support for standard interfaces JDBC, ODBC etc  will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don’t need to be migrated.



Scaling applications/servers to handle more load, April 1, 2009

Posted by Prabakaran Thirumalai in csqlcache, cache.
Tags: , , , ,
add a comment

With the speed of business increasing, and the volume of information that enterprises must process growing as well, businesses in many industry domains need to make transition to real time data management in order to stay competitive.
Though there is huge demand for speed, enterprises are reluctant to migrate their applications, as they do not want to give up the existing database systems they are using for many years that are proven stable in their environment.

CSQL Main memory database executes transactions 30 times faster than other leading disk based database management system.

CSQL Cache works in conjunction with existing database management system (MySQL, Postgres, Oracle , etc) and provides application flexibility to use feature rich existing database functionality and high performance CSQL MMDB based on the performance requirement on per table basis. By caching frequently accessed tables from existing database management system close to the application host, application can improve database throughput by 100 times.

Improves ROI by providing business applications process 1 million transaction in less than half a minute.
Seamlessly plugs into the existing architecture with no or minimal code changes
Reduces the network bandwidth and load on back end systems
No additional H/W to handle more load or more customers

For more information on product, visit the product web site

Product Web Site


Measuring Cache miss August 25, 2008

Posted by Prabakaran Thirumalai in cache.
Tags: , ,

cachegrind tool helps programmers to quantify and understand the cache behavior of programs and algorithms. we shall modify our code to be more cache-friendly and thereby make it run faster.

The following example demonstrates how to measure cache misses for L1 and L2 cache on Linux platform. Consider the following program which initializes two dimensional array (test1.c)

int array[10000][10000];
int setValue1()
int i,j;
for (i=0;i<10000;i++)
for (j=0;j<10000;j++)
int main()
return 0;

Compile the program using gcc compiler

$gcc -O2 -o test test1.c

Run the executable generated using the valgrind tool

$valgrind –tool=cachegrind ./test
==2605== Cachegrind, an I1/D1/L2 cache profiler.
==2605== Copyright (C) 2002-2007, and GNU GPL’d, by Nicholas Nethercote et al.
==2605== Using LibVEX rev 1732, a library for dynamic binary translation.
==2605== Copyright (C) 2004-2007, and GNU GPL’d, by OpenWorks LLP.
==2605== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
==2605== Copyright (C) 2000-2007, and GNU GPL’d, by Julian Seward et al.
==2605== For more details, rerun with: -v
==2605== I refs: 500,153,442
==2605== I1 misses: 547
==2605== L2i misses: 545
==2605== I1 miss rate: 0.00%
==2605== L2i miss rate: 0.00%
==2605== D refs: 100,048,658 (33,484 rd + 100,015,174 wr)
==2605== D1 misses: 100,000,803 ( 629 rd + 100,000,174 wr)
==2605== L2d misses: 6,260,759 ( 589 rd + 6,260,170 wr)
==2605== D1 miss rate: 99.9% ( 1.8% + 99.9% )
==2605== L2d miss rate: 6.2% ( 1.7% + 6.2% )
==2605== L2 refs: 100,001,350 ( 1,176 rd + 100,000,174 wr)
==2605== L2 misses: 6,261,304 ( 1,134 rd + 6,260,170 wr)
==2605== L2 miss rate: 1.0% ( 0.0% + 6.2% )

It displays the access patterns with respect to Level 1 and Level 2 caches which includes the total instruction references(I) and total data references(D), Level 1 instruction misses(I1) and data misses(D1) and L2 instruction(L2i) and data misses(L2d).
From the above output, we could see D1 miss rate is 99.9% and D2 miss rate is 6.2%. There were about 100,048,658 references to L1 cache out of which 100,000,803 references are misses in L1 cache.

The above output is summary for the whole program, When we run the cachegrind tool, it also generates a file named cachegrind.out with suffix as pid of the process. This file contains L1 and L2 cache information for each and every function in your program. Using this we can figure out which function is the candidate for optimization.
For the above run, cachegrind.out.2605 file contains the following information

desc: I1 cache: 32768 B, 64 B, 8-way associative
desc: D1 cache: 32768 B, 64 B, 8-way associative
desc: L2 cache: 2097152 B, 64 B, 8-way associative
cmd: ./a.out
events: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
fn=(below main)
0 64 4 4 19 0 0 26 0 0
0 1037 14 12 956 3 1 34 1 1
0 25 2 2 9 0 0 5 0 0
0 500060005 0 0 2 1 1 100000001 100000000 6260000
0 9647 1 1 3161 14 11 0 0 0
0 657 3 3 103 2 2 0 0 0
0 711 2 2 241 0 0 7 0 0
0 8 2 2 2 0 0 0 0 0
0 19 2 2 7 0 0 5 0 0
summary: 500153442 547 545 33484 629 589 100015174 100000174 6260170

To understand the output, we should know what each of these values represent.
fn=sbrk means the function is sbrk() and values below that are in the format “
events Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw”

Ir Instructions Executed
I1mr L1 Instruction Cache read misses
I2mr L2 Instruction Cache read misses
Dr Total Memory Reads
D1mr L1 Data Cache read misses
D2mr L2 Data Cache read misses
Dw Total Memory Writes
D1mw L1 Data Cache write misses
D2mw L2 Data Cache write misses

For setValue1() function, 500060005 Instructions were executed and there were no instruction misses both in L1 and L2 cache. 100000001 memory writes and out of which 100000000 were L1 data misses and 6260000 were L2 data misses.

Further Reading


Levels of Caching June 8, 2008

Posted by Prabakaran Thirumalai in cache.
Tags: , , , , , , , , ,
1 comment so far

A cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch or to compute, compared to the cost of reading the cache. (Wiki)

Computers have several levels of caches to speed up the operation, including processor Cache, memory cache and disk cache. Caching can also be implemented for frequent accessed internet pages on the web server and for frequently accessed data(table) for databases. Cache technology is the use of a faster but smaller memory type to accelerate a slower but larger memory type.

When using a cache, you must check the cache to see if an item is in there. If it is there, it’s called a cache hit. If not, it is called a cache miss and the computer must wait for a round trip from the larger, slower memory area.

Levels of Caching

L1 Cache

L1 cache is an abbreviation of Level 1 cache. It is also called as primary cache.
L1 cache is a small, fast memory cache that is built in to a CPU and helps speed access to important and frequently-used data. It is used for temporary storage of instructions and data organised in blocks of 32 bytes.

Write back and Write through cache: Write through happens when a processor writes data simultaneously into cache and into main memory (to assure coherency). Write back occurs when the processor writes to the cache and then proceeds to the next instruction. The cache holds the write-back data and writes it into main memory when that data line in cache is to be replaced. Write back offers about 10% higher performance than write-through, but cache that has this function is more costly. A third type of write mode, write through with buffer, gives similar performance to write back.

Speed: 5 cycles

Granularity : word length (64 bit or 128 bit)

Backend:L2 Cache

L2 Cache

Level 2 cache – also referred to as secondary cache is also present inside the processor.

Speed: 10 cycles

Granularity : word length (64 bit or 128 bit)



Principal level of system memory is referred to as main memory, or Random Access Memory (RAM).

Main memory is attached to the processor via its address and data buses. Each bus consists of a number of electrical circuits or bits. The width of the address bus dictates how many different memory locations can be accessed, and the width of the data bus how much information is stored at each location. Main memory is built up using DRAM chips, short for Dynamic RAM.

RAM is used as a cache for data that is initially loaded in from the hard disk (or other I/O storage systems).

Speed:5 to 50ns

Granularity :4 KB (page size)

Backend: Disk blocks


Speed: 5 millisecs

Granularity :Files

Backend: Distributed systems.

HTTP Cache

Speed: 1 sec

Granularity :Internet Pages

Backend: Pages on disk

Database Cache

Speed: 1 millisec for select

Granularity: Table, Result Set

Backend: Database connected via network


Get every new post delivered to your Inbox.