February 2019 ~ Ozkary - Emerging Technologies

The cache-aside pattern is a common approach in which the application performs a cache-first look up to get some data. When the data is found, the term cache-hit is used. When there is no data, it is call a cache-miss, and the application performs a lookup from the database. In this article, we take a look at the benefits of using this pattern and provide a basic solution using Node.js, SQL Server and Redis.

What is In-Memory Cache?

An in-memory cache is a database in which the data is stored in memory instead of disk. This provides performance benefits as a data read can return in sub-millisecond average speed. Because of this benefit, the data is not persisted on disk, and the data is lost on a system restart. This is a common trade-off as this kind of database is used for short time lived storage and not as a system of record. These systems are also commonly known as a distributed cache because they usually reside in different clusters than where the disk based database is hosted.

When to Use a Cache

We tend to use a cache for performance purposes. When building distributed application with heavy reads and writes on the database, we often start to encounter performance issues as the load on the system grows. Compare to the speed of a cache which is in the sub-milliseconds average speed, a disk based storage is in the double-digit millisecond average speed. This is just the physics limitation of the hardware. This does not include other performance concerns like lack of index tuning, partitions and other factors that can lead to blocks and deadlocks on the database.

Cache Aside Pattern

When adding a distributed cache into our system architecture, we need to start thinking on strategies that can enable us to efficiently develop a solution that can improve our overall system performance without doing an overall system redesign.

For these kinds of solutions, we leverage the Cache Aside Pattern which at its simplest form of implementation involves doing an asynchronous look up into the cache and failing over to the disk based database in the event of a cache-miss.

For the cases where there is a cache-miss, we also want to make sure that the cache is populated with the missing information. This helps us keep the reads on the disk based database at a minimum.

Defining the Provider Interface

The first step for our solution is to define an interface contract that all the database providers should implement. The goal is that as we inject the providers into our components, the implementation should be the same. This enables our code to be implementation agnostic of the actual technology that we are using. For our simple approach, we define our contract with a get function export. This function provides our components with database read support. The internal function can have a different name as long at the exported name is kept the same, our components should work properly.

Module.exports.get = function getTelemetry(){…

Implementing the SQL Server Library

Now that we have our provider contract defined, we can implement the SQL server library. This library just performs a SQL query to read information from our SQL database telemetry table for a particular time range. For this integration, we use the typeorm library which works very similar to the .NET Entity Framework library.

Function init(){

var context = orm.createConnection(

{“type”: “mssql”,

“host”: “servername”,

“entities”:[

new orm.EntitySchema(require(“../models/entity/telemetry”))

]

});

return context;

}

function getTelemetry() {

return q.Promise(async function(resolve, reject, notify){

let context = init();

context.then( async function(conn){

let repository = conn.getRepository("telemetry");

let ts = (new Date);

let timeRange = new Date(ts.setHours(ts.getHours()-1));

let dateFilter = timeRange.toLocaleDateString() + ' '

+ timeRange.toLocaleTimeString();

let data = await repository.createQueryBuilder()

.where("telemetry.processed >= :dt",{ dt: dateFilter})

.getMany();

resolve(data);

await conn.close();

})

.catch(function(err){

//…;

});

};

The init() function returns the database context which enables us to make calls to the database and query the telemetry table. The data is return as JSON, and it is mapped by using our entity definition which is mapped to the context by using the entities property. Our model is shown below:

module.exports = {

"name": "telemetry",

"columns": {

"telemetryId": {

"primary": true,

"type": "int",

"generated": true

"deviceId": {

"type": "varchar"

"temperature": {

"type": "numeric"

"humidity": {

"type": "numeric"

"sound": {

"type": "numeric"

"processed": {

"type": "datetime"

"created": {

"type": "datetime"

}

Implementing the Redis Library

We are done with the SQL library. We now move on to implementing our Redis library. Similar to our SQL Server implementation, we want to maintain the same contract. The big difference here is that the Redis API looks very different than a SQL query. This is expected as this is a Key/Value pair system in which we want to find the data based on the particular key.

For our purposes, we define the key to be the same as the table name on our database. For date range on Redis, we use the concept of score for our data entries. We basically use a JavaScript timestamp. This enables us to use the ZRangeByScore API call to query Redis for a time range similar to how we do a SQL query.\

//client context

const client = redis.createClient($config.REDIS.port, $config.REDIS.host);

function getTelemetry(){

//async operation - return a promise

return q.Promise(async function(resolve, reject, notify){

let ts = (new Date);

let from = ts.setHours(ts.getHours()-1); //basic date range query

let to = ts.setHours(ts.getHours()+1);

let args = [tableName,from,to];

await client.zrangebyscore(args,function(err,data){

let result = data;

result.forEach(function(item,idx){

result[idx]= JSON.parse(item);

});

let promise = (err) ? reject : resolve;

let response = err || result;

promise(response); //send back the promise with data

});

}

Similar to the SQL context, we need to get the Redis context to be able to use the APIs. With the context instantiated, we make a call to the items within the time range. Since we stored the data in JSON format which is what we want to return to the client, we do not need to map anything. Our code returns a JSON array of telemetry data.

Message Broker

Now that we have both our service libraries ready, let’s take a look at implementing the cache-aside pattern by looking at our message broker implementation.

Similar to our other libraries, we want our service broker to implement the same contract. The main reason is because we just want to inject the service into our components and not have to change their implementation.

Our service broker also needs to be able to support the cache and SQL providers. This is what enables us to manage the cache-aside pattern logic. As the service goes through the workflow, it uses the same implementation and passes only the provider reference to handle the database integration. This is the advantage of maintaining the same interface for all our providers. The logic looks the same because the APIs have the same contract definition.

function getFromProvider(provider, callback){

let result= null;

let error = null;

provider.get().done(function(data){

result = data;

callback(result,error);

function(err){

error = err;

callback(result,error);

});

}

function getData (){

console.log("broker get");

return q.Promise(function(resolve, reject){

getFromProvider(cache,function(data,err){

if (!data || data.length === 0){

getFromProvider(storage,function(data, err){

if (err){

reject(err);

}else{

resolve(data);

//add to the cache

addToProvider(cache,data);

}

});

}else{

resolve(data);

}

});

}

At its simplest implementation, our service broker just queries the cache. If data is found, it returns the data. If no data is found, it fails over the SQL query. It then populates the cache with the data from SQL, so the next time a request is made; the data is available from the cache.

How to Use the Libraries

We are now ready to use our libraries. Let’s recall that we have two database providers (redis and sql) as well as one service broker. We want our API server library to use one provider to handle all the data access concerns, and let the service broker integrate with the other database providers. Let’s see how to do that:

//message broker

const $repository = require('./modules/message-broker');

const $storage = require('./data_modules/sqlRepository'); //sql

const $cache = require('./data_modules/redisRepository'); //redis

//initialize the broker with the data providers

$repository.init($cache,$storage);

//api service

var $api = require('./modules/telemetry-api.js');

$api.init(app, $repository);

When we initialize the repository (service broker), we pass the references to the cache and sql. We then initialize the API with our repository reference. We should notice that we could pass a reference of any library to our API because all of them implement the same interface. This is a way to enable us to either use sql, cache or both for storage.

API Library

Let’s take a look at how the API library uses the provider.

//add the route for

app.get('/api/telemetry', getData)

//getData

function getData (req, res){

console.log("GET Telemetry");

provider.get().done(function(data){

res.json(data);

function(err){

console.log(err);

res.status(400).send({ error: err });

});

}

Our API library expects a provider instance to handle the data operations. The API defines the routes and the handler for those routes. When a get request is received, the API library uses the same contract definition from the provider to get the data.

The database system is agnostic to the API library. Since all those providers maintain the same contract, the API can use any provider the cache, sql or the service broker. This enables us to minimize the changes on other components.

Conclusion

We are able to show that by maintaining our interfaces, we are able to inject a service broker that manages different data storage systems to implement our cache-aside pattern. Note that the cache should always be populated the moment the database has new information. In our pattern, we populate the cache on a cache-miss event as an exception and not the norm. There are other patterns to handle this database-cache synchronization. This is often then via pipelines that are not part of the application.

Thanks for reading

Ozkary - Emerging Technologies

I am Oscar Garcia, Ozkary^TM. I author this site, speak at conferences and events, contribute to OSS, mentor people. I use this blog to post ideas and experiences about software development, with the goal to both learn from and help the technology communities around the world.

2/16/19

Cache-Aside Pattern With Nodej SQL Server and Redis

Originally published by ozkary.com

Books on Amazon

Data Engineering Process Fundamentals: Master the Fundamentals of Data Engineering with a Hands-on Approach

Developer Technologies Award 2021

Ozkary

This Blog is protected by DMCA.com

Popular Posts

Recent Posts

Categories

Featured Post

Launch a Software Product - Product Release Lifecycle

Blog Archive

Followers

I am Oscar Garcia, OzkaryTM. I author this site, speak at conferences and events, contribute to OSS, mentor people. I use this blog to post ideas and experiences about software development, with the goal to both learn from and help the technology communities around the world.

2/16/19

Originally published by ozkary.com

Books on Amazon

Data Engineering Process Fundamentals: Master the Fundamentals of Data Engineering with a Hands-on Approach

Developer Technologies Award 2021

Ozkary

This Blog is protected by DMCA.com

Popular Posts

Recent Posts

Categories

Featured Post

Launch a Software Product - Product Release Lifecycle

Blog Archive

Followers

I am Oscar Garcia, Ozkary^TM. I author this site, speak at conferences and events, contribute to OSS, mentor people. I use this blog to post ideas and experiences about software development, with the goal to both learn from and help the technology communities around the world.