The
cache-aside pattern is a common approach in which the application performs a
cache-first look up to get some data. When the data is found, the term cache-hit
is used. When there is no data, it is call a cache-miss, and the application
performs a lookup from the database. In
this article, we take a look at the benefits of using this pattern and provide a
basic solution using Node.js, SQL Server and Redis.
What is In-Memory Cache?
An in-memory
cache is a database in which the data is stored in memory instead of disk. This
provides performance benefits as a data read can return in sub-millisecond
average speed. Because of this benefit, the data is not persisted on disk, and
the data is lost on a system restart. This is a common trade-off as this kind of
database is used for short time lived storage and not as a system of record.
These systems are also commonly known as a distributed cache because they
usually reside in different clusters than where the disk based database is
hosted.
When to Use a Cache
We tend to
use a cache for performance purposes.
When building distributed application with heavy reads and writes on the
database, we often start to encounter performance issues as the load on the
system grows. Compare to the speed of a cache which is in the sub-milliseconds
average speed, a disk based storage is in the double-digit millisecond average
speed. This is just the physics limitation of the hardware. This does not include other performance
concerns like lack of index tuning, partitions and other factors that can lead
to blocks and deadlocks on the database.
Cache Aside Pattern
When adding
a distributed cache into our system architecture, we need to start thinking on
strategies that can enable us to efficiently develop a solution that can
improve our overall system performance without doing an overall system redesign.
For these kinds
of solutions, we leverage the Cache Aside Pattern which at its simplest form of
implementation involves doing an asynchronous look up into the cache and
failing over to the disk based database in the event of a cache-miss.
For the
cases where there is a cache-miss, we also want to make sure that the cache is
populated with the missing information. This helps us keep the reads on the disk
based database at a minimum.
Defining the Provider Interface
The first
step for our solution is to define an interface contract that all the database
providers should implement. The goal is that as we inject the providers into
our components, the implementation should be the same. This enables our code to
be implementation agnostic of the actual technology that we are using. For our
simple approach, we define our contract with a get function export. This function
provides our components with database read support. The internal function can
have a different name as long at the exported name is kept the same, our
components should work properly.
Module.exports.get
= function getTelemetry(){…
|
Implementing the SQL Server Library
Now that we
have our provider contract defined, we can implement the SQL server library.
This library just performs a SQL query to read information from our SQL
database telemetry table for a
particular time range. For this integration, we use the typeorm library
which works very similar to the .NET Entity Framework library.
Function init(){
var context =
orm.createConnection(
{“type”: “mssql”,
“host”: “servername”,
“entities”:[
new
orm.EntitySchema(require(“../models/entity/telemetry”))
]
});
return context;
}
function getTelemetry() {
return q.Promise(async function(resolve, reject, notify){
let context = init();
context.then( async function(conn){
let repository =
conn.getRepository("telemetry");
let ts = (new Date);
let timeRange = new
Date(ts.setHours(ts.getHours()-1));
let dateFilter =
timeRange.toLocaleDateString() + ' '
+
timeRange.toLocaleTimeString();
let data = await
repository.createQueryBuilder()
.where("telemetry.processed
>= :dt",{ dt: dateFilter})
.getMany();
resolve(data);
await conn.close();
})
.catch(function(err){
//…;
});
});
};
|
The init() function
returns the database context which enables us to make calls to the database and
query the telemetry table. The data is
return as JSON, and it is mapped by using our entity definition which is mapped
to the context by using the entities property. Our model is shown below:
module.exports
= {
"name": "telemetry",
"columns": {
"telemetryId": {
"primary": true,
"type": "int",
"generated": true
},
"deviceId": {
"type": "varchar"
},
"temperature": {
"type": "numeric"
},
"humidity": {
"type": "numeric"
},
"sound": {
"type": "numeric"
},
"processed": {
"type": "datetime"
},
"created": {
"type": "datetime"
}
}
}
|
Implementing the Redis
Library
We are done
with the SQL library. We now move on to implementing our Redis library. Similar to our SQL Server implementation, we
want to maintain the same contract. The big difference here is that the Redis
API looks very different than a SQL query. This is expected as this is a
Key/Value pair system in which we want to find the data based on the particular
key.
For our
purposes, we define the key to be the same as the table name on our database.
For date range on Redis, we use the concept of score for our data entries. We
basically use a JavaScript timestamp. This enables us to use the ZRangeByScore
API call to query Redis for a time range similar to how we do a SQL query.\
//client
context
const client =
redis.createClient($config.REDIS.port, $config.REDIS.host);
function getTelemetry(){
//async operation - return a promise
return q.Promise(async function(resolve, reject, notify){
let ts = (new Date);
let from =
ts.setHours(ts.getHours()-1); //basic date range query
let to =
ts.setHours(ts.getHours()+1);
let args =
[tableName,from,to];
await
client.zrangebyscore(args,function(err,data){
let result = data;
result.forEach(function(item,idx){
result[idx]=
JSON.parse(item);
});
let promise = (err) ? reject
: resolve;
let response = err ||
result;
promise(response); //send back the promise with data
});
});
}
|
Similar to
the SQL context, we need to get the Redis context to be able to use the APIs.
With the context instantiated, we make a call to the items within the time
range. Since we stored the data in JSON format which is what we want to return
to the client, we do not need to map anything.
Our code returns a JSON array of telemetry data.
Message Broker
Now that we
have both our service libraries ready, let’s take a look at implementing the
cache-aside pattern by looking at our message broker implementation.
Similar to
our other libraries, we want our service broker to implement the same contract.
The main reason is because we just want to inject the service into our
components and not have to change their implementation.
Our service
broker also needs to be able to support the cache and SQL providers. This is
what enables us to manage the cache-aside pattern logic. As the service goes
through the workflow, it uses the same implementation and passes only the
provider reference to handle the database integration. This is the advantage of
maintaining the same interface for all our providers. The logic looks the same
because the APIs have the same contract definition.
function
getFromProvider(provider, callback){
let result= null;
let error = null;
provider.get().done(function(data){
result = data;
callback(result,error);
},
function(err){
error = err;
callback(result,error);
});
}
function getData (){
console.log("broker get");
return q.Promise(function(resolve, reject){
getFromProvider(cache,function(data,err){
if (!data || data.length
=== 0){
getFromProvider(storage,function(data, err){
if (err){
reject(err);
}else{
resolve(data);
//add to the cache
addToProvider(cache,data);
}
});
}else{
resolve(data);
}
});
});
}
|
At its
simplest implementation, our service broker just queries the cache. If data is
found, it returns the data. If no data is found, it fails over the SQL query.
It then populates the cache with the data from SQL, so the next time a request
is made; the data is available from the cache.
How to Use the Libraries
We are now
ready to use our libraries. Let’s recall that we have two database providers
(redis and sql) as well as one service broker. We want our API server library
to use one provider to handle all the data access concerns, and let the service
broker integrate with the other database providers. Let’s see how to do that:
//message
broker
const $repository = require('./modules/message-broker');
const $storage = require('./data_modules/sqlRepository'); //sql
const $cache = require('./data_modules/redisRepository'); //redis
//initialize
the broker with the data providers
$repository.init($cache,$storage);
//api
service
var $api = require('./modules/telemetry-api.js');
$api.init(app,
$repository);
|
When we
initialize the repository (service broker), we pass the references to the cache
and sql. We then initialize the API with
our repository reference. We should
notice that we could pass a reference of any library to our API because all of
them implement the same interface. This is a way to enable us to either use
sql, cache or both for storage.
API Library
Let’s take a
look at how the API library uses the provider.
//add
the route for
app.get('/api/telemetry', getData)
//getData
function getData (req, res){
console.log("GET
Telemetry");
provider.get().done(function(data){
res.json(data);
},
function(err){
console.log(err);
res.status(400).send({ error: err
});
});
}
|
Our API
library expects a provider instance to handle the data operations. The API
defines the routes and the handler for those routes. When a get request is
received, the API library uses the same contract definition from the provider
to get the data.
The database
system is agnostic to the API library. Since all those providers maintain the
same contract, the API can use any provider the cache, sql or the service
broker. This enables us to minimize the changes on other components.
Conclusion
We are able
to show that by maintaining our interfaces, we are able to inject a service
broker that manages different data storage systems to implement our cache-aside
pattern. Note that the cache should
always be populated the moment the database has new information. In our
pattern, we populate the cache on a cache-miss event as an exception and not the
norm. There are other patterns to handle this database-cache synchronization.
This is often then via pipelines that are not part of the application.
Thanks for reading