Skip to content

Key-value stores

In this part we cover the following topics


What a key-value store is -- basic ideas

I hope that all of us are familiar with arrays. This is one of the first (if not the first) among data structures taught to computer science students -- I do this every year during my Introduction to computer science lecture. After numbers, like integers and floats, characters, and boolean variables the array is the simplest, the most universal and present (at least in some equivalent form) in almost any programming language. In its most basic form, we can say that an array is an ordered collection of values. Each value in the array is associated with an integer number called index. Indexes are taken from given interval with no gaps -- each number from this interval corresponds to exactly one index which in turn corresponds to exactly one value. The values are all the same type.

Although arrays are good, they are no perfect. Mostly because of restriction to using integers as indexes (very often from language specific range -- range starting always from 0 is the best known) and limiting values to the same type. Generalization of an array is an associative array where it is allowed to use arbitrary values for identifiers and array's values. Depending on programming language, associative arrays are recognizable by a number of different names, including dictionary, map, hash map, hash table, and symbol table.

In its simplest form, we can say that key-value store (database) is a dictionary. I will use a term dictionary (instead of any other like associative array or map) because in my opinion it best describes all the related concepts. A book named a dictionary has a list of words (keys) and each word (key) has one definitions. Definition may be simple or compound consisting of many sub-definitions depending on its complexity. The paper based dictionary is a simple (analog, non-computer) key-value store where word entries represent keys and definitions (sometimes very elaborated) represent values. If only dictionary entries (words) are sorted alphabetically, retrieval is fast. There is no need to scan the entire dictionary item by item, key by key to find what we are looking for. On the other hand there is no option to find something by scanning its contents (definitions) -- we can do this, but it would take too much time.

Like the dictionary, a key-value store is also indexed by the key. The key points directly to the value, which we can get without need for any search, regardless of the number of items in our store; an access is almost instantaneous. A key-value store is a simple database that when presented with a simple string (the key) returns an arbitrary large BLOB (binary large object, or sometimes: basic large object) of data (the value). Because database is in itself a very simple, also very simply is its query language. Being more precisely, there is no query language because set of operation (queries) is limited to add and remove key-value pairs into/from a database.


Differences between key-value and relational databases

Simplicity is a key word associated with key-value databases where everything is simple. Unlike in relational databases, there are no tables, so there are no features associated with tables, such as columns and constraints on columns. If there are no tables, there is no need for joins. In consequence foreign keys do not exists and so key-value databases do not support a rich query language such as SQL. Saying the truth, their query language is very primitive.

The only extra feature supported by some key-value databases are buckets, or collections. We use them for creating separate namespaces within a database. Keys from one namespace do not collides with keys from other so we can use the same keys for more than one namespace. This can be used to implement something analogous to a relational schema.

Contrary to relational database where meaningless keys are used, the keys in key-value databases are meaningful -- see further How to construct a key subsection for more details.

While in relational database we avoid duplicating data, in key-value (or in NoSQL in general) databases it is a common practice.


Essential features of key-value databases

Despite a huge variety of key-value databases there exists a set of features common for all of them:

  • simplicity,
  • speed,
  • scalability.


Essential features of key-value databases: simplicity

As it was stated in a previous section, simplicity is a key word describing key-value databases.

Ask yourself, how many times do you really need relational database? Is it really indispensable when developing simple application with persons (company staff) and skills they have? We spend our time to develop relational database with all of its requirements (do you remember about normal forms?). For what? Finally our application retrieves aggregations from a database to display person by person with their skills on a simple web page.

If we follow one of the agile method we need a flexible tool to rapidly test our changing ideas. With key-values if we would like to track additional attributes or remove some of them after when our program is ready, we can simply add / change code to our program to take care of those attributes. There is no need to change database code to tell the database about the new attributes set.

In key-value databases, we work with a very simple data model which resembles dictionary. The syntax for manipulating data is simple. Regardless of the type of an operation, we specify a namespace, and a key to indicate we want to perform an action on a key-value pair. Type of it depends on our call. There are three operations performed on a key-value store: put, get, and delete.

  • put adds a new key-value pair to the table or updates a value if this key is already present.
  • get returns the value for a given key if it exists.
  • delete removes a key and its value from the table if it exists.

Other feature which simplifies programmers fate is typelessnes. Values in key-value pairs have no type. They are, generally speaking, BLOBs so we can put everything we want. It’s up to the application to determine what type of data is being used, such as an integer, string, JSON, XML file, or even binary data like image. This feature is especially useful when the data type changes or we need to support two or more data types for the same attribute. Imagine for example a network of sensors where some of them returns integer value, other logical state or enumeration or even a string. There is no problem with this in key-value database.


Essential features of key-value databases: speed

In this case speed is a consequence of simplicity. Supported with internal design features optimizing performance, key-value databases delivers high-throughput for applications with data-intensive operations.


Essential features of key-value databases: scalability

Scalability is another most wanted feature all databases wants to have. In NoSQL: motivation part of this tutorial there have been formulated two types of scaling: up and out. Both are examples of physical type of scaling because we change physical structure of our hardware stack. Regardless of the type, we do this having in mind one goal: improve capability to accommodate both reads and writes. Because working with key-value databases we have no relational dependencies -- all write and read requests are independent -- we can consider two options:

  • master-slave replication,
  • masterless replication.

Master-slave replication
The master-slave architectures have a simple hierarchical structure. In this model, the master is a server in the cluster that accepts write and read requests. It is responsible for maintaining all writes and replicating (copying) new or updated data to all other servers in the cluster. These other servers only respond to read requests.

  • Pros An advantage of this models is simplicity. Except for the master, each node in the cluster only needs to communicate with one other server: the master. The master accepts all writes, so there is no need to coordinate write operations or resolve conflicts between multiple servers accepting writes.
  • Cons A disadvantage is that we have a single point of failure: if master fails, the cluster cannot accept writes. This can impact the total availability of the cluster. To prevent this situation most distributed systems based on this model uses some protocols so active servers can detect when master server in the cluster fails. If master fails, a new one of all active is promoted to be master. Once active as the master, begins accepting write operations and the cluster would continue to function, accepting both read and write operations.

    Another disadvantage is that the master-slave replication model with a single server accepting writes does not work well when there are a large number of writes. No matter how many computers we have only one accepts writes. This limit scalability.

Masterless replication
In this model all nodes accept reads and writes which solves problems with a single point of failure but causes another two problems.

  • Problem with writes. The question is, how to deal with independent writes so information about the same object always will be saved on the same node? Fortunately this can be solved with correct kay naming strategy (this will be discussed in Key is the key in key-value databases section).
  • Problem with reads and replicas. In the master-slave model all nodes have the same data which are copied from master to all slaves. In consequence there are no problems if one fails (at least in loosing data sense). To prevent loss of data, servers in a masterless replication model work in groups. Each time there is a write operation to one of the servers, it replicates that change to the small subset of all servers. The size of subset is configured by database administrator and for example Riak database has default number of replicas equal to 3.


Limitations of key-value databases

There are a lot of key-value databases. Bellow there are some general limitations which are in line with the general idea of this database type. For a given database some of them may not be true.

  1. The only way to look up values is by key.
  2. Range queries are not supported.
  3. There is no standard query language comparable to SQL for relational databases. In consequence queries from one key-value database may not be portable to the other.


Key is the key in key-value databases


Key design

As already stated, keys are used to index, or we can say uniquely identify, a value within a namespace in a key-value database. This makes keys sound pretty simple, and sometimes they look so. On the other hand, keys are the only method we can get the value we are looking for. In key-value databases, generally speaking, there is no method to scan or search values so the right key naming strategy is crucial. I think that term strategy in this context is better than any other because correct keys names allow us to win information war and is factor which makes some application much faster and responsive than others.

The key in a key-value store is very flexible and can be represented by many formats: number, string, JSON, or even such unusual as binary data (image) or even set or list.

Although it's not strict rule, working with relational databases, counters or sequences are very often used to generate keys. Working with numbers it's easy to ensure that every new call for a new key returns a number which is unique and unused so far. That's why application designers use these somehow routinely to make keys (primary keys) for rows of data stored in a table. In relational databases keys are used to connect (join) data stored in one table with others tables' data. Storing a primary key to a row in another table is known as a foreign key. This is the main purpose, and because of the way relational databases work, it makes sense (sometimes it is considered as a good practice) to have such a meaningless keys in this case.

In key-value databases the rules are different. Although we may think about key-value databases as built on the basis of very simple table with many rows and just two columns: first for the key and second for the value, they do not have a built-in table structure. If there are no tables, there are no rows and columns so the question arise: how to "join", combine or somehow collect all information related to a given object.

Let's go back to our Star Wars based example from Relational model: normal forms. In third normal form we have three distinct tables. Now imagine that we want to keep customer data

in key-value database.

First attempt may look like this

Drawbacks of this are obvious. First we have no information what customer's detail we have under index 10: her/his name or maybe age or maybe... Second: how we can store other informations related to this customer?

Usage another namespace might be a solution

but this approach leads to potentially huge namespace set which is not easy to maintain and use. Some of us can live with this. OK, so how we can store information about an invoice details?

Of course we can use all of the above data as values and put them into our key-value database, for example in the following JSON format

or even we can write it as

Hmm... It's not bad but not as good as it may be seen. Notice that every time we have to parse this JSON to get even very basic piece of information like customer name.

Avoid to use many namespaces. Remember: key is the key
I hope that with this examples I was able to convince you that something should be changed in our approach. Because key-values databases are simple, we have not too many possibilities. As mentioned earlier, you can construct meaningful names that entail needed information. For example:

Do not follow relational pattern
With this example we face another important issue related with keys. Let's say that now we want to put information related with invoice details. We can do this in many different ways. Following relational pattern for C3PO we have

Following relational pattern is what we should avoid. In this case this would result in discontinuous invoice range. More adequately would be enumerating invoices per customer

so we could iterate over all invoices related with customer identified by number 30.

Mind aggregation you expect to use
On the other hand, if we suppose that we will use the data most often for order processing, another key naming convention might by more relevant

As we can see in this case following relational pattern in numbering invoices sounds good.

Again this is a sign for us that correct key naming is a strategy and should be chosen very carefully with respect to the aggregation boundaries we have discussed in previous part Column family (BigTable) stores: Aggregation related model and application (developers) future needs.

Ranges of values
Dealing with ranges of values is another thing which should be considered. If we expect that we will need in the future process our invoices by date or date range, following naming convention

would be better than

If you have developed relational data models, you might have noticed parallels between the key-naming convention we have just presented and tables, columns names and primary keys. Concatenating a table name with a primary key, and a column name to get Customer:10:name key is equivalent to a relational table customer, with a column called name, and a row identified by the primary key ID of 10. It's worth to stress that we should avoid key naming conventions which mimics relational databases schema. If it is the case it's worth to consider database replacement. Using key-value database as relational do not seems to be reasonable but maybe in some specific cases (for example when the total number of tables is low) could be effective.


Under the hood

Using keys to locate values and support scalability: making hash on keys

Partitioning is the process of grouping key-value pairs into subsets and assigning those groups to different nodes in a cluster. Now we will show how the hash function may be used for this purpose.

Undoubted advantage of numbers as a keys is that in this form we may use them directly to look up associated values even if we have a cluster. We ma also use them to evenly distributed the write load across all available servers. For example if we are working with the eight-server cluster we can send one-eighth of all writes to each server. We could send the first write to Server 1, the second to Server 2, the third to Server 3, and so on in a round-robin fashion.

From the previous material we can infer that although using numbers to identify locations may be a good idea, it is not flexible enough -- that is why dictionaries are used instead of arrays. We want to go one step further and use not only integers or strings but also more advanced object like lists or even binary data. To make it possible we should be able to transform somehow the whole bunch of possible keys to numbers. Fortunately we can do this with help of hash functions. A hash function is a function that can take an arbitrary sequence of bytes and returns a pseudo-unique, fixed-length string of characters (which may be interpreted as a number). The returned string is only pseudo-unique because there can be two byte sequences (saying the truth there are infinitely many such sequences) for which hash functions returns the same value although in real application this is highly unlikely.

Now we can take advantage of the fact that the hash function returns a number and divide it (the number) by the number of servers in our cluster. Being more precisely, we divide hash number modulo number of servers in our cluster. Modulo division returns the remainder so for example 8 (hash number) mod (modulo) 8 (because we have 8 servers in our exemplary cluster) returns 0 while 12 mod 8 returns 4. This way we may ensure write balancing: having a key we have to find its hash value and then calculate result of modulo operator. The remainder we obtain this way points to a server were write should be performed.

Using keys to locate values and support scalability: range partitioning

Another method of partitioning that evenly distributes key-values pairs across all nodes that may be used is called range partitioning. This assumes a sort order is defined over the key. For example, we could partition by customer number or date if they are a part of key as we discussed it in Key design section. Range partitioning is much more demanding than hash partitioning. Our naming convention should guarantee that regardless of data being saved the distribution of key-values pairs will stay balanced. For example, what should we do if there are few customers who more often than others makes shopping or some of the items are top sellers and almost every customer choose them? In such a case some of our servers may be selected more often than others.


Values

Working with key-value database we have to carefully select key naming strategy. Similarly we have to balance aggregation boundaries for values to make writes and reads more efficient as well as reduce latency.

Bellow there are some strategies. If they are good or bad depends on you -- we will try to highlight their pros and cons.

  • Values which are big aggregates. The following aggregate may be an example of this strategy

    The advantage of using a structure such as this is that much of the information about invoice is available with a single key lookup. By storing all the informations together, we might reduce the number of disk seeks that must be performed to read all the needed data.
    On the other hand when an additional item is added or even existing one is edited (changed), the whole structure has to be written to a disc. As structure grows in size, the time required to read and write the data can increase.
    Another drawback is that we have to read such a big structure even if we need only a small piece of information -- this way we waste a time for reading and memory for storing it.
  • Keep together values commonly used. Another approach is to store only commonly used values together. For example under the separate keys invoice and customer details. Now we have more seeks and more reads operations but we spend less time reading particular data.
  • Small values supports cache. Assume that our database keeps data we have read before now in memory buffer (cache) so in case we want them again database could serve them much faster than from disc. Of course the size of cache is limited so we may be able to store say 2 big structures or 10 smaller. Now if we want to print third customer name we have to remove 1 big structure and replace it with a new one or, in the second case, remove 1 small structure (for example with second customer name) and replace it with a third customer name. In the second case if we need second customer items there is a chance that all of them are still in the memory while in the first case we may be sure that all of them have to be reloaded.


Working example, 1: Riak basics

Riak is an open source, distributed database. Riak is architected for:

  • Low-Latency: Riak is designed to store data and serve requests predictably and quickly, even during peak times.
  • Availability: Riak replicates and retrieves data intelligently, making it available for read and write operations even in failure conditions.
  • Fault-Tolerance: Riak is fault-tolerant so you can lose access to nodes due to network partition or hardware failure and never lose data.
  • Operational Simplicity: Riak allows you to add machines to the cluster easily, without a large operational burden.
  • Scalability: Riak automatically distributes data around the cluster and yields a near-linear performance increase as capacity is added.

In Riak

  • Objects in the database are uninterpreted atomic binary entities.
  • Objects are addressed by unique keys.
  • To facilitate key handling, objects are collected in buckets. Buckets are essentially a flat namespace in Riak and may also be seen as a common prefix for a set of keys or a table name if you want have some reference to relational model.
  • Because it is a distributed database with data replication across nodes, the default degree of replication is configurable with 3 copies.

What is interesting, Riak (formally Riak KV) can work also as a time series database (Riak TS) as well as a document store with two features added to it – Riak Search and Riak Data Types – that make it easier to query.


Riak installation

All the details needed to install and use Riak can be found on Riak docs page. The operating system we chose is Xubuntu 18.04.01 (retrieved 2018.09.05).

Before we start installation, be sure to install curl either with package manager (like Synaptic) or terminal command

Having curl installed we can type (according to Riak doc)


nosql@riak:~/Desktop$ mkdir download
nosql@riak:~/Desktop$ cd download/
nosql@riak:~/Desktop/download$ wget --content-disposition https://packagecloud.io/basho/riak/packages/ubuntu/xenial/riak_2.2.3-1_amd64.deb/download.deb
nosql@riak:~/Desktop/download$ sudo dpkg -i riak_2.2.3-1_amd64.deb

When installed, we can verify a Riak KV Installation (http://docs.basho.com/riak/kv/2.2.3/setup/installing/verify/). To start a Riak node, use the riak start command:


nosql@riak:~$ sudo riak start

A successful start will return no output. If there is a problem starting the node, an error message is printed to standard error. Once our node has started, we can initially check that it is running with the riak ping command:

nosql@riak:~$ sudo riak ping
pong

The command will respond with pong if the node is running or Node <nodename> not responding to pings if it is not.


Open files limit. As we may have noticed, if we haven’t adjusted our open files limit (ulimit -n), Riak will warn us at startup. For real test or development it is advised to increase the operating system default open files limit when running Riak. For now, default are enough.

Does it really work?

One convenient means of testing if Riak really works is with the riak-admin test command:


nosql@riak:~$ sudo riak-admin test
Successfully completed 1 read/write cycle to 'riak@127.0.0.1'


Few words about HTTP and curl


HTTP basics

When it is possible we will be using in this tutorial the HTTP protocol and curl. curl is a vary versatile tool and allows to transfer data from or to a server, using one of the supported protocols: DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, TELNET and TFTP. Here we will present only some basic informations about HTTP and curl so we will have a understanding about things we will do.

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It functions as a request–response protocol in the client–server computing model. This is the foundation for data communication for the World Wide Web (i.e. internet) since 1990 (development of HTTP was initiated by Tim Berners-Lee at CERN in 1989, the first definition of HTTP/1.1, the version of HTTP in common use, occurred in RFC 2068 in 1997, although this was made obsolete by RFC 2616 in 1999 and then again by the RFC 7230 family of RFCs in 2014) but also can be used for other purposes as well using extensions of its request methods, error codes, and headers. It provides a standardized way for computers to communicate with each other. HTTP specification specifies how clients' request data will be constructed and sent to the server, and how the servers respond to these requests.

Basic features of HTTP

  • Media independency Any type of data can be sent by HTTP as long as both the client and the server know how to handle the data content. It is required for the client as well as the server to specify the content type using appropriate MIME-type.
  • Stateless The server and client are aware of each other only during a current one time request-response phase. Afterwards, both of them forget about each other. Due to this nature of the protocol, the client can not retain information between different requests across the web pages.

An HTTP client sends an HTTP request to a server in the form of a request message which includes following format [RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing 3. Message Format]

  • The start-line can be either request-line for request or status-line for response. The request-line begins with a method token, followed by the request-target and the protocol version, and ending with CRLF. The elements are separated by space SP characters.
  • The header-field contains zero or more header fields followed by CRLF. Each header field consists of a case-insensitive field name followed by a colon (:), optional leading whitespace, the field value, and optional trailing whitespace.
  • The CRLF line is an empty line (i.e. a line with the CRLF only) indicating the end of the header field section.
  • The message body (if any) of an HTTP message is used to carry the payload body of that request or response. The message body is identical to the payload body unless a transfer coding has been applied. The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field.

The request method indicates the method to be performed on the resource identified by the given request-target. The method is case-sensitive and should always be mentioned in uppercase. There are 8 different request methods: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE, PATCH from which we will describe only the four most useful in our case:

  • GET The GET method is used to retrieve information from the given server using a given URI (request-target). Requests using GET should only retrieve data and should have no other effect on the data.
  • POST A POST request is used to send data to the server. The POST method requests that the server accept the entity enclosed in the request as a new subordinate of the web resource identified by the URI (request-target).
  • PUT The PUT method requests that the enclosed entity be stored under the supplied URI (request-target). If the URI refers to an already existing resource, it is modified; if the URI does not point to an existing resource, then the server can create the resource with that URI.
  • DELETE Removes all the current representations of the target resource given by URI (request-target).

Methods GET, PUT and DELETE are defined to be idempotent, meaning that multiple identical requests should have the same effect as a single request. Note that idempotence refers to the state of the system after the request has completed, so while the action the server takes (e.g. deleting a record) or the response code it returns may be different on subsequent requests, the system state will be the same every time. For example, the normal HTTP response codes for DELETE operations are 204 No Content and 404 Not Found. 404 responses are normal, in the sense that DELETE operations are idempotent and not finding the resource has the same effect as deleting it.



Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

In contrast, the POST method is not necessarily idempotent, and therefore sending an identical POST request multiple times may further affect state or cause further side effects.

Note that whether a method is idempotent is not enforced by the protocol or web server. It is perfectly possible to write a web application in which (for example) a database insert or other non-idempotent action is triggered by a GET or other request. Ignoring this recommendation, however, may result in undesirable consequences, if a user agent assumes that repeating the same request is safe when it isn't.

After receiving and interpreting a request message, a server responds with an HTTP response message of the form similar to request where start-line takes the form of status-line

The status-code element is a 3-digit integer code describing the result of the server's attempt to understand and satisfy the client's corresponding request. The rest of the response message is to be interpreted in light of the semantics defined for that status code.


curl basics

curl options we will need

  • -d/--data DATA Specify the DATA to be send with POST to the HTTP server.
  • -D/--dump-header FILE Write the received headers to the FILE.
  • -H/--header LINE Pass custom header LINE to the server.
  • -i/--include Include protocol headers in the output.
  • -u/--user USER[:PASSWORD] Server USER and PASSWORD.
  • -X/--request COMMAND Specify request COMMAND to use.

Bellow there are examples (taken form this tutorial) of some curl commands

  • Using GET to retrieve information from the given server

    or simply

    or the same plus header informations

    As we can see, for HTTP, we can get the header information shown before the data by using -i/--include. Curl understands also the -D/--dump-header option when getting files from both FTP and HTTP, and it will then store the headers in the specified file.

  • Using PUT method requests to send data to be stored under the supplied URI

    Another example (with more than one header)

    Another example (with user credentials)

  • Using DELETE method


Creating objects

Basho officially supports a number of open-source client libraries for various programming languages and environments. See the client libraries page for a listing of community-supported clients. In Working example, 1: Riak basics section we will communicate with Riak via HTTP as the most universal way.

Here is the basic form of writes (object creation):

There is no need to intentionally create buckets in Riak. They pop into existence when keys are added to them, and disappear when all keys have been removed from them. If we don’t specify a bucket’s type, the type default will be applied. If we're using HTTP, POST can be used instead of PUT. The only difference between POST and PUT is that we should POST in cases where we want Riak to auto-generate a key.

Here is an example of storing an object (short text: Test string 01) under the key key001 in the bucket bucket001, which bears the type type001


nosql@riak:~$ curl -X PUT -H "Content-Type: text/plain" -d "Test string 01" http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Unknown bucket type: type001

Notice that although we don't have to create bucket in advance, we have to create and activate a type we want to use - the above command will only work if the type001 bucket type has been created and activated.

The step below allows to create and activate the bucket type


nosql@riak:~$ sudo riak-admin bucket-type create type001 '{"props":{}}'
type001 created

WARNING: After activating type001, nodes in this cluster
can no longer be downgraded to a version of Riak prior to 2.0
nosql@riak:~$ sudo riak-admin bucket-type activate type001
type001 has been activated

WARNING: Nodes in this cluster can no longer be
downgraded to a version of Riak prior to 2.0

Now we can again try to add an object


nosql@riak:~$ curl -X PUT \
> -H "Content-Type: text/plain" \
> -d "Test string 01" \
> http://localhost:8098/types/type001/buckets/bucket001/keys/key001

Hmmm... No errors, no other messages... It's time to get something from our database.


Reading objects

You can think of writes in Riak as analogous to HTTP PUT (POST) requests. Similarly, reads in Riak correspond to HTTP GET requests. We specify a bucket type, bucket, and key, and Riak either returns the object that’s stored there--including its siblings (more on that later)--or it returns not found (the equivalent of an HTTP 404 Object Not Found).

Here is the basic command form for retrieving a specific key from a bucket:


nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Test string 01

If there’s no object stored in the location where you attempt a read, you’ll get the not found response.

nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key002
not found


Updating objects

If an object already exists under a certain key and we want to write a new object to that key, Riak needs to know what to do, especially if multiple writes are happening at the same time. Which of the objects being written should be deemed correct? These question can arise quite frequently in distributed, eventually consistent systems.

Riak decides which object to choose in case of conflict using causal context. These objects track the causal history of objects. They are attached to all Riak objects as metadata, and they are not readable by humans. Using causal context in an update would involve the following steps:

  1. Fetch the object.
  2. Modify the object’s value (without modifying the fetched context object).
  3. Write the new object to Riak.

The most important thing to bear in mind when updating objects is this: we should always read an object prior to updating it unless we are certain that no object is stored there. If we are storing sensor data in Riak and using timestamps as keys, for example, then we can be sure that keys are not repeated. In that case, making writes to Riak without first reading the object is fine. If we’re not certain, however, then it is recommend always reading the object first.

Updating Objects. Object Update Anti-patterns


In versions of Riak prior to 1.4, Riak used vector clocks as the sole means of tracking the history of object updates. In Riak versions 2.0 and later, we recommend using dotted version vectors instead. Dotted version vectors address scalability with a small set of servers mediating replica access by a large number of concurrent clients.



When using curl, the context object is attached to the X-Riak-Vclock header


nosql@riak:~$ curl -i http://localhost:8098/types/type001/buckets/bucket001/keys/key001
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frbujHgFZDNlMCUy5rEy9LiyXOXLAgA=
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.9 (cafe not found)
Link: ; rel="up"
Last-Modified: Thu, 06 Sep 2018 10:26:38 GMT
ETag: "1KTnww1Q52P9d666EtbP41"
Date: Thu, 06 Sep 2018 10:28:56 GMT
Content-Type: text/plain
Content-Length: 14

As we can see, for HTTP, we can get the header information shown before the data by using -i/--include. Curl understands also the -D/--dump-header option when getting files from both FTP and HTTP, and it will then store the headers in the specified file.


nosql@riak:~$ curl -D headers.txt http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Test string 01
nosql@riak:~$ cat headers.txt
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frbujHgFZDNlMCUy5rEy9LiyXOXLAgA=
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.9 (cafe not found)
Link: ; rel="up"
Last-Modified: Thu, 06 Sep 2018 10:26:38 GMT
ETag: "1KTnww1Q52P9d666EtbP41"
Date: Thu, 06 Sep 2018 10:29:30 GMT
Content-Type: text/plain
Content-Length: 14

When performing a write to the same key, that same header needs to accompany the write for Riak to be able to use the context object.

Before we will do this, let's check what will happen if we will ignore X-Riak-Vclock


nosql@riak:~$ curl -X PUT -H "Content-Type: text/plain" -d "Test string 01_updated" http://localhost:8098/types/
type001/buckets/bucket001/keys/key001
nosql@riak:~$ curl -D headers.txt http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Siblings:
1KTnww1Q52P9d666EtbP41
38C20HjUtlR8syP3CMGlVW

Something goes wrong -- we will be back to this case a little bit later. Let's create another object na update it according to the above steps (using X-Riak-Vclock header).


nosql@riak:~$ curl -X PUT -H "Content-Type: text/plain" -d "Test string 02" http://localhost:8098/types/
type001/buckets/bucket001/keys/key002
nosql@riak:~$ curl -D headers.txt http://localhost:8098/types/type001/buckets/bucket001/keys/key002
Test string 02
nosql@riak:~$ cat headers.txt
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frZuctCDCCUy5rEyzN/McpUvCwA=
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.9 (cafe not found)
Link: ; rel="up"
Last-Modified: Thu, 06 Sep 2018 10:30:30 GMT
ETag: "4j7mSGlt44VnZkh8eaCltP"
Date: Thu, 06 Sep 2018 10:30:55 GMT
Content-Type: text/plain
Content-Length: 14

Having a new object (string Test string 02) we can try to modify it


nosql@riak:~$ curl -X PUT -H "Content-Type: text/plain" -H "X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frZuctCDCC
Uy5rEyzN/McpUvCwA=" -d "Test string 02 updated" http://localhost:8098/types/type001/buckets/bucket001/keys/key002
nosql@riak:~$ curl -D headers.txt http://localhost:8098/types/type001/buckets/bucket001/keys/key002
Test string 02 updated
nosql@riakat headers.txt
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frZuctCDCCUy5bEy8G5lucqXBQA=
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.9 (cafe not found)
Link: ; rel="up"
Last-Modified: Thu, 06 Sep 2018 10:31:55 GMT
ETag: "5RUinoCMbFaMyGdfofm5bZ"
Date: Thu, 06 Sep 2018 10:32:00 GMT
Content-Type: text/plain
Content-Length: 22


Conflict Resolution. Siblings

A sibling [Conflict Resolution. Siblings] is created when Riak is unable to resolve the canonical version of an object being stored, i.e. when Riak is presented with multiple possible values for an object and can’t figure out which one is most causally recent. The following scenarios can create sibling values inside of a single object:

  • Concurrent writes -- If two writes occur simultaneously from clients, Riak may not be able to choose a single value to store, in which case the object will be given a sibling. These writes could happen on the same node or on different nodes.
  • Stale causal context -- Writes from any client using a stale causal context. This is a less likely scenario if a client updates the object by reading the object first, fetching the causal context currently attached to the object, and then returning that causal context to Riak when performing the update. However, even if a client follows this protocol when performing updates, a situation may occur in which an update happens from a different client while the read/write cycle is taking place. This may cause the first client to issue the write with an old causal context value and for a sibling to be created. A client is "misbehaved" if it habitually updates objects with a stale or no context object.
  • Missing causal context -- If an object is updated with no causal context attached, siblings are very likely to be created. This is an unlikely scenario if we are using a Basho dedicated client library, but it can happen if we are manipulating objects using a client like curl and forgetting to set the X-Riak-Vclock header.

So we have conflict: there are to objects (string Test string 01 and Test string 01_updated) under the same key key001 in the bucket bucket001, which bears the type type001.


nosql@riak:~$ curl -D headers.txt http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Siblings:
1KTnww1Q52P9d666EtbP41
38C20HjUtlR8syP3CMGlVW

As we can see, reading an object with sibling values will result in some form of “multiple choices” response (e.g., 300 Multiple Choices in HTTP). If we’re using the HTTP interface and want to view all sibling values, we can attach an Accept: multipart/mixed header to our request to get all siblings in one request:


nosql@riak:~$ curl -X GET -H "Accept: multipart/mixed" -D headers.txt http://localhost:8098/types/type001/buckets/
bucket001/keys/key001

--LJ8Xy5UHU8z3L3OtKvMwM7BX9DM
Content-Type: text/plain
Link: ; rel="up"
Etag: 1KTnww1Q52P9d666EtbP41
Last-Modified: Thu, 06 Sep 2018 11:26:38 GMT

Test string 01
--LJ8Xy5UHU8z3L3OtKvMwM7BX9DM
Content-Type: text/plain
Link: ; rel="up"
Etag: 38C20HjUtlR8syP3CMGlVW
Last-Modified: Thu, 06 Sep 2018 11:27:10 GMT

Test string 01 updated
--LJ8Xy5UHU8z3L3OtKvMwM7BX9DM--
nosql@riak:~$ cat headers.txt
HTTP/1.1 300 Multiple Choices
X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frbujHgFZDNnMCUy5bEy/KthvcqXBQA=
Vary: Accept, Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.9 (cafe not found)
Last-Modified: Thu, 06 Sep 2018 10:27:10 GMT
ETag: "6nZEerBvuMlMlcQITGUY7C"
Date: Thu, 06 Sep 2018 21:28:56 GMT
Content-Type: multipart/mixed; boundary=LJ8Xy5UHU8z3L3OtKvMwM7BX9DM
Content-Length: 421

We can request individual siblings by adding the vtag query parameter. vtag specify which sibling to retrieve.


nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Siblings:
1KTnww1Q52P9d666EtbP41
38C20HjUtlR8syP3CMGlVW
nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key001?vtag=1KTnww1Q52P9d666EtbP41
Test string 01
nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key001?vtag=38C20HjUtlR8syP3CMGlVW
Test string 01 updated

To resolve the conflict, store the resolved version with the X-Riak-Vclock given in the response (in our case: X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frbujHgFZDNnMCUy5bEy/KthvcqXBQA=).


nosql@riak:~$ curl -X PUT -H "Content-Type: text/plain" -H "X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/frbujHgFZDNnMCU
y5bEy/KthvcqXBQA=" -d "Test string 01 updated" http://localhost:8098/types/type001/buckets/bucket001/keys/key001
nosql@riak:~$ curl -X GET http://localhost:8098/types/type001/buckets/bucket001/keys/key001
Test string 01 updated


Deleting objects

The delete command looks like this

and we can use it as it is showned below


nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key002
Test string 02 updated
nosql@riak:~$ curl -X DELETE http://localhost:8098/types/type001/buckets/bucket001/keys/key002
nosql@riak:~$ curl http://localhost:8098/types/type001/buckets/bucket001/keys/key002
not found