Skip to content

Storing data in NoSQL database


In this tutorial you will see a complete, long, but detailed tutorial where you will learn:

  • how to create Django project,
  • how to upload data via webpage -- in this case you will upload GPX files with GPS track points,
  • how to process uploaded GPX file and transform them to JSON documents,
  • how to save JSON document in document NoSQL database (CouchDB),
  • how to use CouchDB's view to process data in document database -- in this case you will calculate total track length for each user in database.

Table of contents:


Project description and prerequisites

Create an application that allows you to:

  • import data about the location of the object in space (GPS coordinates) from the GPX file ( sample data in GPX format );
  • displaying the length of each route (as preferred an internal database features should be used; in case od CouchDB view should be used);
  • displaying the total length of all routes for each user (as preferred an internal database features should be used; in case od CouchDB view with map-reduce mechanism should be used).

In addition, we assume that:

  • one route is one document;
  • each route is assigned to a specific user identified by a unique number userId ;
  • each route has its unique (within a given user) number trackId .

As a starting point for our project we need

  • correctly installed Python with all supporting tools etc.;
  • CouchDB to be installed;
  • curl.

Installing CouchDB is strightforward process: see Installation or follow latest installation procedure on 1.1. Installation on Unix-like systems. On the page Document stores you can find examples related to working witch CouchDB.

To install Python with all needed packages and then manage it I use Anaconda (see a basic working Django project).

If curl is not installed in your system, please use package manager or terminal commands to install it.


Create starting Django project

  • Use Anaconda Navigator
    Use Anaconda Navigator to:

    1. Create new virtual environment of the name project.
    2. Install all required libraries.
    1. Create new virtual environment
      Open Terminal and run Anaconda Navigator

      In Anaconda Navigator select Environments and then Create

      name new environment project, select required version of Python and press Create.

    2. Install all required libraries
      In Anaconda Navigator in package section of created environment install all needed packages:

      • xmltodict

      Because I can't find all required packages in Anaconda Navigator, some of them I will install later manually.

      Now you can quit Anaconda Navigator as we will not need it any more.

    3. Install Python library for CouchDB
      Use pip install couchdb command to instal couchdb library

      Read the following materials

    4. Create database and user
      1. Create database
      2. Add user (login=userL, password=userP)
      3. Grant access to the project base for user userL
      4. Put {"testKey" : "testValue"} data into project base under the key 001
      5. Get from project base document under the 001 key
      6. Now you should have CouchDB installed and fully operational.
  • Create Django project
    1. Open Terminal and check all available environment
    2. Activate our newly created environment
    3. Check Python version
    4. Install Django (if this is a first time you install Django, a little bit more messages will be displayed)
    5. Create project folder
    6. Create a Django project named project. This name has nothing in common with environment project -- both names could be different.

      A project directory should laid out like this
    7. Run Django developper's server

      Check terminal window in case of any problems as this is a place where Django prints all messages -- very often it helps to identify where the problem is located.

  • Add our main Django application
    Add our main Django application mainto existing Django project project

    1. Open a new terminal tab and type (keep server running in the previous tab)

      A project directory should laid out like this
    2. Edit views.py file
      Open the file ~/Desktop/project/project/main/views.py and put the following Python code in it:
    3. Create urls.py file in main application

      In ~/Desktop/project/project/main directory create a file called urls.py

      Put the following Python code in it:

      With this code Django routing mechanism will call index method located in ~/Desktop/project/project/main/views.py file every time we enter into web browser url 127.0.0.1/ To make it possible we have to complete one more step.

    4. Add/edit to project's urls.py file a code

      Add to ~/Desktop/project/project/project/urls.py the following code:

      Now Django knows that routing for / path is located in main's application routing file (located in ~/Desktop/project/project/main/urls.py).

    5. Test application

      Go to web browser and type http://127.0.0.1:8000/ url

    6. Add/modify ~/Desktop/project/project/main/urls.py file
    7. Create a new view

      Add to ~/Desktop/project/project/main/views.py a new code

    8. Test application

      Go to web browser and type http://127.0.0.1:8000/job url

      Replace

      with

      and again check results in web browser

      As you can see, as an argument for HttpResponse we can pass also a HTML code. For simple, temporary cases we can prepare html response by hand in the body of the view (as we do in job case). For real production purposes we should use templates and this is what we are going to do now.

Now you should have a basic understanding of Django project structure, routing mechanism and views call. You can download a full project at this stage.


Django and templates

More about templates you can find at

You should read at least the first link before you proceed with reading this tutorial further.

  • Create base template
    1. To serve pages with templates, we will start by writing the HTML in a template fashion. We need to add the templates folder. Go to the main folder and create new subfolders

      So in main folder we create templates/main folder. The main folder inside templates is not required (we can put template files directly in templates folder) but is a usual practice -- this way we get namespace for our templates. Namespaces will be very helpful in case we have more than one app under one Django project. Django will use the first static file it finds whose name matches, and if you had a static file with the same name in a different application, Django would be unable to distinguish between them. We need to be able to point Django at the right one, and the best way to ensure this is by namespacing them. That is, by putting those static files inside another directory named for the application itself.
    2. In order to start with something a bit more modern than bare-bones HTML, we will implement a base template with Bootstrap. Bootstrap is a frontend framework that gives us a good set of styles and components to start with.

      Let’s add some HTML to ~/Desktop/project/project/main/templates/main/index.htm -- we simply use Starter template from Bootstrap Introduction page.

    3. Before we proceed we should add main app to INSTALLED_APPS in ~/Desktop/project/project/project/settings.py file. Change it from

      to

    4. In ~/Desktop/project/project/main/views.py file replace

      with

    5. Now we finally can go to the browser and see our results, as you can see in figure below
  • Add static contents used by templates
    We see the page, but files are loaded from an external content delivery network (CDN), which may be good for production sites, but in development I prefere to work completely offline. We will not discuss if it's good or bad and instead we will treat it as an excuse to show how to use static contents in templates.

    1. Create required folder structure: add static folder into our app main folder and next to subfolders: css and js

      Folder for static assets is defined in settings.py file as
    2. We need a copy of all required js and css files inside our repository. To do this quickly, we can download that from the link that we already have in the HTML. We can use curl (or wget) for this task:

      If everything goes well you should have bootstrap.min.css file inside main/static/main/css folder and bootstrap.min.js, jquery.min.js and popper.min.js in main/static/main/js folder.
    3. All the assets linked are now available offline. The last thing to do to finish this is replace the links to external sites with links to local links. To do so we will change index.htm to use static template tag
    4. After applying changes you should see no changes in web browser (compare this figure with the figure we get in the last step of Create base template subsection above -- both should be the same).

      You can also verify if static works in terminal where you should see either 200 or 304 response status codes

      If you see code 404

      quit the server with CONTROL-C and run it again with

  • Template inheritance

    Now we will create a simple hierarchy of templates: we will have one base tamplate with common HTML code for all pages and then we will extend it with some new features typical for every page.

    1. Create ~/Desktop/project/project/main/templates/main/base.htm file, copy code from index.htm and paste into base.htm.
    2. Edit base.htm to the following form

      New elements are

      • title and h1 tag with text changed to Project.
      • Long menu section to display app menu.
      • block content section -- this is a place where we will "paste" an extending HTML code.
    3. Edit index.htm to the following form
    4. Go to the browser and see the results

      You should see app menu with Home, About and Job elements. When you select Job you should see an old page

      and an error when you select About page

      Now we are going to fix this.

    5. Create ~/Desktop/project/project/main/templates/main/job.htm file with the following contents

    6. Create ~/Desktop/project/project/main/templates/main/about.htm file with the following contents

    7. Add this line

      to ~/Desktop/project/project/main/urls.py file. After this change urls.py should have the following contents

    8. Almost done. Edit ~/Desktop/project/project/main/views.py file and replace job function with

    9. Now you should be able to switch between pages and all of them should have the same style

Now you should have a basic understanding of Django templates. You can download a full project at this stage.


Uploading files: models and forms

Now we will implement basic form to upload GPX files to our application. We will do this with the help of Django's models and forms what makes this task quite easy.

  • Define MEDIA folder
    1. Open the ~/Desktop/project/project/project/settings.py file and at the end of this file add

      This way you define MEDIA folder of the name media located in ~/Desktop/project/project folder. In this folder we will save our uploaded files.
  • Add model
    1. Open the ~/Desktop/project/project/main/models.py file and paste there the followng code

      This is a typical Django model. Take a look at line with the document field definition where we specify that uploaded files will be saved in gpx/%Y%m%d folder located in MEDIA folder.
    2. Check contents of db.sqlite3 file

      As you can see, db.sqlite3 is an empty file.
    3. Press CONTROL-C to stop server, apply database changes and again run server
    4. Check now contents of db.sqlite3 file

      As you can see, now db.sqlite3 file is not empty. Let's check what is inside

      Yes, our model was turn into SQL statements.
  • Add form
    1. Create forms.py file in ~/Desktop/project/project/main folder and paste there the following code

      As you can see, based on previously created model, we will create form with the userId, trackId, description and document fields.
    2. To bring this form to life, add gpx_upload.htm file to the templates folder and paste there the following code

      Notice usage of url Django tag to get correct and valid URL base on urls.py contents, particularly the line

      where we can find name='job'.
    3. We have to set urls to call proper method. Add this line

      to ~/Desktop/project/project/main/urls.py file. After this change urls.py should have the following content
    4. To ~/Desktop/project/project/main/views.py file add two imports

      and change line

      to the form
    5. To the same views.py file add method called when url with gpx_upload will be entered. Add this line
    6. Change job.htm code so it will display all uploaded files
    7. Finally change job function
  • Make upload test
    1. Enter the following URl into web browser http://127.0.0.1:8000/job -- you should see something similar to figure below
    2. Click on Import GPX and upload form should be displayed

      We can click on Return to Job to go back to Job screen.

    3. Fill the form

      and press Upload button.

    4. You should see some info about uploaded file
    5. Also a folder media should be created in ~/Desktop/project/project folder and in this folder there should be our uploaded files

      Notice, that if you try upload the sane file multiple times, some random sufix will be added to it

    6. Also information about uploaded files should be possible to be found is database

Now you should have a basic understanding of Django models and forms role. You can download a full project at this stage.


Processing files

Now we will stop saving uploaded files as our goal is to save data in document store, CouchDB in this case. Before we can save them, we have to make some processing extracting only necessary data.

Below there is a small piece of GPX file

As we can see the root tag is gpx. Inside this tag, there may be one or more trk tag. Every trk tag enclosed one or more trkseg which in turn may have one or more trkpt tag. The trkpt describes one point: its coordinates, elevation, recording time and possibly some other arguments like temperature in the above case (ns3:atemp tag).

To simplify the task I assume that in every file there is only one trk with only one trkseg.

  • Allow saving data after successful upload
    1. Edit ~/Desktop/project/project/main/views.py file and change gpxUpload function to the form

      Here we add a call to handleUploadedFile function, so now we have to implement it.
    2. Edit ~/Desktop/project/project/main/views.py file and add new handleUploadedFile function
    3. Add more imports at the top of the ~/Desktop/project/project/main/views.py file

      Now you should have the following imports there
    4. Create temp directory under the ~/Desktop/project/project/media/gpx/temp directory.
    5. Open web browser and use http://127.0.0.1:8000/job URL. Click on Import GPX, fill the form

      and press Upload. If everything goes well you should be able to locate document.dat file in the ~/Desktop/project/project/media/gpx/temp directory

  • Process XML file and prepare document ready to be uploaded to document store
    1. Add the following two methods to the ~/Desktop/project/project/main/views.py file

    2. Add a call to transform function in handleUploadedFile -- replace

      with

    3. Add one more import at the top of the ~/Desktop/project/project/main/views.py file

    4. Open web browser and use http://127.0.0.1:8000/job URL. Click on Import GPX, fill the form

      and press Upload. If everything goes well you should be able to notice that document.dat file has different size (now it is much bigger than previously). You can check it first 300 bytes (characters) with head command

      We can observe the general structure of the document

Now you should have a basic understanding of pure data saving in Django. You can download a full project at this stage.


Put data into CouchDB

  • Use couchdb library
    1. Add one more import at the top of the ~/Desktop/project/project/main/views.py file
    2. At the end of handleUploadedFile function located in the ~/Desktop/project/project/main/views.py file add the following lines
    3. Open web browser and use http://127.0.0.1:8000/job URL. Click on Import GPX, fill the form (in my case: userId=2, trackId=1, description="u2, t1") and press Upload.
    4. If no errors are printed go to Fauxton and login as admin (with login=admin and password=admin in our case)
    5. Among all available databases you should be able to find project database

      Select project database(click on it)

      You should see two document: first (with 001 key) is our test document, second (with 87fb51cb2fb5da7500518221e8001b72 key in our case) is a document related to the track we have just uploaded (in my case: userId=2, trackId=1, description="u2, t1"). You can click on it to see it's content

Now you should know how to save data in CouchDB. You can download a full project at this stage.


Get data from CouchDB with views

Now we will retrieve data from CouchDB. We can do this with simply querying for a document with given key, but this is not typical use case as in document databases keys are meaningless and it is very hard to identify document by key. More often we will use views and this is what we are going to do now. You can read more about querying in Working example, 1.2: CouchDB basics - querying

  • Add simple view
    The first view we will implement is intended to calculate length of every path. So this view (as every view) will receive a document and then will return document of the form

    To simplify our consideration we assume, that distance between two recorded points is always equal to 1m, so numberOfPoints will be the same as length.

    1. Go to Fauxton, select project database, then New Doc under the Design Documents section

      put as its content

      and save it.

    2. Select New View under the queries section
    3. In the Index name field type the name of our new view: getTrackLength

      and paste the following code

      as a map function

    4. Leave Reduce dropdown as NONE as we will not use in this case reduce function, and press green Create Document and then Build Index button.
    5. You should see getTrackLength in Fauxton
    6. To verify if getTrackLength view works, open terminal and type

    7. In Fauxton you can press on getTrackLength and if you hover over the document you will see view result
  • Update Django view
    1. At the end of ~/Desktop/project/project/main/views.py file paste the code

    2. Add

      to the ~/Desktop/project/project/main/urls.py file.

    3. Change job.htm file located in templates folder ~/Desktop/project/project/main/templates/main/ so now it has the following content

    4. Add track_length.htm file to templates folder ~/Desktop/project/project/main/templates/main/ and paste there the following code

    5. Open web browser and use http://127.0.0.1:8000/job URL
    6. Click on Track length per track

      You should see length of a track 1 of user 2.

    7. Use http://127.0.0.1:8000/job URL to add one more track for user 2, add one track for user 3.
    8. Use our view to get their length
  • Add view with reduce part
    The second view we will implement is intended to calculate for every user a total length of all path related to this user. This view will return document of the form

    where keys are the userId's and values are a total length of all path related to the user identified by a corresponding key.

    1. Now we will create view with reduce part. Select, as we did it before, project database in Fauxton and then select New View under the queries section
    2. In the Index name field type the name of our new view: getAllTracksLength, paste the following code

      as a map function.
    3. In Reduce dropdown select CUSTOM and paste the following code as Custom Reduce function

      and press green Create Document and then Build Index button.
    4. You should see getAllTracksLength in Fauxton
    5. To verify if getAllTracksLength view works, open terminal and type
  • Second update Django view
    1. At the end of ~/Desktop/project/project/main/views.py file paste the code

      to the ~/Desktop/project/project/main/urls.py file.
    2. Change job.htm file located in templates folder ~/Desktop/project/project/main/templates/main/ so now it has the following content
    3. Add tracks_length_all.htm file to templates folder ~/Desktop/project/project/main/templates/main/ and paste there the following code
    4. Open web browser and use http://127.0.0.1:8000/job URL.
    5. Click on All tracks length per user

      You should see length of all tracks for user 2 (which is 6278+6278=12556) and 3 (which is 6278).

Now you should know how to use views to get data from CouchDB. You can download a full project at this stage.

If you don't want to use external library, or learn CouchDB details you can send POST request manually -- more apout sending POST request can be found here