Our technology choices are growing each day, and, so does the architecture complexity. One of the major pains points often ignored by app developers is the database selection and management.
Often the ORM and database mapping keeps changing as a result of poor developer and DBA(Database administrator) synchronization.
Coming back to the huge database choices available today – from JSON document storage like MongoDB to object-relational databases like PostgreSQL to Graph relational databases like Riak – You have a wide array of “databases” at your disposal to build modern mobile applications.
Crisp a messaging platform built their entire platform on Firebase as the database for their application. But, they quickly ran into the following challenges:
- Difficulties in handling complex queries
- Storing data offline
- Managing object relations at web scale
For Crisp it was extremely important to ensure that customers are able to effectively communicate with their customers. And, Firebase proved to be a bad fit here!
Customers were often missing important communications sent via the app. They later on moved to a SQL database for data storage and MongoDB to store the messages for better performance and scalability.
Crisp isn’t the only story where a database dramatically compromised quality and performance of an application.
The face of disruption, Uber had to migrate their database from PostgreSQL to MySQL. Uber made this choice as they faced inefficient database replication features and constant migration from one PostgreSQL version to another.
The face of disruption, Uber had to migrate their database from PostgreSQL to MySQL.
Note: We are not in any capacity implying that you should only rely on a single database for your app. Often for complex and large-scale solutions, your app might require 2 or more than 2 types of databases.
Different type of Databases for mobile applications:
MYSQL: An open source, multi-threaded, and easy to use SQL database.
PostgreSQL: A powerful, open source object-based, relational-database that is highly customizable.
Redis: An open source, low maintenance, key/value store that is used for data caching in mobile applications.
MongoDB: A schemaless, JSON document database which is known for its flexibility and scalability.
Memcached: A distributed cache system which is multi-threaded, and used primarily for caching objects to speed up applications by alleviating the database load.
MariaDB: A popular open source relational database which was created by the original developers of MySQL.
Cassandra: A free and open source, NoSQL database which is designed to handle a large amount of unstructured data at any scale.
SQLite : Embedded database, common usage is to provide local data storage capabilities on mobile phones.
InfluxDB: An open source, fast and time series database which is written in Go.
RethinkDB: An open source, and document-oriented database which stores data in JSON format and sync in real time with the application.
Riak DB: A distributed NoSQL database, which offers high availability, fault tolerance and data resiliency as its core feature.
Couchbase: A full Stack NoSQL database that supports offline sync, full CRUD, and query capabilities and runs locally on the device.
ArangoDB– An open source NoSQL database which is known for its multi-model, graph and geo algorithm features.
Selecting a right database for Mobile applications(with multiple layers of data) is a big challenge. I was shocked to see the challenges that developers face when they integrate mobile app with databases.
I remember when complete developer podcast did an episode on big the challenge becomes, you can listen to the episode here.
Applications with multiple layers of data
Selection of the right database with multiple layers of data can quickly become challenging.
For Instance, a task manager application with multiple fields and tables on top of a different set of fields and tables. In such case, each database relies on another, and often it becomes difficult to manage the data.
noSQL with multiple layers of data
The problem with NoSQL databases for these applications is their position in the schema. The data that you will store would be too deep, somewhere around 5th or 6th level.
And since CRUD operations are involved, when multiple concurrent users are trying to access an information hosted on NoSQL, the query speed is very slow.
MySQL and multiple layers of data
For the app in question, MySQL performs very well. This works great if you have a fixed schema/database structure and you aren’t planning to routinely update it.
If you add a new field or a table, the entire structure of database needs to be built again.
So what should you select if your app has multiple data layers
From an engineer’s perspective, the most app changes over-time. With a structured database, you won’t be able to make these changes frequently.
Using an unstructured database like MongoDB would bring in flexibility to change without going through the painful process that you would find yourself in with structured databases.
For example, MongoDB allows you to work with unstructured data in multiple layers, adding new app features with flexibility – without the need to change your overall structure of your database.
Data synchronization across multiple devices, and multiple users
When data synchronization and offline features are at the core of your mobile app, the users might have access to the same data and have the same CRUD rights.
This basically means that you would end in a situation where multiple users are editing the same data block – concurrently!
In other situations, we have an app that uses API to communicate with the server and perform CRUD operations. When a mobile app is offline, there’s no API connecting and dictating the read-write operations.
When a user now edits a shared doc that others are editing as well, the writes may or may not be accepted by the server when the app goes online again.
Let’s imagine this challenge better. Assume that we have the following situation:
Some users are editing records while they’re offline. So, how does the app responds to their changes? Let’ see which edits will be sent to the server first.
Assessing how complex the situation can get is a bit tricky, and many fail to accommodate for edge cases here. The solution here would require a locking mechanism to prevent data loss while synchronization.
Lookout for any database that supports efficient MVCC, what these MVCC supporting database would do is timestamping each data record they contain. The record updates time-stamp whenever any change is made offline.
For offline apps, what database you should select?
Generally speaking, I wouldn’t recommend MySQL for this purpose, it wasn’t designed for a such a use case either. PostgreSQL also fails to handle such locking operations, Uber famously had to abandon it as it had inefficient MVCC replication support.
If you are constrained to carry out heavy manipulations locally on your mobile app, you should stick to CouchDB and REST Client’s that have a local caching database(PouchDB). The solution would perform better than SQL based databases.
Encryption keys conflicts and databases
The data synchronized between several devices is usually secured with a symmetric encryption algorithm and a key. The key is stored on each device and is encrypted with a password.
When a user changes the password, only the key gets re-encrypted. It gets sync and updated across all devices whenever there’s a network connection.
But, this creates issues when a new device doesn’t have a network connection and locally creates its own new but incompatible key.
Note that, this isn’t a database selection challenge. This is rather a practice-based issue. Developer experts recommend not to encrypt security keys when you have implemented offline synchronization in the first place.
Whenever possible, prevent encrypting your keys to make the database performance a lot better.
Network interruptions and availability
When any SQL databases lose network connection with the client side storage, it typically generates an error message instead of transferring the data as needed. If this problem occurs frequently, you may need to reconfigure your database.
Apart from that, there are other challenges to deal with MySQL and network interruptions:
- Persisting local data including unsynchronized transactions and app state
- Getting mobile transactions and operations back to the main server-side database
Above listed issues occur on behalf of network interrupts, keeping a database that offers better reliability and resists connection loss is a better option in this case.
Application scalability from zero to million users
Scaling = replacing all components of a car while driving it at 100mph
– Mike Krieger, Instagram Co-founder
When you think of scaling your application, you think of adding more resources in form of servers and making the database engine more efficient.
The Database should be able to utilize the resources and handle parallel processing, that means the Database must be multi-threaded.
Multithreading allows a database to schedule parallel tasks on the available resources and minimize the workload on the server-side.
Apart from multithreading, Distributed Design of a database is significantly important for scalability.
In a distributed designed database, you can split up the services on different threads to minimize the workload on the main database. This drastically improves the parallel processing of databases.
Pushing new app updates and database changes
Often developers run into problems of pushing new updates and not keeping up to date with the database changes.
When you’re building a new feature for the app the chances are that there will be some additional storage requirements. Your data format could change, you might have to make changes in the existing database – In any case, you would end up adding new fields and tables.
A way to prevent redoing the whole database is by managing the object relationships, make sure that they remain in synchronization.
Whether you choose a SQL database like MySQL or a noSQL database like Mongo, you will have to make these few changes to ensure the data integrity of the mobile app:
Enable onCreate(..) and onUpgrade function
These functions make sure that after you upgrade your application any changes(Schema, data formats) in the database are synced with the application. So, when users make any query, they don’t get random results.
Keeping up with Old Database versions
Mapping app and database versioning are important for app developers. Whenever developers push an updated version of their apps to their users, they often fail to account for a legacy database, API versions.
These older versions are what your app’s users would use when they don’t update their app with the same frequency as you do.
This cause the app to break and not work properly, most likely a 1-star rating or an uninstall.
I will try to illustrate the challenge with a simple illustration.
Imagine you have an app user that’s currently using V1.0, and you just pushed V3.0. The user updates the app from V1.0 to V3.0. Note that your app’s user has skipped V2.0 altogether.
Rather than routing a user through V2.0, most developers would just let them upgrade to V3.0. And they don’t account for what’s going to break here.
Account for user’s who aren’t going to update, manage API and Database versions to support their apps. Using Automated deployment tools here can also help you get really efficient with DB versioning.
Database versioning – Final thoughts
On an end note, you can choose SQLite with any server-side database, and it would work fine. SQLite has built-in OpenHelper tool to check database versions and to call functions such as onUpgrade or onCreate.
Alternately, you could use a migration library like Sails to migrate between different versions of a database. Although, this only works with MySQL and PostgreSQL. You can find the tutorial here.
Using multiple databases in the same app
While building your application you may realize that you need to store your data in the data structure where certain queries can’t be fully optimized for performance.
This happens in use cases where you have complex search requirements, robust reporting or where you need to run analysis on incoming data quickly.
Under these situations, you would need more than one type of database to support your application. Uber, for example, has MySQL, MongoDB and lot of other databases.
They used MongoDB for their CDN, MySQL for business side logic. Using MongoDB in their case made it very simple to store high volume incoming data.
At the end of the day, it is all dependent upon your app’s business and technology needs.
If you aren’t sure on how to integrate two databases within the same application, follow this tutorial.
Multi-Version Concurrency Control (MVCC) Limitations of Databases
A support for Multi-Version Concurrency Control (MVCC) allows simultaneous access without blocking the threads or processes involved.
MVCC allows a reader to view a snapshot of the data before the writer’s changes, allowing the read and write operations to continue in parallel.
For instance, look at this table to see which databases have MVCC implementation:
|MySQL||Partially, when used with InnoDB|
|PostgreSQL||Yes, but not efficient|
|MariaDB||Partially, when used with XtraDB|
Predictive caching in mobile apps
Predictive caching can be applied to improve the performance of your mobile app by looking at how, when, and where your users are using your application.
Based upon your understanding, a segment of users can be identified and can be served with a specific information they always lookup based on their behavioral traits.
The best part is that the data can be made available and cached locally before the users even log-in to your app.
MongoDB provides predictive caching that allows developers to serve users with predicted-cached data before they even ask for it.
Low latency challenges in Real-time streaming apps
Low Latency is crucial for streaming applications, specifically for online gaming applications. Latency can be defined as the time it takes to access to data from the server. A lower latency allows an application to become more interactive to users.
With high latency, a user would get frustrated by refreshing the app screen and it still won’t return results in time. Anything that goes below 500ms is considered to be high latency.
Optimizing latency is a continuous cycle, and each database has its own limits to what extent it can be replicated or optimized.
There are multiple methods to reduce latency, and replication is most popular among developers.
Disruptive startups use database replication to replicate their production database to a target, and then let users interact with it i.e. run queries, perform read/write operations.
With each different database, you would find them limited in terms of latency for specific use cases. However, here are some databases with latency lower than the average:
- PostgreSQL for transactional operations in real time
- RethinkDB built for real-time application
- PipelineDB, SQL database for streaming applications
Caching databases to reduce load
We often add a caching layer to reduce the load on servers. We often cache data on this caching layer to prevent a recurring request for the same data. This reduces the number requests going to the server and renders our server efficient for performance.
To reduce on the server you can use a caching server database like Redis or Memcached. They are useful if you have queries returning data that doesn’t change often.
Instead of your app hitting the database every time it asks for information, your caching server can simply return the cached version while updating the cached data at set intervals (say, once every 30 seconds).
If you’re building an application which requires heavy caching on the local side:
Redis and Memcached are preferred.
For more specific use-cases, refer to this table to determine which database would be more suitable for your specific application requirement.
Databases for different use-cases
|When your app needs||Compatible database type||Recommended Database|
|Complex transactions||Relational database||MySQL or VoltDB|
|Scalability(Load balancing, partitioning, automatic sharding etc)||Document oriented, schema less design||MongoDB|
|Implementing social network operations||Graph database||Riak|
|Wide variety of data types support||Document database||CouchDB or MongoDB|
|Offline reporting with large datasets||Database which supports MapReduce||MongoDB or Hadoop|
|CRUD operations||Document database||CouchDB or MongoDB|
|Built-in search feature||noSQL graph database||Riak|
|To cache or store blob data||Key-value store||Membase|
|Continuous reads and writes||Document or key-value store||CouchDB or Membase|
If you have any questions or suggestions, feel free to drop a comment.