Why do so many programs use rational databases instead of loading the data during startup and keeping it in memory? Especially for smaller datasets I would think, that a database adds unnecessary complexity and overhead. Also, a lot of data can be saved using modern RAM and when using an in-memory approach, optimized data structures can be utilised to further improve the performance
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.
Hope you enjoy the instance!
Follow the wormhole through a path of communities !webdev@programming.dev
Relational database/RDBMS? It’s because the added complexity was necessary or desirable for some reason - relational databases are pretty good at managing data fairly quickly, often with features to deal with timing issues, concurrency, transactions, security, auditing, replication… They have theory going back decades and are still highly relevant. Same as any other software offering, you sometimes expect it to provide features you can’t, won’t or shouldn’t implement yourself.
The overhead is likely negligible, or was considered a fair tradeoff, or the database is actually better at its job in the given scenario. Hopefully. Sometimes people really do add shitloads of very unnecessary overhead by mistake, or overengineer their solutions terribly (cause it’s fun)
Good luck keeping an entire corporate database in RAM only. It’s not a 1:1 ratio of space on disk to space on RAM, pointers to the appropriate addresses also have to be stored in memory.
Even for a relatively small database, say, 100MB, keeping it in RAM only is a safety hazard. Even if you fully isolate the memory, make it completely “impregnable” from any outside application, memory corruption remains a problem. If you don’t isolate, memory leaking into or out of the database is also a problem, the former being much more damaging.
The fact that you still have to dump it into a hard drive also shows the main problem of keeping it in RAM: if the database service goes down, you “lose” the entire database
Put it another way, your suggestion is to have Notepad open, reading/writing the data and only saving at certain intervals.
First, persistency. You data lifecycle may not be directly proportional to your applications lifecycle. You may need it even after the app is shut down.
Second, RDBM systems provide a well defined memory/storage structure and API - “structured query language”. This enables you to easily query your data and acquire suitable views that are beneficial for your applications purposes.
Third, It’s always better to outsource your data layer to a battle tested, and trustworty database then trying to reinvent the wheel.
So this paves a road for you to focus on your business logic than splitting that focus for the data layer and business logic.
And also it may save you down the road.
Sure, your dataset might not be big now but what about in a year’s time? It’s easier to start out using a database than to have to transition it later.
(Relational) Databases can be in-memory. And unless you have very little data, your in-memory storage will probably turn into a database, and if it’s relational it might even turn into a pseudo-relational one. Just without all the benefits of 1000 of dev-hours of optimization. But next, you also need to persist your data. And you probably don’t want to lose everything if your app crashes. More stuff that is already done for you, if you use a non-memory database that probably will hold all frequently accessed data in memory anyway. And there are many, many more issues like that.
edited to be friendlier, original was
Why would their experience be relevant? They’re asking a question, so obviously they have things to learn. You could be nicer about it.
Partially, because experience in an area means one can understand the answer, but to a probably bigger part see below.
It read to me like they asked as if they know better than everyone else, and were ranting about others doing it wrong. But that was actually an assumption on my part, and may simply be their style, or even me completely misreading things. So thanks for calling me out on it.
Like everyone has mentioned, because you want the data to persist across program runs. By all means, use in-memory state for truly ephemeral things like caches. You will need both for any real world task.
One more reason to use a database, even if the persisted set of data is small, is the query engine. SQLite is absolutely perfect for such small tasks. Writing SQL to query the data can save you from tons of wastefully repetitive app code.
RDBMS do not imply persisted data. There are plenty of databases which support or are designed explicitly to work as in-memory data stores. Perhaps the most popular one is SQLite, whose in-memory store is a popular choice.
deleted by creator
I think you answered your own question in a way. For smaller datasets and workloads which only require one machine, storing your data in memory or on disk locally is probably fine.
When you start getting larger datasets or need to have a dataset that’s consistent across multiple servers, you need something more than just the ram of the server.
At that point it’s best not to reinvent the wheel and just use some kind of pre existing datastore who’s tradeoffs make sense for your use case. Some of these datastores do use RAM under the hood for performance but there are pros and cons as with anything.
I presume you’re referring to relational databases instead of rational.
The responsibility of a RDBMS is to implement a set of performance-focused data structures that help clients reliably get the data that they need in the fastest possible way, without having to reinvent the wheel.
More often than not, this data does not fit in the heap.
Also, in many usecases there is more than a single client.
Hope this helps.
Because the data needs to be persistent?
It all depends on the implementation and need.
In-memory structures are usually faster to work with, but harder to coordinate multiple updates from multiple sources (different applications, services, etc).
Databases have all sort of failsafe mechanisms to ensure data integrity and recovery options, in most times there is no need to reinvent it all over again.
Persistent - do you need to access the data again once your program was finished? How often does the data change by other programs/tasks once you read it? How big is your data and how complex are the connections between your data objects?
Many times the implementation is a mixed approach. It is better to know and calculate the needs before you start your project, but as it usually happen, once you get performance issues, you start optimizing adding in-memory cache or scale to a bigger database.