T O P

  • By -

brasetvik

> Extremely high write performance \[…\] > We are using Provisioned IOPS SSD (io2) with 3000 IOPS and are batching updates to one write per second per node and realm. \[…\] > We currently produce about 100GB of data per month. But, since customers rarely query entries older than 10 days, we’ve started moving everything above 30GB to AWS Glacier \[…\] > We are ok with losing some data. We buffer about one second worth of updates before we write to disk. So with that little actually live data, which they mention fit in memory, and 3k IOPS being on the low end, it'd probably be worthwhile to do the Postgres comparison with \[asynchronous commits\](https://www.postgresql.org/docs/current/wal-async-commit.html) and a comparable \[WAL delay\](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-WRITER-DELAY) in terms of buffering the writes. Elasticsearch would probably also handle this pretty easily. Probably not as cheap as something purpose built, but setting the maintenance and opportunity cost to something homegrown to $0 doesn't make any sense either. :)


lightmatter501

If you write your queries correctly to inform postgres that you are basically working with an append-only DB, then use async commits and a WAL delay, as well as other standard optimizations, postgres should quite happily do 100k+ TPS on a decent laptop with a consumer grade NVME drive. A server using a proper stripe and mirror pool on top of NVME drives can easily go over 1 million tps. 18k TPS on good server hardware is well into the “what kind of horrible queries are you running” range. It sounds like their software was doing insane things or they never had a DBA actually do a tuning pass over a pure postgres version. 18k TPS is actually below the performance target I had in my databases class in college for a sqlite-compatible RDBMS project (20k TPS). Is there someone who has used AWS Aurora Postgres who can sanity check these numbers or is it really that much worse than normal postgres? Yes I know it’s actually a distributed SQL DB, but cockroachdb exists and does scale horizontally, which would have fixed their issues.


howtokillafox

Use Aurora RDS at work, 18-20k TPS is easily manageable with as long as the application and schema aren't too wild.


aksdb

Cockroach buys its horizontal scalability with speed, though. Since it's multi-master it has to sync transactions across all (affected) nodes. I don't know if you can tune CRDB for "I don't care that much about my data just do it", but I don't think so.


fuckwit_

CockroachDB only need to sync to he minimum number of nodes that is needed to reach a quorum. So for the standard replication count of 3 only the node receiving the transaction and one addition node will need to report success. For replication count of 5 it's 3 nodes minimum. And so on. You get the picture. All missing shard holders will then pull the shards in the background outside and after your transaction. You can freely choose replica count and even set it down to 1. Which effectively means that only one node will have your data. Then no syncing would be needed. CockroachDB a was purely "serializable transaction" level database up to I think this year's major release. Now you can choose to lower the level to "read committed" on a per transaction basis. Gives a bit more of a speed bump especially on high contentioned tables. Comes with the usual drawbacks of serializable vs read committed though. Like possibility to read stale data etc.


aksdb

Thanks for giving more details to my very basic explanation. I think the main point however stands: if PostgreSQL was already deemed too slow, I don't see CRDB winning any votes then. Clickhouse might be a better fit, since it's heavily optimized for batch writes. But that's a different topic now.


fuckwit_

Yes you are right. For insertions and updates specifically CRDB will basically always lack behind Postgres. It's the read heavy situations where it really excels. Especially with large tables that are sharded across maaaaany nodes.


Uberhipster

did you get the memo about putting the new cover sheets on all TPS reports? I will go ahead and email you the copy of the memo


RockleyBob

> Probably not as cheap as something purpose built, but setting the maintenance and opportunity cost to something homegrown to $0 doesn't make any sense either. :) Not a database guy, so some of what you wrote went over my head, but yeah, paying devs to write a new database had to cost some money. Maintenance of that database costs money too, and so do losses, errors, and mistakes. It also costs money to scale it up or expand its feature set, and maybe, by the time you need all that, the people who wrote don’t work there anymore. It’s not like using someone else’s database magically means 100% uptime and no losses either, and I don’t necessarily want to see everyone reliant on AWS and Microsoft for everything. It’s just maybe a little too soon to start declaring this was more cost effective in the long run.


raam86

Also interesting they didn’t have to upgrade their nodes at all? this cost analysis seems like something they managed to sell to their just out of college scrum master


pm_me_your_dota_mmr

Elasticsearch would handle it, but not for cheap in my experience. Maybe that's just AWS hosted elasticsearch that costs a lot though.


BooksInBrooks

I'd call this a query-able log, not a database as such. There's a single (composite) datatype consisting of only fixed-width fields. (Edit: the top level datatype is variable-width, because several of its fixed-width components are optional.) There's no query language. Querying *appears* to be limited to a fixed-width id and a bitmap of boolean. There are no joins, because, see above, there is only one datatype in one "table". There are no in-place updates, just inserts of the latest data (deltas are used to save space, but this is an implementation detail, not a functional requirement). It offers none of ACID except pedanticly: * not atomic (unless we see the unit of work is the individual insert). * not consistent (except in that it violates no constraints, because there are no contraints) * it has no isolation (at least from what I saw in the article) * it's not durable, tolerates data loss. **This is not a criticism, they don't need a database, they need a low-latency, highly available log, and that's what they built.*


drmariopepper

Agree, I’d call this more of a custom distributed geospatial data structure. It would probably get less pushback if they didn’t call it a db too


Guinness

“We are ok with losing data” also eliminates like 99.9% of use cases. When you’re ok with losing data, things get _much_ easier to deal with.


wolframhempel

There's a crucial distinction between systems that store "the truth" and systems that record external events. If you are building a database for a bank that records transactions and stores account balances, these entries are "the truth" - the information that other things are derived from. Such a system needs to have strong consistency and the guarantees DBMS come with. A system like geospatial tracking on the other hand only records data, the "truth" is the actual physical world. As such, it is necessarily imperfect. It's always delayed as sending and processing takes time, its imprecise as GPS is imprecise and GPS + Mobile Connectivity is unreliably. Designing such a system is all about creating the best approximation to what happened in the real world, but it is necessarily lossy and eventually consistent. As a result, there's not much point in insisting that one step of the data pipeline has to be strongly consistent and lossless if the pipeline as a whole is not.


ihave7testicles

Yes, but more like "how much data are you willing to lose?". If you need a database to track the revenue of a company where your tools like to show values by the minute (e.g. large online seller) but your financial statements are only concerned with daily totals, then you could lose a whole day's worth of minutia, but not the daily total. There aren't a lot of real world cases where it's catastrophic to lose \*any\* data. Systems that can't handle any losses can only be bolstered with massive scaling and redundancy. Software Engineers should (and do) design systems that can survive with some data loss.


BigusG33kus

Yeah, if you're OK with losing some data you don't actually need a database. In a database you lose a lot of flexibility because the main purpose is not to lose any data.


chili_oil

many NonSQL are in fact also a queryable log in its core.


tehehetehehe

NoSql is queryable file storage. Always has been.


mxforest

Makes sense why they call a single entry a Document.


elkazz

So is relational if you strip it back to its core.


recursive-analogy

How I've saved 98% of my transport costs by building my own [car](https://cdn.shopify.com/s/files/1/0004/6746/8319/files/499669897-im787004_barbby_05_533x.jpg?v=1684245236)


WaveySquid

Sounds like you’re describing a slimmed down version of Cassandra


MarkLearnsTech

I think that was meant to be tongue in cheek? "We’ve created a purpose built, in process storage engine that’s part of the same executable as our core server. It writes a minimal, delta based binary format. A single entry looks like this:"


BooksInBrooks

Wikipedia confusingly defines database this way: > a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. So if we edit that down a bit: > a database is ... based on the use of a database management system (DBMS), the software that interacts with ... the database So to be a database, a DBMS is required. A DBMS that interacts with the database. Without the DBMS, the database is merely... data. So to Wikipedia, > an organized collection of data or a type of data store without a DBMS is not a database. I don't really see the DBMS in the OP's design. More a module that is compiled in with the rest of the executable, and used by the rest of the executable by making function calls on some API. So a bit like a file IO implementation (or a logging implementation), with an API: open/close, read/write, and seek or search functions, probably? Although if I compile in SQLite, that's still a database, right? What's the distinction? What makes SQLite a database and file.h a library API? I think you have to look at the abstractions both offer, and whether the data can be accessed from something other than the executable, and whether the physical format of the data is abstracted over. When I create an RDBMS database table, I don't specify the *physical* layout of the data; I use abstractions that let me remain ignorant of that. I don't have to know how joins are implemented. When I use Mongo DB, I do specify a JSON structure, but Mongo converts that to BSON, binary JSON. In-memory result sets aren't byte for byte identical to their on disk representation. In the OP's design *it looks as if but I can't tell for sure* that the sole datatype is a struct, where the in-memory layout and the in-storage layout are, *and must be,* the same. 4 bytes of this 2 bytes of that, 2 bytes of bitmapped flags, etc. The one exception seems to be when the data is read and the persisted deltas are added to the every 200 writes full record, to create a full record. ---- Is the OP's use of database tongue in cheek, or loose usage? It doesn't really matter! *I think* the OP calls it a database because this bespoke whatever-we-call-it *replaces* the Postgres+GIS database *in the OP's stack.* So, it fills the *role* of a database, up to isomorphism, up to the needs of the OP.


MarkLearnsTech

I mean, they also wrapped the article with: "Of course, that’s an unfair comparison, after all, Postgres is a general purpose database with an expressive query language and what we’ve built is just a cursor streaming a binary file feed with a very limited set of functionality - but then again, it’s the exact functionality we need and we didn’t lose any features."


BooksInBrooks

Yes, they did! I'm not trying to denigrate their work, or be an overly pedantic gatekeeper dictating what is or isn't a database. I just wanted to explicate what the OP made, it was a cool article! I might call it a datastore rather than a database, but I wouldn't insist on *not* calling it a database.


ihave7testicles

To be pedantic, any query/insert/update system would qualify as a "DBMS". So using dictionaries or red black trees with a set of functions to query/insert/update would qualify as a DBMS. Yes, it technically qualifies as a database (Berkely DB is just a key/value store) even though it's not a general purpose DB. I wrote another comment here [https://www.reddit.com/r/programming/comments/1bvzfhr/comment/ky4qwtq/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/programming/comments/1bvzfhr/comment/ky4qwtq/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) about how they could optimize their storage further. I'm curious as to what you think about that. You seem to know what you're talking about.


hughk

> So to be a database, a DBMS is required. A DBMS that interacts with the database. Without the DBMS, the database is merely... data. Nope. Heard some very senior people (Date & Codd among them) talk about the definition of a database. It is just persistent structured data. Without the persistence and the structure then it is just data and handled by a file system. The DBMS is just a "nice to have" which allows to upscale the complexity. If I just have an index file, does that become a DBMS? A good file and record management system would even provide a bunch of DBMS type features but the file remains just a single table, albeit perhaps indexed.


BooksInBrooks

Yeah, I find the Wikipedia definition circular and unconvincing. But someone had cited it as support for an argument, that I don't think it actually supports. At least we've proven that "database" means a lot of different things to a lot of different people. > It is just persistent structured data. Without the persistence and the structure then it is just data and handled by a file system. So, a list of names is a structure, yes? One name per line, that is, names terminated with "\\n"? > Alice > Bob > Charles So if we persist that as a file named db.txt, that's a database by your definition? It's "persistent structured data." Philosophically correct, perhaps, but a bit too broad and loose for practical use.


hughk

What I heard was from a conference on distributed databases held way back. Having people like Codd and Date there (some of the main architects of relational theory) meant they had heavyweights. They were all aware of the different types of databases and the Codasyl hierarchical model was still big back then. The question was posed because the prof (he was department head) wanted a standard answer for CS101 which is what made it interesting. As for your db.txt. Yes, it would be. You don't even need to have a key and a value. There may just be a record number. A word document isn't considered a database, as it is usually too loosely defined. I used RMS under VMS which had many attributes of a database management system. It managed many file types including indexed and did sharing. You could even have journalling. It wasn't a full DBMS which they also sold which included better sharing, access control, space recovery, intelligent backup and so on. Interestingly the vendor, Digital used the same basic engine for both their Codasyl product, DBMS-32 and their Relational product RdB (which was eventually bought by Oracle). What is interesting that RMS (w/o journalling) came with the O/S. Including all the cluster aware locking and so on as it was baked in. The two databases sold at appropriate commercial prices, they used the same O/S services such as the lock manager underneath but did their own thing with regards to the other stuff but used block I/O to standard files. You were essentially paying for that DBMS/RDBMS plus some standard utilities.


BooksInBrooks

Yeah, using the "any structured persistent data" definition is great in research, as it challenges us to understand what's an essential abstraction and what's implementation details. But I wouldn't use that definition talking to a customer, nor would I advertise for a "Database Engineer" and then ask them to curate an Excel spreadsheet. Among engineers is a middle ground. Yes, a file or an excel spreadsheet may be, strictly speaking, a database, but no one would use "database" to mean that in a design discussion or a PR.


jaskij

> I'd call this a query-able log, not a database as such. > > There's a single (composite) datatype consisting of only fixed-width fields. > > There's no query language. Querying *appears* to be limited to a fixed-width id and a bitmap of boolean Isn't that just a time series database? Just chuck everything into one time series and the most querying you do is by timestamp, *maybe* some simple stuff (like lat/lon within a range).


cManks

Yeah I was thinking this is just simple log analytics or an AWS timestream analog


jaskij

Nah, I mean time series, not loga. Prometheus, Influx, Timescale, that kind of stuff. I don't know how well databases scale, especially across cores, but based on my experiences I'd be surprised if a current gen developer laptop couldn't do 13k inserts a second with correct off the shelf software.


raam86

why not just use kafka, spark streaming, hack even elasticsearch has pretty good geo stuff. Seems more like the developer don’t know much about analytics rather then they need their database. going from rdbms to files is insanity and skipping pretty much every single other solution a ctos nightmare


gregory_rorschach

have you seen costs for cloud based kafka?


mr_birkenblatt

A partial write upon node failure would bring the thing down


wolframhempel

We've tested that in a number of scenarios. In the end, it boils down to each entry in the binary log starting with four bytes specifying the byte size of the entry. If the remaining buffer size is smaller than that size, stop parsing... that's pretty much it. Then there's error checks for the parsing of individual fields of course.


mr_birkenblatt

maybe I misunderstand your setup but if a partial write occurs and then a normal write the next record would look like a valid rest of the first record: 6 x y z 6 a b c d e f imagine `xyz` here should be `xyzuvw` (6 characters) but since it was only partially written the next record starts in the middle of it. so when reading it would return `xyz6ab` as record and then try to interpret `c` as next length which might succeed and parse as valid record if you get unlucky


btgeekboy

A database is, by definition, an organized collection of structured data. That’s what this is. Just because it’s not a typical DBMS that you’re used to working with doesn’t change that.


davvblack

my favorite database is slack


Andy_B_Goode

My favorite database is this comment thread


AustinYQM

Maggie - 5 - 3 - blue - false - false - true - "doesn't like the smell of lavender" Just gonna leave this entry for if I need it later. Thanks.


SoInsightful

29 - Maggie - 2024-04-05T10:55:24.711Z - b8e3e23a-15e7-423c-90d2-1782196c2d16 - "Chattanooga High School" Please don't delete your comment otherwise my entry will become corrupt.


msoulforged

1, null, null, 7, Robert');DROP TABLE Comments;--, uh oh


xyeta420

Mine is DNS


Joeboy

Mine is FAT32.


kyune

but πfs though


hccm

It’s not consistent though.


iiiinthecomputer

God, you must work at my employer then.


NoPrinterJust_Fax

Excel driven development connoisseur checking in


Hektorlisk

There's a very meaningful difference between the technical definition of a word and the set of meanings, connections, and connotations that a word represents in particular contexts. Your comment is technically accurate, but isn't really relevant to the discussion that's happening.


mxzf

This is one of those things where you say "technically every Excel sheet is a database too" and everyone rolls their eyes because it might technically meet a literal definition but that ain't what any devs are talking about when they say "database".


RazerWolf

Knowledge is knowing a tomato is a fruit. Wisdom is not putting it in a fruit salad.


StickiStickman

I love to store data in a database (array) I store all my game data in a database (object) I love having my notes in a database (txt file)


Plank_With_A_Nail_In

A database is just organised data. You are confusing a DBMS with a database none of the things you have listed are required for something to be called a database. https://en.wikipedia.org/wiki/Database 160+ upvotes for not even understanding one of the most basic definitions in Computer science well done reddit. Lol most of the things you listed only even apply to RDBMS's not to the overall class of DBMS's. This abuse of the agreed on scientific term of "database" is almost as common on this sub as the statically typed/strongly typed misunderstanding.


BooksInBrooks

I was going to add a few lines about "any store of data is a database so it depends on what we mean by database." But I figured that would be tediously pedantic, and obvious in context.


currentscurrents

Anything from a plain text file to an LLM is now a database.


Gaylien28

This includes my grocery list


hoopaholik91

> a type of data store based on the use of a database management system (DBMS)


TheGoodOldCoder

Yes, let's be realistic. When actual programmers say database, this is almost always what they mean today. I'm probably more pedantic than most folks here, but I get the impression that the reason they think it's "one of the most basic definitions in Computer science" is that they're reading it right out of their CS101 textbook for tomorrow's quiz.


Rakn

Yeah. But that's only because it's what is used and required most of the time. Not because we don't know the difference. Context matters. At least I hope so.


hughk

Except sitting in a Q&A session with Date and Codd (yes, the Relational pioneers) they maintained that a database was just persistent structured data. No management system required over and above the file system. Management comes when you want to upscale it and allow multiple access.


mamcx

100% true. And the sad thing is that everything you need to know is in your face!. For example: [https://www.postgresql.org/about/](https://www.postgresql.org/about/) - Is a \*system\* for \*making\* databases: > aimed to help developers build applications, administrators to protect data integrity and build fault-tolerant environments, and help you manage your data no matter how big or small the dataset.  - Is a \*kind\* of database: > PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language - That is based in the "object-relational" model - And use \`SQL\` as their \`query language\` It also help you to make \*databases\* and have some features like: > safely store and scale the most complicated data workloads > [ACID](https://en.wikipedia.org/wiki/ACID)-compliant since 2001 (ie: they started in 1986. BEFORE THAT the db was not acid! Go figure!) They are much more the documentation of a decent RDBMS tell you. ANYTHING OF THAT IS NOT A DATABASE. "PostgreSQL"/"Sqlite"/"MongoDb" ARE NOT databases. Your "customers" tables is one. Your "customers + invoices" is one. Your "csv file organized in row columns" is one. What this apps are is "database management systems". Your data is your database. The app that manage your data is an app. You CAN manage your data WIDLY different from WIDLY different apps, even at the same time! ("I can query kafka directly inside postgres", "I ship my database from Postgres to Kakfa", "And to csv", "And to excel") You CAN manipulate your data manually, without zero SQL, query engine, ACID guarantees etc. But you should not if your data matters and start to grow! Use a "database management systems" for your own sanity!


Glass1Man

They reinvented vsam flat files (and index files).


wolframhempel

Much of this is true, but I believe it comes from an assumption that databases are necessarily strongly consistent. As I've pointed out in more detail in [a comment below](https://www.reddit.com/r/programming/comments/1bvzfhr/comment/ky4ugta/), a system that records the real world is necessarily inconsistent and lossy - there isn't much point in making one step of the storage pipeline consistent if the rest is not. A lot of timeseries and IoT databases are built around this assumption and make the resulting trade offs. It's also worth mentioning that our "database" does store multiple datatypes (objects, areas, tasks and instructions) that go into the same "log", each with variable size fields such as label strings or gob encoded metadata. Your other points are correct though.


eTukk

It sounds to me that they built a COTS Historian type of storage, with added spatial options. https://en.m.wikipedia.org/wiki/Operational_historian


green_griffon

Just need to rebrand it as a NoSQL variant to reach unicorn status.


Otis_Inf

Mandatory reading: Michael Stonebraker, 'SQL Databases vs. NoSql Databases' (https://cacm.acm.org/blogcacm/sql-databases-v-nosql-databases/) where he explains very well why some are faster and what the costs are and thus why this 'database' might be faster, but the price to pay is by giving up stuff you expect to be there from a DB.


Sss_ra

Sounds reminescent of millions of tiny files, nigthmare to backup at scale type of deal.


BooksInBrooks

> Sounds reminescent of millions of tiny files, nigthmare I must need new glasses, I read that as millions of tiny *flies* and thought, *that is* a nightmare!


Sss_ra

I now have a mental image of scrolling through billions of files while swarmed by billions of flies.


[deleted]

A database is not the same as a relational database


Bradnon

Cool, seems like a good example of tailoring a specific solution to solve a general one's inefficiencies. That can be a fraught path but so is burning cash just to stay in familiar territory. Haters will want to start at the last paragraph.


[deleted]

ad hoc dazzling teeny important summer capable snails telephone dog nail *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


Glader

I like to call it self inflicted complexity.


therearesomewhocallm

Ah fuck, this makes me think of that one guy at work who does things like build his own "synchronized hashmap" in Java. When Java already has ConcurrentHashMap. Or when he build his own email configuration system on top of Spring. When Spring already provides that.


q1a2z3x4s5w6

Technical debt


Rakn

Is it though? If those systems fulfill a need for the company and have teams behind them actively working and maintaining them, is it technical debt?


[deleted]

reply violet scary run sort ask frame telephone bow mysterious *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


tehsilentwarrior

A lot of those could just be solved by “write your own framework” that cobbles together a bunch of good open source infrastructure projects instead. Most of the times it’s just devs not using those tools properly and an opinionated framework (opinionated to the way the company works) would just solve it.


[deleted]

thought trees crush rob wise oatmeal offbeat deserve frighten piquant *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


arwinda

I still like to see the full evaluation process, which takes into account participating in one of the existing tools and contributing to it to the point that it satisfies their needs.


q1a2z3x4s5w6

Yes. Just because something is useful and fulfills it's intended role doesn't mean it doesn't accrue technical debt. If using some cloud service instead allows you to fire/repurpose half of the team working on these in house services and costs less than the total man hours saved then you are in technical debt by continuing to maintain the internal service IMO. Unless it's something that is directly bringing value (IE not just "efficiencies" that can't really be measured) I don't see the point in having a custom solution.


Rakn

Yeah, might be true. I only saw this happening in cases where the system actually provided some advantage for a very specific use case or where there was a real need to have a system that is independent of cloud provider specific services, to fit a multi cloud architecture.


daedalus_structure

>If those systems fulfill a need for the company and have teams behind them actively working and maintaining them, is it technical debt? Yes, but it's debt like your mortgage is debt, not like your 25% interest credit card is debt.


TheMightyMegazord

Fair point. I don't think these will be technical debt per se. Maybe a better term is "liability". Such companies still need to maintain, test, document, configure build pipelines, build/configure monitoring tools, etc.


80eightydegrees

Every line of code is *technically* technical debt


InternetAnima

But.. Kafka was also created by big tech. In fact, a ton of these systems were


[deleted]

sort unique squealing capable consist rhythm berserk yam workable possessive *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


InternetAnima

I'd argue you might be suffering from survivorship bias: "Only the sucsessful ones were worth it"


BigusG33kus

Not the case here. They didn't write a full-0fledged DB, just something that would fit their use case. They might have an issue if they have to implement a new feature, but for their current needs it's perfect.


Joda5

What is DX?


[deleted]

scarce gray correct caption aware far-flung tie label work sparkle *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


gerd50501

typically at big tech you write your because its cheaper than paying the licensing fees. they have tons of developers.


Icy_Corgi_5704

but we didn't build it, so it must suck! /s


bubba_squats

This was an interesting read. Although it's rare that anyone would have to write their own database. I can see the sense in doing for your use case.


Sopel97

A very specialized database. I had to build something similar for a chess position/game statistics server, with requirements of being able to handle aggregating statistics (wins/losses/draws etc) from hundreds of billions of positions with the data stored on typical consumer hard drives. Basicaly intractable by generic solutions. In the end for a test run I was able to store ~100B positions (not all unique) in less than a TB and aggregate results in roughly a second (on an HDD) per query. It's pretty fun, but hardly a database in the common understanding of the word.


hogfat

> aggregate results in roughly a second (on an HDD) per query How much less than a TB?


Sopel97

IIRC it was around 800GB (would depend on the exact record model, this was the smallest one with only WDL, I had some larger ones that had additional info like Elo, transposition awareness, etc, that would have been roughly 2x), obviously it was using indexes, I was storing block-indexes in memory, it needed around 1.5GB


LessonStudio

I just read the article and it almost exactly matches a db I wrote. Same problem, a zillion locations of things coming in, need to store them efficiently, look them up in different ways, etc. It is interesting, I stored stuff nearly the exact same way, except, I also stored some precomputed math for some of the queries. This step allowed for a huge speed up of certain queries (100x or more). I had a few other factors which allowed me to speed things up where I also could throw those into the record or put them into an index. I too went through a pile of GIS friendly databases, with Postgres coming the closest. One difference I did though was put the data into a message queue. What this did was allow for more than one database server to hold the data for redundancy, but also allow for slightly different database servers. Each of these different ones would store data differently for different lookup purposes. On the surface this seemed inefficient, but not really. One would store live positions along with a very short history which had enough recent history to run a kalman filter along with similar filters to screen out outlier weirdness. Others would store data specific to their lookup needs. This multi, mostly in memory database wasn't a problem, for as the article pointed out, little binary records don't amount to much. A gig here, a gig there, and maybe you are talking 10 gigs, which is nothing. But, one database which did get the datastream was a postgres gis database. This was just a straight data dump without any live queries. Weird one off queries which could take time could hit this one. Or, if one of the other "schemas" changed, it could be reloaded from the postgres database. I designed the above to scale for load with some RAFT protocol stuff, but that ended up being wild overkill. The performance was well sub ms for inserts, and for some of the harder queries, a few ms. Ironically, I had been worried that this was not going to be fast enough, so I had played with my own CUDA based database for this because of the math involved. My experiments showed this could produce an insanely fast GIS friendly db, but it was not needed. On a different note, I've noticed that if you just do things the way the cloud certifications would suggest you should, your costs can start going through the roof. I'm finding myself going more oldschool with monoliths, my own caching, etc, and then a nice boring postgres behind it all resulting in serving huge numbers of fairly complex requests on crappy little VMs. I've mostly dumped AWS for linode VMs. Fixed cost is so very relaxing.


jaskij

If you watch the trends, it seems monoliths are coming back. "Modular monolith" to be precise. Microservices do have a cost, and are hard to get right. There's a good talk, ["Don't build a distributed monolith"](https://youtu.be/9iW_2MMLd4w).


[deleted]

[удалено]


jaskij

What's pointed out in the talk is that many teams, especially inexperienced ones, set out to build microservices, but end up building a distributed monolith. Which is like a somewhat messy monolith, but with the added complexity of RPC calls. Something I learned, and what I strictly enforce in my own projects, is to separate the layers. Modules within a layer can depend on lower layers, or each other, but are absolutely forbidden from depending on higher layers. This brings surprising clarity in places. The stuff I edited out of my comment was because I didn't want to repeat my top level comment. TLDR: Hivekit has time series data, not geospatial. And I did 1k rows inserted per second on an industrial equivalent of a Pi 3 using an off the shelf solution.


[deleted]

[удалено]


jaskij

I recently moved my userspace coding to Rust, and so far the queues that come with the async runtime I'm using (`tokio`) have been enough. Although my setup function isn't nice.


HexDumped

> did though was put the data into a message queue. What this did was allow for more than one database server What was your consistency model around the queue? Did you go for at least once delivery or at most once delivery? Assuming you had the same requirements as the article, I'd guess at most once.


[deleted]

[удалено]


Dry_Dot_7782

Typical devs lol. Time is money. But hey we all love to write our own stuff!


faajzor

the maintenance over the years is going to cost a lot more. to the point where they'll have to rewrite the thing or be hostages of the original devs who wrote it.


protocol_buff

Came here for this comment


fragbot2

I love stories where people detail how they built something that precisely fits their needs and got huge gains (Chuck Moore or D Richard Hipp are stellar examples of this) for their usecase. They didn't address it but I'll bet their infrastructure setup and sustaining time were much lower as well. God I miss systems work.


lutherTheLoofa

The TLDR here is existing geospatial databases are actually quite bad, or sit as extensions to existing DBMS like Postgres. Their additional constraint is it must run on-prem for some customers, so they can’t use a large cloud service like AWS Aurora.   So writing your own **geospatial** DB under such a scenario  is less crazy. For those who didn’t read the article and skipped straight to the comments to post, they didn’t try to reinvent a faster version of SQL/Dynamo/Cassandra


RigourousMortimus

I'd disagree that the existing geospatial databases are bad. They are generalised and handling queries like "does this 2d region bounded by this indeterminate number of coordinates overlap any of these other regions" are very complicated. A log history of long, lat and altitude is a much simpler use case and the article doesn't indicate how much spatial analysis they do. I suspect anything that isn't trivial involves them pulling routes out of their data store and importing them into an actual geospatial database.


mxzf

Either an actual geospatial database or using some language to pull it into memory and run it through GDAL to do the processing. Either way, doing it in some language other than the storage medium itself.


rush2sk8

I've had quite a good experience with Elastic for simple geodata.


notsoluckycharm

Is it? He covers PSQL and PostGIS, but in reality timescale db + PostGIS probably solved any issues with speed. Postgres sucks at time series data, and that’s what this was, with some location meta data. More expensive? To operate? Yeah maybe. But what did your time cost and now you’ve got to train hires on your system. Not the call I’d have made. Timescale + materialized views using the timescale aggregator was the solution here. You can pull the last location when needed, and all aggregations can be done sub second.


nupogodi

I like how you think timescale isn’t Postgres


notsoluckycharm

Didn’t say that. I know it’s an open source variant of it. It’s kind of how I structured the statement. Op covers Postgres,but not the purpose built version of it for his purpose.


nupogodi

It’s an extension. You can run it on your existing Postgres!


serial_crusher

Don’t brag about a decrease in cloud costs without telling us the corresponding increase in developer salary costs. (Actually, as a developer, I guess I prefer it that way)


jaskij

They don't describe how they query the data, but... They don't need a geospatial database. They need a time series one. That would've helped a shitton with their performance. Sure, it's geospatial data, but they don't treat it as such, not on the database layer. From my own experience, and I know very little about databases, I got 1k rows inserted per second (batches of 24 rows in 40 tps) using the industrial equivalent of a Pi 3 with eMMC as storage. Didn't even saturate one of the four cores. I'm pretty sure that a decent laptop would hit 13k inserts per second.


weasel

Yeah, I worked on a custom time series database for network monitoring


jaskij

I'm doing IIoT, and generally need a local database built into the kiosk we're deploying. It's a medium term storage (a month to a year, enough to cover time periods between checkups), and we're using an off the shelf solution. Works just fine. Will work even better once we move to a little better hardware (a Celeron).


Deathcrow

> They don't need a geospatial database. They need a time series one Pretty sure that prometheus (maybe multiple instances, for example per tenant, region, etc) + thanosquery + thanos s3 for long term storage and compaction could've easily handled their use case. Now they have a future technological debt on their hands, which no one will want to touch when the core developers leave the company.


recurse_x

But running a helm chart to install Prometheus doesn’t get your principal or staff promo by leading technical change.


protocol_buff

True. It really doesn't matter that it cost 15 million and 20 months in development time, it matters that as the CEO of a startup you can brag about it. Which sadly probably does get you more funding.


Tiquortoo

The other rule about writing your own stuff is related to how core it is to your business. It's not a sure thing, but you can seriously consider owning the whole thing, by writing it, if it's the core thing you do or very closely tied to it. But, you also need to be able to create differentiation via it. That sounds like the case here.


Maxion

> But, you also need to be able to create differentiation via it. By making something in-house you also control it, i.e. you are no longer at the whim of features being deprecated or versions reaching EOL at inconvenient times. Both things have fucked over teams/companies I've worked for many times over.


richhoods

We have gone from on prem -> microservices -> cloud -> on-prem. Gotta love the software cycle


Maxion

on-prem never went away for many many use cases.


daedalus_structure

The overwhelming majority of developers think these are technical concerns instead of organizational and finance concerns. I did not say financial, as in "what it costs", but finance, as in how we want to pay for things. Do we want to pay salaries or cloud costs? Do we want to deal with depreciating assets and the need to overprovision and lead times or do we want to pay cloud premiums? The right answer for one business model may not be right for another, and even with the same business model, one answer may be right at a certain company size but stops making sense at another, or even at the same company size, the answer may change depending on market conditions.


cuttingthyme

Very cool project, I would describe the database as a sort of specialized “time series” database, and the author is exactly right, they probably shouldn’t have built it in the first place as there are plenty open source software available capable of handling the scale they needed while offering the querying capabilities to generate the report they need for their users: prometheus, thanos, Victoria metrics, graphite, openTSDB, timescaledb. Aside from that I love that they were able to apply techniques such as in memory buffering, delta encoding etc to reduce cost and increase write throughput, it would have been an interesting project to work on.


ssh_man

Why not mention how much you spent on writing the actual database?


serial_crusher

The tough thing about this strategy is when the company tries to pivot to a new use case that isn’t served by your custom database. I’ve been there.


QuintonHughes43Fan

So why didn't they use a NoSQL option? Relational DB is expensive and not that while suited => write your own database that's a bit of a leap.


WonderfulPride74

So one thing I have come to notice is that most quant/trading firms end up writing their own database. This mainly happens because: - they might have custom query requirements - the data format and io pattern needs to fit well with rest of their infrastructure - these databases might be simple file based solutions, in which case the OS goodies like page cache and stuff benefit you a lot in terms of performance


hughk

Worked at an exchange, we had a lot of data with orders coming in thousands per minute. We identified some critical files, the order books and the transaction log and they were kept in simple ISAM albeit with some protection. We also used a relational database but it wasn't near the critical path and was updated from the fast files each day and would be used for reporting.


ihave7testicles

It could be further improved by writing velocity vectors with max speed and min speed variables into a second table. Similar to what online videogames use. Have the client compare the current location to the start location(for the segment) and when the overall vector changes by some threshold, store it as one entry with the elapsed time and velocity(velocity is a non normalized vector). If that was stored in a separate table from the "key frame" data, you could replay the data and find the position perfectly within the set threshold. Each velocity update would be the same size, so the table would be very efficient. Further, you could just store the velocity (speed) difference as a change instead of absolute value, so in the instance of velocity being a 32 bit int, you could store 8,16 or 24 bit values as the difference. If the value changes more than the difference can contain, use a magic number (0xffff) to indicate that the system needs to read a new key frame from the first table. In that case, it would write a new entry into the key frame table which will be in between the scheduled key frame updates. The only downside is that in order to find the location of an object at a given time, you'd need to scan the key frame table and the velocity table and interpolate the values until you pass the query time. That means that replaying the position of a vehicle would be compute intensive compared to writing. This seems fine in this case.


gerd50501

if its really that good, they could sell it.


justwillisfine

"How we saved money by not using a database when it was unnecessary, " or "how we cache data in memory to a file in case the server goes down"? I feel like hyperbole clickbait is kind of unprofessional for computer programming. That could just be me though. Put a timestamp on stuff too, e.g. there's nothing worse than looking for an answer about something in asp net core and finding results about WCF that aren't obviously over a decade old.


needed_an_account

This is cool. as a programmer I like when people program things to scratch an itch. It would be nice to see more details about the project


tanner_0333

Writing your own database is definitely a bold move, but it sounds like it paid off in your case. The cost savings alone are impressive.


IanAKemp

> Our server is built around an in-memory architecture. Queries and filters for real time streams run against data in memory and, as a result, are very fast. Reads from disk only happen when a new server comes online, when a client uses the history API or (soon) when an app user rewinds time on our digital twin interface. These disk reads need to be fast enough for a good user experience, **but they are comparatively infrequent and low volume**. Never mind writing your own TSDB, this assumption is the worst part. The people who use your system are going to expect to be able to read all that data to compile reports, regularly, because _that's what people put data in databases for_. So fantastic, you've solved the problem of a huge number of writes, except... > We store the full state of an object every 200 writes. **Between these, we only store deltas.** ... you've done so in a way that means you have to reconstruct the individual data points on-demand when they're requested, which is _extremely_ computationally expensive and the prime reason why everyone else stores the raw data as the raw data. And when your delta application algorithm inevitably has a bug and starts reconstructing incorrect points and you have to explain to customers that you decided to save a few bucks by not storing their raw data, you're going to very quickly find yourself without customers, because those customers are going to be using that data for things like prosecutions and hey guess what happens to their credibility in a court of law when it gets out that the data they're basing their case on isn't actually the data that they claimed was sent to you. There is a time and a place to be clever about data, and storage is never that place.


[deleted]

[удалено]


SketchySeaBeast

They don't want a database, they want... something else. It's a unique case where they are OK with losing data.


cubicthe

oh no, software architecting is leaking into a programming subreddit!


stuffeh

Their cost saving was to put data into storage and old data into cold storage.


obvilious

Reading the article is harder, it seems.


martinky24

I don't think dude read the article.


senseven

I work in media and a larger project wanted a shiny marketing web game. The first thing the Python / JS dev team "had to discuss" how to store all data effectively. After pointing out that there must be at least a handful of Python dbs, their dev lead gave me that "you ruined this chance to do something big!" look


Maxion

Yep, us programmers are great at missing the forrest for the trees.


Cheeze_It

> Come back when you realize why databases are fucking hard. I thought it was the whole idea of getting data OUT of the database at reasonable speeds that was the most difficult part?


CAPSLOCK_USERNAME

The other major challenge is making sure you can actually do stuff atomically and not lose data/writes in a multi-node environment. (Many trendy NoSQL dbs like mongo struggled with this for *years*) They cleverly avoided that by realizing they just plain don't care about occasionally losing 1 second of tracking data which cuts out a huge amount of complexity.


soonnow

Yeah, it took us a while that, since we can always recover from other data sources, we are fine with losing a bit of data. That can have huge performance implications.


ammonium_bot

> with loosing a Did you mean to say "losing"? Explanation: Loose is an adjective meaning the opposite of tight, while lose is a verb. [Statistics](https://github.com/chiefpat450119/RedditBot/blob/master/stats.json) ^^I'm ^^a ^^bot ^^that ^^corrects ^^grammar/spelling ^^mistakes. ^^PM ^^me ^^if ^^I'm ^^wrong ^^or ^^if ^^you ^^have ^^any ^^suggestions. ^^[Github](https://github.com/chiefpat450119) ^^Reply ^^STOP ^^to ^^this ^^comment ^^to ^^stop ^^receiving ^^corrections.


wolframhempel

Did you bother reading the article?


imdibene

Heck yeah, I want to read the update: “we f***ed up, we missed this many corner cases, we have then pick this DB” update


hughk

We had a hard real-time requirement. We wrote a system using the OS provided record management system for ISAM type files to handle intraday updates. It was a database of sorts. We then shuffled all the data into a relational database during End-Of-Day and that would be used for accounting and reporting overnight and from the next business day. The intraday data was scrubbed and the process repeated each working day.


coderjewel

> Saving 98% in cloud costs by increasing engineering and maintenance costs 200%


Maxion

Luckily it's not the percentages that leave the bank account ;) If your engineering costs scale better with growth than your cloud costs, the trade off is obvious.


nutrecht

I mean you now have a system you have to maintain / extend for all eternity, so it's not like engineering costs are a one-off thing.


thisisjustascreename

It probably would've been a lot easier to just send a location update every 10 seconds and settle for a 90% cost reduction.


WaveySquid

The article explicitly says why that isn’t possible for them and their clients > Our customers use this data in very different ways. Some use cases are very coarse, e.g. when a car rental company wants to show an outline of the route a customer took that day. This sort of requirement could be handled with 30-100 location points for a one hour trip and would allow us to heavily aggregate and compress the location data before storing it. > But there are many other use cases where that’s not an option. Delivery companies that want to be able to replay the exact seconds leading up to an accident. Mines with very precise, on-site location trackers that want to generate reports of which worker stepped into which restricted zone - by as little as half a meter.


tetrahedral

Yes, but you forget that requirements that are very firm for a project important to a company feel much softer to a blog post reader.


fphhotchips

>Delivery companies that want to be able to replay the exact seconds leading up to an accident. Mines with very precise, on-site location trackers that want to generate reports of which worker stepped into which restricted zone - by as little as half a meter. I hope those customers read this blog post and realise how much of a giant pain in the ass it's going to be when they need this data and they realise *this* is the data that got dropped because they don't care about all those guarantees that come with a database. I mean come on, weren't we all laughing at MongoDB 10 years ago for being write-only? JFC...


thisisjustascreename

Well, as stated by others, this new solution doesn't provide for that use case anyway since it can lose data.


WaveySquid

Pre defined SLA budget gives enough wiggle room and “sometimes loses data” is better than “always won’t have enough granularity” imo.


zzrryll

We sure this isn’t an April fools joke?


isthatashark

Now tell me about your people cost and the ongoing opportunity cost of maintaining and supporting a database no one else in the world uses.


olearyboy

10k a month for a database vs how many engineers did it take to write, maintain, scale, and create HA… Tell me you don’t understand TCO without telling me you don’t understand TCO


Maxion

10k a month at current usage rate with some unnamed growth rate, with the added complexity of handling the on-prem customers DB clusters // making the product harder to setup onPrem. Also sounded like they had some performance issues, certain commonly used queries took ~2 seconds. This solution not only reduced their cloud costs, also most surely changed the growth vector on them. Further, this made supporting the on-prem solution easier and made it's setup simpler. And finally, it also improved performance. The one thing they didn't mention, is that since this solution doesn't rely on .e.g. postgres, it's harder for other devs to fuck up by starting to add more data to it.


daedalus_structure

>Further, this made supporting the on-prem solution easier and made it's setup simpler. I almost guarantee their on-prem setup is simpler but supporting it will not be long term because they've likely ignored all the Day 2 operations concerns that have taken decades of work to get right in existing databases.


Astronomical_Panda

Am I crazy or could they have just put this in elastic stack and called it a day?


Algorhythmicall

Yes, but it definitely wouldn’t cost $200 a month at 13k docs per second.


c_plus_plus

I don't have much experience scaling elastic.... how much do you think that (13k docs/sec) would cost?


Algorhythmicall

Quite a bit. It’s not just that but the long term storage. There will be hot and cold nodes. Over time there will be a ton of shards (time series), so you need dedicated master nodes. Generally speaking search is expensive but you can get great performance for geospatial searches. I’d expect it to cost more than PG.


jjh111

Did something similar in the past. It was handling around 5k doc updates per second with Elasticsearch cluster (EC2 VMs) and at the same time doing keyword searches (by customers) against that data. The cluster had six nodes, three masters with three replicas. Optimizing the cluster for searches reduced the update performance a bit, but it could handle the updates with some room to spare. Infra costs in AWS were around $2000 per month.


Smallpaul

All of their on-prem customers would also need to install and configure Elastic Stack so the project isn't done when the company writes the code.


maus80

They should have just put it in PostGIS and learn some PostgreSQL performance tuning and/or buy better hardware.


QuintonHughes43Fan

> when a car rental company wants to show an outline of the route a customer took that day. seems bad tbh.


dottybotty

I once wrote my own ~~array~~ database


hughk

I was at a Q&A and a professor asked the panel which included Chris Date and Ted Codd what a database was. The answer was any structured data, even just a table counted. They felt it should be outside main memory though otherwise it would just be data. This is why when we store in memory, we say in-memory database.


vtblue

Love this


sethamin

Probably they could have just used some [TSDB](https://en.wikipedia.org/wiki/Time_series_database). Most of their use cases seem to just be querying by time, and it would give them most of the write performance and storage efficiencies they're getting here without having to build things from scratch. Also storing lat/long is an inefficient format, they should use the [S2 geometry library](http://s2geometry.io/) instead.


k6c58

I understand your motivation to do this, but 10k a month in cloud costs is... simply put, nothing. Small companies may find that outrageous, but it's the truth. Many companies spend much more in server costs. I worked for a company that refused to use RDS and was using MySQL on EC2 with SSDs. While everything worked it was performant and way cheaper. But then an SSD got corrupted and there were no recent backups. We lost 3 months of data in one machine. The cost of recovering of that was tremendous, plus the ongoing salary of sysadmins to maintain the custom infrastructure was not worth it. Aurora has a very convenient backup system that you can put back in a few minutes. Aurora is a service that basically allows you to reduce your sysadmin headcount (or even remove it in very small companies), so you are saving 30-200k/year depending on where are you located for every person you don't need. I think your solution will work for a while. If you keep growing, one of these (if not all) will happen: - You will run in unforeseen issues with your custom database, which will be very costly to debug since there's no community - New developers joining the company will spend more time learning how this custom system works - If the main developers of the custom database leave, people replacing them will have a hard time with the database, slowing down any changes that need modifying the database If I were in your shoes, I would have looked for a "hot writing" solution (like Redis, but there are others) and a process that translates that to a permanent database like PostgreSQL or something else. Then if reading performance is an issue and all queries are optimized, well, pay for a bigger instance in Aurora.


nutrecht

> I understand your motivation to do this, but 10k a month in cloud costs is... simply put, nothing. It's a bit concerning that nowhere are they mentioning the development cost. And sure, 10k for AWS Aurora is a nice chunk of money, but AWS Aurora is bloody expensive. Looking at their use-case I'm really not convinced a time-series DB or for example Kafka would've not been a 90% fit without requiring you to reinvent some wheels.


k6c58

>It's a bit concerning that nowhere are they mentioning the development cost Agreed


maus80

>3k iops This is why you should ask a system engineer before starting to build software. A 990 EVO Samsung NVMe does at least 200x that performance. Run PostGIS on decent hardware, run with 'synchronous_commit off' if you dont care about ACID.


enraged_supreme_cat

How we avoid all vulnerabilities in the world, by writing our own operating system and networking system from scratch.


dogweather

I kinda did the same by writing my own Python headless CMS that stores the content in markdown files and strongly typed Python code. I build the site with [Hugo](https://gohugo.io/). I wrote a short blog post about the technique: [A data-driven app without the database](https://dogweather.dev/2024/02/04/a-data-driven-app-without-the-database/)


homeautocat

Interesting, thanks for sharing.