Sort
Profile photo for Assistant
Assistant

Distributed databases like Dynamo and Voldemort use vector clocks to manage the ordering of multi-version records primarily to address challenges related to distributed systems, such as concurrent updates and eventual consistency. Here are the key reasons and advantages of using vector clocks over direct timestamps:

1. Handling Concurrent Updates

  • Vector Clocks: They allow the system to track causality between different versions of a record. Each node maintains a vector clock, which is essentially an array of counters, one for each node in the system. When a node updates a record, it increments i

Distributed databases like Dynamo and Voldemort use vector clocks to manage the ordering of multi-version records primarily to address challenges related to distributed systems, such as concurrent updates and eventual consistency. Here are the key reasons and advantages of using vector clocks over direct timestamps:

1. Handling Concurrent Updates

  • Vector Clocks: They allow the system to track causality between different versions of a record. Each node maintains a vector clock, which is essentially an array of counters, one for each node in the system. When a node updates a record, it increments its own counter in the vector clock. This enables the system to determine if one version of a record is causally related to another (i.e., if one update happened before another).
  • Direct Timestamps: Using a single timestamp can lead to ambiguity in the case of concurrent updates. If two nodes update the same record at the same time, it becomes difficult to determine which update should take precedence without additional context.

2. Causal Relationships

  • Vector Clocks: They allow for the representation of causal relationships between updates. If one update can be determined to have happened before another, the system can resolve conflicts more intelligently.
  • Direct Timestamps: Timestamps do not inherently provide information about the causal relationship between updates. This can lead to conflicts that are harder to resolve, especially in the presence of network partitions.

3. Conflict Resolution

  • Vector Clocks: When conflicts arise (e.g., when two nodes concurrently modify the same record), vector clocks can help identify the conflicting versions. The system can either merge the changes or present multiple versions of the record to the user, allowing for application-level conflict resolution.
  • Direct Timestamps: With direct timestamps, if two updates happen simultaneously, the system may struggle to determine which version is more "recent," leading to a simpler but less effective resolution mechanism.

4. Scalability

  • Vector Clocks: As the system scales and more nodes are added, vector clocks can still effectively track the state of updates across all nodes without losing the context of causality.
  • Direct Timestamps: More nodes can lead to clock synchronization issues, especially if relying on physical clocks. This can introduce inconsistencies and complicate the ordering of updates.

Summary

In summary, vector clocks provide a robust mechanism for tracking the versioning and ordering of records in distributed databases, effectively managing concurrency and enabling intelligent conflict resolution. Direct timestamps, while simpler, lack the necessary context for handling the complexities of distributed systems, leading to potential ambiguities and challenges in maintaining consistency.

Profile photo for Henry Robinson

Firstly, as others have pointed out, using the physical timestamp of a record is no good because you can't guarantee that time will move forward in a uniform manner across all nodes in the cluster, which means a 'later' record might have an earlier timestamp than some record produced before it.

You could instead use a logical timestamp, otherwise known as a Lamport Clock. A Lamport Clock is a single integer value that is passed around the cluster with every message sent between nodes. Each node keeps a record of what it thinks the latest (i.e. highest) Lamport Clock value is, and if it hears a

Firstly, as others have pointed out, using the physical timestamp of a record is no good because you can't guarantee that time will move forward in a uniform manner across all nodes in the cluster, which means a 'later' record might have an earlier timestamp than some record produced before it.

You could instead use a logical timestamp, otherwise known as a Lamport Clock. A Lamport Clock is a single integer value that is passed around the cluster with every message sent between nodes. Each node keeps a record of what it thinks the latest (i.e. highest) Lamport Clock value is, and if it hears a larger value from some other node, it updates its own value.

Every time a database record is produced, the producing node can attach the current Lamport Clock value + 1 to it as a timestamp. This sets up a total ordering on all records with the valuable property that if record A may causally precede record B, then A's timestamp < B's timestamp.

By 'causally precede', I mean that the node that produced A may have sent some messages which caused another node to send messages which caused another node to... etc. until the node that produced B receives a message, before it actually creates B, which may have originated at A. The idea being that we want to capture whether the production of A may have influenced B at all, and we do that by tagging this timestamp on to all messages we send, which are the only mechanisms we have for affecting the behaviour of other nodes in the system.

So Lamport Clocks are great for making sure we know when there's a causal dependency between records. But there is a problem. Because Lamport Clocks induce a total ordering over all records, they actually imply more dependencies than truly exist. If two records are not causally related at all, and were produced completely independently by separate nodes that did not communicate, they will still have Lamport Clock timestamps which imply that one is ordered before the other, which is a false positive. For some applications this is fine, but not for Dynamo.

The reason is that Dynamo wants to know if two messages cannot possibly be causally related. This situation, which Lamport Clocks can not detect, arises in Dynamo when there is the possibility of a conflict between two record versions, arising perhaps due to a partition. So instead we need a timestamp type that admits only a partial ordering, so we can detect when two timestamps are not ordered with respect to each other.

Vector clocks allow us to do that. Roughly speaking, a VC timestamp A is less than a VC timestamp B if all members of A are less than or equal to their counterpart in B. So we can detect causal dependency. But if some but not all of the individual timestamps in A are less-than-or-equal-to, and some are greater than their counterparts in B, A and B cannot be causally related.

Of course, it's the way that the individual timestamps get incremented in Vector Clocks that give rise to this property; you can't just pick a datatype and a partial ordering and assume that this will work. But VCs can be viewed simply as an array of Lamport Clocks, one per node, and so simply and elegantly generalise Lamport Clocks at the cost of space to store them.

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.

Overpaying on car insurance

You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.

If you’ve been with the same insurer for years, chances are you are one of them.

Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.

That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.

Consistently being in debt

If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.

Here’s how to see if you qualify:

Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.

It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.

Missing out on free money to invest

It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.

Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.

Pretty sweet deal right? Here is a link to some of the best options.

Having bad credit

A low credit score can come back to bite you in so many ways in the future.

From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.

Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.

How to get started

Hope this helps! Here are the links to get started:

Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit

Profile photo for Bankim Bhavsar

Clocks across different nodes can be skewed and hence time-stamps aren't reliable. Also in optimistic concurrency control system vector clocks i.e. logical clocks help establish happens-before relationship for events even in the case of network partitions.

For e.g.
Suppose there are 2 nodes A and B in the system. For a particular object, vector clock starts with 0,0; where 1st zero is version of object on node A and 2nd zero is version of object on node B.

When node A updates the object it can only increment its own version i.e. vector clock will be 1,0. Now suppose because of network partition

Clocks across different nodes can be skewed and hence time-stamps aren't reliable. Also in optimistic concurrency control system vector clocks i.e. logical clocks help establish happens-before relationship for events even in the case of network partitions.

For e.g.
Suppose there are 2 nodes A and B in the system. For a particular object, vector clock starts with 0,0; where 1st zero is version of object on node A and 2nd zero is version of object on node B.

When node A updates the object it can only increment its own version i.e. vector clock will be 1,0. Now suppose because of network partition update from node A doesn't reach node B. So when node B locally updates the object, the version will be 0,1. After network is restored when update from A to B is propagated, B will compare vector 1,0 sent by A with its local vector 0,1. For the update to be accepted from a remote node, all elements of the remote vector should be greater than or equal to each element in the local vector. In this case the vector sent by A isn't greater than or equal to B's local vector and hence there is a conflict. This way happens-before relationship is established.

Recommended readings:

  1. Leslie Lamport on Time, Clocks and Ordering of events: http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf
  2. Optimistic Replication: http://research.microsoft.com/pubs/66979/tr-2003-60.pdf
Profile photo for Avinash Gupta Konda

In distributed systems we cannot trust clocks. Because different machines might have different clocks. However what we can trust is the ordering of events.

Now the big issue is how to detect that system is not aware of the ordering of events.

This whether vector clocks help to ascertain whether system is able to order the events or not and commonly lives it up to client for reconciliation when it cannot order the events.

Please refer the explanation below. I use the term causal ordering for ordering of events.

Vector’s clocks are used when you are maintaining multiple replicas and the repliacs get

In distributed systems we cannot trust clocks. Because different machines might have different clocks. However what we can trust is the ordering of events.

Now the big issue is how to detect that system is not aware of the ordering of events.

This whether vector clocks help to ascertain whether system is able to order the events or not and commonly lives it up to client for reconciliation when it cannot order the events.

Please refer the explanation below. I use the term causal ordering for ordering of events.

Vector’s clocks are used when you are maintaining multiple replicas and the repliacs get diverged. In this case replicas got diverged mainly for the reasons that you ended allowing the client to perform the update when the replicas were not in sync in the very first place. This kind of requirements are valid whether you want 100% availability irrespective of all repliacs in sync or not.

Now the issue is because you allowed writes to the repliacs that were already diverged there is no causal ordering retained meaning the order of writes is lost.

Typically what the systems do is they expose this causal ordering violation to clients when it reads (all relicas came up because they recovered from failure) and place the reconciliation responsibility to client. The use of vector clocks comes in detecting this causal ordering violation. I’m aware of 2 systems (not exactly file systems) that incorporate this vector clocks for detection Riak and Amazon Dynamo.

Let me give the simple example of how this comes in play:

Assume client C1 creates new object D1 and it goes to replica Sx.

Let’s call Sx, Sy,Sz which hold the replicas.

The vector clock is (Sx, 1).

C1 writes updates the object and let’s call this mutation D2.Lets even assume the write goes to same replica Sx. VC is (Sx,2).

At this point, Sy and Sz are aware of these mutations D1 and D2. So they know D2 is the latest.

Let’s say client updates again. Call it mutation D3. This time write goes to Sy.

It’s vector clock is [(Sx,2), (Sy, 1)]. Now due to communication failures assume Node Sz is not aware of the mutation D3.

Let’s say client C2 updates again and call it mutation D4. This time writes goes to Sz. It’s vector clock is [(Sx, 2) , (Sz, 1)].

Now the important catch here is copies at Sy and Sz have already diverged. Meaning the system has some parts which is unware of D3(Sz, Sx) and has some parts which is unware of D4(Sy, Sx).

This is what violation of causal ordering means as aforementioned. Let’s assume some client C3 reads both D3 and D4. The context is now just a summary of clocks and the detection of violation. C3 get the following VC = [(Sx,2), (Sy, 1)],

[(Sx, 2) , (Sz, 1)]. Or simply put [(Sx,2), (Sy, 1), (Sz, 1)].

Client C3 performs the reconciliation . From the summary of VC’s received client C3 can assert that there is causal ordering violation. Hence the reconciliation.

It can perform mutation now. If Sx receives it then new VC at Sx is [(Sx,3), (Sy,1), (Sz,1)].

If Sy happen to receive it then new VC at Sy is [(Sx,2) , (Sy,2), (Sz, 1)].

To summarize, given 2 VC’s Object1[(a, id1), (b, id2) ,…] and Object2[(a, idx, (b, idy)…]

we say they are causally ordered and hence the system knows the last overwrite

if id of each of the marker in first object is lesser then or equal id of the same corresponding marker in the 2nd object.

Your response is private
Was this worth your time?
This helps us sort answers on the page.
Absolutely not
Definitely yes
Profile photo for Christopher Smith

Honestly, Henry's answer is quite good, and you could really just stop there. I'd like to add one additional point to the discussion about a timestamp based solution.

Using timestamps isn't necessarily bad iff you address three requirements/constraints that tend to be outside the default contract (which doesn't reflect pragmatic real world behaviour most of the time).

  • Your timestamps are monotonically increasing within a given database node/thread. This is not difficult to achieve as time for the most part behaves like this. In the cases where it does not, you can either take the node temporaril

Honestly, Henry's answer is quite good, and you could really just stop there. I'd like to add one additional point to the discussion about a timestamp based solution.

Using timestamps isn't necessarily bad iff you address three requirements/constraints that tend to be outside the default contract (which doesn't reflect pragmatic real world behaviour most of the time).

  • Your timestamps are monotonically increasing within a given database node/thread. This is not difficult to achieve as time for the most part behaves like this. In the cases where it does not, you can either take the node temporarily offline/read-only or simply have logic where next_timestamp = (current < last) ? ++last : current. As long as your timestamps are sufficiently granular, this is not a tough sell.
  • Should timestamps for two mutations precisely match, you have a heuristic that defines which one happened first (e.g. the operation that happened on the lowest node ID wins). Combined with the previous point, this gives you something akin to a Lamport Clock, but with much less overhead.
  • Most importantly: you loosen constraints such that for two mutations sent to two different nodes/threads which occur within the maximum clock skew inside the database, the order of the mutations is unknown. Essentially, the database gets to choose the order.


This is fairly close to the contract guarantees with Cassandra if you are using server side timestamps. The one exception is that Cassandra's logic for resolving matching timestamps occurs at the field level instead of the operation level, which breaks the otherwise strong guarantees it makes about write atomicity. If when confronted with a timestamp collision it were to simply compare node ID's and declare in favour of the lowest (or highest), it could provide its atomicity guarantees within the bounds of "eventual consistency".

Such an approach is essentially a special case of a vector clock with a simplified heuristic. It reflects the practical realities of how most mutations are reflected and resolved in the real world, is resilient in the face of partitioning, and can be implemented quite efficiently.

Note that technically the first constraint isn't needed, but without it application developers would find the interface semantics significantly more confusing.

Profile photo for Quora User

Here’s the thing: I wish I had known these money secrets sooner. They’ve helped so many people save hundreds, secure their family’s future, and grow their bank accounts—myself included.

And honestly? Putting them to use was way easier than I expected. I bet you can knock out at least three or four of these right now—yes, even from your phone.

Don’t wait like I did. Go ahead and start using these money secrets today!

1. Cancel Your Car Insurance

You might not even realize it, but your car insurance company is probably overcharging you. In fact, they’re kind of counting on you not noticing. Luckily,

Here’s the thing: I wish I had known these money secrets sooner. They’ve helped so many people save hundreds, secure their family’s future, and grow their bank accounts—myself included.

And honestly? Putting them to use was way easier than I expected. I bet you can knock out at least three or four of these right now—yes, even from your phone.

Don’t wait like I did. Go ahead and start using these money secrets today!

1. Cancel Your Car Insurance

You might not even realize it, but your car insurance company is probably overcharging you. In fact, they’re kind of counting on you not noticing. Luckily, this problem is easy to fix.

Don’t waste your time browsing insurance sites for a better deal. A company called Insurify shows you all your options at once — people who do this save up to $996 per year.

If you tell them a bit about yourself and your vehicle, they’ll send you personalized quotes so you can compare them and find the best one for you.

Tired of overpaying for car insurance? It takes just five minutes to compare your options with Insurify and see how much you could save on car insurance.

2. Ask This Company to Get a Big Chunk of Your Debt Forgiven

A company called National Debt Relief could convince your lenders to simply get rid of a big chunk of what you owe. No bankruptcy, no loans — you don’t even need to have good credit.

If you owe at least $10,000 in unsecured debt (credit card debt, personal loans, medical bills, etc.), National Debt Relief’s experts will build you a monthly payment plan. As your payments add up, they negotiate with your creditors to reduce the amount you owe. You then pay off the rest in a lump sum.

On average, you could become debt-free within 24 to 48 months. It takes less than a minute to sign up and see how much debt you could get rid of.

3. You Can Become a Real Estate Investor for as Little as $10

Take a look at some of the world’s wealthiest people. What do they have in common? Many invest in large private real estate deals. And here’s the thing: There’s no reason you can’t, too — for as little as $10.

An investment called the Fundrise Flagship Fund lets you get started in the world of real estate by giving you access to a low-cost, diversified portfolio of private real estate. The best part? You don’t have to be the landlord. The Flagship Fund does all the heavy lifting.

With an initial investment as low as $10, your money will be invested in the Fund, which already owns more than $1 billion worth of real estate around the country, from apartment complexes to the thriving housing rental market to larger last-mile e-commerce logistics centers.

Want to invest more? Many investors choose to invest $1,000 or more. This is a Fund that can fit any type of investor’s needs. Once invested, you can track your performance from your phone and watch as properties are acquired, improved, and operated. As properties generate cash flow, you could earn money through quarterly dividend payments. And over time, you could earn money off the potential appreciation of the properties.

So if you want to get started in the world of real-estate investing, it takes just a few minutes to sign up and create an account with the Fundrise Flagship Fund.

This is a paid advertisement. Carefully consider the investment objectives, risks, charges and expenses of the Fundrise Real Estate Fund before investing. This and other information can be found in the Fund’s prospectus. Read them carefully before investing.

4. Earn Up to $50 this Month By Answering Survey Questions About the News — It’s Anonymous

The news is a heated subject these days. It’s hard not to have an opinion on it.

Good news: A website called YouGov will pay you up to $50 or more this month just to answer survey questions about politics, the economy, and other hot news topics.

Plus, it’s totally anonymous, so no one will judge you for that hot take.

When you take a quick survey (some are less than three minutes), you’ll earn points you can exchange for up to $50 in cash or gift cards to places like Walmart and Amazon. Plus, Penny Hoarder readers will get an extra 500 points for registering and another 1,000 points after completing their first survey.

It takes just a few minutes to sign up and take your first survey, and you’ll receive your points immediately.

5. Get Up to $300 Just for Setting Up Direct Deposit With This Account

If you bank at a traditional brick-and-mortar bank, your money probably isn’t growing much (c’mon, 0.40% is basically nothing).

But there’s good news: With SoFi Checking and Savings (member FDIC), you stand to gain up to a hefty 3.80% APY on savings when you set up a direct deposit or have $5,000 or more in Qualifying Deposits and 0.50% APY on checking balances — savings APY is 10 times more than the national average.

Right now, a direct deposit of at least $1K not only sets you up for higher returns but also brings you closer to earning up to a $300 welcome bonus (terms apply).

You can easily deposit checks via your phone’s camera, transfer funds, and get customer service via chat or phone call. There are no account fees, no monthly fees and no overdraft fees. And your money is FDIC insured (up to $3M of additional FDIC insurance through the SoFi Insured Deposit Program).

It’s quick and easy to open an account with SoFi Checking and Savings (member FDIC) and watch your money grow faster than ever.

Read Disclaimer

5. Stop Paying Your Credit Card Company

If you have credit card debt, you know. The anxiety, the interest rates, the fear you’re never going to escape… but a website called AmONE wants to help.

If you owe your credit card companies $100,000 or less, AmONE will match you with a low-interest loan you can use to pay off every single one of your balances.

The benefit? You’ll be left with one bill to pay each month. And because personal loans have lower interest rates (AmONE rates start at 6.40% APR), you’ll get out of debt that much faster.

It takes less than a minute and just 10 questions to see what loans you qualify for.

6. Lock In Affordable Term Life Insurance in Minutes.

Let’s be honest—life insurance probably isn’t on your list of fun things to research. But locking in a policy now could mean huge peace of mind for your family down the road. And getting covered is actually a lot easier than you might think.

With Best Money’s term life insurance marketplace, you can compare top-rated policies in minutes and find coverage that works for you. No long phone calls. No confusing paperwork. Just straightforward quotes, starting at just $7 a month, from trusted providers so you can make an informed decision.

The best part? You’re in control. Answer a few quick questions, see your options, get coverage up to $3 million, and choose the coverage that fits your life and budget—on your terms.

You already protect your car, your home, even your phone. Why not make sure your family’s financial future is covered, too? Compare term life insurance rates with Best Money today and find a policy that fits.

Profile photo for Nicolae Marasoiu

Timestamps are not reliable given the clock skew, timezone interpretation risks and granularity. Vector clocks are the most used way to manage a causal ordering of your events, which is a partial order (the events which do not have an order between them are called concurrent events). Thus it has the extra advantage of not putting excessive order between the events, and this creates opportunities in concurrent processing of those events for various purposes.

Now depending on your system, you may use normal timestamps, Lamport timestamps, vector clocks or other means and you can combine them, usi

Timestamps are not reliable given the clock skew, timezone interpretation risks and granularity. Vector clocks are the most used way to manage a causal ordering of your events, which is a partial order (the events which do not have an order between them are called concurrent events). Thus it has the extra advantage of not putting excessive order between the events, and this creates opportunities in concurrent processing of those events for various purposes.

Now depending on your system, you may use normal timestamps, Lamport timestamps, vector clocks or other means and you can combine them, using vectors for some features which profit from distributed processing, and linear time for any other features if it makes more sense. There is always a tradeoff involved, between scalability, simplicity, feature richness, flexibility, and so on.

Profile photo for Yuan Xiaodan

Only a few database use vector clock because of CAP principle.

Any database which wants to ensure high availablility has to sacrifice consistency.For exemple, Dynamo use eventual consistency which is much weaker than sequential consistency.

Only a few database use vector clock because of CAP principle.

Any database which wants to ensure high availablility has to sacrifice consistency.For exemple, Dynamo use eventual consistency which is much weaker than sequential consistency.

Profile photo for Fiverr

The best way to find the right freelancer for digital marketing is on Fiverr. The platform has an entire category of professional freelancers who provide full web creation, Shopify marketing, Dropshipping, and any other digital marketing-related services you may need. Fiverr freelancers can also do customization, BigCommerce, and Magento 2. Any digital marketing help you need just go to Fiverr.com and find what you’re looking for.

Profile photo for Dudez Mobi.vrsc

Distributed ledger technology (DLT) represents a paradigm shift in how data is stored, verified, and transferred, offeringg a stark contrast to traditional centralized databases. Understanding the fundamental principles of DLT and how it differs from centralized systems is key to appreciating its potential impact across various industries. Here's a breakdown of the core principles and differences:

Fundamental Principles of Distributed Ledger Technology (DLT)

  1. Decentralization: Unlike centralized databases managed by a single entity, DLT operates across a network of distributed nodes. Each node ho

Distributed ledger technology (DLT) represents a paradigm shift in how data is stored, verified, and transferred, offeringg a stark contrast to traditional centralized databases. Understanding the fundamental principles of DLT and how it differs from centralized systems is key to appreciating its potential impact across various industries. Here's a breakdown of the core principles and differences:

Fundamental Principles of Distributed Ledger Technology (DLT)

  1. Decentralization: Unlike centralized databases managed by a single entity, DLT operates across a network of distributed nodes. Each node holds a copy of the ledger, ensuring no single point of control or failure. This decentralization enhances security and resilience againts attacks or data corruption.
  2. Transparency and Immutability: Changes to a distributed ledger are visible to all participants and require consensus before they can be appended. Once recorded, the data is immutable; it cannot be altered or deleted, ensuring a tamper-evident record of all transactions.
  3. Consensus Mechanisms: DLT employs various consensus algorithms (e.g., Proof of Work, Proof of Stake) to validate transactions and achieve agreement among network participants on the ledgger's state. This ensures trust in the data's accuracy and integrity without the need for a central authority.
  4. Smart Contracts: Some distributed ledgers (e.g., blockchain) support smart contracts—self-executing contracts with the terms of the agreement directly written into code. Smart contracts automate and enforce contractual agreements based on predefined rules, further reducing reliance on intermediaries.
  5. Security: The distributed nature of the ledger, combined with cryptographic techniques (like hashing and digital signatures), secures the data against unauthorized access and fraud. Even if one or more nodes are compromised, the network's overall integrity remains protected.

Differences from Traditional Centralized Databases

  1. Control and Ownership: Centralized databases are controlled by a single organization that manages access and maintains the database's integrity. In contrast, DLT is a collaborative system with no cemtral control, where data integrity is maintained collectively by all participants.
  2. Data Integrity and Trust: In centralized systems, trust is placed in the managing organization to act honestly and competently. DLT, however, relies on its inherent design and consensus mechanisms to ensure data integrity, removing the need for trust in a single entity.
  3. Resilience and Security: Centralized databases present single points of failure, making them vulnerable to cyber-attacks, data breaches, and downtime. DLT's distributed architecture significantly enhances resilience, as compromising the system would require attacking a majority of the nodes simultaneously.
  4. Transparency: Centralized systems often operate as "black boxes," where data changes are not transparent to external parties. DLT offers greater transparency, as transactions are visible to all participants, fostering trust and collaboration.
  5. Intermediation: Traditional systems frequently rely on intermediaries for validation and settlement processes, which can introduce delays and additional costs. DLT reduces the need for intermediaries through smart contracts and direct peer-to-peer interactions, streamlining operations and potentialy reducing costs.
  6. Accessibility and Participation: Centralized databases are accessible only to users granted permission by the controllingg entity. Distributed ledgers, especially public ones, allow anyone to participate, contributing to a more open and inclusive system.

DLT is not a one-size-fits-all solution and may not replace centralized databases in all scenarios. However, for applications requiring high levels of trust, transparency, and security without reliance on a central authority, DLT offers compelling advantages. Its continued evolution and application across diverse fields highlight its potential to redefine how we think about and manage data in a digital world.

Profile photo for Aryna Khan

So, a distributed ledger is all about decentralization. It's like a cool gang of interconnected nodes, with no single boss calling the shots. Each node maintains a copy of the ledger, ensuring that no one entity has all the power. This decentralized approach promotes transparency and resilience, making it harder for anyone to manipulate the data. It's like a Democratic party where everyone has a say!

On the other hand, traditional centralized databases are quite the opposite. They're like the strict principal in charge, with full control over the data. Changes and access to the database are tig

So, a distributed ledger is all about decentralization. It's like a cool gang of interconnected nodes, with no single boss calling the shots. Each node maintains a copy of the ledger, ensuring that no one entity has all the power. This decentralized approach promotes transparency and resilience, making it harder for anyone to manipulate the data. It's like a Democratic party where everyone has a say!

On the other hand, traditional centralized databases are quite the opposite. They're like the strict principal in charge, with full control over the data. Changes and access to the database are tightly managed by a central authority, which can introduce some vulnerabilities and single points of failure. Not so fun, right?

Now, let's talk about consensus mechanisms. Distributed ledgers use these fancy mechanisms to reach agreement among the nodes. It's like getting everyone on the same page, ensuring that all participants agree on the validity and order of transactions. This way, they maintain a single version of truth, which is pretty important, you know?

In contrast, centralized databases don't bother with consensus mechanisms. They don't need to, because the central authority calls the shots. Whatever they say goes, and everyone just has to follow along. Trust in centralized databases relies on the authority controlling the data, rather than on consensus among the participants. It's like a one-man show, really.

Now, let's talk about transparency and auditability. Distributed ledgers are like an open book, allowing participants to view and verify all the transactions recorded on the ledger. The data is usually immutable, meaning it can't be changed without everyone's agreement or permission. It's like a digital trail that can't be tampered with easily. Good for keeping things honest!

In comparison, traditional centralized databases might not offer the same level of transparency and auditability. The central authority can modify or delete data as they please, making it tricky to trace and verify the history of transactions. It's like playing a game of hide and seek with the data.

Security and integrity are crucial, too! Distributed ledgers use fancy cryptographic techniques to keep the data secure and intact. Each transaction is linked to previous ones using cryptography, creating an unbreakable chain of blocks. Plus, the distributed nature of the ledger adds an extra layer of security. It's like a fortress that hackers find hard to breach.

But with traditional centralized databases, security relies on the measures put in place by the central authority. If they're not careful, vulnerabilities or unauthorized access can put the data at risk. It's like relying on a guard to keep the keys safe, but we all know guards can have their off days.

Last but not least, let's talk trust and trustlessness. Distributed ledgers are all about building trust among participants. The consensus mechanism, cryptographic integrity, and transparent transactions create an environment where people can trust each other without relying on a central authority. It's like a big trust circle where everyone's got each other's backs.

In contrast, traditional centralized databases depend on trust in the central authority. Participants have to put their faith in that authority to maintain the accuracy and security of the data. It's like trusting the boss to always do the right thing, even when they might not deserve it.

So, my friends, the key difference between distributed ledgers and traditional centralized databases boils down to decentralization, consensus mechanisms, transparency, security, and trust models. Distributed ledgers offer a more inclusive, transparent, and secure way of managing data, while traditional centralized databases rely on a central authority for control and trust.

Profile photo for Matt Jennings

Just look at the legendary Chuck Norris’s advice since he is now a whopping 81 years old and yet has MORE energy than me. He found a key to healthy aging… and it was by doing the opposite of what most of people are told. Norris says he started learning about this revolutionary new method when he noticed most of the supplements he was taking did little or nothing to support his health. After extensive research, he discovered he could create dramatic changes to his health simply focusing on 3 things that sabotage our body as we age.

“This is the key to healthy aging,” says Norris. “I’m living pro

Just look at the legendary Chuck Norris’s advice since he is now a whopping 81 years old and yet has MORE energy than me. He found a key to healthy aging… and it was by doing the opposite of what most of people are told. Norris says he started learning about this revolutionary new method when he noticed most of the supplements he was taking did little or nothing to support his health. After extensive research, he discovered he could create dramatic changes to his health simply focusing on 3 things that sabotage our body as we age.

“This is the key to healthy aging,” says Norris. “I’m living proof.”

Now, Chuck Norris has put the entire method into a 15-minute video that explains the 3 “Internal Enemies” that can wreck our health as we age, and the simple ways to help combat them, using foods and herbs you may even have at home.

I’ve included the Chuck Norris video here so you can give it a shot.

Profile photo for John Smart

(1) Large-scale in 2020 is tens of millions of queries per second. This scale can’t be met by a single central database server as one CPU isn’t able to keep up with this load.

(2) Having a single point of failure with only one central DB instance isn’t going to meet the service level committed to by large services. For example, how do you run a security patch on software and keep that central DB running? Organizations commit to 99.99% or more uptime, so that single instance isn’t going to cut it.

(3) World wide services must serve users in the physical world, so having servers in a central locat

(1) Large-scale in 2020 is tens of millions of queries per second. This scale can’t be met by a single central database server as one CPU isn’t able to keep up with this load.

(2) Having a single point of failure with only one central DB instance isn’t going to meet the service level committed to by large services. For example, how do you run a security patch on software and keep that central DB running? Organizations commit to 99.99% or more uptime, so that single instance isn’t going to cut it.

(3) World wide services must serve users in the physical world, so having servers in a central location presents problems like signal latency. Latency can significantly downgrade performance and lead to stability issues. There are also risk when network regions go down. Distributed DB means locating data closer to where the customers are that use the data. This is the principle of locality.

Profile photo for Scott McNulty

The short answer is that they do it different ways. And sometimes they aren’t as successful as they would like.

The slightly longer answer goes partway into the weeds. I don’t have time to write the long answer and you wouldn’t read it, I suspect.

This is one of the reasons for the NoSQL movement in databases, by the way. (keyword: cap theorem)

The key to the situation is that often they only care about the most recently written record. They can be that way because they ship the whole business record around instead of breaking it up into its components as is done in normalization.

The system has t

The short answer is that they do it different ways. And sometimes they aren’t as successful as they would like.

The slightly longer answer goes partway into the weeds. I don’t have time to write the long answer and you wouldn’t read it, I suspect.

This is one of the reasons for the NoSQL movement in databases, by the way. (keyword: cap theorem)

The key to the situation is that often they only care about the most recently written record. They can be that way because they ship the whole business record around instead of breaking it up into its components as is done in normalization.

The system has to handle the record shipping in such a way that if it gets shipped multiple times it won’t affect the outcome. (keyword: idempotent)

MongoDB simplifies this by having a single reference node called a primary. Writes are performed there and then the other nodes read from it, or from one of their neighbors. If there are too many records to hold in a single database, they “shard” which is portioning the records to separate locations.

Cassandra asks every node that has a copy of the record for the most recent one in their possession and then it offers up the most recent of those. Then it writes that copy back to every node that has a copy as the most current. This is complex and that complexity leads to problems but the benefits are high.

The DNS system, that thing in your computer that transforms a web link into an IP address and gets you a web page is a database but does not get managed by a database management system. It assumes that it has a good record if the time to live on the record (keyword: TTL) hasn’t expired. When it expires, the local system asks for an updated record but only if someone requests that site again. (Simplified, but that’s the concept.)

In several cases, you might get stale data. This is called “eventual consistency” which is a bit of a misnomer because a high churn system is often only consistent at the record level.

If you think of some examples you can see how this works in the real world. A password change takes time to propagate. You are often told to wait 15 minutes before trying to use a new password.

A basic catalog put online would be something that could lag by quite a bit. This is the name and description of the item but not the inventory.

The issue is when there is a limited supply of something (like toilet paper in a pandemic) and there are multiple customers who want it. If it goes into a customer’s shopping cart, you don’t want to pull it out while they are shopping. But how long should a customer be able to keep something in their cart before you pull it out and sell it to someone who is ready to buy it? Two weeks? Ten minutes?

The non-pandemic example is the case of concert tickets. Lots of unique seats, different prices, and so on. A great teaching example.

The business decides a rule, the programming team (possibly the architects) figure out how to get as close to the rule as possible, and then marketing sets the expectations of the customer as well as they can. They pick a technology that best suits their needs for that.

I hope that clarifies things.

Profile photo for Ronald

DynamoDB Global Tables allows deploying a multi-region, multi-master DynamoDB replication solution. It is a fully-managed solution, where users need not write any custom code to make changes to data. DynamoDB automatically updates the data before replicating it across different regions.

Profile photo for Sourabh Sanghi

I am going to explain about both read and write consistencies. Since this question is also marked in the nosql category , I will answer the question from the perspective of nosql database Cassandra.

Cassandra falls under the AP part of the CAP(C-Consistency , A-Availability , P-Tolerance for Partition) which means that cassandra is not a consistent database but an eventually consistent one. Having said that , there are ways in which consistency can be achieved in Cassandra.

Consistency in Cassandra is achieved by something called quorum which is a parameter which decides on how many nodes of the

I am going to explain about both read and write consistencies. Since this question is also marked in the nosql category , I will answer the question from the perspective of nosql database Cassandra.

Cassandra falls under the AP part of the CAP(C-Consistency , A-Availability , P-Tolerance for Partition) which means that cassandra is not a consistent database but an eventually consistent one. Having said that , there are ways in which consistency can be achieved in Cassandra.

Consistency in Cassandra is achieved by something called quorum which is a parameter which decides on how many nodes of the Cassandra cluster particular data is read from or written to.

For read consistency quorum internally uses timestamp to decide which data to be sent to the user in case there is an inconsistency in the data received from various nodes.

When it comes to write consistency the timestamp does not come into picture as the write acknowledgement is sent to the user after the data is written/updated on all the nodes of the cluster which ensures the consistency versus an inconsistent distributed database where the write acknowledgement is sent to the user as soon as the data is written/updated onto any one of the nodes of the cluster.

Profile photo for Ricky Nelson

I think its a real simple question. Do you have relational data? If so, choose an RDBMS, if not a key value store. For example are you going to have users? Are those users going to have scores, shopping carts, profiles, x, y, or z? Those are relationships best defined in an RDBMS. Do you have objects that don't really have a relationship that you just want really fast access to? Like maybe a messaging server, that's a good use of a key value store. Also I think there are cases where you would want to use the two together. Like if you have a real complex query in your RDBMS, those can be

I think its a real simple question. Do you have relational data? If so, choose an RDBMS, if not a key value store. For example are you going to have users? Are those users going to have scores, shopping carts, profiles, x, y, or z? Those are relationships best defined in an RDBMS. Do you have objects that don't really have a relationship that you just want really fast access to? Like maybe a messaging server, that's a good use of a key value store. Also I think there are cases where you would want to use the two together. Like if you have a real complex query in your RDBMS, those can be expensive. You might consider caching the results of those queries off in a key value store.

Profile photo for Alex Genadinik

Here are two videos that document the difference. They are done by Ben Engber who is the CEO of Thumbtack technologies which is one of the leading companies in the NoSQL space.

Here is the video on document databases:

And here is the video on key-value stores:

Profile photo for Nikhil Kumar Singh

Distributed transactions are easier to implement on eventually consistent databases such as AWS DynamoDB, which supports single-operation, single-row ACID. More advanced options include strongly consistent databases like MongoDB that support fully distributed ACID. Other databases that support distributed transactions include Spanner, CockroachDB, TiDB, YugabyteDB, and Apache Cassandra

Profile photo for Siddharth Teotia

As far as I understand, the reasons should ideally be the same as for “why use a distributed system”.

If there is an extremely powerful single machine having properties like:

  • Loads of memory.
  • Huge amount of reliable storage with super fast I/O.
  • Great processing speed and computing power (10s, 100s and may be thousands of cores).
  • Extremely reliable and fault tolerant. The machine should always be available ; up and running with zero downtime.
  • High speed networking infrastructure connecting clients for low latency client server communication. The network never goes down.
  • And any other thing that will a

As far as I understand, the reasons should ideally be the same as for “why use a distributed system”.

If there is an extremely powerful single machine having properties like:

  • Loads of memory.
  • Huge amount of reliable storage with super fast I/O.
  • Great processing speed and computing power (10s, 100s and may be thousands of cores).
  • Extremely reliable and fault tolerant. The machine should always be available ; up and running with zero downtime.
  • High speed networking infrastructure connecting clients for low latency client server communication. The network never goes down.
  • And any other thing that will add to the computing power of this machine.

If we really have a system like this, we don’t need to deploy a software in a distributed system. If ages down the line, such a system is developed then there won’t be any need to design and develop distributed systems. A single computer will solve all the problems like a magic wand.

The problem is such a system doesn’t exist, and that is why we run into design problems like availability, fault tolerance, throughput, latency, scalability, reliability, network partitions, data consistency, data distribution, replication and millions of other issues that might turn out to be a big obstacle in the success of business. Until the magical system is developed, these problems can’t be solved on a single machine.

We can start with a single machine, and scale it vertically by adding more resources (computing power, memory, hardware etc), but it is very likely that at some point vertical scaling by adding powerful (and also expensive) software/hardware will not turn out to be cost effective.

Moreover, a single machine is very likely going to be a bottleneck in throughput and scalability. It will of course be a single point of failure, and thus the system won’t really be fault tolerant.

Thus the software is developed in a way such that it spans multiple nodes (cheap commodity hardware with reasonable resources). In other words, we scale out horizontally and develop a distributed system. Here we do things like: replicating data to multiple nodes for greater availability, more nodes also means more computing power, no single point of failure, greater availability etc etc. We then have to think about maintaining data consistency or trading it off with some other systemic property.

The point I am trying to make is that we develop a distributed system to address the goals that can’t be achieved with a single computer (at least in today’s world). But the thing is that such problems become more complex and challenging to solve in a distributed system, and that is why we need to study distributed algorithms, well known problems, solutions that are typically considered when architecting any distributed system.

There shouldn’t be such thing as “okay here are the reasons for developing a distributed file system”, and here are the reasons for developing a distributed database”.

Of course file system and database solve different purpose and have their own set of features and uses, but the fundamental reasons for developing them in a distributed manner are same.

Profile photo for JJS

It's been a while since I read the Dynamo paper but I'd say there are two ways for a Dynamo-style DB to handle this:

1) serialize on the vnode level
2) use client IDs rather than vnode IDs in the vector clocks

Riak actually let's you choose:
Vector Clocks

Profile photo for Marin Dimitrov

check out OpenTSDB [1][2]

[1] http://opentsdb.net/
[2] http://assets.en.oreilly.com/1/event/55/OpenTSDB_%20A%20Scalable,%20Distributed%20Time%20Series%20Database%20Presentation.pdf

Profile photo for Aman Nagwanshi

Transactions in distributed ledger systems are processed by a consensus protocol. This protocol ensures that every transaction is verified and synchronized across every node on the network to create an immutable, valid state. Transactions in a regular database system are processed locally within the database, and are not verified or synchronized to any other nodes. Additionally, by using a consensus protocol, distributed ledgers can be more secure than regular database systems as the decentralized validation helps ensure that data is not modified or altered throughout the ledger.

Profile photo for Andrii Vozniuk

You may want to have a look at SciDB [1]

[1] http://www.scidb.org/

Profile photo for Steph G

I would wager very safely that the people that deal with large-scale applications prefer distributed databases as these suit their needs better than centralized databases would suit their database needs !

Profile photo for Quora User

From 14th June’17, when you create a new DynamoDB table using the AWS Management Console, the table will have Auto Scaling enabled by default. DynamoDB Auto Scaling automatically adjusts read and write throughput capacity, in response to dynamically changing request volumes, with zero downtime.

Previously, you had to manually provision read and write capacity based on anticipated application demands. If estimated incorrectly, this could result in underprovisioning or overprovisioning capacity. Underprovisioning could slow down application performance, and overprovisioning could result in underu

From 14th June’17, when you create a new DynamoDB table using the AWS Management Console, the table will have Auto Scaling enabled by default. DynamoDB Auto Scaling automatically adjusts read and write throughput capacity, in response to dynamically changing request volumes, with zero downtime.

Previously, you had to manually provision read and write capacity based on anticipated application demands. If estimated incorrectly, this could result in underprovisioning or overprovisioning capacity. Underprovisioning could slow down application performance, and overprovisioning could result in underutilized resources and higher costs. With DynamoDB Auto Scaling, you simply set your desired throughput utilization target, minimum and maximum limits, and Auto Scaling takes care of the rest.

DynamoDB Auto Scaling works with Amazon CloudWatch to continuously monitor actual throughput consumption, and scales capacity up or down automatically, when actual utilization deviates from your target. Auto Scaling can be enabled for new and existing tables, and global secondary indexes. You can enable Auto Scaling with just a few clicks in the AWS Management Console, where you'll also have full visibility into scaling activities. You can also manage DynamoDB Auto Scaling programmatically, using the AWS Command Line Interface and the AWS Software Development Kits.

There is no additional cost to use DynamoDB Auto Scaling, beyond what you already pay for DynamoDB and CloudWatch alarms. DynamoDB Auto Scaling is available in all AWS regions, effective immediately.

Profile photo for Kenneth Love

I really only have experience with Redis (on the key-value side) and MongoDB (on the document side), so some of this may not apply across the board.

Key-value dbs are typically simpler, both in terms of API and content. You would use one for when you need small snippets of data handily available, like a hit counter or session data or votes.

Document databases, on the obvious other hand, are more complex. Since documents have little or no predictable structure, APi calls tend to be longer and more detailed than they would be on a key-value database. Documents can obviously be used for complex bit

I really only have experience with Redis (on the key-value side) and MongoDB (on the document side), so some of this may not apply across the board.

Key-value dbs are typically simpler, both in terms of API and content. You would use one for when you need small snippets of data handily available, like a hit counter or session data or votes.

Document databases, on the obvious other hand, are more complex. Since documents have little or no predictable structure, APi calls tend to be longer and more detailed than they would be on a key-value database. Documents can obviously be used for complex bits of data, too. For example, documents could be custom form submissions or templates.

Profile photo for Gaive Gandhi

Using the AWS Web Console, you can disable auto scaling for a DynamoDB table.

Go to DynamoDB, choose the table and go to Capacity section, Auto Scaling sub-section. You can uncheck Read Capacity or Write Capacity or both for disabling auto-scaling.

Profile photo for Richard Lewis

The intended order is defined by the primary key. However, sequenced delivery order of transactions are defined by timestamps. In a distributed database like Cassandra, there is a timestamp at the field level which helps to validate consistency as well.

Profile photo for Magnus

Latency is also a challenge especially if the “ledger” should be updated synchronously.

Profile photo for PSR

Amazon DynamoDB is a fully managed NoSql database. When you create a DynamoDB table, you provision the desired amount of request capacity based on the expected amount of read and write traffic, and average size of each item. This provisioned capacity can be changed based on the changes in application requirements.

You can auto scale Amazon Dynamo DB using an open source tool called “Dynamic DynamoDB”, which is developed by an independent developer Sebastian Dalhgren. This tool is flexible and highly configurable. It manages the process of scaling the provisioned throughput of DynamoDB tables. Y

Amazon DynamoDB is a fully managed NoSql database. When you create a DynamoDB table, you provision the desired amount of request capacity based on the expected amount of read and write traffic, and average size of each item. This provisioned capacity can be changed based on the changes in application requirements.

You can auto scale Amazon Dynamo DB using an open source tool called “Dynamic DynamoDB”, which is developed by an independent developer Sebastian Dalhgren. This tool is flexible and highly configurable. It manages the process of scaling the provisioned throughput of DynamoDB tables. You can use ‘Dynamic DynamoDB’ tool to scale your tables up and down automatically and can also restrict scaling activities to certain time slots. You can scale read and write throughput capacity independently using upper and lower thresholds and you can set min and max for each value.

Update: AWS recently introduced Auto Scaling for DynamoDB.

Profile photo for Andrew

To me, Documented Oriented Databases are key value stores with specific meta data and api implementations around them that make using key value stores theoretically easier for a particular use case.

Take Mongo - you have a pre-defined hierarchy of keys and values where the root level must be Data bases

They have children which are predefined key values which must be collections.

They have childre

To me, Documented Oriented Databases are key value stores with specific meta data and api implementations around them that make using key value stores theoretically easier for a particular use case.

Take Mongo - you have a pre-defined hierarchy of keys and values where the root level must be Data bases

They have children which are predefined key values which must be collections.

They have children which are predefined key values which must be documents.

Documents have predefined children which must be BSON (aka extended JSON) objects.

JSON is nothing but a key value store with a particular standard format.

Mongo uses its meta data about the key value stores to provide predictable API's to do things. The most novel thing about it, is how it distributes its data. The fixed key value store implementation is not interesting.

Replicating the actual data portion of this simple key value store with its fixed 4 levels would be relatively trivial in any platform that could handle key value stores. In postress with the prebuilt JSON data type I could have a generic key value store model working in a weekend with API's that could functionally replace most mongo applications minus the distributed portion. Give me a month and I would have something way more versatile and extensible than only using JSON as your only document children option. Also doing SQL across collections would be trivial which is something mongo doesn't do well without a map reduce job.

The distributed portion of mongo is the non trivial work that is what is p...

Profile photo for David Brower

Kind of. What time is it in a distributed system? This is a harder question to answer than it may seem. The usual solution, a "Lamport Clock", carries some compleixty and side effects of its own that can cause scalability issues.

Profile photo for Matthew Cooke

Assuming equal knowledge across all the datastores:
If you have several classes of things that you are storing data on and those things are clearly related to each other (eg, students, teachers, classes, subjects, fees, etc,), and you are likely to want to query the data in many different ways in the future then a relational database is a good option.

If the frequently accessed data is likely to be bigger than you can fit in a machine with a large amount of RAM or worse, if even the indexes can't fit in a machine with a large amount of RAM and particularly if the data you are storing is not rel

Assuming equal knowledge across all the datastores:
If you have several classes of things that you are storing data on and those things are clearly related to each other (eg, students, teachers, classes, subjects, fees, etc,), and you are likely to want to query the data in many different ways in the future then a relational database is a good option.

If the frequently accessed data is likely to be bigger than you can fit in a machine with a large amount of RAM or worse, if even the indexes can't fit in a machine with a large amount of RAM and particularly if the data you are storing is not relational (perhaps it is hierarchical or relatively flat) then it may be worth considering the noSQL datastores you mention. Some more information on their relative merits is here: Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison

In practice you should take into account how much experience people have in the different data stores and who might have to maintain it in the future. Relational data stores do have many advantages over the noSQL data stores for the majority of every day use cases I can think of - it's important though that the data is modelled sensibly.

A number of open source communities are using primary and replica. Drupal, for instance, made the change at: Replace "master/slave" terminology with "primary/replica"

Profile photo for Ben Darfler

I would imagine because they haven't created index rebuilding logic yet. It's not an easy problem to solve and so, for now, its not supported. I'm sure it is on their roadmap though as they are always improving their services.

Profile photo for Jean-Christophe Huc

The short answer is that there are a lot of choices. Well, as of April 9th, 2016, there were… well… you can count them here! You will notice if you scroll down to the comment section there that even then, which is ancient history in Internet time, people were shouting out to include others. Compare that list to this one to see how much has changed in just months.

This is a top ten list, but it isn’t an ordered list. It would be interesting to get some comments below to see how readers think they should be ordered. It would be even more interesting to see if anyone agrees with every entry on thi

The short answer is that there are a lot of choices. Well, as of April 9th, 2016, there were… well… you can count them here! You will notice if you scroll down to the comment section there that even then, which is ancient history in Internet time, people were shouting out to include others. Compare that list to this one to see how much has changed in just months.

This is a top ten list, but it isn’t an ordered list. It would be interesting to get some comments below to see how readers think they should be ordered. It would be even more interesting to see if anyone agrees with every entry on this list, and as Spock would say, “Fascinating,” if two people agree on the order…

InfluxDB scores right up near the very top on several software blogs, making it into the top ten multiple times.

Druid scores within the top 10 time series databases, again on multiple lists.

… read more here!

Profile photo for Jeff Nelson

If the "master" node doesn't actually do any work then I prefer to refer to it as the coordinator node and the "slaves" as worker nodes.

If the "master" node does do some work and merely uses the "slaves" as a backup or complement or some sort, then I prefer primary and secondary nodes.

To begin with, there are not a lot of differences between Key Value and Document Type No-SQL databases. Key Value DBs are considered for more ‘primitive’ data type agnostic use cases i.e it really doesnot matter to your application whether you store a blob, xml, json or any other data types. Document Type databases, on the other hand offer specialized capabilities built around the pre-understanding of data types in the value field. Eg. MongoDB/Mapr-DB supports JSON and offer auto-indexing of JSON fields.

AWS dynamoDB falls more as a latter type. There are additional capabilities offered by AWS

To begin with, there are not a lot of differences between Key Value and Document Type No-SQL databases. Key Value DBs are considered for more ‘primitive’ data type agnostic use cases i.e it really doesnot matter to your application whether you store a blob, xml, json or any other data types. Document Type databases, on the other hand offer specialized capabilities built around the pre-understanding of data types in the value field. Eg. MongoDB/Mapr-DB supports JSON and offer auto-indexing of JSON fields.

AWS dynamoDB falls more as a latter type. There are additional capabilities offered by AWS DynamoDB eg. auto setup of Secondary index built around the understanding of data type that will place itself more as a ‘Document Type No-SQL’

Profile photo for Randall R Schulz

We would gladly, happily use relational database management systems, today synonymous with those that support SQL, were it possible to scale them to the size we need.

We use distributed databases ’cause we have to, not because we want to.

Distributed databases are in every way inferior to relational database systems. (*)

Well, except in scalability.

(*) If you think not having to create a schema is an advantage, I deem you a hobbyist programmer. “Constraints liberate. Liberties constrain.”

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025