BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

NoSQL vs SQL

Subject: NoSQL vs SQL
From: matt-urrlRJtNKRMsHrnhXWJB8w at public.gmane.org (Matt Shields)
Date: Tue, 7 Dec 2010 10:41:51 -0500
In-reply-to: <4CFE3FEF.7070802-FJ05HQ0HCKaWd6l5hS35sQ@public.gmane.org>
References: <4CFCF7C4.50605@mohawksoft.com> <000101cb959b$79ad4860$6d07d920$@nedharvey.com> <4CFDC101.3000205@mohawksoft.com> <20101207122925.GU17306@tao.merseine.nu> <4CFE3FEF.7070802@mohawksoft.com>

On Tue, Dec 7, 2010 at 9:08 AM, Mark Woodward <markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org> wrote:

> On 12/07/2010 07:29 AM, Dan Ritter wrote:
> > On Tue, Dec 07, 2010 at 12:07:13AM -0500, Mark Woodward wrote:
> >
> >> The most important aspect of these scalability discussions, and one
> >> which I frequently find lacking in the "NoSQL" camp is a critique of
> >> just how many full scale transactions a SQL database can have.
> >>
> > ...
> >
> >
> >> average head positioning time.  This gives us a worst case average of
> >> 77  arbitrary write procedures per second.
> >>
> >> The actual iops is higher based on seek time these days, but lets use
> >> the worst case scenario. (We could use a RAID system and multiply
> >> performance)
> >>
> >> In an ACID database configuration, assume 1/2 maximum, i.e 36 writes per
> >> second on the database. That's 36 transactions (read/write) per second.
> >> Assuming 1 transaction per page view, that amounts to about 90 million
> >> page views a month (on average) as a sustainable number.
> >>
> >> So, a good SQL database with no scaling tricks on a bog stock modern PC
> >> based server will serve a web site as busy as all but the very most
> >> popular sites on the web. Someone, please tell me, what are the NoSQL
> >> guys going on about with regard to scalability?
> >>
> > There are basically no sites that use a database and only want one record
> > view/update per page view.
> >
> For the sake of argument, "one transaction" is short hand for one
> write/update operation. I'm assuming a databases page cache and/or
> memcache (or something like it) will cope with repeat session based
> data. But, that's a design detail, but your point is taken.
> > Let's say we have, oh, a dating site. The login page may only need one DB
> > query. How about the second page, where they will helpfully show you the
> > names, photos and blurbs of ten or so eligible people, plus a check on
> > how many messages you've received, plus another on how many you've sent,
> > and... you can get into 50 queries pretty quickly. Add some billing and
> > logging records and you could have a couple of updates in there, too.
> >
> Again, not to be pedantic, common queries where nothing is modified are
> a lot less expensive. Obviously Facebook is a completely different
> animal, i mean, jeez, they practically have the population of the earth
> on-line. That's not what I'm talking about, a "dating site" is an
> interesting case.
>
> Lets break it down, how many members? How many members on-line at any
> time? What are the usage patters? Let's assume you use PHP with some
> sort of distributed session cache mechanism. I'm using a system I
> designed for DMN a few years back as the basis of my response.
>
> A login, yes one transaction two disk I/O operations, retrieval of data
> and storage of change of state. The login also brings the user's
> information into the page cache of the database, so most of the
> subsequent select operations that deal with deal directly with user info
> will be hitting the page cache and avoiding the disk. The first page
> view will always be the hardest, that brings the most data into the
> various caching subsystems (php session cache, any memcache object
> definitions, and, of course, database disk page buffers.) Subsequent
> page views as the user uses the system, tend to hit the page cache and
> only alterations of data and retrieval of data not currently in the page
> cache force I/O operations.
>
> The efficacy of the page cache is directly proportional to the number of
> members in total, and the number of members on-line and active at any
> one time. You aren't going to have a thousand users synchronizing their
> watches and pressing a button at the exact same time. So, unless your
> load forces the database to read disk pages for every single select
> operation because the variability of the requests exceeds your page
> cache, user actions (multiple disk I/Os per page) can be averaged out
> over a number of page views.
> > Now, a clever system will aggregate the most often requested data into
> > a single row, when it can... that helps with the second page, but not
> > with profile pages or a mail-analogue system. And a clever system will
> > cache as much frequently-requested data as possible.
> >
> Many databases do this transparently, but yes, systems like memcached
> can be very helpfull.
> > But then there's Facebook, and anything even vaguely like it.  The
> obvious
> > first thing to do is to optimize time at the cost of storage: store a
> > copy of the new message with each of the recipients, so that when they
> > read their update page, that takes one transaction instead of as many
> > as they have friends. And pretty quickly you run out of disk space. So
> > you decide that you have a read-mostly environment, and have many, many
> > read-only slave databases to handle lookups, and most of those lookups
> > aren't needing anything near the power of SQL -- really, you could get
> > away with a key-value lookup, if only it were FAST -- and before you
> > know it, you've re-invented MongoDB.
> >
> Facebook, like the video I posted, Facebook *has* to implement these
> things and for good reason, the scale at which they operate is HUGE.
> Unbelievably huge. But, it took them several years to get where they are
> and, like I was saying, all but the most busy web sites would well be
> handled by a standard database.
>
> Yes, if you are facebook, you have a different set of problems. Hell, if
> you are L.L. Bean you have a different set of problems. However, 99% of
> the web sites out there will never have those problems. As the saying
> goes, "Let's hope we have to address these problems later."
>
> > By way of contrast, I know there are sites which do real work on
> > data people care about in sophisticated ways -- and they
> > generally use full SQL databases, because that's what they need.
> >
> > SQL and ACID solve some tough problems. Key/value systems solve
> > some speed issues, as do caches. There are lots of solutions
> > because there are lots of problems.
> >
>
> My issue with the NoSQL mentality is the idea of "no." You can't have a
> conversation when it starts with "no." Then there is always the refrain:
> "Use the right tool for the job" which is a euphemism for limiting your
> options and usually the defense of the indefensible.  Well, more often
> than not you can't ever really know what the right tool is or the level
> of sophistication required at the beginning of a project, especially
> with AGILE type development. So, you start down the road myopically
> focusing on one aspect of the project "scaling" whilst completely
> ignoring vast parts of the project, like, what are you going to do with
> the data once you get it? What MIGHT you WANT to do with the data? Sure,
> you're covered for the next 5 years of (wishful thinking) scaling at the
> expense of accessing your data.
>
> Most all of the NoSQL solutions are dreadfully lacking in any sort of
> capability to allow you to capitalize on your data beyond mere
> regurgitation to a web page. Your data will never become "information"
> and information is where the money comes from.
>
> My general rule of thumb is, "it is better to have it and not need it,
> than to need it and not have it." So, I would say, the question
> shouldn't be "Do I need a SQL database" it should be "Are there areas
> where a SQL database doesn't work?"
>
> Mark.
> _______________________________________________
> Discuss mailing list
> Discuss-mNDKBlG2WHs at public.gmane.org
> http://lists.blu.org/mailman/listinfo/discuss
>

I think the NoSQL term "eventually consistant" sums up why it should not be
used for all situations.  They say that eventually all data will
be committed, but if your bank was running a NoSQL database and you just
deposited your money and expected the transaction to show up when you looked
at it from your phone and didn't see it there and they told you your account
would be "eventually consistant" with what they were telling you, would you
be happy?

How about an environment where you have to make sure that transaction 1 is
processed before transaction 2.  From what I've read about NoSQL, they
cannot guarantee that.

Sure, sites like Facebook and Twitter where you have status updates and
profiles, this is the perfect place to use NoSQL.

-matt

References:
- NoSQL vs SQL
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- NoSQL vs SQL
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- NoSQL vs SQL
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- NoSQL vs SQL
  - From: dsr-mzpnVDyJpH4k7aNtvndDlA at public.gmane.org (Dan Ritter)
- NoSQL vs SQL
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)

Prev by Date: VMWare questions (memory overcommit)
Next by Date: NoSQL vs SQL
Previous by thread: NoSQL vs SQL
Next by thread: NoSQL vs SQL
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org