Most companies realize they need to become more data driven in order
to make better decisions and identify new opportunities. Many also
recognize the need for new tools to analyze their data, much of it
stored in operational systems.
At the same time, for their
operational systems, a growing number of companies have adopted NoSQL
databases, the most popular of which is the document database MongoDB.
Unfortunately, document databases are nobody’s first choice for
analytics, so people end up using ETL to move data from MongoDB to an
RDBMS or Hadoop for analysis. ETL processing adds latency, however --
perhaps too much latency if you want your business to be "data driven."
Now
a new open source analytics tool, SlamData, has arrived to operate
directly on MongoDB data. I spoke with Jeff Carr, the CEO of SlamData,
about his new offering. He contrasts his open source solution with that
of Pentaho, a traditional BI tool that supports MongoDB -- but does so
by transforming the document database into an RDBMS. According to Carr,
"Pentaho is built for relational data to make the data look like tables.
That is not an easy thing to do."
I asked Carr about his target
market. At the moment, because the tool is still in its early stages,
SlamData's user base mostly consists of developers. As the tool matures,
he hopes for adoption by business analysts and/or nondevelopers who at
least know SQL.
The
latest version of SlamData allows SQL-fluent users to gather results
based on queries of MongoDB collections of documents, which you manage
through a GUI that uses a simple notebook metaphor. The front end is
browser-based, so there's no annoying client-side install. Already, in
the unreleased GitHub version, SlamData has added charting features to
the mix.
In
order to deal with the difference between documents and tables,
SlamData extends SQL with an XPath-like notation. Rather than querying
from a table name (or collection name), you might query FROM person[*].address[*].city.
This should represent a short learning curve for SQL-loving data
analysts or power business users, while being inconsequential for
developers.
The power of SlamData resides in its back-end
SlamEngine, which implements a multidimensional relational algorithm and
deals with the data without reformatting the infrastructure. The JVM
(Scala) back end supplies a REST interface, which allows developers to
access SlamData’s algorithm for their own uses.
There’s overlap
between the back end of the project and Apache Drill. According to Carr,
Drill is Hadoop-based and has only fledgling support for MongoDB. He
also stated that it had scant commercial support (only MapR) and is “not
very active if you look at the commit logs.” (I looked at the commit
logs and Drill seems pretty active to me: 17 contributors made more than 100 commits last month.)
Both the SlamData front end and SlamEngine
are on GitHub and offered under the GNU Affero GPL. For now, this is
all free. The company plans to pursue a hybrid coffee shop free Wi-Fi
model and sell “enterprise grade features” like LDAP integration. The
company will also sell support for both the open source and proprietary
version. New releases of the open source version will add Charting and
support for other JSON-speaking NoSQL databases such as Cassandra.
Clearly,
there’s room for NoSQL-specific analytics tools. Consider the effort of
getting MongoDB data into Hadoop with the Mongo connector or into Hive
in order to query it with a JDBC/SQL-speaking BI tool. There’s all that
ETL involved, mapping from documents to tables. With the likes of
SlamData, you could turn your analysts on the production database (bad
idea) or create a replica -- which, provided the infrastructure is
available, is nearly a push-button affair in MongoDB’s management tool.
SlamData
is developing rapidly. It’s probably not ready for analysts to use, but
it might be an interesting tool for developers who want to show what
can be done with NoSQL data. It’s also further evidence of the new
maturity of the NoSQL space, particularly MongoDB. As Carr puts it:
“Data guys have done a great job of innovating, but the analytics people
are behind.” Maybe SlamData will evolve enough to be a first step in
catching up.
No comments:
Post a Comment