When To Use Hadoop vs In-Memory vs MPP


When presented with more options, choosing among big data tools can seem confusing. This blog explains the benefits and uses of the various option paths available to help guide choices for your next big data project.

Spark, Storm, or KafkaIt is an in-memory data grid that provides real-time data access to applications that are critical to the revenue stream of the business.

MangoDBIt’s massively parallel processing style of data management makes it an excellent choice for analytics.

Hadoop is your research and development arm. As the landing spot for all data, and powered by a powerful SQL query engine, you can explore all data to identify new insights and opportunities you can later operationalize with MPP or in-memory.

Questions you need to consider when choosing:
  • When do I need it? Now? Later?
  • What do I want to do with it? Singular event processing? (includes some analytics), Transactions? Exploratory analytics?
  • How will I query and search? Structured, regular? Using an Alternate index(other sources)? Unstructured/unknown?
  • How do I need to store it? Temporarily?  I do, but not required? I must and I’m required to?
  • Where is it coming from? Events/Stream? File? ETL?





Comments

Popular posts from this blog

Book Review: The Catalyst: How to Change Anyone's Mind

Math for Data Science