Tuesday, June 14, 2022

Real Time Big Data Analytics Architecture

Credits https://www.linkedin.com/in/scgupta 

Lambda Architecture is a popular data processing architecture that includes both batch and stream processing. You can balance your throughput and latency requirements.

Just like the CAP theorem for distributed systems, We have this for big data analytics. You can have only two of the following: high throughput, low latency, and low cost.
Often leaders ask for Real-Time Analytics. You shouldn't comply blindly. Ask them what they would do differently if they had real-time analytics, which of their decision-making workflows will change if analytics moves from, say, daily, to real-time.
Then ask them to quantify the business value of the benefits. You should implement real-time analytics only if that value is substantially higher than the dev and OpEx cost.



SQL vs. NoSQL Datastore Cheat Sheet

Referred Link 

https://www.linkedin.com/feed/update/urn:li:activity:6896321682114514944/ 



Updated SQL vs. NoSQL Datastore Cheat Sheet

Decision tree to pick a suitable datastore based on:

🔹 Application Type
Transactions (OLTP) or analytics (OLAP)?

🔹 SQL or NoSQL
Structured, semi-structured, unstructured data?

🔹 Use Case Specialization (if any)
In-memory, time series, immutable ledger, full-text search?

🔹 Deployment
Major cloud, cloud-agnostic, or on-prem?


Resources:
0. ML4Devs newsletter: 

https://www.linkedin.com/newsletters/ml4devs-6875240399443763200/

1. SQL vs. NoSQL Database: When to Use, How to Choose

https://towardsdatascience.com/datastore-choices-sql-vs-nosql-database-ebec24d56106

#BigData #DataEngineering #DataScience #MachineLearning #MLOps #CloudComputing #Microservices