Notes from Big Data Class-1

Volume(Not just value)

  • 204 million email
  • 1.8 million likes in facebook
  • 200000 photos
  • 1.3 million vido
  • 72 hours of video upload

2009 => 0.8 ZB Data
2020 => 36 ZB Data


2449841200 TB data produced each Boeing fly. Precision and safety is important.

Problems in Data Processing

  • Storage
  • Data Acquisition
  • Retrieval

2. Velocity

It’s important for real-time systems.

V = Δx / Δt

  1. The speed of creating data
  2. The speed of storing data
  3. The speed of analyzing data => (Fast) Real-time processing

Late Decisions => Missing Opportunities 🙂

  1. Instantly capture
  2. Feed real-time to machine
  3. Process real-time
  4. Act

This process for business decisions

3. Variety [Size of data] = Complexity

The following items are heterogeneous.

  1. Text
  2. Videos
  3. Tables
  4. Audio

Structed Variety – Semantic Variety

Media Variety – Availabilty

 

4. Verocity = Quality

  • Accuracy of data
  • Reliability of data source

5. Valence 

Get the similarity with correlation.

Data connectivity: Two data items connected

6. Value

Value = Velocity + Valance + Verocity + Volume + Variety

Also this is called diamond five.