Notes from Big Data Class-3

Foundation

Distributed File System = Retrieval (index) , storage (filesystem)

Stand-Alone Problems 

  • Large data storage
  • Multiple Process
  • Access Results
  • Mobility

We use rock system to solve this

  • Access
  • Concurrency
  • Fault Tolerance
  • Stability

Problems: Data Consistency

Single Comp. -> Parallel Comp. -> Community Cluster(Affordable, low experimental, distributed systems )

  • Data Storage
  • Data parallelism

Big Data Proggramming Models

Parallel data scability

Classic Prg Model: Abstraction + Runtime Lib. + Proggramming Lang.

Distrubted File System

Requirements:

  • Split volumes of data
  • Access data fast
  • Computation of NODE
  • Replicate data partitions
  • Recover files when needed
  • Enable adding more racks
  • Optimized for specific data types

Map Reduce -> Prog. Model -> Specific Imp.

  • Digital Signatures
  • Hashing Algorithms