Towards Robust Distribut ed Sy stems Inktomi at a Glance
Company Overview
“INKT” on NASDAQ
Founded 1996 out of UC
Berkeley
~700 Employees
Applications
Search Technology
Network Products
Online Shopping
brooksideWireless Systems
Our Perspective
Inktomi builds two
distributed systems:
–Global Search Engines
–Distributed Web Caches Bad on scalable
cluster & parallel
computing technology But very little u of classic “Distributed Systems” don’
d b cooper>数学一对一家教There exist working DS:
–Simple protocols: DNS, WWW
–Inktomi arch, Content Delivery Networks
–Napster, Verisign, AOL
But the are not classic DS:
–Not distributed objects
–No RPC
–No modularity
–Complex ones are single owner (except phones)
Three Basic Issues
Where is the state? Consistency vs. Availability Understanding Boundaries
Where’s the state?
(not all locations are equal)
PODC Keynote, July 19, 2000
Santa Clara Cluster
•Very uniform •No monitors •No people •No cables •Working power •Working A/C •Working BW
Delivering High Availability
We kept up the rvice through: Crashes & disk failures (weekly) Databa upgrades (daily)
Software upgrades (weekly to monthly) OS upgrades (twice) Power outage (veral)
Network outages (now have 11 connections) Physical move of all equipment (twice)
Persistent State is HARD
Classic DS focus on the computation, not the data –this is WRONG, computation is the easy part
Data centers exist for a reason
–can’t have consistency or availability without them
Other locations are for caching only:
–proxies, bastations, t-top boxes, desktops
–phones, PDAs, …
Distributed systems can’t ignore location distinctions
AP
Active Proxy:
Bootstraps thin devices
into infrastructure, runs
mobile code
AP
Workstations & PCs Berkeley Ninja Architecture
Ba:Scalable, highly-
available platform for
persistent-state rvices
Internet
PDAs
Consistency vs. Availability
(ACID vs. BASE)ACID vs. BASE
DBMS rearch is about ACID (mostly)
But we forfeit “C” and “I” for availability, graceful degradation, and performance
This tradeoff is fundamental.
BASE:
–B asically A vailable
–S oft-state
–E ventual consistency
ACID vs. BASE
ACID
Strong consistency Isolation
Focus on “commit” Nested transactions Availability?
Conrvative
(pessimistic)
Difficult evolution
(e.g. schema)
BASE
Weak consistency
–stale data OK
Availability first
Best effort
Approximate answers OK Aggressive (optimistic)
Simpler!
Faster
Easier evolution
But I think it’s a spectrum The CAP Theorem
C onsistency A vailability
Tolerance to network
P artitions
Theorem: You can have at
most two of the properties
for any shared-data system
Forfeit Partitions如何提高胆量
C onsistency A vailability
brandeisTolerance to network
P artitions
Examples
Single-site databas
Cluster databas
LDAP
拜师学艺的意思xFS file system
Traits
2-pha commit
cache validation
protocols
教育在线网Forfeit Availability
C onsistency A vailability
Tolerance to network
P artitions
Examples
Distributed databas
Distributed locking
Majority protocols
Traits
Pessimistic locking
Make minority
partitions unavailable
Forfeit Consistency
C onsistency A vailability
Tolerance to network
P artitions
priceline comExamples
Coda
Web cachinge
DNS
Traits
expirations/leas
conflict resolution
optimistic
The Tradeoffs are Real
The whole space is uful
Real internet systems are a careful mixture of
ACID and BASE subsystems
direction
–We u ACID for ur profiles and logging (for revenue)
But there is almost no work in this area
Symptom of a deeper problem: systems and
databa communities are parate but
overlapping (with distinct vocabulary)bossini
CAP Take Homes
Can have consistency & availability within a
cluster (foundation of Ninja), but it is still hard in practice
OS/Networking good at BASE/Availability, but terrible at consistency
Databas better at C than Availability
Wide-area databas can’t have both
Disconnected clients can’t have both
All systems are probabilistic…Understanding Boundaries (the RPC hangover)