Nikolay Novik
PyConUA 2017
aiomysql, aioobc, aiogibsonaiomonitor, aiohttp_debugtoolbar, aiobotocore,
aiohttp_mako, aiohttp_admin, aiorwlock
Stateless protocol is proved technique, use it like duck tape
Notice that user data fetched several times and cached on multiple servers.
Avoided are extra trips to the database which reduces latency. Even if the database is down the request can be handled.
Geo spatial index service to match driver and user
Orleans used as backbone for server part of Halo game, including: presence, statistics, cheat detection, etc
San Diego Supercomputer Center uses Serf to coordinate compute resources in multiple locations, cluster size is about 2k nodes
Services that predicts reselling prices of different products, based on product specification
This simple algorithms made Akamai multi billion worth company
Consistent hashing minimizes number of keys, need to be remapped
http://blog.carlosgaldino.com/consistent-hashing.html
In case of adding capacity, only fraction of keys will be moved
In case of node failure next address will handle related keys
Virtual nodes help with keys distribution, moving it close to 1/n
We have routing and job distribution, lets figure out how to add and remove nodes.
Issues
Broadcast: could be used for cluster membership update

Basic gossip protocol
Heavy packet loss does not stop dissemination, it simply will take a bit longer, 2 times for 50% loss.
We can route jobs and communicate cluster update, last component is failure detector.
In asynchronous distributed systems, the detection of crash failures is imperfect. There will be false positives and false negatives.
Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed systems." Journal of the ACM (JACM) 43.2 (1996): 225-267.
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership. Protocol
Provides a mechanism to reduce the rate of false positives by “suspecting” a process before “declaring” it as failed within the group.
Ordering between messages is important, but total order is not required, only happens before/casual ordering.
Nodes in each subnet can talk to each as result declares peers on other subnet as dead.
| Name | Language | Developer | Description |
|---|---|---|---|
| ??? | Python | ??? | ??? |
| RingPop | node.js | Uber | Used as services for matching user and driver with follow up location update |
| Serf | golang | Hashicorp | Used in number applications for instance in HPC to manage computing resources |
| Orleans | .NET | Microsoft | General purpose framework, used in Halo online game |
| Orbit/jGroups | Java | EA Games | Used in Bioware games, such as DragonAge game, not sure where thou. Inspired by Orleans |
| riak_core | Erlang | Basho | Building block for Riak database and erlang distributed systems |
| Akka | Scala | Lightblend | General purpose distribute systems framework, often used as microservsies platform |
Famous paper from MIT, describes synthetic network coordinates, based on ping delays, used in Serf/Consul for data center fail over
Notice coordinate drifting in space and stable distance between clusters
For huge clusters full membership is not scalable, paper proposes partial membership protocol
Even for failure rates as high as 95%, HyParView still manages to maintain a reliability value in the order of deliveries to 90% of the active processes.
Orleans uses a one-hop distributed hash table that maps actors between machines, as result actors could be moved across the cluster
aio-libs: https://github.com/aio-libs