At a basic level, a distributed system is a collection of computers that work together to form a single computer for the end-user.
All these distributed machines have one shared state and operate concurrently.
They are able to fail independently without damaging the whole system, much like microservices.
These interdependent, autonomous computers are linked by a network to share information, communicate, and exchange information easily.
Unlike traditional databases, which are stored on a single machine, in a distributed system, a user must be able to communicate with any machine without knowing it is only one machine. Most applications today use some form of a distributed database and must account for their homogenous or heterogenous nature.
In a homogenous distributed database, each system shares a data model and database management system and data model. Generally, these are easier to manage by adding nodes. On the other hand, heterogeneous databases make it possible to have multiple data models or varied database management systems using gateways to translate data between nodes.
Distributed computing is a field of computer science that studies distributed systems.
The term “distributed computing” describes a digital infrastructure in which a network of computers solves pending computational tasks. Despite being physically separated, these autonomous computers work together closely in a process where the work is divvied up. The hardware being used is secondary to the method here. In addition to high-performance computers and workstations used by professionals, you can also integrate minicomputers and desktop computers used by private individuals.
Distributed hardware cannot use a shared memory due to being physically separated, so the participating computers exchange messages and data (e.g. computation results) over a network. This inter-machine communication occurs locally over an intranet (e.g. in a data center) or across the country and world via the internet. Messages are transferred using internet protocols such as TCP/IP and UDP.
In line with the principle of transparency, distributed computing strives to present itself externally as a functional unit and to simplify the use of technology as much as possible. For example, users searching for a product in the database of an online shop perceive the shopping experience as a single process and do not have to deal with the modular system architecture being used.
In short, distributed computing is a combination of task distribution and coordinated interactions.
Generally, there are three kinds of distributed computing systems with the following goals:
Distributed Information Systems: distribute information across different servers via multiple communication models
Distributed Pervasive Systems: use embedded computer devices (i.e. ECG monitors, sensors, mobile devices)
Distributed Computing Systems: computers in a network communicate via message passing
Throughput and performance are related but distinct concepts in distributed systems and computing:
Definition: Throughput is the measure of how many tasks, requests, or units of work a system can process or complete per unit of time.
Focus: It focuses on the system's capacity to handle load — how much work the system can do overall.
Example: Number of transactions processed per second or data packets transmitted per second.
Key factors: Network bandwidth, concurrency, parallelism, processing power, and protocol efficiency influence throughput.
Goal: Maximize the amount of work done, often measured in units per second or minute.
Definition: Performance is a broader term that reflects how well a system or component fulfills its function, often measured by latency, response time, and throughput combined.
Focus: Performance focuses on the speed and efficiency of the system to respond and complete tasks.
Example: User experience affected by response time, system latency, and throughput.
Goal: Minimize latency (delay) and response time while maximizing throughput for an optimal user experience and system efficiency.
High Performance Computing (HPC) and High Throughput Computing (HTC) are two distinct computing paradigms designed to address different types of computational workloads:
Focus: Achieving maximum performance and speed for individual, large-scale, complex, and tightly-coupled computational tasks.
Characteristics:
Uses many processors working in parallel.
Requires high-speed interconnects and low-latency communication between processors.
Tasks typically run for short periods but demand intense computation.
Metrics: Measured in FLOPS (Floating Point Operations Per Second).
Use cases: Scientific simulations, engineering design, weather forecasting, drug discovery.
Scaling: Usually scales up with powerful CPUs or GPUs tightly coupled.
Resource Management: Uses job schedulers for controlling tightly coupled resources.
Focus: Maximizing the total number of tasks completed over a long period, often handling many smaller, loosely-coupled, independent tasks.
Characteristics:
Tasks are often independent and can be distributed over many nodes or clusters.
Runs for long durations (months or years) rather than brief intensive bursts.
Focuses on high task completion rates rather than individual task speed.
Metrics: Concerned with number of jobs completed (operations per month or year).
Use cases: Large-scale simulations, scientific research with many independent tasks, bioinformatics, business analytics.
Scaling: Scales horizontally by adding more distributed resources.
Resource Management: Uses distributed resource management for loosely coupled jobs.
Fault Tolerance: Robust to individual task failures without risking overall system integrity.