Winter 2025 Course Journal: EECS 491

Course Title: Distributed Systems

Motivation

This is one of the only three upper-level courses that doesn't conflict with my schedule. Also, Brian Noble is one of my favorite professors.

When he was teaching 482, Brian brings up 491 sometimes, and if he teaches it, it has to be good. Although I'm not a software person, it might come useful in designing a mesh of embedded systems.

Go language

I wrote a little bit of Go in high school but it was nothing serious.

Brian said we use Go because it was invented at Google to power their distributed systems, concurrency and all that stuff. He also emphasized we were to write "idiomatic Go", which means making use of strengths in the language.

Being C-like makes it easy to pick up, but there's new stuff. So far, my favorite thing is the for ... select loop and it's incredibly cool.

Over the winter I've been doing an embedded project in C++ with FreeRTOS. In FreeRTOS, one way to block a task until data shows up is to receive from a queue. The problem is, I need to wait for three events (represented as three structs), but you can only shove one type in a queue. So what I ended up doing is write a wrapper struct with an event type and a pointer, which I then cast to various types based on event type.

With Go, each queue would be a channel, and you could wait for three all at once.

The go keyword in go spawns a goroutine, which is basically like spawning a thread, but safer and cheaper.

Another big thing that's different from the C world, namely 482, is that we're forbidden from using mutexes or any low-level synchronization mechanism. We're also cautioned against passing values by pointer to two concurrent processes. Copying a couple kilobytes is negligible compared to network roundtrips.

Non-deterministic behavior

I've taken 482, so I'm familiar with programs where things happen in a different order each time. And I know that, if there is more than one thing in a pool, you cannot predict which one is getting out first. This can lead to non-deterministic behavior, and that's ok as long as it's in the set of "allowed" non-deterministic behaviors. This reminds me of a Tom Scott video that explains why computers "can't count", e.g. why the number of upvotes on a Reddit post changes every time you refresh.

Project 2 and 3

Not good. I only got ~80% each on the autograder.

P2 is primary/backup. P3 is Paxos. They're different ways to ensure consistent data if part of a system dies.

P2 was probably a skill issue. Some excuses I can make with p3 are:

  • I got hit by a car and missed the discussion section on p3
  • I attended MCFC and basically wasted three days

I'm passing all my test cases locally, unless I run it with the -race flag, in which case some of them deadlock. How the fuck does this happen.

I was really excited when, upon guarding the termination channel properly, I passed all part A cases. But I couldn't get part B to work no matter what I try. The struggles devolved into trying obviously wrong things just to see what the fuck happens, because that's what you do when the deadline is in six hours.

I did not expect this course to be such a pain.

Piazza "resolved"

One day I ran into an error where Go would complain

rpc: service/method request ill-formed: Prepare

I searched on Piazza and found a post with the exact same problem. Their follow-up was:

resolved

tfw smh.

After a while I figured it out, and posted my best passive aggressive reply to date:

I ran into the same error and found this piazza post. The moment I gazed upon the ancient incantation that begins with an “R” and ends with “esolved”, an angel descended from heaven and brought me to the library of sacred texts amongst seas of untold secrets. She summoned for me a book titled “Trying Stuff until it Works”; it was when my finger touched its dusty cover that did the epiphany surface, and I fell into a thousand years’ sleep. When I woke up I quickly tried what the book revealed to me, and it worked magically. Anyway the solution is to use “Paxos.Prepare” etc for the rpc name

Project 4

P4 is a sharded KV store made of Paxos groups. "Sharded" means every group is responsible only for a subset of keys, which allows for a form of distributed storage. A special Paxos group manages the others, which grants it the name "ShardMaster".

Anyway, with a fair bit of thinking that's way more level-headed than the P3 mud, it was less complicated than I thought. The instructors defined a list of RPC error codes, and I passed all test cases — even the most difficult ones — without even using all of these. Well, if it works it works.

Why though? Have I really internalized the course principles without realizing it, or was I just lucky? It might be impossible to know.

Project stats

I counted the lines of code I wrote for each project, counting code, comments, and blank lines that differ from the starter code (basically git diff --stat).

  • P1 MapReduce:

    • wordcount: 21
    • mrdistrib: 61
  • P2 Primary/Backup:

    • viewservice: 246
    • pbservice: 348
  • P3 Paxos:

    • paxos: 531
    • rsm: 56
    • kvpaxos: 136
  • P4 ShardKV:

    • paxos: 527 (copied from p3)
    • rsm: 56 (coped from p3)
    • shardmaster: 362
    • shardkv: 266

Exams

Both exams were on Canvas, with a hefty amount of questions. All of them require substantial typing, full sentences where you explain your reasoning (occasionally code). They were open-book, open-internet, but no chatbots or human help. They were graded against a rubric. It must be difficult to grade this course.

End of course

In 373, I attempted to build a pair of bicycle lights that blink in unison. What I did not know was, it was a rudimentary distributed system, where two systems try to reach consensus by communicating. Although Leslie Lamport might disagree because even though it was broken, I have heard of both machines.

Anyway, what I learned in 491 is, agreeing on something gets hard once you have more than one agent capable of making decisions, whether it's goroutines, servers, or people. Will I be using Paxos when I'm out with an even number of friends? Absolutely not. But I will remember that there is always a tradeoff between consistency and efficiency.

Paxos is consistent, and I've come to accept that it's about as efficient as we can ever get. But what makes Paxos efficient also lays in the fact that the agents trust each other. They just want to get someone's work done. Without this trust, the system would operate under tyranny — such as hardware interrupts and a task scheduler, as I learnt in 482.

I wish the world was more like 491 than 482.