Accountability and Incentives for Consensus

It has been decided to build de-centralized Starknet using consensus engines that are based on the well-known Tendermint consensus algorithm. To be safe and live, Tendermint consensus requires that more than 2/3 of the participants are correct, that is, follow the algorithm. In a Proof-of-Stake (PoS) context, setting up a correct node is a technological challenge itself, e.g., (1) the node needs to have its private key it uses to sign consensus messages on a computer that is continuously connected to the Internet, which poses a security challenge to the setup or (2) the node needs to have high availability, as downtime of one node may results in downtime (or reduced performance, e.g., throughput) of the whole chain. We thus argue in the following that it is best practice to incentivize node operators to take up this technological challenge seriously.

A prerequisite to such incentivization schemes is to collect evidence of misconfiguration or misbehavior. When we talk about evidence here, we are only interested in provable pieces of information, e.g., if a node has used its private key to sign to conflicting messages (which is forbidden by PoS consensus algorithms, including Tendermint), that is, so-called equivocation (double vote). We don’t consider subjective criteria, e.g., whether a node did not respond before a timeout expired.

What is the typical case for equivocation seen in production systems? Let’s look at
CometBFT. CometBFT is a battle-tested consensus engine based on Tendermint consensus, which
only records specific misbehavior, namely the duplicate vote evidence. While actual attacks are rare, equivocation has still been observed in production as a result of misconfiguration. Many companies operating a validator typically implement this node as a fault-tolerant setup itself (in order to achieve availability), having copies of the private key of the validator on multiple machines. For instance, the two tools tmkms and Horcrux help managing validator keys.
If, however, a fault-tolerant setup would be implemented poorly or misconfigured, this may result in duplicate (and sometimes conflicting) signatures in a protocol step, although no actual attack was intended.

While a single instance of an unintentional double vote of one process typically does not pose big problems (it cannot bring disagreement), repeated unintentional double votes by several validators having large voting power might eventually lead to disagreement and a chain halt. Therefore it make sense to incentivize individual operators to fix their setup while the whole system is still operational.

Thus we propose that also in Starknet such behavior should lead to mild penalties (e.g., not paying fees to the validator for some time, taking a small portion of their stake as penalty), as part of the incentivization scheme motivating validator operators to fix such issues and ensure reliability of their node. I think the concrete incentivization scheme is a matter for the Starknet community and the node operators to agree on; all this lies in the application layer. In the remainder of this post, I would like to focus on the consensus layer, and lay out some options regarding what provable evidence consensus may provide to the application.

Misbehavior types

Here we give some explanation about attacks on Tendermint. If you are aware of those, and are just interested in our conclusions, just scroll down to the last section.

Tendermint is a variant of the seminal DLS
algorithm
by
Dwork, Lynch and Stockmeyer. It shares with DLS the property that if less than one third of
the processes are faulty, agreement is guaranteed. If there are more than two
thirds of faulty processes, they have control over the system.

In order to bring the system to disagreement, the faulty processes need to
actively deviate from the protocol. By
superficial inspection of the pseudo code (cf. Algorithm 1 in the
arXiv paper), we derive the
following:

  • [Double vote] correct processeses never send two (conflicting) vote messages
    (PREVOTE, PRECOMMIT) for the same height and round (that is, the messages
    differ in the value they carry; also nil is considered a value here), and
  • [Double propose] a correct proposer never send two different proposals (i.e., PROPOSAL messages) for
    the same height and round, and
  • [Bad proposer] a correct processes whose ID is different from the one
    returned by proposer(h, r) does not send a proposal for height h and
    round r.

A more involved inspection shows that if a correct process p locks a
value (setting lockedValue_p and lockedRound_p in lines 38 and 39) then it sends
a prevote for a different value in a later round (line 30) only if the
condition of lines 28/29 is satisfied. That is, only of it receives a proposal
and 2f+1 matching prevotes for the value in round vr that satisfies vr >= lockedRound_p (line 29). In other words

  • [Amnesia] a correct process never sends a prevote for a value v if
    it has locked a different value v' before and hasn’t received a proposal
    and sufficiently many prevotes for v' with valid round vr >= lockedRound_p.

Remark on the term “amnesia”. Amnesia a violation of the locking mechanism
introduced by Dwork, Lynch, and Stockmeyer into their algorithm: a process locks
a value in a round if the value is supported by more than 2/3 of the processes. A process that
has locked a value can only be convinced to release that lock if more than two
thirds of the processes have a lock for a later round. In the case of less than
one third faults, if a process decides value v in a round r the algorithm ensures
that more than two thirds have a lock on value v for that round. As a result
once a value is decided, no other value v' != v will be supported by enough correct
processes. However, if there are more than one third faults, adversarial processes
may lock a value v in a round and in a later round “forget” they did that and support a
different value.

It has been shown by formal verification (see results obtained with
Ivy, and
Apalache)
that if there are between one third and two thirds of faults, every attack on
Tendermint consensus that leads to violation of agreement is either a
“double vote” or an “amnesia” attack.

What evidence to collect

We argue that the only two types of evidence that make sense to collect are “double vote” and “amnesia”. By the verification results mentioned above, they are the ones actually required to disrupt the system.

Why not “double propose”?

First, it doesn’t harm safety by itself, as processes also need to double vote to produce agreement violations.
Second, in consensus engine implementations, sometimes there are no self-contained PROPOSAL messages, but rather they are big chunks of data that is transmitted in block parts or streamed, so that the mapping of algorithmic PROPOSAL messages to what we see in implementations is not so direct. Consequently, we don’t think it makes sense to go down this rabbit hole.

Why not “bad proposer”?

First, by itself it doesn’t harm safety, as correct processes will just disregard the produced proposals.
Second, we are only interested in “provable evidence”. So while in principle it can be proven, much more data, partly on consensus internals, needs to be included in the evidence. Checking that a process was not the proposer of a certain round and height requires knowing the state of the proposer selection algorithm at this specific point. Which depends on the state of the application at that point. Again, it doesn’t seem to make sense to investigate this, given that there is no value added.

Why “double vote”?

We have laid out above that just to keep the system stable and operational, an incentivization scheme against double votes is very pragmatic. It motivates validator operators to fix misconfigurations and ensure reliability of their nodes.
So it makes sense that the consensus engine collects this. Observe that in contrast to “bad proposer” discussed above, the data to prove misbehavior is very concise. See the evidence data structure from CometBFT, which basically just consists of two signed vote messages.

What about Amnesia?

Regarding the amnesia attack, there are trade-offs that we would like to start a discussion around:

  • Pros
    • together with “double vote” this would allow an incentivization scheme against all behaviors that can lead to disagreement
    • it would allow us to shield the consensus engine against all attacks on safety, since we could generate evidence for forensics
  • Cons
    • out-of-the-box, Tendermint consensus does not support provable amnesia evidence. However, we have developed a slight adaptations of Tendermint (roughly speaking, it adds one additional round field to votes), that would make amnesia provable. (It doesn’t involve extra steps or performance penalties, but this is actually a Pro)
    • our solution doesn’t necessarily help with the “fix misconfigurations” issue, as it only produces evidence when we have conflicting commits

Conclusions

We argue that a mild form of incentivization is useful to stabilize the system and keep it operational. Such incentivization scheme must be based on provable data. Based on these two requirements, we suggest that the consensus engines may collect two types of evidence. We strongly are in favor of “double vote” evidence and recommend to the Starknet community to agree on an incentivization scheme that is acceptable for users and node operators. We are also in favor of considering “amnesia” evidence, although this perhaps needs a broader discussion.

Thanks for the post Josef.

Bad proposer - I want to review the consensus internals that you reference:

  1. Validator Set - widely known for each height
  2. Height - public identifier of where the block is
  3. Round - somewhat of an implementation detail of consensus (Precommit quorums must have the same round, so anyone who inspects the block chain must be aware of this field)
  4. fn proposer - algorithm for selecting a proposer. Our plan for starknet is that this will be a function with inputs: validator_set, height, round, seed:
    a. seed is a “random” number that is publically known (onchain) by H-X

In fact our current thoughts is for there to be an onchain source of truth where you can call proposer(V, H, R, Seed) -> ID. Furthermore, our plan for streaming is that a proposal opens with ProposalInit(H, R, ID, Sig). Therefore, under our current plan it may be feasible to enforce bad proposers.

Since this is not a safety risk, I don’t suggest prioritizing this now; I just wanted to review the feasibility and hear what you think.

Could you please share more details about your suggestion for Amnesia?

Thank you for your detailed analysis regarding the Tendermint consensus algorithm and double voting penalty system. I’ve carefully reviewed your proposals and agree with several points:

  1. A penalty mechanism is necessary to incentivize validator operators and prevent misconfigurations.
  2. Your approach of keeping penalties mild is very logical, as our goal is to improve the system rather than punish.

However, I have some concerns:

  1. What will be the impact of the proposed penalty system on small validator operators? I believe this needs more detailed assessment.
  2. What mechanism do you envision for distinguishing between technical failures and intentional misuse?
  3. I would like to see more details about how community participation will be ensured in determining penalty durations and amounts.

I would appreciate hearing your thoughts on these points and working together to develop solutions.

I agree with your assessment. On-chain this is not a big issue. The only point to take care would potentially be that we need to maintain some historical data (over the last couple of days) in order to still know who has been proposer in the recent past, in case the evidence is submitted late (or reaches the chain late).

However, I don’t have a clear picture of who may actually need to verify evidence. It would also be possible that evidence is submitted to the L1 contract directly, as the stake is managed there. If this is the envisioned use case, then the question of additional data, and how to prove it to a component outside of consensus (i.e., the L1 contract), potentially becomes more of an issue.

We talked about this with @_dd recently, and it would be great to get clarity whether and where evidence processing should happen.

Yes, the main point of proving the “bad proposer” misbehavior is that it is not trivial to produce a self-contained evidence.

We need to rely on some form of oracle that is able to provide us, at any time until the evidence expiration time (which can be logical or physical), the expected proposer for a given forkID, height, and round, so that to attest that the “bad proposer” is indeed “bad”.

Thanks for the feedback @haroldtrk, and the good questions. All of them touch crucial points.

To give some context, I am working with Informal Systems on contributing to the design of the consensus engine. The post discusses the need for an incentivization scheme. As we haven’t discussed details on how such a scheme should look like (e.g., actual values for parameters and penalties), we would need input and collaboration from more stakeholders to fully respond to your questions. But I am happy to participate in the conversion. To your points:

  1. There are two (somewhat conflicting) concerns. (i) A penalty system might shy away small operator companies, which may lead to less decentralization. (ii) if penalties are proportional to the stake, small operators might be less incentivized to do a proper setup. I guess these points need to be balanced. Are you more worried about (i) or (ii)?

  2. From the consensus engine viewpoint, double signing is unfavorable, independent of the reason (technical or intentional), and also indistinguishable. So we propose in this post that we provide evidence to the incentivization system in any case. Perhaps at that level, some form of social consensus may be used to pardon technical mishaps: technical failure usually results in the team socializing their failure, apologizing, or explaining in post-mortems how they plan to improve; this will serve as a good method to separate intentional <> technical issues, which is a base for community decisions.

  3. Community participation is also a question for the Starknet Foundation, I believe. I hope that with this post we can get a broader discussion going.

I hope these ideas help in the discussion.

Thank you, @josef-widder , for your detailed response. The context you provided is very insightful. While I agree with many of your points, I’d like to share some thoughts and questions regarding specific aspects you mentioned:

  1. (i) The potential exclusion of small operators is indeed a critical issue. However, (ii) the risks posed by inadequate setups cannot be ignored either. My primary concern leans toward (i), ensuring small operators are not adversely affected. That said, I believe we need to discuss in more detail how to balance these two points. Do you have additional suggestions for achieving this balance?
  2. The difficulty of distinguishing between technical failures and malicious intent is a significant challenge. Your proposal for a social consensus approach makes sense. However, what processes do you envision to ensure such a mechanism operates objectively? For example, should there be guidelines or criteria for evaluating post-mortem reports and improvement plans shared after a technical failure?
  3. On community participation, you pointed out the role of the Starknet Foundation, which is a valid observation. However, without an initial framework or guiding principles, it may be challenging to expand the discussion. What would you suggest as a starting point to develop a more comprehensive approach for this?

Thank you again for your response. I’m looking forward to collaborating further and refining these ideas to create a more inclusive solution.

Hi, @haroldtrk.

I agree that the discussion here might be a bit abstract, without having a concrete framework/design for an incentivization scheme to discuss. With this post, I am trying to raise awareness that such a scheme is needed. Once we have agreement on that, we might start to draft an incentivization scheme. This should include the definition of parameters such as durations and extent (e.g. percentages or fixed values) of penalties. Then, we can have a meaningful conversation regarding Point 1 (small validators), by discussing concrete values of these parameters.

Regarding Point 2, I believe guidelines are important, they give clarity on what criteria a decision is based on.