Storage
Introduction
This guide explains how Gitopia's decentralized storage system works, helping you understand the technology behind our platform and how to participate as a storage provider.
Gitopia uses a custom storage module in Gitopia blockhain and storage providers to create a robust, decentralized storage network for git repositories. This system ensures your code is always available, secure, and efficiently distributed across the network.
How It Works
System Overview
Our storage system is built on three key components working together to create a reliable and efficient storage solution. The Git Server handles all your git operations, making sure your code is properly stored and retrieved. The IPFS Cluster manages the distributed storage network, ensuring your data is replicated across multiple providers. Finally, the Blockchain Integration keeps track of repository packfiles,release assets and manages the rewards for storage providers.
Understanding the Data Flow
When you push code to Gitopia, the system first processes your changes through the git server. This server creates a new packfile containing your changes and distributes it across the IPFS Cluster network. The blockchain is then updated with the new packfile information.
When you pull code, the system first checks if the data is available locally. If it's not found in the local cache, the system retrieves it from the IPFS Cluster network. This approach ensures fast access to frequently used data while maintaining the benefits of distributed storage.
Storage Approval Workflow
To ensure data integrity and user control, Gitopia employs a two-step approval workflow for updating repository storage (packfiles, LFS objects, and release assets). This process prevents storage providers from modifying repository contents without the user's explicit consent.
Here’s how it works:
Client Initiates Push: A user pushes changes to a repository using
git push
. Thegit-remote-gitopia
client sends the new data to the designated storage provider.Provider Proposes Update: The storage provider receives the data, pins it to IPFS to get a new Content Identifier (CID), and calculates its Merkle root. It then submits a proposal transaction (e.g.,
MsgProposeRepositoryPackfileUpdate
) to the Gitopia blockchain. This proposal contains the new CID, size, and a reference to the user who initiated the push.Client Approves Update: After successfully pushing the data to the provider, the
git-remote-gitopia
client automatically queries the blockchain for the pending proposal. It then signs and sends an approval transaction (e.g.,MsgApproveRepositoryPackfileUpdate
).Blockchain Finalizes State: The storage module verifies the approval transaction. If valid, it updates the repository's state with the new CID, adjusts the user's storage quota, and handles any associated storage fees. The pending proposal is then deleted.
This propose/approve mechanism ensures that only the user who initiated the change can authorize updates to their repository's storage, providing a crucial layer of security in the decentralized network.
Handling Concurrent Updates (Race Conditions)
Gitopia's decentralized nature means multiple users could try to update the same repository simultaneously. To prevent conflicts and ensure data consistency, the system uses an optimistic concurrency control mechanism. This prevents a "lost update" scenario, where one user's changes are unknowingly overwritten by another's.
How It Works
The process relies on verifying the repository's state before applying any changes. The key is the OldCid
(the previous Content Identifier of the repository's packfile), which acts as a version identifier.
State Check on Proposal: When a user pushes a change, the storage provider creates an update proposal (e.g.,
MsgProposeRepositoryPackfileUpdate
). This proposal includes theOldCid
, which is the CID of the packfile before the new changes. The blockchain verifies that thisOldCid
matches the current CID of the repository on-chain. If it doesn't, it means another update has already been processed, and the proposal is rejected.State Re-Check on Approval: During the approval step (
MsgApproveRepositoryPackfileUpdate
), the blockchain performs the same check again. It verifies that the repository's CID has not changed between the proposal and the approval. This double-check ensures that no other updates have slipped in during the short window of the approval process.
What Happens in a Race Condition?
Imagine two users, Alice and Bob, both trying to push to the same repository.
- Alice and Bob both fetch the current state of the repository, which has a packfile CID of
CID_A
. - Alice pushes her changes first. The provider proposes an update from
OldCid: CID_A
toNewCid: CID_B
. Alice's client approves it, and the blockchain updates the repository's state toCID_B
. - A moment later, Bob pushes his changes. His provider proposes an update from
OldCid: CID_A
toNewCid: CID_C
. - When the blockchain processes Bob's proposal, it checks the
OldCid
. It sees that the proposal expects the old CID to beCID_A
, but the current on-chain CID is nowCID_B
. - The check fails, and Bob's transaction is rejected with an error indicating the repository state has changed.
Bob's git push
command will fail. He will need to git pull
Alice's changes, merge them with his own, and then push again. This new push will correctly use CID_B
as the OldCid
, and the update will succeed. This mechanism guarantees that updates are applied sequentially and that no changes are accidentally overwritten.
Becoming a Storage Provider
What You Need to Know
To become a storage provider, you'll need to meet certain hardware and software requirements. These requirements ensure that you can effectively participate in the network and provide reliable storage services. For detailed setup instructions, please refer to our Storage Provider Guide.
As a storage provider, you'll participate in our verification system to ensure the reliability of the network. This system includes periodic challenges to verify that you're maintaining the data properly. Successful participation in this system comes with rewards, while missed challenges may result in penalties. Learn more about how the Storage Challenge System works and its economic incentives.