IPFS

IPFS#

Distributed file system, the principle is similar to BT, by chunking files, each chunk corresponds to a CID and various levels of Hash for storage and verification, using DHT (Distributed Hash Table) for lookup and routing.

IPFS Documentation#

https://docs.ipfs.io/ Mainly look at the Concepts and How-tos sections.
IPFS generates different CIDs for each Content, for the need for fixed links, it can be achieved through IPNS, but IPNS is not suitable for rapidly changing content, with an update interval measured in minutes.

Some services using IPFS#

Online music: https://github.com/icidasset/diffuse, https://diffuse.sh/
Online video: https://github.com/download13/ipfstube, http://www.ipfs.guide/, https://ipfstube.erindachtler.me/ https://ipfs.video/
Gateway speed test: https://ipfs.runfission.com/ipns/ipnso.com
Rick Astley - Never Gonna Give You Up: QmcniBv7UQ4gGPQQW2BwbD4ZZHzN3o3tPuNLZCbBchd1zh

Gateways#

The official list of available gateways: https://ipfs.github.io/public-gateway-checker/ Below are some tested available gateways.

http://ipfs.io
https://ipfs.greyh.at/ (available in China)
https://ninetailed.ninja/ (available in China, good speed)
dweb.link
ipfs.runfission.com (available in China, good speed)
http://trusti.id/
http://ipfs.globalupload.io/

go-ipfs#

The Go language project of IPFS is its main open-source implementation: https://github.com/ipfs/go-ipfs/

Configuration instructions for go-ipfs#

Detailed configuration instructions can be found at https://github.com/ipfs/go-ipfs/blob/master/docs/config.md which is more detailed than the manual.

Compiling go-ipfs#

Refer to https://github.com/ipfs/go-ipfs#download-and-compile-ipfs

The source code of the Github project is over 30MB, cloning may take a long time, if it's too slow and errors occur, a proxy needs to be added. Compilation uses the make build command, which is mostly spent downloading dependency libraries.
Set GOPROXY before executing to avoid download timeouts.

# Check go version, must be above 1.13
go version
# Set proxy
git config --global https.proxy 'socks5://127.0.0.1:10080'
# Check
git config -l
#
git clone https://github.com/ipfs/go-ipfs.git
cd go-ipfs/
# Set GOPROXY
export GOPROXY=https://goproxy.cn
# Check
echo $GOPROXY
# Compile
make build
# Check compilation result
./cmd/ipfs/ipfs version

Running IPFS on Armbian for Amlogic S905L#

Download the Arm64 version from the GO IPFS Github project, Armbian version is 5.99, kernel 5.30.

# Download
wget https://github.com/ipfs/go-ipfs/releases/download/v0.5.1/go-ipfs_v0.5.1_linux-arm64.tar.gz
# Extract
tar xvf go-ipfs_v0.5.1_linux-amd64.tar.gz
# Directly run install.sh to install, this script will copy the ipfs directory to /usr/local/bin
cd go-ipfs
./install.sh

After initialization, start with a normal user.

# Check version
ipfs --version
# Initialize node
ipfs init
# View instructions
ipfs cat /ipfs/QmQPeNsJPyVWPFDVHb77w8G42Fvo15z4bG2X8D2GhfbSXc/readme
 
# This command will not run in the background, it is recommended to create a screen session first and execute in the screen session
ipfs daemon

If IPFS is not running on the current computer, then the configuration needs to be changed in two places to access the webui.

One is Addresses.API and Addresses.Gateway.

"Addresses": {
  ...
  "API": "/ip4/127.0.0.1/tcp/5001",
  "Gateway": "/ip4/127.0.0.1/tcp/8080"
}

Change the API address to /ip4/0.0.0.0/tcp/5001 to listen on all network interfaces. If your server has a public IP, this can pose a security risk, so this IP can be set to the internal network interface address.
Change the Gateway address to the internal or public address. If IPFS is in the internal network, the external network can access it via NAT, which can be configured as an internal address.

The other is API.HTTPHeaders, to avoid cross-site errors when accessing IPFS's webui from the current computer.

"API": {
    "HTTPHeaders": {}
},

Add two items in HTTPHeaders, which can refer to Gateway.HTTPHeaders.

"Access-Control-Allow-Methods": [
  "GET","PUT","POST"
],
"Access-Control-Allow-Origin": [
  "*"
]

After modifying the configuration, restart IPFS to access http://IP:5001/webui/ to graphically view the current node status, the number and size of stored blocks, details of connected nodes, and view the current configuration. This interface even provides some demo files for browsing.

Through the Gateway, files on IPFS and files with queryable CIDs can be accessed. Any accessed file blocks will be cached, and on subsequent accesses, the corresponding blocks will go directly through the cache without being downloaded from the remote again.

After selecting a file to upload, the transfer will take place in the background on the page (it is not confirmed whether the tab can be closed), and the total storage growth can be seen on the status page. For files of different sizes, the visibility delay of their publication varies. For files under 10MB, they can be quickly discovered through the remote gateway, while for files close to 500MB, it may take half an hour to an hour. This is related to the mechanism of the node publishing its own content CID to other nodes, as a single file block is 256KB (262144Bytes), and for a 500MB file, it will generate over 2,000 CIDs, making the publishing time much longer.

When adding files in the webui, you need to wait until the file is fully uploaded before performing the pin operation. If the file is uploaded via CID, submitting the CID will not trigger file synchronization. Performing a pin operation on this CID will trigger IPFS to start synchronizing from other nodes, and once the file synchronization from other nodes is complete, it will appear in the pin list.

Installing as a service#

Create the file /etc/systemd/system/ipfs.service and write:

[Unit]
Description=IPFS Daemon
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
Type=simple
ExecStart=/usr/local/bin/ipfs daemon --enable-namesys-pubsub
User=milton
[Install]
WantedBy=multi-user.target

Then add it through systemctl.

Configuration instructions#

IPFS exposes three main ports: API, Gateway, and Swarm, among which:

API: defaults to port 5001, providing the webui for managing and controlling IPFS. When setting the listening network interface, care should be taken not to expose it to the public network.
Gateway: defaults to port 8080, providing content lookup and download services for ipfs/CID.
Swarm: defaults to port 4001, this port is used to listen for requests from other IPFS nodes.
Addresses.NoAnnounce adds internal IPs that need to be excluded; these IPs will not be announced, but be careful not to exclude 127.0.0.1 and ::1, as these seem to be used by other nodes to check whether the current node supports ipv4 or ipv6. If excluded, it will not be able to maintain connections with other nodes (can ping and connect, but cannot find in swarm peers).
Swarm.AddrFilters adds internal IPs to be ignored; for peers announcing internal IPs on ID, these internal IP addresses will be filtered out.
Discovery.MDNS.Enabled set this option to false to avoid initiating node searches in the internal network.
Peering.Peers adds nodes that need to be kept connected.

Fixed nodes#

For self-built networks, it is necessary to maintain connections between your own nodes. However, under the default mechanism of IPFS, even if self-built nodes are set as bootstrap, after running for a while and with an increasing number of connected nodes, it will still close connections between other self-built nodes. To maintain connections, you need to use the Peering section in the configuration file, formatted as follows, with the second being the access address of ipfs.runfission.com.

{
  "Peering": {
    "Peers": [
      {
        "ID": "QmPeerID1",
        "Addrs": ["/ip4/18.1.1.1/tcp/4001"]
      },
      {
        "ID": "QmVLEz2SxoNiFnuyLpbXsH6SvjPTrHNMU88vCQZyhgBzgw",
        "Addrs": ["/ip4/3.215.160.238/tcp/4001", "/ip4/3.215.160.238/udp/4001/quic"]
      }
    ]
  }
  ...
}

Nodes using Peering configuration:

In connection management, it will protect the connection between this node and the specified node; IPFS will never actively (automatically) close this connection, and it will not close this connection even when the connection count reaches the limit.
It will establish a connection at startup.
If the connection is lost due to network reasons or the other node going offline, IPFS will continuously attempt to reconnect, with the interval length between attempts randomly ranging from 5 seconds to 10 minutes.

Running IPFS under public NAT#

Operating environment#

The server with a public IP is Centos7, with a public IP of 118.119.120.121 and an internal IP of 192.168.13.10.
The internal server is Ubuntu18.04, with an internal IP of 192.168.13.25.

Setting up port forwarding on the public server#

Configure the global forwarding switch.

firewall-cmd --permanent --zone=public --add-masquerade

Forward the public IP port 4002 to the internal server's port 4001.

# TCP port forwarding
firewall-cmd --permanent --zone=public --add-forward-port=port=4002:proto=tcp:toaddr=192.168.13.25:toport=4001
# UDP port forwarding
firewall-cmd --permanent --zone=public --add-forward-port=port=4002:proto=udp:toaddr=192.168.13.25:toport=4001
# Apply the settings
firewall-cmd --reload
# Check
firewall-cmd --zone=public --list-all

If the gateway is an OpenWRT router, just add the forwarding rules directly in the firewall -> port forwarding. Note that after adding the forwarding rules, you also need to allow WAN access to this device on this port in the communication rules.

Limiting the number of connected nodes#

You need to use the Swarm/ConnMgr/HighWater parameter. In version 0.6.0, this setting did not work well; after running the server for a long time, the number of nodes would far exceed the set limit. In version 0.7.0, it is effective.

"Swarm": {
    ...
    "ConnMgr": {
        "GracePeriod": "30s",
        "HighWater": 500,
        "LowWater": 100,
        "Type": "basic"
    },
    ...
}

Configure the IPFS service on the internal server.

# Installation process omitted
 
# Initialize in server mode
ipfs init --profile=server
 
# Modify the configuration, see the specific instructions below
vi .ipfs/config
 
# Start
ipfs daemon

The server mode has several changes compared to the normal mode:

Addresses.NoAnnounce in server mode will list all internal IPs, which will not be announced.
Swarm.AddrFilters in server mode will list all internal IPs; peers connecting with internal IPs will be filtered out.
Discovery.MDNS.Enabled in server mode is set to false to avoid initiating node searches in the internal network.

In addition to the API, Gateway, and API HTTPHeaders of normal nodes, you also need to configure Addresses.Announce, adding the public IP and port of this node.

"Announce": [
  "/ip4/118.119.120.121/tcp/4002",
  "/ip4/118.119.120.121/udp/4002/quic"
],

Issues encountered#

Using OpenWRT configured gateways does not have this issue, but when using Centos as a gateway, it seems unable to correctly pass the external IP, resulting in many third-party nodes displaying the IP as the gateway IP. Connections to this node can succeed, but ping fails. Therefore, the list in Swarm.AddrFilters should delete parts of the same subnet.

The specific reason can be seen from the swarm peers of this node. For nodes actively connecting from this node, the recorded IP is the public IP. However, for nodes passively connected (i.e., connected through the gateway 118.119.120.121), the recorded IPs are all the internal IPs of the gateway. According to the rules of AddrFilters, these nodes will be discarded. This leads to the issue where ipfs swarm connect succeeds, but ipfs ping fails.

 $ ipfs swarm peers
/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ
/ip4/111.231.85.77/tcp/4001/p2p/QmWv1eLMNHPpwYKzREQBEpDfYjW6YXCrVpVyBZVjAuSd2i
...
/ip4/192.168.13.10/tcp/10041/p2p/QmXUZth5Pr2u1cW65F7gUeFwjkZFfduE1dwqiysNnrTwXd
/ip4/192.168.13.10/tcp/10053/p2p/QmPM3bepMUKpYTza67coD1Ar3efL7FPBFbGRMc42QLf4q9
/ip4/192.168.13.10/tcp/10202/p2p/QmVBzjW2MyNrSuR48GjcqB6SAJTnw8Y9zaFbgbegh4bRx4
/ip4/192.168.13.10/tcp/1024/p2p/QmbBEfFfw59Vya9CJuzgt4qj9f57sPZGiyK7fup8iKqkTr
/ip4/192.168.13.10/tcp/1025/p2p/QmcmhwCeLkBJvcq6KJzN58BRZg1B1N8m3uA3JyQKpVn64E
...
/ip4/192.168.13.10/udp/6681/quic/p2p/QmcSCBpek4YF5aAsY7bUMxiL7tacoYMeXUJUpU4wctqX4w
/ip4/192.168.13.10/udp/8050/quic/p2p/QmWeuXCNKAHfbineKMqo3U3dvVSz2em1w67pj5Up6tkUXo
/ip4/206.189.69.143/tcp/31071/p2p/12D3KooWHDr5W3Tse17mr4HSzuQm44dVQYp8Bb638mQknsyeHXSP
/ip4/206.189.69.250/tcp/30511/p2p/12D3KooWRd1BNPd8PMfxpCT7TNCFY4XSZsy8v8Cmm36H136yxzub
...

Further check whether it is possible to ping test from this node to the passively connected nodes, taking one Peer ID for the ping test.

ipfs ping QmXUZth5Pr2u1cW65F7gUeFwjkZFfduE1dwqiysNnrTwXd
PING QmXUZth5Pr2u1cW65F7gUeFwjkZFfduE1dwqiysNnrTwXd.
Pong received: time=26.43 ms
Pong received: time=25.70 ms
Pong received: time=26.31 ms
...

This indicates that the nodes recorded as the gateway's internal IP are available and should be retained.

These nodes connected through the gateway can be divided into two categories: nodes without public IP and nodes with public IP:

For nodes with public IP, it is unclear whether they will use the other party's Announce address to ping back after the initial connection. If they do and will update the node's address based on this result, then these nodes will briefly stay in the list displayed as internal IPs before updating to public IP addresses. This way, when other nodes connect later, they can connect through their public IP addresses.
For nodes without public IP, the server node cannot connect by pinging the other party's Announce address, and can only connect through the internal IP of its own gateway, so they will remain in the list displayed as internal IPs. These nodes cannot be shared with other nodes.

Optimizing download speed#

If you want to read files through CID, you should first choose a faster gateway. Using the ipfs.io gateway to obtain files is the most reliable, but due to connection issues, the speed may be very slow.

If you want to send files through CID, you must ensure that the gateway used by the other party is in the connection list of the current file instance, so maintaining a list of fast gateways is very important. Adding these gateways to your Peering.Peers will greatly improve your file publishing speed.

Upgrading IPFS#

For major version upgrades, you need to use the fs-repo-migrations tool. Refer to the documentation for instructions; it's just two steps: back up the .ipfs directory and run the fs-repo-migrations command. Before this, you need to stop the IPFS service.

IPFS Desktop#

Install IPFS Desktop on Windows 10, which is installed by default in the user directory C:\Users\Milton\AppData\Local\Programs\IPFS Desktop. After installation, the program directory is 265MB.
The data directory is located at C:\Users\Milton.ipfs, with the same format and content as go-ipfs. There are two IPFS Desktop processes running in the background, and two ipfs processes, consuming about 500MB of memory in total.

The status window is actually a webview displaying the content of the webui.

The configuration only changes the water level to 50, 300, while everything else remains the same.

Resources can be accessed if they exist (cached) on directly connected nodes; otherwise, they cannot be accessed.

IPFS Private Network and Cluster#

Refer to https://labs.eleks.com/2019/03/ipfs-network-data-replication.html

Run IPFS as a service, with automatic startup on boot.

# Create service file
sudo vi /etc/systemd/system/ipfs.service
 
# File content
[Unit]
Description=IPFS Daemon
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
Type=simple
ExecStart=/usr/local/bin/ipfs daemon --enable-namesys-pubsub
User=root
[Install]
WantedBy=multi-user.target
# End of file content
 
# Add service
sudo systemctl daemon-reload
sudo systemctl enable ipfs
sudo systemctl start ipfs
sudo systemctl status ipfs

Add IPFS Cluster as a service.

# Create service
sudo nano /etc/systemd/system/ipfs-cluster.service
 
# File content start, note that After includes the ipfs service to ensure the startup order
[Unit]
Description=IPFS-Cluster Daemon
Requires=ipfs
After=syslog.target network.target remote-fs.target nss-lookup.target ipfs
[Service]
Type=simple
ExecStart=/home/ubuntu/gopath/bin/ipfs-cluster-service daemon
User=root
[Install]
WantedBy=multi-user.target
# End of file content
 
# Add to system services
sudo systemctl daemon-reload
sudo systemctl enable ipfs-cluster
sudo systemctl start ipfs-cluster
sudo systemctl status ipfs-cluster

Application scenarios of IPFS#

Through practical testing, for IPFS nodes in the public network, once a connection is established, the speed of CID publishing and reading is very fast. Excluding the network's inherent latency, the time from request to the start of transmission is basically within 2 seconds, and the transmission speed depends on the bandwidth between the two points.

The indexing form of IPFS content is very suitable for file sharing among teams, as each modification generates an index change, allowing for version control. The distributed file access points and download points facilitate cluster scalability, and the caching feature can reduce the impact of hot data on bandwidth resources.

Audio and Video Distribution#

Used to replace existing PT download networks. Since individual files propagated by PT are large, unified pin management must be performed on all nodes in the PT group to ensure access speed to hot content at each terminal node and to ensure that long-tail content has sufficient backups and will not be lost.

Streaming Media and Download Acceleration#

The characteristics of IPFS can naturally replace CDN services. Static files such as images, CSS, JS, and compressed packages, as well as some live broadcast services with low time sensitivity. With the widespread adoption of IPv6 by ISPs, the bandwidth resources of households with public IPv6 addresses can be fully utilized for regional content acceleration.

libp2p#

libp2p is a well-packaged p2p module that has been separated from IPFS. The module already includes mechanisms such as PeerId, MultiAddress, and Protocol Handler, making it easy to expand your own applications.

The Go language implementation can be found at https://github.com/libp2p/go-libp2p, with sample code at https://github.com/libp2p/go-libp2p-examples.

The usage in the code generally follows these steps:

Create Host
For the specified protocol, set StreamHandler on the Host
If there are local ports providing services, create the corresponding service and listen on the port
Specify the target node and protocol, create Stream
Write data to the Stream
Read data from the Stream, close or not close the Stream based on business needs

After starting the Host, if you need to keep it running, you can use the following methods:

# Method 1, use select
select {} // hang forever
 
# Method 2, use the built-in service's ListenAndServe
http.ListenAndServe(serveArgs, p)
 
# Method 3, use channel
<-make(chan struct{}) // hang forever