Thursday, December 01, 2022

Software Systems and Entropy Increase

Entropy increase is the basic law of the universe. It is said that the overall entropy of a closed system can only increase, not decrease.

The development process of civilization is to fight against it in order to find the slim chance of survival.

Human beings cannot fight against the increase of entropy, and there are only one or two workarounds.

Transfer Entropy

A closed system tends only to chaos. The easiest way to avoid entropy increase is to transfer entropy. Break out of closure and transform into an open system. Entropy is expelled and energy is brought in.

This is the reason why children compete for toys. The younger brother's toys were snatched away by the older brother, so the younger brother was naturally very unhappy, but the older brother's mood might improve a little. Although after fighting for the toy, the sum of the two people's mood entropy still increased.

The reason why the Internet can grow into a huge system on a global scale is that it is constantly absorbing innovations from all walks of life, constantly evolving, and constantly reducing the trend of chaos. Its entropy has been transferred to other industries, forming a huge impact on the traditional model.

Draw the Boundary

If it cannot be transferred, then you can only rely on yourself to create a few subsystems, and draw boundaries to avoid entropy increase from affecting the main body of the system.

In a system that does not distinguish boundaries in the economic field, it is often the case that "bad money drives out good money". The more factors are mixed together, the more likely the system tends to degenerate.

The layering, modules, components, microservices, etc. mentioned in the software architecture design are actually drawing boundaries, trying to limit the overall complexity of the software within a certain range.

Introverted

In the end, there is no other way but to stop tossing and slow down the entropy increase in a restrained way.

For software services, it is placed in the maintenance state. No new features will be added, and no new users will be accepted.

Tuesday, November 08, 2022

Attack and Defense in Distributed Finance (DeFi)

In the past four years (2018 ~ 2022), the blockchain-based distributed finance (DeFi) industry has raised 253 billion US dollars, and the losses due to attacks have exceeded 3 billion US dollars. Although this is lower than the loss of the traditional financial system, it still sounds the alarm for fintech scholars. Distributed finance is not a silver bullet in the face of multi-level complex attacks.

The paper "SoK: Decentralized Finance (DeFi) Attacks Liyi" co-authored by Imperial College London, Technical University of Munich, University of Macau, Swiss Institute of Technology, University of California, Berkeley, etc. analyzed 77 papers, 30 audit reports, and 181 incidents. Some interesting analysis came up.

First of all, in terms of architecture, the attack involves four layers, from top to bottom:

Protocol layer: realize the application of distributed scenarios, digital currency, exchange services, etc.;
Smart contract layer: code, data structure and execution environment for implementing financial logic;
Consensus layer: consensus algorithm (including PoW, PoS, etc.), incentive mechanism;
Network layer: communication and network protocols, traffic analysis, data transmission, etc.

In addition, there are auxiliary services outside the chain, including client, operation layer, Oracle, etc.

A few statistics:

The attack trend is increasing, with the highest in August 2021, with a monthly loss of 600 million US dollars. 3.1 attacks per month in 2020 and 8.5 in 2022;
Attacks mostly occur at the protocol layer (mostly stablecoins and lending applications), smart contract layers and auxiliary services;
Academic research is relatively average across layers, including network and consensus layers. However, almost all audit reports in the industry focus on smart contracts, and a small amount of them are auxiliary services;
Most attacks are not fast and can be prevented by a pause at the protocol layer. But in fact, only 1 of the 87 protocols can respond within an hour;
Auditing in advance can effectively prevent attacks. 15.49% of unaudited protocols were attacked, while only 4.09% of audited protocols;
Early detection is a more effective method, and most contract loopholes can be detected in advance. However, there is currently a lack of effective protocol layer detection tools;
Most attackers can be traced due to the centralized trading and mining mechanism used.

From the above statistical results, it can be seen that the attack and defense of DeFi is actually very close to the traditional attack and defense. The most common attacks are often not technically sophisticated enough to be identified at an early stage and effectively stopped, but systematic detection tools are currently lacking. In addition, academia and industry focus differently.

Monday, August 15, 2022

Mastering Systematic Thinking

The whole is greater than the sum of its parts, and structure determines system behavior.

What is System Thinking?

System thinking is an approach to problem solving by thinking holistically.

Different from the simple way of thinking about the problem itself intuitively, systematic thinking often needs to observe the behavior, structure, and association of complex systems, summarize its internal laws from different levels, and understand its operation. Furthermore, the internal laws can be changed by adjusting the structure of the system to achieve the goal of changing the behavior of the system.

Focus on the whole, not the parts. Focus on connections, not things.

For example, seeing an apple falling to the ground, the intuitive way of thinking is that the apple will fall to the ground when it is ripe.

Systems thinking may need to consider:

What is the connection between apples, fruit trees and the ground?

What internal law causes the behavior of the apple falling to the ground?

What factors can be changed to prevent apples from falling to the ground?

......

Another example is to see inflation, intuitive thinking may think that it is due to additional currency issuance. New systems thinking takes into account the cyclical laws and distribution mechanisms of the economy.

Why master systems thinking?

The world itself is a complex system, and many problems in real life are dealing with complexity. For example, designing a bridge, building an assembly line, implementing an enterprise software, and so on.

Simple systems tend to be linear, i.e. 1+1=2, while complex systems are usually nonlinear, i.e. 1+1>2.

Usually, due to the limitations of knowledge, cognition, and way of thinking, it is difficult for humans to intuitively see the whole of things.

Also, it's hard to understand it directly for most complex objects.

These all require dissection and thinking using systems thinking, which can help us analyze problems more comprehensively.

How to Master System Thinking

Systematic thinking can be mastered through training, which mainly includes the following steps:

First, to observe the dynamic behavior of the system, including system events, behavior characteristics, and summarize the behavior rules of the system.

Afterwards, its possible internal structure is predicted through behavioral laws.

Further, the divide-and-conquer system is divided into multiple small-scale simple systems according to the structure.

To verify that the predicted structures are accurate, prototypes can be built to conduct experiments. Make adjustments through experimental feedback.

Tuesday, July 12, 2022

Some whimsical thoughts

1. From the point of view of computational complexity, polynomial means equivalence, and exponent means difference; from the point of view of number, polynomial is an equivalence class, and exponent brings about the difference of rank. So, with a high probability, P!=NP . But why are indices so special? Can it be analyzed from an information theory perspective? In addition, from the perspective of differentiation, it means dimensionality reduction for polynomials; exponents maintain invariance, is it related to this?

2. Financial activity is like water, flowing in the direction of potential decline. The amount of water in the process does not change, but it will affect the environment on the path. Without path control, water flow can cause bad results. If there is demand, and there is no path, the flow may spontaneously rush out a new path.

3. The basic principles of the natural sciences have been studied relatively clearly, but the basic principles of the social sciences seem to be unclear, and in many places contradict the laws of the natural sciences. But living things themselves should obey the laws of nature. Unless it’s because of the introduction of a new variable through some spontaneous or other behavior.

4. The meaning of mathematics is to grasp the essence of the problem. Consistency in a system can be seen through different forms. For example, high dimensions look at low dimensions, and complex numbers look at simple numbers. Conversely, generalization from general concrete operations to abstract more general operations, such as addition, subtraction, multiplication and division, reflects the law of association between things. Perhaps, to study mathematics is to study how the universe is constructed.

5. Many times, the idea of solving the problem cannot be found because the problem has not been clearly defined. So the definition is the hardest. The reason why Tao Te Ching is difficult to understand is precisely because of the lack of this premise of definition. In this sense, the range of human perception depends on the ability to express language. So, are the expressive abilities of different languages equivalent? If not equivalent, which language is more efficient?

6. Asking questions is sometimes more important than solving them, especially in cutting-edge fields. Science is to ask questions and further find solutions. Engineering is more important to achieve better.

7. Increasing entropy means eliminating difference; decreasing entropy means introducing difference. The basic form of difference expression is comparison. And sorting is undoubtedly a natural comparison. Nature doesn’t like comparisons, but humans do.

8. The traditional goal of Eastern education is basic training, which means that there will be people who cannot get enough to eat. In the information age, there are more ways to access knowledge, which means that personal learning will become more critical; it means that education will return to the elite of Western education.

9. In the past, human civilization was based on individual abilities, that is, the development of the whole depended on talented individuals. And this model has basically approached its limit. The follow-up is either to develop group intelligence cooperation; or to see if machine-assisted thinking can break through.

10. The world is smooth in most cases, which means that the laws summarized through local analysis are likely to be general. It also means that many things have prior knowledge. But once there are a few unsmooth situations, there will be more out of control than expected.

11. Learning new knowledge will go through four stages: not understanding, feeling understood but unable to speak, being able to explain clearly to people in the same field, and being able to explain clearly across fields. Therefore, being able to talk about a thing means that you have already started; talking about a complex issue in a simple way means you really understand it.

12. Transformation (compression, encryption…) does not change the dimension and does not lose information, so it is reversible; but dimensionality reduction (hash) will lose information and be irreversible.

13. If DNA is the code for software, biology is the form in which the software’s interpretation operates. But where does consciousness come from? The essence of the physical world is information, but consciousness is not information.

14. Natural evolution is between intelligent design and random evolution, but because there are external influences, it is closer to intelligent design than random selection; but civilization evolution currently cannot see where the external influence is, which may be why civilization evolves Reasons for slower evolution relative to nature?

15. There are two major directions in the evolution of civilization: one is that all individuals greatly strengthen their ability to communicate and form a community; one is the great development of science and technology, and individuals completely separate from the group and exist independently.

16. The difference between countable and uncountable is that after separately countable division, the former will be finite, while the latter will remain infinite.

17. Information → increase order → break symmetry → destroy energy → generate mass. But how does the reverse generate information from mass? If it can also be proved, then information, energy, and quality can be unified, and the world will no longer have mysteries.

18. Both absolutely ordered and absolutely disordered systems are fragile, and most sustainable systems are always somewhere in between, and can constantly adjust themselves.

19. From the point of view of the scalability and complexity of interaction, the peer-to-peer network is polynomial, the centralized network is linear, and the hierarchical network is logarithmic. This, in turn, means that as the size of the network increases, most will naturally go through such processes as peer-to-peer, centralized, and hierarchical.

20. The ultimate mystery of how the world works is so fascinating. When I was young, I naturally took physics as my research direction. However, it soon became apparent that the subjects were unable to recognize the system itself. Probably the best way to understand a system is to construct it yourself. Information science brings a glimmer of possibility.

21. A paradox of traditional economic theory is that in order to optimize the allocation of resources, it is necessary to pass the production and consumption links. This leads to crises (economic, financial, war) that inevitably come periodically. Actually the two don’t have to be tied together at all.

22. The only difference between a market and a plan is who gets most of the resources, the contestants or the referees. Most countries now participate in both.

23. The reason why organisms evolve is because they have been unable to get rid of the influence of uncertain environmental changes; the reason why human beings stopped evolution is to create a controllable external environment through technological means.

24. In the early 20th century, the reason why physicists were able to discover the theory of relativity may have something to do with the emergence of this form of film at that time. Experience, abstraction, and association are important abilities. One of the important reasons why it is difficult for humans to imagine certain inferences in science is the lack of similar experience.

25. time == space, computation == storage.

26. Energy == Matter. Matter is the property of space, and space is the representation of information. There is a deeper level of conservation behind this. Incomprehensible experiments in quantum mechanics are much more natural if the understanding of the world simulates computation.

27. Resources are generally limited, so some members are allocated more and others less, requiring more labor and preferably acknowledging labor.

28. Local optima often lead to failure to achieve global optima. The way to improve is to introduce information and trust. Trust in technology will be the key to a new level of human social organization.

29. The essence of finance is to transcend time and space; the essence of exchange is to eliminate uncertainty through information.

30. The essence of mathematics is intuition, the cognition of the laws of the world. Calculation is just a process, not a necessity.

31. The only thing that can affect the speed of space-time is the electromagnetic force, perhaps because it is the fundamental force that makes up the world.

32. Perhaps the reason why fractals are ubiquitous in nature is that they are easier to generate with a small amount of computation.

Thursday, April 21, 2022

Latest Progress in the Fed's Digital Currency

Since 2016, Central Bank Digital Currency (CBDC) has gradually become an important subject of research and development experiments by central banks around the world. In terms of application scenarios, general CBDC is oriented to retail, online shopping, personal payment, etc., basically corresponding to cash scenarios, and is the main research direction at present. In addition, there are CBDCs such as financial institutions’ reserves.

The Federal Reserve has been cautious in its exploration of digital currencies, and has suppressed Facebook's Libra project. But it has been conducting exploration and research itself, mainly including its financial laboratory and the "Hamilton" project that authorized its Boston branch.

Note: The project name honors two people: Alexander Hamilton, the first U.S. Treasury Secretary and founder of the financial system. Margaret Hamilton, director of software engineering at the MIT Instrumentation Laboratory, was involved in software development for the Apollo program.

Hamilton Project

The "Hamilton" project is an exploratory research project by the Federal Reserve Boston Branch and the MIT Monetary Research Center.

The project is divided into two phases:

Phase 1: Solve core issues such as high performance, reliable transactions, scalability, and privacy protection. Target 100,000 TPS, second-level confirmation, multi-region fault tolerance.
Phase 2: Solve key issues such as auditable, programmable contracts, support for intermediary layers, attack prevention, and offline transactions.

After several years of hard work, the first phase was completed in February this year. The source code OpenCBDC was released in the form of open source software, mainly developed through C++, following the MIT open source license agreement, and the project address is mit-dci/opencbdc-tx.

Two kinds of engines were tested. The single-order node engine Atomizer (order-preserving) can reach a peak value of 170,000 TPS; the parallel execution engine 2PC (order-preserving) can reach 1.7 million TPS.

In terms of architecture, it is similar to other central bank digital currency systems, drawing on the technical characteristics of blockchain and cryptocurrencies.

A centralized transaction structure is adopted because the central bank can provide a strong premise of trust;
Transactions are verified by private key signature;
The user uses the currency through the wallet client;
Referring to the UTXO model, the spent currency will be destroyed, and then a new currency will be created;
Transaction verification and execution are decoupled, making it easier to expand.

The project is still at an early stage and the scenarios under consideration are very limited. The author believes that there is still a long way to go before it can be used on the ground.

Several major open issues at present:

How is identity verification implemented? This still depends on the public-private key mechanism, which can be accelerated by specific hardware.
How to monitor anti-money laundering? This may be handled offline in an extended manner.
How is the audit granularity of identity and transaction data achieved? The main purpose is to allow different roles to see different granularities. This can be achieved through data isolation and encryption mechanisms.
How is the currency issued? It can be directly exchanged to individuals, or it can be authorized by secondary commercial banks (the latter is adopted by the digital renminbi).
How to integrate with the existing financial system? You can go through the transaction gateway, or simply not get through first, and go separately.

Summary

In fact, objectively speaking, under the premise of a centralized architecture, it is not difficult to implement a high-throughput trading system by using the existing software and hardware system. The difficulty is to support complex financial services, multi-transaction associations, and scalability, while taking into account conflicting requirements such as compliance, auditability, and privacy protection. These often require a lot of hands-on experience.

Monday, April 11, 2022

Decentralized Exchange

If you want to exchange between different digital assets, you need to go through intermediary channels such as exchanges.

The traditional exchange is a centralized model, that is, the two parties of the transaction exchange according to the exchange rate through the trading platform provided by the third party, and the trading platform side often needs to collect the handling fee from the transaction. This model is not only costly, but also has the risk of relying too much on the trading platform.

To solve these problems, Decentralized Exchange (DEX) was designed. The initial idea is to allow both parties to exchange directly peer-to-peer through a blockchain-based protocol. Since there is no need to participate in the trading platform, the transaction cost is low, and it can be completed in real time without worrying about security risks. Currently, decentralized exchanges are one of the hottest topics in decentralized finance.

To implement a decentralized exchange, some basic problems need to be solved:

The transaction can be completed automatically without manual participation;
No one can fake or deceive the other party during the transaction;
Calculate the exchange rate automatically and complete the transaction according to the exchange rate;
Avoid excessive market volatility and losses.

At present, decentralized exchanges mainly include three modes according to their order positions: on-chain bookkeeping, off-chain bookkeeping and automatic market makers.

On-chain accounting

The idea of on-chain bookkeeping is very simple, and the exchange transactions are directly stored on the blockchain.

This mode is simple to implement, but has major flaws.

Every transaction needs to be on the chain, and there will be billing fees. When transactions are frequent, the cost of bookkeeping is too high;
All information needs to be recorded on the chain, which may benefit someone from knowing the transaction information in advance;
When there are many transactions, the performance requirements of the blockchain are very high, and most public chains cannot support it.

Platforms adopting this scheme include Stellar and others.

Off-chain accounting

In contrast to on-chain bookkeeping, off-chain bookkeeping stores transactions on a third-party platform. Third-party platforms only write transactions to the blockchain when needed.

This method can avoid writing a large number of transactions to the blockchain, but it needs to rely on a third-party platform, and there is a high security risk.

Platforms that have adopted this solution include Binance and others.

Automated market maker

Similar to market makers in the securities market, smart contracts can be used to implement an automated market maker mechanism (AMM).

When users need to exchange currency, they do not directly trade with other users, but exchange with blockchain smart contracts.

Behind the smart contract, the exchange rate is calculated in real time according to its liquidity pool and pricing algorithm (such as reciprocal curve, straight line, etc.). A small fee is charged per transaction (e.g. Uniswap charges 0.3%).

This mechanism does not need to rely on transaction bookkeeping, transaction costs are generally low, and risks are small.

Users can also put the currency they hold into the liquidity pool according to the protocol and become a Liquidity Provider (LP). Liquidity providers can obtain benefits from transaction fees.

The main problem of this model is that the depth of the market depends on the liquidity pool, and it is necessary to balance the contradiction between LP income and transaction costs. At the same time, when the currency price fluctuates greatly, LP may incur Impermanent Loss.

Typical implementation protocols include Uniswap, Bancor, etc., and platforms include Chainlink, Kyber, etc.

Thursday, April 07, 2022

From Digital Artwork to NFT

Over the past few years, digital artwork has explored the possibility of using NFT (Non Fungible Token), a new digital medium for asset transactions.

In the future, a large number of item transactions can be carried out in the form of NFT. In addition, NFTs can digitize asset ownership, making it easier to realize value more fully.

The emergence of NFT represents the urgent need for traders to switch from paper-based contracts to digital contract-based transactions.

Note: If you want to understand the ins and outs of web 3.0, you can read From web 1.0 to web 3.0.

Digital Art

Digital artwork refers to a set of data with certain artistic value generated by computer technology. Similar to traditional physically created artworks, digital artworks are considered unique and collectible. Since the transactions of digital art are mostly carried out through cryptocurrencies, it is also called CryptoArt.

Back in 1993, Hal Finney (also an early Bitcoin expert) discussed the idea of a "crypto trading card" on the crypto forum Cypherpunks, possibly the earliest discussion of cryptographic artwork and NFTs.

Digital artwork mainly includes the following characteristics:

Based on the blockchain platform, once the generation rules are determined, artworks cannot be issued or modified, nor can they be counterfeited;
More use of cryptocurrency transactions, all records are publicly visible and can be effectively traced;
After the user purchases, the ownership is recorded on the distributed ledger, which cannot be tampered with or faked;
Art transactions are completed directly and immediately, and there is no traditional third party;
Digital artwork itself is not scarce, and even easily copied, but its ownership is unique and recognized by the market.

In 2014, Robby Dermody, Adam Krellenstein, Ouziel Slama and others launched the Counterparty trading platform based on the Bitcoin network. The platform provides peer-to-peer financial transactions through the Metadata Token Protocol, supports the creation of tokens, decentralized asset transactions, and more. In September 2016, the "Rare Pepe" project was launched, becoming an early digital artwork.

On June 23, 2017, Larva Labs launched the CryptoPunks project. The project created 10,000 punk avatars, each as a unique 24x24 8-bit pixmap. Initially, the project was released for free on the Ethereum network, hoping to honor the spirit of punk. Later, with the publicity and participation of enthusiasts, the project attracted the attention of a large number of users and even investment institutions. It is still very active to this day, and the single price is often tens of thousands of dollars. In June 2021, Avatar No. 7523 sold for $11.8 million. CryptoPunks have unique cultural interest as collectibles and are considered to be the beginning of the later trend of encrypted digital art. Since then, Larva Labs has also developed Autoglyphs and Meebits projects, which have also attracted market attention.

On November 28, 2017, the Axiom Zen team (which later incubated Dapper Labs) launched the CryptoKitties game based on Ethereum trading. Each player can buy a digital cat with ether, breed offspring, and sell it. All records are publicly visible on the Ethereum network. Through this game, players can learn to master the basic usage of ether. The game was once very popular. The price of a single digital cat once exceeded 100,000 US dollars, and related transactions accounted for nearly 20% of the transaction traffic of the Ethereum network, causing transaction delays and blockages. The success of the game has also inspired many imitators. The ERC-721 standard that the project follows is widely adopted.

In April 2021, Yuga Labs launched the Bored Ape Yacht Club project on the Ethereum network, which includes 10,000 different ape portraits, generated by computers. Among them, the portrait numbered 8817 was auctioned for a high price of 3.4 million US dollars.

NFT

Although many NFTs are digital artworks at present, the connotation of NFTs is actually more extensive. Anything that can be circulated in the digital world can be considered an NFT. Including paintings, photography, music, books, games, etc.

NFT literally means non-fungible token. Traditional encrypted digital currency is homogeneous (Fungible Token), there is no difference between any two coins, can be replaced with each other, and can often be split into smaller units, such as Bitcoin. NFTs, on the other hand, are unique, cannot be replaced by other NFTs, and often cannot be divided into smaller units. For example, a painting NFT represents the painting itself and cannot be replaced by other NFTs.

At present, NFT products often have the following characteristics:

not interchangeable;
cannot be split into smaller units;
Often only exists.

The non-homogeneous nature of NFT makes it easy to anchor to objects in the physical world, such as real estate, cars, collectibles, etc. The property rights of any real object can be tied to an NFT. Therefore, NFTs are considered to have great potential.

In 2020, with the massive issuance of sovereign currencies around the world, NFTs have begun to be more and more sought after. In 2021, the NFT project has ushered in a big explosion, so 2021 is also called "the first year of NFT" by many people. At present, most NFTs are traded through platforms such as Opensea, Rarible, and Nifty Gateway, and rely on the Ethereum network and IPFS for storage.

The emergence of the NFT idea is very natural, and its earliest prototype can be traced back to the Bitcoin-based ColoredCoin that appeared in 2012. Colored coins have color attributes, and colors can be used to represent different assets. This provides feasibility for real-world assets to be put on the chain.

But the Bitcoin network does not support smart contracts, limiting its expressiveness. The Ethereum network, which was launched in July 2015, strengthened its support for smart contracts, making the emergence of a large number of NFTs a reality. In particular, in September 2017, the ERC-721 specification was officially proposed and became the reference standard for a large number of NFTs based on Ethereum projects. This year, the encrypted cat project was launched, and the concept of non-fungible tokens was officially established. In 2018, the Ethereum community also proposed the ERC-1155 standard that supports batch transactions, which is currently supported by the trading market Rarible.

Sky Mavis developed the game Axie Infinity in 2018, which has since become one of the popular games on the Ethereum network. It supports players to trade virtual pets and land resources through NFT. At the same time, players can obtain points and exchange them by playing games. Some virtual pets cost as much as 300 ether (about $1 million).

In October 2020, Dapper Labs partnered with the NBA to launch the NBA Top Shot game project. It uploads the highlight video clips of a player in the game to the public chain Flow developed by Dapper Labs and makes it as an NFT product. After the NBA Top Shot project was launched, it attracted the participation of a large number of users. The total turnover has exceeded 200 million US dollars, and the NFT price of some products such as player LeBron James's slam dunk video once soared to 400,000 US dollars.

In addition, the British Museum, the Russian Hermitage Museum, etc. have also auctioned NFT products of world famous paintings.

In February 2021, Linkin Park (Linkin Park) band member Mike Shinoda released an NFT music composition on the platform Zora for a whopping $400,000.

In February 2021, digital artist Mike Winkelmann (aka Beeple) created 5,000 digital paintings "Everydays – The First 5000 Days" which took 13 and a half years from May 2007 to create a 316 MB image NFT. , sold at Christie's for a historic price of $69.34 million (42,329 ETH) to cryptocurrency investor Vignesh Sundaresan.

In April 2021, Centrifuge successfully secured a MakerDAO loan using the house as collateral.

In December 2021, the digital artist, codenamed Pak, will include 312,686 digital art collections, "The Merge," sold on digital art auction platform NiftyGateway to 28,983 buyers for a total of $91.8 million. This is also the most expensive NFT work at present.

Advantages of NFT

The use of NFTs for transactions includes the following advantages:

Instant transaction: After the buyer and the seller reach a transaction from the platform and write it into the blockchain, the ownership of the NFT is transferred, and the transaction record is stored on the distributed ledger, which cannot be tampered with;
Not easy to fake: Once the NFT product is confirmed, its transaction history will be completely recorded, and it is difficult for others to fake it;
Improve efficiency: NFT-based transactions are processed automatically through smart contracts, and the processing efficiency is much higher than manual operations;
Reduce costs: The handling fee of NFT platforms is usually much lower than the intermediary fee for real transactions, and reducing transaction costs can also promote the prosperity of the market.

Problems with NFTs

The rapid development of NFT has also led to the emergence of some problems:

Auditing issues for NFTs: Before becoming an NFT and being traded, the platform or auditor needs to confirm the actual ownership of the bound items. Once there is a false property right situation, there needs to be a way to roll back, which puts forward new requirements for the current distributed ledger technology;
The problem of rational return of the market: At present, excellent NFT products are very scarce, resulting in many products being hyped up with inflated prices after they are launched. Excessive prosperity in the early stages of the development of new things often leads to the rapid creation and bursting of bubbles. The trading platform should design a more rational auction mechanism and raise the threshold for participation.
Interconnection between different platforms: NFTs based on different platforms often adopt different standards, and it is difficult to interconnect with each other, which limits the circulation of NFTs in the larger market.

Design and Management of Digital Token

NIST’s February 8301 report, “Blockchain Networks: Token Design and Management Overview,” specifically addressed issues related to digital tokens. There are several interesting topics that are worth thinking about.

The focus of future architecture

According to the mainstream design, the architecture is divided into 5 layers from bottom to top:

Physical layer: Physical hardware. Including servers, network infrastructure, etc.;
Network layer: A network that supports peer-to-peer network communication. Such as the Internet, enterprise network, etc.;
Blockchain layer: The implementation of blockchain-related protocols. Including consensus, storage, smart contracts, etc.;
Integration layer: An integrated application based on blockchain smart contracts. Such as middleware, offline computing storage, etc.;
Interface Layer: User-facing interface. Such as data analysis, client applications, etc.

The bottom layer and the top layer are rich in research and practice, but the integration layer has strong definability and complex functions, and currently faces great challenges. In fact, in the traditional software architecture, the most important is the middleware layer. Only when the middle stability function is powerful can it effectively link the previous and the next.

At present, the industry does not have in-depth research on the middle layer, and there are not many problems in the traditional academic category. Only the open source community such as the Hyperledger community has begun some explorations on the basis of practice.

Token Definition and Classification

Explicitly defines a token as the data that the service can verify the data exchange with. It is classified into Blockchain-Based tokens (such as various NFTs and FTs) and Self-contained tokens (such as JWT).

The mainstream blockchain-based token implementation is the authentication data bound to the account. And accounts can be on-chain or off-chain (in client wallets). The implementation of Token can be native or contract-based; it can be a UTXO model or an account model; it can be split or non-split.

This actually implies that the digital currency will be taken out separately. After all, the digital currency has a greater significance in the currency, and the underlying realization is just the carrier. The current official digital currency is more about digitizing real currency than a certificate of circulation in the digital world that geeks envision.

Token exchange

Mainly through atomic swap (atomic swap). Unmanaged, the exchange is either fully implemented or will fall back to its original state. The implementation is mostly achieved through hash locks and time locks.

Atomic swaps can be within the same chain or across chains.

It relies heavily on the security of the contract.

Cross-chain support

Including two-way exchange or one-way transfer two scenarios. The implementation depends on the intermediate coordination system, or requires the support of contracts or agreements on both sides.

There is no practical solution for this yet, and further development is required.

The Ceiling of Civilization

The law of technological development

Perhaps you have ever thought about what is the process of the development of civilization? Sprout, grow like a tree, and eventually become a towering tree? Or is it just like picking up shellfish in a wild sea, purely by random luck?

In fact, it is more like climbing a tower of civilization. After each climb to a new layer (technological revolution), it takes hundreds or even tens of millions of years to explore and develop on this layer, and then stagnate. Until the opportunity appeared, a few wise men accidentally raised their heads and found the entrance to the upper floor.

Fortunately, over the past million years, we have been climbing upwards, and technology and civilization have also been advancing.

Unfortunately, we may have reached the very top of the Tower of Civilization.

The Illusion of the Tower of Civilization

Some people will say, no, technology is still progressing, and life will be better.

Indeed, with the fuller application of technology, the future of food may be more abundant, the environment may be cleaner, the working hours may be shorter...

However, within the same layer, no matter how the application of technology is deepened, the upper limit is fixed.

The Stone Age has developed for millions of years. Even if it were given millions of years, humans based on stone technology would not be able to build a rocket to fly to the moon. Similarly, we based on fossil energy, even if we develop further, until the sun goes out, we will not be able to leave the Milky Way.

If we don't continue to climb, although it seems that technology is still improving, as we approach the upper limit, the rate of development will become slower and slower.

Metaverse

Why hype the "metaverse"? Why put virtual reality, blockchain, cloud computing and other technologies together to give the virtual world a new name?

On the one hand, the capital market needs new concepts, and on the other hand, it is precisely because the development of modern technology has almost stagnated.

At the beginning of the twentieth century, with the breakthroughs in a series of basic sciences such as quantum mechanics, relativity, and genetic technology, the upper limit of human science and technology broke through to an unprecedented height. ability.

However, perhaps a few keen people have begun to realize that although new technological achievements are still emerging, from the perspective of technology as a whole, human civilization seems to have stagnated.

Where is the entrance to the upper tower? It is currently unknown.

A possible way of thinking is that since the progress of science and technology in reality is too slow, can we use the power of the virtual (digital) world? The biggest advantage of the virtual world is that its time flow rate (frequency) is different from that of the physical world, which allows calculations and experiments to be performed in the virtual world at a speed far exceeding that of the real world.

For example, after a thousand years of development and exploration in the virtual world, it may only be a year in the real world.

Opportunity or trap

So, is the Metaverse really the cure for technology? not necessarily.

Everything has two sides. The virtual world will be a brand new thing, and the current cognitive system will face great challenges. In the metaverse, everyone can be what you want to be and live the life you want.

Just like the world in The Matrix, at that time, how many people can still remember the original intention to return to reality? Will humanity as a whole sink completely and become a walking dead? That would be the biggest tragedy.

Future

There is a theory in astronomy: based on the conditions of a galaxy, the upper limit of the development of its civilization can be inferred. For example, a small galaxy with poor material is difficult to break through no matter how it develops. Innate conditions will lock the possibility of growth.

The solar system is running out of time for us.

Controllable Solutions for Large-scale Systems

One of the fundamental issues facing any system that grows to a certain size is controllability.

At smaller scales, controllability is usually not a bottleneck and can be managed directly through the operating system nodes. However, as the scale gradually increases, the controllability and complexity will become a key constraint on whether the system can be expanded.

To deal with controllability complexity, there are two basic ideas: decoupled control and spontaneous control.

Decoupled Control

The simplest idea is to separate the control plane from the data plane and management plane of the system. The decoupled control plane can construct a unified interface for operation and reuse control logic.

Taking herd of sheep as an example, it would be very difficult to directly control the progress of each sheep, but it would be much simpler to tie a rope to each sheep (decoupled from the control plane) and walk directly on the rope.

This idea was applied to the Internet system, and the Software Defined Network was born; when it was applied to the cloud-native software system, the Service Mesh was born. The same applies to economic and financial systems.

It should be emphasized that controllability is the ability to manage the system in an orderly manner. Therefore, the management of the control plane can be centralized or distributed.

Spontaneous control

Spontaneous control is the opposite of decoupling control: instead of stripping control, it devolves control, so that components in the system have more self-control capabilities.

Take herd of sheep as an example. If through training, each sheep will automatically follow the lead sheep, then you only need to lead the lead sheep to guide the whole flock.

The idea of spontaneous control was applied to the communication system, and the Internet and Web 2.0 were born; when it was applied to encrypted electronic currency, the Bitcoin was born; when it was applied to the economic system, several Nobel Prizes were born.

This idea is likely to be the basis for large-scale intelligent systems (robot clusters) in the future.

The Future of Control Science

The study of cybernetics of small-scale systems has been very mature in the last century. Many modern systems are large-scale systems, and the research on their control modes is imminent.

From Web 1.0 to Web 3.0

The Internet ecology has undergone a transformation from web 1.0 to web 2.0 in the past three decades. Where is the future of web 3.0, it is worth thinking about and exploring.

web 1.0

In the early 1990s, after the invention of the HTTP protocol, various websites sprang up one after another. In the early days, such as AOL and Yahoo, they all created amazing growth miracles.

The characteristics of these websites are that the owner is responsible for providing the content, and the user can only read the content and use the service. In other words, the classic single-production-many-consumption model.

The model of web 1.0 makes the right to speak on the World Wide Web in the hands of a few website service providers, a few decide the mainstream voice, and the rest are the silent majority.

web 2.0

web 1.0 spawned the booming .com bubble. Ten years later, in the fall of 2001, Internet stocks crashed, marking the beginning of the 1.0 model's gradual decline.

People began to discuss a new generation of web models, and after several years of exploration, the 2.0 model gradually became popular. This model begins to support user interaction, that is, users can create and decide content.

In the early days of web 2.0, blogs were used as a typical application, which represented a transition from a traditional user consumption model to a user who could become a content producer at the same time. Further, in the later period, social networking sites around social networking were born, and twitter, facebook, etc. became famous for a while. This pattern continues to this day.

While users can interact and create content, the platform remains firmly in the hands of a handful of internet giants, meaning they can easily channel and control so-called "mainstream" voices.

web 3.0

With the emphasis on privacy protection and the awareness of personal data rights, users are beginning to be dissatisfied with the existing network ecosystem, hoping to get their own interests back through a new generation of network models.

Web 3.0 is pinned on this kind of good hope. At present, the concept of web 3.0 is still in development, and the early exploration mainly hopes to combine distributed ledger technology to allow users to control their own identity information while limiting the sharing of data. Decentralized finance, data storage and trading applications have emerged.

Summarize

From the early web 1.0 to the web 3.0 that is still under discussion, it can be seen that the user's demand for interactive participation is constantly increasing, and at the same time, they hope to have their own control over the data on the network. This will be a huge challenge to the existing Internet ecosystem, and it may take ten or even decades to evolve.

Friday, April 01, 2022

Design a Health Code System That Will Never Crash

Today, when the epidemic is not over, the health code on the mobile phone provides great convenience for quickly identifying risk groups. However, once the back-end system fails to provide query services in a timely manner, users will face the embarrassment of being unable to prove their innocence. So, is it possible to provide reliable services with limited resources?

This article attempts to design a low-cost (average daily cost of $0.15) and a health code service system that will not collapse by using modern information technology.

Disclaimer: All analyses in this article, such as virtual machine prices, are based on public information on the Internet, and the actual situation may be different.

If you want to know how to optimally and quickly detect the new coronavirus, you can refer to "Optimal Grouping Algorithm for Covid-19 Detection".

Demand analysis

First analyze the functional requirements, the system must be able to achieve at least the following two basic operations:

Query: Given an ID number, it can return whether there is a risk in time (such as red and green codes, more complex ones can further return the risk value or location data, which is ignored here);
Updates: Background data can be updated, including user transitions between health and risk status.

In terms of performance requirements, if the service is provided in a province, it needs to support the number of users at the level of 100 million; if it is in a city, the number of users at the level of 10 million is sufficient.

It is assumed here that the province is the unit to support the daily queries of 100 million users. If a user makes an average of 10 queries per day (8 am to 8 pm), the throughput rate is about 24 thousand queries/second (100,000,000*10/12/3600). Assuming the peak is 10 times the average, the maximum throughput the system needs to support is 240 kbps.

The user ID number is 18 bits, and it takes 36 bytes to store in a string, and 8 bytes to store in a number. It is almost negligible relative to the network protocol header.

Assuming that a single request message is 200 bytes, the peak access bandwidth is 2402008 Kbps = 384 Mbps, and a conservative estimate of 400 Mbps bandwidth can be satisfied.

Analysis of the characteristics of the problem

The hardest part of the problem is how to support high-rate queries. Considering that under normal circumstances, the ratio of riskers is small (otherwise there is no need for health code services), this is a typical sparse data query problem.

For similar problems, the usual idea is to store only sparse data (risk person ID number), and if the query cannot be found, it means non-risk person.

Assuming that the risk ratio is 1%, the actual stored data is 1 million ID numbers (about 36 MB in string storage, which is easy to put into memory), which is not a large-scale dataset.

Initial design

Simplest design, storing all risker data directly to the database. Select common open source to implement MySQL, which is easy to implement thousands of queries per second on a single machine. After applying common optimization methods (cache, index, memory engine, etc.), it can achieve more than 10,000 queries per second on a single machine.

A rough estimate is that the overall system requires 24 servers. Taking the rental of public cloud virtual machines as an example, the average daily price of a 4-core 8G virtual machine is less than 1 yuan, and the total price of 24 virtual machines is 24 yuan. 400 Mbps bandwidth costs about 1000 RMB per day.

Overall, the average daily cost is 1024 yuan ($160).

Considering that this service can support hundreds of millions of users, this result is already very good. However, technology workers will not give up if they do not reach the limit. Let us see if we can further optimize it.

Optimization Solution 1: replace the database

First of all, since the query only needs to confirm whether a certain data exists, and the data volume is not large, it has a high tolerance for read and write consistency, so consider replacing it with a NoSQL database such as Redis.

The queries of NoSQL databases on ordinary servers can reach 100 thousand times per second, successfully reducing the number of servers to 3.

The average daily cost of this program is 1003 yuan.

Optimization Solution 2: Use Bloom filter

Bloom Filter is a classic data structure for querying whether an element exists in a collection. It has a case of misjudgment, that is, if the judgment exists, it may not actually exist; but if the judgment does not exist, it must not exist.

This feature is very suitable for judging the situation of risk groups. If the Bloom filter thinks someone is risk-free (in most cases), it must be risk-free; if it is considered risky, it can be confirmed through the database.

When one million pieces of data are stored and the false positive rate is 3%, the Bloom filter only needs a 1 MB memory data structure, and the query performance can easily reach one million times. In fact, many high-performance databases also use this data structure (such as Redis) to improve performance.

At this point, only 1 server is required to handle all queries easily.

The average daily cost is 1001 yuan.

Optimization Solution 3: preset filter

Readers may find that the number of servers has been optimized from 24 to 1, but the overall cost has barely dropped because bandwidth costs account for most of the cost. So, is there room for optimization of network bandwidth?

The most direct idea is to simply download the filter to the client locally. In this way, the vast majority of risk-free users can obtain results through local queries, and only a few risk users and misjudged users need to be sent to the server for processing.

It is estimated that the bandwidth requirement has been reduced to 1% of the original, and the average daily cost has been reduced to 5 yuan.

Optimization Solution 4: use content distribution network

Careful readers may think of another problem. When the preset filter on the client needs to be updated, it will still occupy a lot of network bandwidth. This is where the Content Delivery Network (CDN) comes in.

The content distribution network will deploy multiple edge servers in different geographical locations in advance, and cache the data to be distributed on the edge servers in advance, and the bandwidth cost is lower than that of direct distribution from the core network.

In this way, even if the client needs to update the data every day, it will not be a big problem.

What if you don't even want to pay for this?

The Ultimate Solution: Distributed Network Applications

The essence of the network is nothing more than to transmit data. If clients can directly send data to each other, wouldn't the distribution pressure on the server be greatly reduced? This is exactly what P2P networks are good at.

By allowing the client to support P2P network distribution, the problem of client filter update is perfectly solved.

Further, the server can distribute both the filter and complete data to the client through the P2P network, and the client can query it locally. Exchange storage for computing, and even if the server goes down or the network fails, it will not affect the display of the health code.

More lazy, if you don't even want to develop a client that can do P2P networking, then just encrypt the data and put it in the existing P2P network (IPFS can understand it). The client can download and decrypt the data at any time before updating.

In this way, the server does not even need to provide query services, and the average daily cost is reduced to 1 yuan ($0.15).

Summarize

In conclusion, this article shows the step-by-step evolution of designing a high-performance concurrent query system: by selecting the appropriate database and system optimization, the number of required servers is reduced to 10%. Use Bloom filter to successfully implement a single machine to support massive query requests. The server pressure was reduced by another 1% through the client-side preset filter. Realize rapid update of client data through content distribution network. Combined with P2P technology, the scalability and reliability of the system are greatly improved, and the average daily cost is finally reduced to 1 yuan.

Of course, this article only analyzes possible design schemes from a theoretical point of view, and there must be a lot of work to be done between system design and implementation. In actual production, at least the following aspects need to be considered:

Load balancing: It is assumed that load balancing is performed by presetting the server address on the client.
Security precautions: It is assumed that the data center where the server is located provides a basic security precaution system.
Privacy protection: It is assumed that the client can better protect user privacy through encryption.
......

As the primary productive force, isn't technology just to make everyone's life better and easier?

Optimal Grouping Algorithm for COVID-19 Detection

In response to the outbreak, countries around the world need to test potentially infected people. Due to the relative shortage of detection reagents, how to use as few reagents as possible for detection has become an interesting problem. It is assumed here that the sampling volume is sufficient, and the detection time requirement is not considered.

At present, many countries use the group testing mechanism, that is, the groups to be tested are divided into groups, and the samples in the same group are mixed for testing. If a group is positive, the group is tested further.

A natural question: how to group to save reagents?

Consider first the two extreme cases.

A small number of infected people

Usually, infected people within a group are a minority.

Suppose there are 64 people in the group, and there is only 1 infected person, using the 2, 4, 8, 64 equal division method.

2-point method requires 6 rounds of testing, a total of 2*6=12 times;
4-point method requires 3 rounds to be tested, a total of 4*3=12 times;
8-point method requires 2 rounds to be tested, a total of 8*2=16 times;
64-point method requires 1 round of tests, 64 times in total.

In fact, when there are a large number of people, such as N people, the x-division method is used, and the total number of tests is about y=x*log(x, N). It is easy to know that the function takes a minimum value when x=e.

In this case, an x value near e should be taken, such as 2 or 3, to save reagents. At the same time, the number of people N is best chosen to be a value that divides x.

A large number of infected people

In the second scenario, all 64 people are infected, and the 2, 4, 8, and 64 division method is also used.

2-points method requires 6 rounds to be tested, a total of 2+4+8+16+32+64 = 126 times;
4-point method requires 3 rounds of testing, a total of 4+16+32+64=116 times;
8-point method requires 2 rounds of testing, a total of 8+64=72 times.
64-point method requires 1 round of tests, 64 times in total.

It is easy to find that when the 64-point method is used at this time, the total number of detections is the least.

N people, using the x-point method, the total number of tests is about S=x*\frac{1-x^log(x,N)}{1-x}, which is obviously a decreasing function.

In this case, a larger number of groups should be taken.

Promotion as a general case

If N people are known and the infection rate is p, how much should x be selected at this time?

This problem is a bit more complicated than the above two cases, we adopt the idea of backward push.

First draw the detection route, which is a typical tree structure, as shown in the following figure.

Among them, there are p*N people who are infected (in the worst case they do not have a common parent node), then in the last round, it needs to be checked p*N*x times. Pushing back to the next level, it still needs to detect pNx times. Until the last sample to be detected has a common parent node, starting from this layer, each layer needs to be detected x times.

That is, p*N*x + p*N*x + … x + x + ….

In the worst case, infected samples are separated as far as possible from each other until the second layer, where all samples to be detected have a common parent node. At this time, the total number of detections is S=p*N*x*(log(x, N)-1)+x. At this time, the optimal x value is around 3.

Usually, this formula is accurate when the infection rate p is not high; when p is high, the total number of detections predicted by the formula is higher than the actual value because multiple infected people at the bottom have a common parent node. Therefore, the consensus obtained is the upper limit of the required number of reagents.

Applied to real life

Assuming that the resident population of a city is N=20000000 (20 million), and the infection rate is p=0.0001 (one ten thousandth), then insert the formula to get

S=2000*x*(log(x,20000000)-1)+x, it is easy to find that when x=2.91 (which can be rounded to 3), the minimum value of S is 85782, and the number of detection rounds is 16 rounds.

Therefore, if the optimal detection algorithm is adopted, the number of reagents used in the city accounts for only 0.43% of the overall population to be tested, which can save a lot of reagents. The cost is the need for multiple rounds of testing.

In practice, only the first few rounds can be grouped to achieve a balance between the optimal number of reagents and the detection time. For example, the detection must be completed within a given time, and the number of detection packets at each layer can be adjusted at this time.

ps, interestingly, no matter what value p takes, the final optimal x will never exceed 4.

Pages

Thursday, December 01, 2022

Transfer Entropy

Draw the Boundary

Introverted

Tuesday, November 08, 2022

Monday, August 15, 2022

What is System Thinking?

Why master systems thinking?

How to Master System Thinking

Tuesday, July 12, 2022

Thursday, April 21, 2022

Hamilton Project

Summary

Monday, April 11, 2022

On-chain accounting

Off-chain accounting

Automated market maker

Thursday, April 07, 2022

Digital Art

NFT

Advantages of NFT

Problems with NFTs

The focus of future architecture

Token Definition and Classification

Token exchange

Cross-chain support

The law of technological development

The Illusion of the Tower of Civilization

Metaverse

Opportunity or trap

Future

Decoupled Control

Spontaneous control

The Future of Control Science

web 1.0

web 2.0

web 3.0

Summarize

Friday, April 01, 2022

Demand analysis

Analysis of the characteristics of the problem

Initial design

Optimization Solution 1: replace the database

Optimization Solution 2: Use Bloom filter

Optimization Solution 3: preset filter

Optimization Solution 4: use content distribution network

The Ultimate Solution: Distributed Network Applications

Summarize

A small number of infected people

A large number of infected people

Promotion as a general case

Applied to real life