Parsing Ethereum transaction data with Ethers-RS


Introduction

If you've used Ethereum before, you'll have no doubt noticed the strange data attached to contract calls. In your wallet, it's probably named something along the lines of "hex" or "hex data". On etherscan, it's called "input data" and can be found at the very bottom of the transaction details page.

Etherscan input data field

By default, etherscan displays the data as shown below:

Function: buyShares(address sharesSubject, uint256 amount)

MethodID: 0x6945b123
[0]:  0000000000000000000000006ca979b89edda4b0c6ee4e290636eb752c9dd1ae
[1]:  0000000000000000000000000000000000000000000000000000000000000001

Looks spooky, but fear not, as this article should help demystify call data!

Huge thanks to Ali Ashar from frontal.io for their "Guide to decode and analyze Ethereum Transactions" blogpost. If you want a more in-depth understanding of what we will look at in this article, make sure to read Ali's guide.

Getting transaction data using the JSON-RPC-API

Every Ethereum client implements the JSON-RPC spec. Whilst I won't be going into detail on what this means, all you need to know is that we can request on-chain data via any RPC. For instance, if we need to get information on a specific transaction using its hash, we can simply POST the following to any RPC:

curl --location 'https://base-mainnet.public.blastapi.io' \
--header 'Content-Type: application/json' \
--data '{
    "jsonrpc": "2.0",
    "method": "eth_getTransactionByHash",
    "params": [
        "0x311ea59a9cfd3a15c17d185bf1d2a7341c05960e2a8df30c1e2a9fe3925f402a"
    ],
    "id": 1
}'

Note that this is a public RPC for Base.

The response is as follows:

{
    "jsonrpc": "2.0",
    "id": 1,
    "result": {
        "blockHash": "0x32ee186fcd6c16a4a54890f195db35fa0f0b948446b07b1cb8f0b7780ce6643c",
        "blockNumber": "0x332d74",
        "from": "0x6ca979b89edda4b0c6ee4e290636eb752c9dd1ae",
        "gas": "0x1453c",
        "gasPrice": "0x5fcc7098",
        "maxFeePerGas": "0x66e7fed0",
        "maxPriorityFeePerGas": "0x59682f00",
        "hash": "0x311ea59a9cfd3a15c17d185bf1d2a7341c05960e2a8df30c1e2a9fe3925f402a",
        "input": "0x6945b1230000000000000000000000006ca979b89edda4b0c6ee4e290636eb752c9dd1ae0000000000000000000000000000000000000000000000000000000000000001",
        "nonce": "0x0",
        "to": "0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4",
        "transactionIndex": "0x5",
        "value": "0x0",
        "type": "0x2",
        "accessList": [],
        "chainId": "0x2105",
        "v": "0x1",
        "r": "0xc8132e17df3d7925745fc4451e485b4e2fb9a9c628df76eb0a55bc4950648dd3",
        "s": "0xe9cd861c7d7555efd793b0b7f4bc91780605b9c206a8c750bedf9380c729072"
    }
}

Awesome! The "input" field is the equivalent of "hex" or "input data" we discussed earlier; but what can we do with this? How does etherscan decode this into the info shown below? It just looks like a bunch of hieroglyphics...

Etherescan decoded transaction data showing shareSubject and amount fields in a chart

Understanding "input"

As implied, the input data contains extra information related to a transaction. This includes the function called and the parameters passed into the call. Let's lay out the basics:

  • The data is in 32-byte format and encoded into hex where each byte is 2 hex characters. That is to say, each chunk of data is 64 characters long in the encoded version.

  • We can ignore and remove the 0x (first 2 characters) as it does not play a part in parsing the data.

Moving forward, we will be using this transaction for demonstration purposes.

Okay, now let's take a closer look at the data itself. Here is some sample data from our example transaction:

0x6945b1230000000000000000000000006ca979b89edda4b0c6ee4e290636eb752c9dd1ae0000000000000000000000000000000000000000000000000000000000000001

After removing the 0x, you may have noticed that the data is 136 characters long. This doesn't make sense; I've just told you the data is divided into 64-character chunks after all.

The first 8 hex characters (4 bytes) of input data are generated by taking the method name and its arguments (types only, argument names are not included), removing all whitespace, taking the keccak hash of this String and converting the first 4 bytes of the hash to hex.

buyShares(address,uint256)
-> Hash to Keccak
-> Take first 4 bytes of Hash
-> Convert to hex
Result = "6945b123"

These characters are not contract-unique. Every ERC721 contract, for example, will have a set of hashes that are the same. If you have a transaction's call data but not the contract's source nor its ABI, you can check what the function might be by inputting those first 8 characters into https://www.4byte.directory/.

Ok, now let's remove the method identifier, "6945b123", from our call data. We are left with a String of length 128. Perfect! That's two 32-byte parameters, representing the input data for our function. The first 32 bytes (64 characters) represent an address since that is buyShare's first parameter, and the final 32 bytes represent a uint256.

💡
Note: Ethereum addresses are 20-byte hashes. They are commonly packed into 32-bytes by adding twelve 0s before the address begins.

Thus, using the contract's ABI or source code, we can fully parse the call data. In our ABI, the buyShares() method is defined as follows:

{
    "inputs": [
      {
        "internalType": "address",
        "name": "sharesSubject",
        "type": "address"
      },
      {
        "internalType": "uint256",
        "name": "amount",
        "type": "uint256"
      }
    ],
    "name": "buyShares",
    "outputs": [],
    "stateMutability": "payable",
    "type": "function"
  }

Putting 2 and 2 together gives the following:

ParameterNameBytes
addresssharesSubject0000000000000000000000006ca979b89edda4b0c6ee4e290636eb752c9dd1ae
uint256amount0000000000000000000000000000000000000000000000000000000000000001

To unpack the address, replace the 12 leading bytes (24 leading 0s) with 0x

Using Ethers-RS

Whilst it's possible to whip our own algorithm to parse input data with our newfound knowledge, there is no need for us to do so! Ethers-RS provides us with all the tooling necessary to parse data.

First off, you're going to need the contract's ABI. If the contract is verified on Etherscan, you can get it from there.

Using ethers' Abigen, we can run the following:

use ethers_contract_abigen::Abigen;

Abigen::new("SampleContract", "sampleAbi.json")?.generate()?.write_to_file("sample.rs")?;

This will create a file named "sample.rs" containing almost everything we could ever need related to our contract based on the ABI we've provided.

When we have input data we need to parse from this specific contract, we can decode it as follows:

use crate::contract::{SampleContractCalls};

let data = SampleContractCalls::decode(input_data)?;

This returns an Enum of type SampleContractCalls containing a struct with the parsed data.

The End

And... That's really all there is to it!

I hope you enjoyed this article and found it useful. As always, any feedback is appreciated! Take care.

File:Thats all folks.svg - Wikimedia Commons