Download bitcoin core and download the blockchain. You end up with a bunch of files blk0000.dat files. These files contain the blocks, each file contains up to 128MiB (mebibyte is 2^20 bytes) of data.
Lets read the first one:
Block 0 (Genesis block) | Block 1 |
f9 be b4 d9 1d 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a 29 ab 5f 49 ff ff 00 1d 1d ac 2b 7c 01 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f ac 00 00 00 00 | f9 be b4 d9 d7 00 00 00 01 00 00 00 6f e2 8c 0a b6 f1 b3 72 c1 a6 a2 46 ae 63 f7 4f 93 1e 83 65 e1 5a 08 9c 68 d6 19 00 00 00 00 00 98 20 51 fd 1e 4b a7 44 bb be 68 0e 1f ee 14 67 7b a1 a3 c3 54 0b f7 b1 cd b6 06 e8 57 23 3e 0e 61 bc 66 49 ff ff 00 1d 01 e3 62 99 01 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 07 04 ff ff 00 1d 01 04 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 96 b5 38 e8 53 51 9c 72 6a 2c 91 e6 1e c1 16 00 ae 13 90 81 3a 62 7c 66 fb 8b e7 94 7b e6 3c 52 da 75 89 37 95 15 d4 e0 a6 04 f8 14 17 81 e6 22 94 72 11 66 bf 62 1e 73 a8 2c bf 23 42 c8 58 ee ac 00 00 00 00 |
format of a block: <magic (4 bytes)><size (4 bytes)><header (80 bytes)><tx count (varint)><tx data (variable)>
Block 0 (Genesis block)
f9 be b4 d9 – Network magic, always the same marks the start of a block
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a 29 ab 5f 49 ff ff 00 1d 1d ac 2b 7c – The header of the block is fixed at 80 bytes in the form <version (4 bytes)><previous block (32 bytes)><merkel root (32 bytes)><timestamp (4 bytes)><bits (4 bytes)><nonce (4 bytes)>
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 – previous block (this is the very first block)
3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a 29 ab 5f 49 ff ff 00 1d 1d ac 2b 7c – this is the merkle root of the transactions. Transactions in the block is the big yellow section in the above block. In this case there is only one transaction (the coinbase transaction) so the merkle root is simply sha256(sha256(transaction)) as shown below in python
>>> r2 = “01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f ac 00 00 00 00”
>>> rr2=r2.replace(” “,””)
>>> rr2
‘01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a01000000434104678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5fac00000000’
>>> rrr2 = bytes.fromhex(rr2)
>>> hashlib.sha256(rrr2).digest().hex()
‘27362e66e032c731c1c8519f43063fe0e5d070db1c0c3552bb04afa18a31c6bf’
>>> hashlib.sha256(hashlib.sha256(rrr2).digest()).digest().hex()
‘3ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a’
01 – number of transactions is 01 in this case. This is a variable length integer field. Since it is <=0xfc it is interpereted as a number so there are simply 01 transactions (or one transaction) in the block
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f ac 00 00 00 00 – Transactions form the remaining part of the block, in the form <version (4 bytes)><inputs><outputs><locktime (4 bytes)>
01 00 00 00 – Version, as far as I know this is always just the number 1 in little endian fixed as 4 bytes
01 – This is another variable integer to show the number of ‘ins’ in the transaction. In this case since it is <=0xfc it is understood as the number one, there are 1 input transaction here
Inputs are of the form <prev transaction id><prev tx index><script length for scriptsig><scriptsig><sequence>
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 – previous transaction id
ff ff ff ff – previous transaction index
4d – variable length byte. 4d = 77 bytes. The next 77 bytes are the scriptsig
04 ff ff 00 1d 01 04 45 54 68 65 20 54 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 – Scriptsig
04 – next 4 bytes are an element
ff ff 00 1d – element
01 – next byte is element
04 – element
45 – next 69 (4*16+5) bytes are element
54 68 65 20 54 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 – element
ff ff ff ff – sequence
01 – Now we get to the outputs. This first part is a variable int to show number of ‘outs’, in this case there is one out only since it is <=0xfc
Outputs are of the form <amount><script pubkey length><scriptpubkey>
00 f2 05 2a 01 00 00 00 – 000000012a05f200=5000000000 Satoshis = 50 BTC
43 – length of scriptpubkey = 67 bytes. i.e. the next 67 bytes are scriptpubkey
41 04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f ac – scriptpubkey
41 – the next (4*16+1=)65 bytes are an element
04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f – the public key to pay to
ac – OP_CHECKSIG
00 00 00 00 – locktime – always 4 bytes of 0. This was meant to be for future payment validations scheme but it not used since its insecure.
Block 1
f9 be b4 d9 – Network magic, always the same marks the start of a block
01 00 00 00 6f e2 8c 0a b6 f1 b3 72 c1 a6 a2 46 ae 63 f7 4f 93 1e 83 65 e1 5a 08 9c 68 d6 19 00 00 00 00 00 98 20 51 fd 1e 4b a7 44 bb be 68 0e 1f ee 14 67 7b a1 a3 c3 54 0b f7 b1 cd b6 06 e8 57 23 3e 0e 61 bc 66 49 ff ff 00 1d 01 e3 62 99 – The header of the block is fixed at 80 bytes in the form <version (4 bytes)><previous block hash (32 bytes)><merkel root (32 bytes)><timestamp (4 bytes)><bits (4 bytes)><nonce (4 bytes)>
01 00 00 00 – version
6f e2 8c 0a b6 f1 b3 72 c1 a6 a2 46 ae 63 f7 4f 93 1e 83 65 e1 5a 08 9c 68 d6 19 00 00 00 00 00 – previous block hash calculated as follows:
>>>r = “01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a 29 ab 5f 49 ff ff 00 1d 1d ac 2b 7c”
rr=r.replace(” “,””)
>>> rr
‘0100000000000000000000000000000000000000000000000000000000000000000000003ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a29ab5f49ffff001d1dac2b7c’
>>> rrr = bytes.fromhex(rr)
rrr
b’\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00;\xa3\xed\xfdz{\x12\xb2z\xc7,>gv\x8fa\x7f\xc8\x1b\xc3\x88\x8aQ2:\x9f\xb8\xaaK\x1e^J)\xab_I\xff\xff\x00\x1d\x1d\xac+|’
>>> import hashlib
>>> hashlib.sha256(rrr).digest().hex()
‘af42031e805ff493a07341e2f74ff58149d22ab9ba19f61343e2c86c71c5d66d’
>>>hashlib.sha256(hashlib.sha256(rrr).digest()).digest().hex()
‘6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000‘98 20 51 fd 1e 4b a7 44 bb be 68 0e 1f ee 14 67 7b a1 a3 c3 54 0b f7 b1 cd b6 06 e8 57 23 3e 0e – Merkel root of transactions calculated as follows:
>>> r = “01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 07 04 ff ff 00 1d 01 04 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 96 b5 38 e8 53 51 9c 72 6a 2c 91 e6 1e c1 16 00 ae 13 90 81 3a 62 7c 66 fb 8b e7 94 7b e6 3c 52 da 75 89 37 95 15 d4 e0 a6 04 f8 14 17 81 e6 22 94 72 11 66 bf 62 1e 73 a8 2c bf 23 42 c8 58 ee ac 00 00 00 00”
>>> rr=r.replace(” “,””)
>>> rrr = bytes.fromhex(rr)
>>> hashlib.sha256(hashlib.sha256(rrr).digest()).digest().hex()
‘982051fd1e4ba744bbbe680e1fee14677ba1a3c3540bf7b1cdb606e857233e0e’61 bc 66 49 – timestamp
ff ff 00 1d – bits. Difficulty is calculated as follows:
>>> a=0x00ffff
>>> b=0x1d
>>> target = a2(8(b-3))
>>> target.to_bytes(32, ‘big’).hex().encode()
b’00000000ffff0000000000000000000000000000000000000000000000000000′if hash is less than this, it is accepted
01 e3 62 99 – nonce
01 – number of transactions is 01 in this case. This is a variable length integer field. Since it is <=0xfc it is interpereted as a number so there are simply 01 transactions (or one transaction) in the block
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 07 04 ff ff 00 1d 01 04 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 96 b5 38 e8 53 51 9c 72 6a 2c 91 e6 1e c1 16 00 ae 13 90 81 3a 62 7c 66 fb 8b e7 94 7b e6 3c 52 da 75 89 37 95 15 d4 e0 a6 04 f8 14 17 81 e6 22 94 72 11 66 bf 62 1e 73 a8 2c bf 23 42 c8 58 ee ac 00 00 00 00 – Transactions
01 00 00 00 – version
01 – Number of ins
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 – prev tx id
ff ff ff ff – prev tx index
07 – the next 7 bytes are scriptsig
04 ff ff 00 1d 01 04 – Scriptsig
ff ff ff ff – sequence
01 – num outs
00 f2 05 2a 01 00 00 00 – out amount 50 BTC
43 – length of scriptpubkey
41 04 96 b5 38 e8 53 51 9c 72 6a 2c 91 e6 1e c1 16 00 ae 13 90 81 3a 62 7c 66 fb 8b e7 94 7b e6 3c 52 da 75 89 37 95 15 d4 e0 a6 04 f8 14 17 81 e6 22 94 72 11 66 bf 62 1e 73 a8 2c bf 23 42 c8 58 ee ac – scriptpubkey (address followed by ac = op_checksig)
00 00 00 00 – locktime
Future work:
- How is a block mined? (what does it mean, the ‘proof of work’ simply slows down the network to provide everyone an opportunity to form concensus?)
- How to submit a transaction to the network?
- Investigate some ‘interesting’ transactions, unlike above (python code to parse the above blocks)
- more