[SIMP] String standardization

Simple Summary

A standard struct representing strings for consistent usage between contracts & standardization of string interfacing with Dapps.

Abstract

The use of strings longer than 31 characters is required for many applications, mostly NFTs, as storing URIs and/or IPFS hashes most often require long strings. Of course, appending several short strings (31 character strings) would allow for longer strings but on-chain usage becomes extremely difficult and inefficient.

This proposal aims at standardizing the string struct in the same way uint256 has been redefined.

In memory (while using the contracts), strings are represented as character arrays. In storage, they are stored as short strings to optimise memory cell space.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”,

and “OPTIONAL” in this document are to be interpreted as described in RFC-2119.

string struct

This struct is designed to store strings in memory while executing contracts. It stores the string’s length (string.len) as well as a felt pointer representing the string’s content as a single-character array (string.data) e.g. “Hello” would be represented as string.len = 5 and string.data = [0x48, 0x65, 0x6c, 0x6c, 0x6f] or string.data = ['H', 'e', 'l', 'l', 'o'].

struct string:
  member len : felt
  member data : felt*
end

String storage

Most projects will need to be able to store and edit several strings. Using a single storage variable for each string would be impossible, hence using a key-string storage system would be a better choice, where each string is stored and accessed using a unique string_id.

Using character arrays in memory makes it easy to use strings but storing them as such is highly inefficient. To store strings, they SHOULD be stored as an array of short strings.

Two storage variables are then needed:

  • string_ss_len(string_id : felt) -> (ss_len : felt): The number of short strings used for the whole string.
  • strings_data(string_id : felt, index : felt) -> (ss : felt): Each part of the string, indexed from 0 to ss_len-1.

In-contract string usage

Every function using strings as arguments MUST expect a string struct.

Every function returning a string for use in-contract MUST return a string struct.

External contract interaction

As short strings are encoding strings in ASCII, it is RECOMMENDED to use ASCII encoding for strings. And specify the encoding used otherwise.

Sending and receiving strings from contracts in Dapps will follow the same principle as the string struct.

Strings MUST be returned as arrays of single characters and functions MUST expect arrays of single characters.

String usage

Here are short examples of interacting with strings in Dapps:

Return a string from a contract:

%lang starknet

from starkware.cairo.common.alloc import alloc

@view
func send_hello{syscall_ptr : felt*}() -> (str_len : felt, str : felt*):
  let str_len = 5
  let (str) = alloc()
  assert str[0] = 'H'
  assert str[1] = 'e'
  assert str[2] = 'l'
  assert str[3] = 'l'
  assert str[4] = 'o'

  return (str_len, str)
end
from cairopen.utils import felt_arr_to_str

async def receive():
  execution_info = await string_contract.send_hello().call()
  hello = execution_info.result.str

  print(len(hello))
  print(hello)
  print(felt_arr_to_str(hello))

# Prints:
# 5
# [ 72, 101, 108, 108, 111 ] or [ 0x48, 0x65, 0x6c, 0x6c, 0x6f ]
# Hello

Send a string to a contract and write it in storage:

from cairopen.utils import str_to_felt_arr

async def send_hello():
  hello = str_to_felt_arr("Hello")
  print(len(hello))
  print(hello)

  await string_contract.receive(str_to_felt_arr("Hello")).invoke()

# Prints:
# 5
# [ 72, 101, 108, 108, 111 ] or [ 0x48, 0x65, 0x6c, 0x6c, 0x6f ]
%lang starknet

from starkware.cairo.common.cairo_builtins import HashBuiltin

from cairopen.string.string import String
from cairopen.string.type import string

@external
func write{syscall_ptr : felt*, pedersen_ptr : HashBuiltin*, range_check_ptr}(str_len : felt, str : felt*):
  let _str = string(str_len, str)

  String.write('hello', _str) # To use the string later on, use String.read('hello')

  return ()
end

Read a string from storage:

%lang starknet

from starkware.cairo.common.cairo_builtins import HashBuiltin, BitwiseBuiltin

from cairopen.string.string import String

@view
func read{
    syscall_ptr : felt*, pedersen_ptr : HashBuiltin*, bitwise_ptr : BitwiseBuiltin*, range_check_ptr
}() -> (str_len : felt, str : felt*):
  let (str) = String.read('hello')
  return (str.len, str.data)
end
from cairopen.utils import felt_arr_to_str

async def read():
  execution_info = await string_contract.read().call()
  hello = execution_info.result.str

  print(len(hello))
  print(hello)
  print(felt_arr_to_str(hello))

# Prints:
# 5
# [ 72, 101, 108, 108, 111 ] or [ 0x48, 0x65, 0x6c, 0x6c, 0x6f ]
# Hello

We developped and documented a full Cairo library for string storage and usage, available for review here: cairopen-contracts/src/cairopen/string at main · CairOpen/cairopen-contracts · GitHub

17 Likes

Hey,

Could you elaborate or give an example about how to store the Strings?
I’d like to be sure I fully understand that part before stating an opinion :slightly_smiling_face:

G.

4 Likes

Hey,

Sure :slight_smile: We have a complete implementation here: cairopen-contracts/storage.cairo at main · CairOpen/cairopen-contracts · GitHub

with documentation here: cairopen-contracts/README.md at main · CairOpen/cairopen-contracts · GitHub

But in essence, during runtime strings are, as described here, represented as character arrays. When written in storage we convert the string into short strings to optimise storage space (i.e. fill memory cells as much as possible)

For example “Hello” during runtime is represented as ['H', 'e', 'l', 'l', 'o'] to facilitate character indexing, manipulation, etc. When stored, it becomes a whole short string ‘Hello’ or 0x48656c6c6f in hex and 310939249775 in decimal. As there is only one short string (“Hello” < 31 characters) the short string length stored is 1, hence:

strings_len.read('my_string') = 1
strings_data.read('my_string', 0) = 'Hello'

For a bigger string for example

“This string is longer than thirty-one characters” we would get ['T', 'h', 'i', 's', ' ', 's', 't', 'r', 'i', 'n', ' ', 'i', 's', ' ', 'l', 'o', 'n', 'g', 'e', 'r', ' ', 't', 'h', 'a', 'n', ' ', 't', 'h', 'i', 'r', 't', 'y', '-', 'o', 'n', 'e', ' ', 'c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's'] during runtime and when stored we would get:

strings_len.read('my_other_string') = 2 # as it takes 2 31-character short strings to store a 49-character string
strings_data.read('my_other_string', 0) = 'This string is longer than thir' # 31 chars
strings_data.read('my_other_string', 1) = 'rty-one characters' # 18 chars

Reading the string from storage we do it the other way around. We cut the short strings into individual characters to get the char array.

To allow storage of several strings, each string is accessed through a unique id (e.g. here, ‘my_string’ and ‘my_other_string’). We advise using short strings to comprehensively access strings and avoid conflicts. You could imagine for ERC721 URIs for example 'URI' + tokenId to have a unique string key for each token URI and avoid conflict with potential other strings.

I hope I was clear enough :sweat_smile:

6 Likes

If the string that needs to be stored is just a regular long string, then I’d rather use an incremental index rather than a string as an id that a dev would have to hard code.
If I think about anything else I’ll add it, so appart from that it looks good :slight_smile:
Will probably already use it whenever I encounter string in my projects!
G.

5 Likes

Hey, I have two suggestions:

  1. I’d rather see the name String to indicate it’s a non-native type. IMO it’s more transparent, follows the convention of using capitalized naming for structs and future-proof (if Cairo will ever get native string support, it’s feasible they’d choose the name string).

  2. I think there might be an elegant way to combine namespaces and encodings. You could use the encoding name as the name for the namespace (from cairopen.Ascii import String). That way, you can open up to having more encodings (UTF-8, 7-bit ASCII) using the same API.

5 Likes

Hey,

  1. I understand your point of naming the type String (it was my approach at first) but then the word String would be reserved and I wouldn’t be able to use the namespace. Do you suggest another name for the namespace ?

  2. I really like the idea of having namespaces based on encoding :slight_smile:. In the end, most of the functions under the namespace would be the same but for storing and other character-related functions that would definitely be helpful. I think we could create a namespace StringUtil with all encoding-agnostic functions and then, as you suggested, a namespace for each encoding. Would it make sense to name to encoding-related namespaces named after the encodings themselves (e.g. from cairopen.string.ASCII import StringAscii) ? This way one could use different encodings in the same contract (for whatever reason) without any conflicts like

from cairopen.string.ASCII import String
from cairopen.string.UTF_8 import String

Also, going with this option would free up the String name for the type :slight_smile:

4 Likes

Cairo supports import renaming (from foo import bar as baz), so that would also be an option for users to resolve the conflict when importing from different encodings.

I never really used it and Cairo is somewhat fussy with all these imports and silent conflicts, so I’m not sure if there aren’t some hidden dragons with this approach. Just mentioning it if you want to do some testing :slight_smile:

3 Likes

Oh yes, I forgot about import renaming. I will try this and update the lib.

I will write tests to try import renaming as well :slight_smile:

2 Likes

Ok,

After a pretty good upgrade, here is how it goes:

Type String

The String type is now capitalized to follow non-native type guidelines.

Namespace StringUtil

Imported using from cairopen.string.utils import StringUtil

Comprised of codec-agnostic utility functions such as append a character at the end, concatenate strings, etc.

Codec namespaces StringCodec

Imported using from cairopen.string.<codec> import StringCodec

For now only the ASCII encoding exists (default encoding when using short strings) using from cairopen.string.ASCII import StringCodec

Comprised of all codec-dependant functions such as convert a felt into its string value (e.g. 12345 → “12345”), convert short strings into strings (e.g. ‘Hello’ → “Hello”), read/write to storage, etc.

Import renaming works well so in the future one could use

from cairopen.string.ASCII import StringCodec as ASCII
from cairopen.string.UTF_8 import StringCodec as UTF_8

The doc has also been updated to reflect this modifications: cairopen-contracts/README.md at main · CairOpen/cairopen-contracts · GitHub

4 Likes

Hey, thanks for starting this. I don’t know why there is no string native type in Cairo though.

Just a simple question : have you considered simply copying the design used in solidity instead ? It’s similar but not exactly this

1 Like

Hi,

Could you elaborate on what are those differences?