YSC4230 Lecture 5

Arrays

How to represent arrays?

  • static
    • array contains direct address
  • dynamic
    • array contains indirect address

arrays are just collections of address pointers.

Multi-dimensional arrays

We need to know the size of these.

OCaml, Java have a size of array, during runtime (Waste more space).

movq -8(%rax) %rdx // load size, %rax is pointing at A[0] so we need to go back a word (-8).
cmpq %rdx %rcx // compare idx to bound
j l __ok // jump if ok
callq __err_oob

__ok:
  movq (%rax, %rcx, 8) dest

Strings

C strings lie in text segment, so they are read only, so they can be shared.

Tagged Datatypes

Datatypes may be recursive, in that case we can use pointer, because it is known size.

In implementations which have length stored allow us to have O(1) access.

How to compile switch?

break instructions compiled to a jump to the next label after the switch blk.

%tag = ⟦e⟧
br label %l1
l1: %cmp1 = icmp eq %tag, $tag1
br %cmp1 label %b1, label %l
b1: ⟦s1⟧
br label %l2
l2: %cmp2 = icmp eq %tag, $tag2
br %cmp2 label %b2, label %l
b2: ⟦s2⟧
br label %l
lN: %cmpN = icmp eq %tag, $tagN
br %cmpN label %bN, label %merg
bN: ⟦sN⟧
br label %merg
merge:

Pattern matching

Bind variables. Patterns can nest.

Strategy:

  • Flatten nested patterns into matches against one constructor at a time.
  • Compile match against the tags of the datatype for C-style switches.
  • Code for each branch must copy data from e to variables bound in the patterns.

LLVM Datatypes

LLVM IR: uses types to describe the structure of data.

data t = void
       | i1 | i8 | i64 
       | [<#elts> x t]
       | fty
       | {t1, t2, ...}
       | t* (pointers)
       | %Tident (named type)

array types are monomorphic.

getelementptr

Type Casting

bitcast

Regular Expressions

  • ɛ, empty string
  • ‘a’, ordinary character
  • R1 | R2, alternatives, stands for choice of R1 or R2
  • R1R2, concatenation, stands for R1, followed by R2.
  • R*, Kleene star, stands for zero or more repetitions of R

Extensions

  • “foo”, a string, requivalnet to ‘f’ ‘o’ ‘o’
  • R+, stands for one or more repetitions of R, equivalent to RR*
  • R?, Zero or one occurrences of R, equivalent to (E|R)
  • [‘a’-‘z’] One of a or b or c or … z, equivalent to (a|b|…|z)
  • [^‘0’-‘9’] Any char except 0 through 9 (expensive)
  • R as x Name the string matched by R as x.