Benchmarking reading binary values from a file with F#, Python, Julia, R, Go and OCaml

In one of my recent posts I showed some F# code for storing and reading integers as binary data. Since then I've created different versions of the reading part in Python, Julia, Go, OCaml and R to be able to compare the performance of the different languages on this simple task and to get a feeling on what it's like to program in Julia, Go and OCaml as I hadn't created any programs in these languages yet.

Below table shows the benchmark results for reading 10 times 10000 values and 10000 times 10 values with links to the source code files in the different programming languages:

Language 10 times 10000 values 10000 times 10 values
F# 6 seconds 20 seconds
Python 26 seconds 40 seconds
Julia 45 seconds 72 seconds
Go 8 seconds 25 seconds
OCaml 2.5 seconds 48 seconds
R 110 seconds NA

The overall fastest version is the one written in F# but note that it's also the version I have tweaked the most. As I'm not very experienced in most of the languages so any performance tips are welcome. Note that I tried using memory mapped files in .NET and Python this improved performance when querying lots of values from the same file but also made it worse in other cases.

The implementation of the functionality is most of the times rather similar in the different languages. Some notable differences where:

  • Julia apparently doesn't have a null value so I refrained from checking whether the read integer value was equal to the int32 minimum value (-2147483648).
  • In Go converting the bytes to integers was faster with a custom function.
  • I didn't find a function in the OCaml Core library to convert bytes to a 32-bit integer, but luckily I found one on Stack Overflow.

F#:
open System
open System.IO

let readValue (reader:BinaryReader) cellIndex = 
    reader.BaseStream.Seek(int64 (cellIndex*4), SeekOrigin.Begin) |> ignore
    match reader.ReadInt32() with
    | Int32.MinValue -> None
    | v -> Some(v)
        
let readValues indices fileName = 
    use reader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
    let values = Array.map (readValue reader) indices
    values

Python:
def read_values(filename, indices):
    # indices are sorted and unique
    values = []
    with open(filename, 'rb') as f:
        for index in indices:
            f.seek(index*4L, os.SEEK_SET)
            b = f.read(4)
            v = struct.unpack("@i", b)[0]
            if v == -2147483648:
                v = None
            values.append(v)
    return values

Julia:
function readvalue(stream, position)
    seek(stream, position)
    return read(stream, Int32)
end

function readvalues(filename::String, indices)
    stream = open(filename, "r")
    try
        return Int32[readvalue(stream, index*4) for index in indices]
    finally
        close(stream)
    end
end

Go:
import ("os")
func bytes2int(b []byte) int32 {
    v := int32(0)
    for  i := 0; i < 4; i++ {
        v = v | (int32(b[i]) << (uint(8*i)))
    }
    return v
}

func readValues(indices []int, filename string) []int32 {
    results := make([]int32, len(indices))
    b := make([]byte, 4)
    f,_ := os.Open(filename)
    for i, cellIndex := range indices {
        f.Seek(int64(cellIndex*4), os.SEEK_SET)
        f.Read(b)
        value := bytes2int(b) // around 10-20% faster then binary.Read
        if value != -2147483648 {
            results[i] = value
        } else {
            results[i] = 99999
        }
    }
    return results
}

OCaml:
let input_le_int32 inchannel = (* http://stackoverflow.com/a/6031286/477367 *)
  let res = ref 0l in
    for i = 0 to 3 do
      let byte = input_byte inchannel in
        res := Int32.logor !res (Int32.shift_left (Int32.of_int byte) (8*i))
    done;

    match !res with
      | -2147483648l -> None
      | v -> Some(v)

let readvalue inchannel index =
  seek_in inchannel (index*4);
  input_le_int32 inchannel

let readvalues (indices:int array) filename =
  let inchannel = open_in_bin filename in
    try
      let result = Array.map (readvalue inchannel) indices in
        close_in inchannel;
        result
    with e ->
      close_in_noerr inchannel;
      raise e

R:

read.values <- function(filename, indices) {
  conn <- file(filename, "rb")
  read.value <- function(index) {
    seek(conn, where=index*4)
    readBin(conn, integer(), size = 4, n = 1, endian = "little")
  }
  r <- sapply(indices,read.value)
  close(conn)
  r[r==-2147483648] <- NA
  r
}

Any suggestions for improving the implementation in one of the above programming languages ? Which language would you like to compare my results with ? Which other language(s) do you expect to be faster than my benchmark results ? Can you help me out with a version in your favorite language or in C, fortran, Common Lisp, Scheme, Clojure, Java or J ?

Becoming Functional

Here is my first book review from the O'Reilly Reader Review Program. I picked Becoming Functional by Joshua Backfield because I thought it would be nice to compare it to the latest book I've read on Functional Programming in JavaScript by Michael Fogus (accidentally also published by O'Reilly).

A step by step introduction to some basic concepts of functional programming like higher order functions (passing a function to a method), pure functions (for the same input always return the same output without side effects), immutable variables, recursion and lazy evaluation.

The first chapters are in Java but starting from chapter 4 the book switches to Groovy and the latest chapters are in Scala. The Java code is not very complicated and is probably understandable for anyone with knowledge of statically typed object-oriented programming languages. Note that all Java samples use Java 7 which is rather verbose for functional programming which is one of the reasons the book switches to Groovy and Scala. The other reason is off course is that these languages support functional concepts out of the box.

Content wise the books starts with an introduction to functional programming then makes the transition from first-class functions to higher order functions and pure functions. Next are immutable variables and recursion, with a nice introduction to tail recursion. Followed by small side steps to laziness and statements. To conclude with pattern matching functional object-oriented programming. The last chapter is surprisingly as it first gives advice on transitioning into functional programming then talks about new design patterns and ends with a full implementation of a simplistic database in Scala.

It's not a bad book in the sense that it teaches incorrect material but it's not a very good book neither. It's a book written for Java developers that have never heard of the basic concepts of functional programming but personally I prefer a different approach to achieve this goal. Instead of refactoring code from the imperative style into the functional style and along the way introducing concepts, I think it's faster and easier to learn functional programming by clearly explaining and showing the concepts directly. Only then you should elaborate on how to migrate from the original code to the functional style and explain why this is a good thing to do. Although it's a rather short book (an estimated 140 pages) it is not very dense but it gives semi-real world example based on requirements from a fictional company XXY. Only the most common functional programming concepts are introduced.

My advice is to buy this book if you're a Java developer who has no experience in functional programming.

Otherwise I would skip this book and instead read:
Or take a look at my page of recommendations.