Implementing map-reduce in F#

Introduction

MapReduce is a software paradigm popularised by Google in which we take a set of tuples (key-value pairs), transform (map) them into an intermediate set of key-value pairs, and then perform some aggregation (reduce) operation on the intermediate values to obtain a result set. This is a useful way to express a problem because it yields an obvious way to “divide and conquer” the computation in a way that lends itself to parallel/distributed computing, thus providing a fairly simple way to perform computations on extremely large data sets.

It can be quite difficult to grok at first, so I decided to try implementing one of the examples from the MongoDB documentation in F# (if interested, see shell example 2). In this example, we have a list of people and the types of pet each of them has. We wish to calculate the total number of each animal.

The Code

Again, F# proves to be a remarkably succinct language to express problems, in this case the built in syntactic sugar for tuples is a godsend!

UPDATE (25-May-2010) – Controlflow helpfully suggested that I could make my original code somewhat neater by using pattern matching to decompose tuples. I’ve updated the code below with these improvements.

#light

// Simple example of map-reduce  in F#
// Counts the total numbers of each animal

// Map function for our problem domain
let mapfunc (k,v) =
    v |> Seq.map (fun(pet) -> (pet, 1))

// Reduce function for our problem domain
let reducefunc (k,(vs:seq<int>)) =
    let count = vs |> Seq.sum
    k, Seq.ofList([count])

// Performs map-reduce operation on a given set of input tuples
let mapreduce map reduce (inputs:seq<_*_>) =
    let intermediates = inputs |> Seq.map map |> Seq.concat
    let groupings = intermediates |> Seq.groupBy fst |> Seq.map (fun(x,y) -> x, Seq.map snd y)
    let results = groupings |> Seq.map reduce
    results

// Run the example...
let alice = ("Alice",["Dog";"Cat"])
let bob = ("Bob",["Cat"])
let charlie = ("Charlie",["Mouse"; "Cat"; "Dog"])
let dennis = ("Dennis",[])

let people = [alice;bob;charlie;dennis]

let results = people |> mapreduce mapfunc reducefunc

for result in results do
    let animal = fst result
    let count = ((snd result) |> Seq.toArray).[0]
    printfn "%s : %s" animal (count.ToString())

printfn "Press any key to exit."

System.Console.ReadKey() |> ignore

This yields the expected results:

Dog : 2

Cat : 3

Mouse : 1

Exercise for the reader

Parallelise this implementation (for a single machine this should be trivial by using the Parallel LINQ integration provided in the F# Powerpack).

Advertisements

6 thoughts on “Implementing map-reduce in F#

  1. What is this?
    Seq.map (fun(input) -> map (fst(input), snd(input)))

    F# can decompose tuples for you like this:
    Seq.map (fun (x,y) -> map (x, y))

    But there is absolutely no need for doing this, why no just to pass map itself as mapping function:
    Seq.map map

    Take a look at:
    Seq.map (fun (x) -> (fst(x), snd(x) |> Seq.map(fun(s) -> snd(s))))

    And equal code:
    Seq.map (fun (x,y) -> x, Seq.map snd y))

    F# is not just “C# with powerful type inference”, why not using functions as first-class values and declarative tuples decomposing?

    • Thanks for your feedback. I didn’t realise F# could decompose tuples like that, very neat! As you guessed I’m a C# developer so I’m still learning the best ways of doing things in a more functional language.

  2. Pingback: Dew Drop – May 24, 2010 | Alvin Ashcraft's Morning Dew

  3. Pingback: Rick Minerich's Development Wonderland : F# Discoveries This Week 05/28/2010

  4. I suggest to use available methods for sequences without transforming them to lists/arrays:

    let reducefunc (k,(vs:seq)) =
    let count = vs |> Seq.sum
    k, Seq.ofList([count])

    is simply
    let reducefunc (k,(vs:seq)) = k, vs |> Seq.sum |> Seq.singleton

    Further you can use (snd result) |> Seq.head to extract the count from a result tuple instead of creating an array and getting the first element out of it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s