Tuesday, March 29, 2016

Go Channels slow for bulk data

I've been writing some Go code recently, for processing reasonably medium amounts (millions of data points) of time-series data. The data processing involves aggregating multiple time series by combining similar time periods and then performing simple operations across sets.

Reading the pipelines post on the golang blog demonstrated how channels can be used for some really neat looking (and potentially parallel) code, with very little work.

Without worrying too much about the implementations of Interpolate(), RollingAverage() and GetAllData(), I ended up with some code that followed this format:

type Value struct {
  Timestamp time.Time
  Value     float64
}

// Create the processing pipeline
input := make(chan *Value)
steadyData := Interpolate(input, 300) // Fix the interval between points to 5 minutes
output := RollingAverage(steadyData)  // Calculate rolling average

// Write all data to the pipeline
for v := range GetAllData() {
  input <- v
}

// Fetch the results from the end
for v := range output {
  fmt.Println("Got an output value", v.Value)
}

Interpolate(), RollingAverage() and GetAllData() all create goroutines so that the processing that they do can all be performed in parallel.

It seems relatively elegant and does make it very easy to insert other steps into the pipeline, or change order or functions. It's generally what I'd regard as pretty code.

Unfortunately, it's SLOW. Extremely slow. I ended up throwing away all the pipeline code and just passing around []*Value everywhere, taking the hit of creating and copying new slices, and the potential loss in productivity by only using a single core.

Even when the number crunching in each step is relatively complex, the performance increase by using more cores is dwarfed by the loss of using channels.

To demonstrate the performance difference, this is the code I threw together, which you can run to see for yourself:

package main
    
import (
  "fmt"
  "math/rand"
  "time"
) 
  
type Value struct {
  Timestamp time.Time
  Value     float64
}   
  
func averageOfChan(in chan *Value) float64 {
  var sum float64
  var count int
  for v := range in {
    sum += v.Value
    count++
  }
  return sum / float64(count)
} 

func averageOfSlice(in []*Value) float64 {
  var sum float64
  var count int
  for _, v := range in {
    sum += v.Value
    count++
  }     
  return sum / float64(count)
}   
    
func main() {
  // Create a large array of random numbers
  input := make([]*Value, 1e7)
  for i := 0; i < 1e7; i++ {
    input[i] = &Value{time.Unix(int64(i), 0), rand.Float64()}
  } 
    
  func() {
    st := time.Now()
    in := make(chan *Value, 1e4)
    go func() {
      defer close(in)          
      for _, v := range input {
        in <- v                
      }
    }()                        
    averageOfChan(in)          
    fmt.Println("Channel version took", time.Since(st))
  }()

  func() {
    st := time.Now()           
    averageOfSlice(input)      
    fmt.Println("Slice version took", time.Since(st))
  }()
}

Running this on my home PC, I get this:
Channel version took 1.14759465s
Slice version took 24.839719ms

Yes, it's 46x faster to pass around a slice in this contrived (but representative) example. I did attempt to optimise this by changing the size of the input channel, and 1e4 is about the fastest channel size I found.

In short: channels are neat. pipelines are neat. channels are slow.

I'd be happy to hear if I'm doing something wrong or there is a better (faster) way.

Thursday, October 28, 2010

PHP XSRF Protection

This week I was in a training course which included how XSRF attacks work and how they can be relatively easily defended against.
I did a quick Google search for libraries which provide XSRF protection in a really simple way and came up empty. So I wrote one.

This class provides a simple method of protecting form submission from common Cross Site Request Forgery (XSRF) attacks.

Protection is accomplished by adding a randomised hidden field to forms that are checked when the form is processed. If the hidden field doesn't exist, or is modified then the request should be rejected.

The method used is stateless and does not require any session management to be used. This allows the request to be easily handled by a load balanced cluster of frontends that don't share session information.

Protection against replay attacks can also be provided using this same method, but requiring session local storage which makes this stateful, and requires distributed session management if multiple web servers are being used.

If you're interested, check out http://sites.dparrish.com/php-xsrf-protection for the code.

Sunday, December 13, 2009

IPv6 on Karmic Koala

Ubuntu version prior to Karmic Koala use tspc to create an IPv6 tunnel. In Karmic this has been replaced with gw6c, and the configuration file is similar but not exactly the same. For reference, here is my /etc/gw6c/gw6c.conf which uses the Aarnet tunnel broker in Australia.

This is mostly the sample configuration file, with the changed bits in italics.
userid=dparrish
passwd=********
server=broker.aarnet.net.au
auth_method=any
host_type=router
if_tunnel_v6v4=sit1
if_tunnel_v6udpv4=tun0
if_tunnel_v4v6=
prefixlen=64
if_prefix=eth0
auto_retry_connect=yes
retry_delay=30
retry_delay_max=300
keepalive=yes
keepalive_interval=30
tunnel_mode=v6anyv4
client_v4=auto
client_v6=auto
template=linux
proxy_client=no
broker_list=/var/lib/gw6c/tsp-broker-list.txt
last_server=/var/lib/gw6c/tsp-last-server.txt
always_use_same_server=no
log_stderr=0
log_file=2
log_filename=/var/log/gw6c/gw6c.log
log_rotation=yes
log_rotation_size=32
log_rotation_delete=no
syslog_facility=USER


After you have saved this file, run service gw6c restart to start the daemon. You can then check for your new IPv6 address using ip addr ls tun0.

If you want to enable firewalling on this tunnel (HIGHLY RECOMMENDED), you can use the following two files:
  • linux.sh - Copy to /usr/share/gw6c/template/linux.sh and chmod 755
  • firewall.sh - Copy to /etc/gw6c/firewall.sh and chmod 755


You should also edit firewall.sh to allow/disallow what you want. By default, this script allows ssh in to any machines behind your router, and any packets out from internal machines.

EDIT: If gw6c fails to start and doesn't give any useful information at all, try copying /usr/share/gw6c/template/linux.sh to /var/lib/gw6c/template/linux.sh.

Friday, June 12, 2009

What's going on



This is a quick test post to let people know what's going on in my life.

I'm working for Google as a Site Reliability Engineer. I recently bought a new R1 (the motorbike).

Thursday, February 19, 2009

Dbackup 1.2.0

It's been quite a long time since the last release, and during that time I had a nasty crash that took out my server including git repository. However, backups come to the rescue and dbackup is available again. Announcing version 1.2.0, which includes:

  • Updated documentation for each application

  • Changed the protocol to include flags on a list

  • Added the dry_run flag to restore

  • Some bug fixes and performance enhancements


Along with the new version comes a move to Google Code, sitting side-by-side with libcli and rollout. You can find the code now at http://code.google.com/p/dbackup.

The latest release is available as a tarball only, at http://dbackup.googlecode.com/files/dbackup-1.2.0.tar.gz.

Thursday, February 5, 2009

I LEGO N.Y.

This guy has put together some amazing lego creations that remind him of New York. No wait... They aren't amazing, but they are quite funny.

http://niemann.blogs.nytimes.com/2009/02/02/i-lego-ny/?em


Not to scale.

Saturday, April 26, 2008

Rollout Moved

There’s been enough actual interest in Rollout for me to move it to a community site. I chose Google Code because I work there, and it’s much faster than Sourceforge.

The new URL for Rollout is: http://code.google.com/p/rollout.

Unfortunately they don’t support git, so I have to deal once again with Subversion.

The new checkout instructions are:
svn checkout http://rollout.googlecode.com/svn/trunk/ rollout

The manual is online there, and an issue tracker, so submit bugs!