Fuzzy collect module

This module is designed to collect fuzzy hashes from some isolated spamtrap and publish them to the local fuzzy storages using replication protocol. This is depicted in the following figure:

Operation mode

To enable collection, you need to setup Rspamd instance on spamtrap which works in a collect only mode. Here is the minimal configuration of such an instance:

# We skip common section and leave only relevant configuration
worker "fuzzy" {
  bind_socket = "*:11335";
  count = 1;
  # Important to enable this
  collection_only = true;
  # This is needed to sign collections (will discuss later)
  collection_signkey = "utenidt7xdkys5ite89w4gntrdgbsd9gp9rzjjtzzzwx693cei8y";
  # This is needed to encrypt communication between collector and this storage
  collection_keypair = {
    pubkey = "ffg1m6rqi3doy7qggqbr4qjwxw6ahy56nr4zs47doz3nn6euhsty";
    privkey = "y6qjkr4htunjwm7i9cxzzu413tnobe8cjmgmo916i1hdy4yh1s4y";
    id = "eg6ccqr91bt7bkfspufk5kgrejr8sriypkixo5a5xje83nhd58jnjnusr9ppcjtkgyqc7x1fyqpqkazxk6wnnf9buuxbguspyme7trn";
    encoding = "base32";
    algorithm = "curve25519";
    type = "kex";
  }
  # Allow local updates
  allow_update = ["localhost"];
  # Collection should be performed once per minute
  sync = 1m;

}

# Needed for `rspamc fuzzy_add`
worker "controller" {
   bind_socket = "localhost:11334";
   secure_ips = "127.0.0.1";
}
# Needed to send hashes to local storage
fuzzy_check {
    min_bytes = 100;
    rule "main" {
        timeout = 1s;
        retransmits = 7;
        servers = "localhost:11335";
        symbol = "FUZZY_UNKNOWN";
        mime_types = "*";
        max_score = 20.0;
        read_only = no;
        skip_unknown = yes;
        algorithm = "mumhash";
        fuzzy_map = {
            FUZZY_DENIED {
                max_score = 20.0;
                flag = 1;
            }
            FUZZY_PROB {
                max_score = 10.0;
                flag = 2;
            }
            FUZZY_WHITE {
                max_score = 2.0;
                flag = 3;
            }
        }
        learn_condition =<<EOD
return function(task)
  return true
end
EOD
    }
}

Then, you need to setup a local Rspamd in your network and setup this plugin. Afterwards, this Rspamd instance will query a remote fuzzy storage to gather hashes. This connection is done using plain HTTP with HTTPCrypt encryption. To protect a fuzzy storage that performs collection from unauthorized clients and replay attacks, there is an additional signature that is calculated using the following procedure:

  1. A fuzzy storage generates a random cookie (128 bytes)
  2. A fuzzy_collect module requests this cookie using /cookie path, a connection is encrypted using collection_keypair and random fresh key generated by fuzzy_collect plugin
  3. Then this cookie is digitally signed by fuzzy_collect module using ed25519 algorithm
  4. Digital signature is added to HTTP header named Signature
  5. Fuzzy collect module makes another (encrypted) request to /data path with Signature header in it
  6. Fuzzy storage verifies signature using collection_signkey and sends update queue content (encrypted) and generates a new cookie to enable replays protection
  7. Fuzzy collect module receives updates and sends them to all mirrors defined using fuzzy mirroring protocol (same thing: HTTP + HTTPCrypt).

After this procedure, all local storages are updated from the collection storage.

Here is a sample module configuration with comments about each relevant option:

fuzzy_collect {
  # Remote storage in collection mode
  collect_server = "example.com:11335";
  # Generated by `rspamadm keypair -s -u`
  sign_keypair = {
      pubkey = "utenidt7xdkys5ite89w4gntrdgbsd9gp9rzjjtzzzwx693cei8y";
      privkey = "qag97momihhozgxgxszzfwwyeaqj837ugg9jj1ywruw5xxru3oa8dtrk8n59gwyczmdtq6jiprnjgcnc86p46jqu1nxxxj9h9u3okxy";
      id = "xdhpeeyr9ubiy1wkpbzzr31jidy9dkpy5r5edbi1k9xpzpiwuyj3ye9wht7jtxifto1t8ip5fhppse9yeme1ysrx4iq19sqfp6etp4n";
      encoding = "base32";
      algorithm = "curve25519";
      type = "sign";
  }
  # Obtained from `collection_keypair` pubkey part on collection storage
  collect_pubkey = "ffg1m6rqi3doy7qggqbr4qjwxw6ahy56nr4zs47doz3nn6euhsty";
  # Local mirrors
  mirrors = {
    collection = {
      server = "127.0.0.1:11335";
      # This keypair is used to authenticate updates
      # Generated by `rspamadm keypair -u`
      keypair = {
          pubkey = "tabmk61uimctbrhudoqi9xc7pi8rudgk464semper3dfj18irgqy";
          privkey = "ohu7pnpn9ozhdcs3bocpzroo9r9s51z1j35o5troufkdgmd94fty";
          id = "q37sihtpn9xq5wpuooooqnc9fhr3paf7s3na4yofmqs6c3xkzw99iwk9dpbdfxamfi5htumxuqdnhe7pa51o6pguyoqii8xx54sygod";
          encoding = "base32";
          algorithm = "curve25519";
          type = "kex";
      }
      # This is a local fuzzy storage key to encrypt data
      pubkey = "iecwytmuxddau9pawxutmgn184jihhc1u7thdou9dpmhidysftxy";
    }
  }
}

Here is the relevant local storage configuration to allow updates:

worker "fuzzy" {
  bind_socket = "127.0.0.1:11335";
  masters = ["127.0.0.1"];
  # Copied from `fuzzy_collect`
  master_key = "tabmk61uimctbrhudoqi9xc7pi8rudgk464semper3dfj18irgqy";
  # Generated by `rspamadm keypair -u`
  sync_keypair = {
    # Copied to `fuzzy_collect`
    pubkey = iecwytmuxddau9pawxutmgn184jihhc1u7thdou9dpmhidysftxy;
    privkey = a1ixysgokojzue9nb1e6z1dmf9i145jhsyt5g1giyqp19hibs3uy;
  }
  # Other options are skipped
}