2016-10-09

Jubatusで外れ値検知機能(jubaanomaly)の検証

Jubatus

参考

Jubatus で facebook への不正ログインを検知したい話(1) - Qiita
Anomaly チュートリアル (Python) — Jubatus
データ変換 — Jubatus

サンプルプログラム

やってること

ユーザー名と接続元IPアドレスの情報で構成されるテストデータを作成
※接続元IPアドレスはたまに仮想クラッカーのIPアドレスを利用

jubaanomalyの起動・停止

仮想クラッカーのIPの時だけ判定結果を出力

引数により学習動作が異なる。

"string1"の場合

ユーザー名:IPアドレス"の1つの文字列データにして学習させる。

"string2"の場合

ユーザー名"と"IPアドレス"の2つの文字列データにして学習させる。

ip_to_num

ユーザー名"を1つの文字データとし、"IPアドレス"を1つの数値データにして学習させる。

コンフィグファイル(anomaly.json)

num_rulesの重みづけは、以下の理由によるtypeでstrを指定する。

参考URLに以下記載あり。

"num" 与えられた数値をそのまま重みに利用する。
"num" 与えられた数値をそのまま重みに利用する。
"str" 与えられた数値を文字列として扱う。これは、例えばIDなど、数値自体の大きさに意味のないデータに対して利用する。重みは1とする。

ignore_kth_same_pointを有効にする。有効にしないとscoreがinfになってしまう為。
参考URLに以下記載あり。

登録できる重複データの件数を nearest_neighbor_num - 1 件に制限することにより、スコアが inf になることを防ぐ。このパラメタは省略可能であり、デフォルト値は false (無効) である。 (Boolean)

{
 "method" : "lof",
 "parameter" : {
  "nearest_neighbor_num" : 10,
  "reverse_nearest_neighbor_num" : 30,
  "method" : "euclid_lsh",
  "ignore_kth_same_point" : true,
  "parameter" : {
   "hash_num" : 8,
   "table_num" : 16,
   "probe_num" : 64,
   "bin_width" : 10,
   "seed" : 1234
  }
 },

 "converter" : {
  "string_filter_types": {},
  "string_filter_rules": [],
  "num_filter_types": {},
  "num_filter_rules": [],
  "string_types": {},
  "string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
  "num_types": {},
  "num_rules": [{"key" : "*", "type" : "str"}]
 }
}

実行プログラム(anomaly_test.py)

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import signal
import sys
import os
import json
from jubatus.anomaly import client
from jubatus.common import Datum
import random
import time
import ipaddr


TNAMES = ["string1", "string2", "ip_to_num"]
NUM_OF_USER = 1000
NUM_OF_DATA = 10000
HUSEI = 1000


# 自分のサーバに不正ログイン施行回数の多いサーバのIP
CRACKERS = ["183.60.122.126", "188.140.127.155", "58.186.158.62", "91.224.160.184", "79.169.110.246", "46.186.241.89"]


def make_users():
    users = list()
    for i in range(NUM_OF_USER):
        users.append("user" + str(i))
    return users


def make_ip():
    num = random.randint(1, HUSEI)
    if num == HUSEI:
        ip = CRACKERS[random.randint(0, len(CRACKERS) - 1)]
    else:
        ip = "10.0.0." + str(random.randint(1, 254))
    return ip


def make_data(users):
    data = list()
    for i in range(NUM_OF_DATA):
        idx = random.randint(0, NUM_OF_USER - 1)
        ip = make_ip()
        data.append([users[idx], ip])
    return data


def do_exit(sig, stack):
    print('You pressed Ctrl+C.')
    print('Stop running the job.')
    sys.exit(0)


def ip2int(ip):
    o = map(int, ip.split('.'))
    res = (16777216 * o[0]) + (65536 * o[1]) + (256 * o[2]) + o[3]
    return res


def exec_test(tname, data):
    cnt = 1
    proc_per = 10
    cur = 0
    stime = time.time()

    for ent in data:

        datum = Datum()

        if tname == "string1":
            string = ent[0] + ":" + str(ent[1])
            datum.add_string("string", string)
        elif tname == "string2":
            datum.add_string("user", ent[0])
            datum.add_string("src_ip", ent[1])
        elif tname == "ip_to_num":
            datum.add_string("user", ent[0])
            ipnum = ip2int(ent[1])
            datum.add_number("ip", ipnum)

        anom = client.Anomaly("127.0.0.1", 9199, tname)
        ret = anom.add(datum)

        if ent[1] in CRACKERS:
            print (ret, ent)

        if not cnt % (NUM_OF_DATA / proc_per):
            cur += proc_per
            etime = int(time.time() - stime)
            print("===>" + str(cur) + "% completed (elapse: " + str(etime) + " sec)")

        cnt += 1


def op_srv(OP):
    if OP[0] == "start":
        com = "jubaanomaly -f " + OP[1] + "> /dev/null 2>&1 &"
    else:
        com = "pkill jubaanomaly > /dev/null 2>&1"
    if os.system(com):
        print("Error: jubaanomaly " + OP[0] + " failed")
        exit(1)


def get_args():
    if len(sys.argv) != 3:
        print("Error: Invalid args")
        exit(1)

    config = sys.argv[1]
    tname = sys.argv[2]

    if not os.path.isfile(config):
        print("Error: " + config + " does not exist")
        exit(1)
    if not tname in TNAMES:
        print("Error: " + tname + " is invalid test name")
        exit(1)
    return config, tname


def main():
    signal.signal(signal.SIGINT, do_exit)
    config, tname = get_args()

    users = make_users()
    print("make users complete")
    data = make_data(users)
    print("make data complete")

    op_srv(["start", config])
    time.sleep(5)
    print("Test Started")
    exec_test(tname, data)
    op_srv(["stop"])
    print("Test Finished")

if __name__ == '__main__':
    main()

実行方法＆結果

./anomaly_test.py ./anomaly.json "string1"
make users complete
make data complete
Test Started
(id_with_score{id: 34, score: 1.00283694267}, ['user0', '58.186.158.62'])
(id_with_score{id: 223, score: 1.00143611431}, ['user80', '183.60.122.126'])
===>10% completed (elapse: 0 sec)
(id_with_score{id: 290, score: 0.999702095985}, ['user61', '91.224.160.184'])
(id_with_score{id: 294, score: 0.999072134495}, ['user13', '79.169.110.246'])
(id_with_score{id: 343, score: 0.99412637949}, ['user50', '58.186.158.62'])
(id_with_score{id: 405, score: 1.00214076042}, ['user12', '58.186.158.62'])
(id_with_score{id: 472, score: 1.0037945509}, ['user90', '183.60.122.126'])
(id_with_score{id: 486, score: 1.00199890137}, ['user19', '91.224.160.184'])
===>20% completed (elapse: 2 sec)
(id_with_score{id: 570, score: 1.00471949577}, ['user94', '58.186.158.62'])
(id_with_score{id: 673, score: 0.996484994888}, ['user90', '188.140.127.155'])
(id_with_score{id: 709, score: 1.00101566315}, ['user0', '183.60.122.126'])
===>30% completed (elapse: 5 sec)
(id_with_score{id: 852, score: 1.01610195637}, ['user35', '183.60.122.126'])
(id_with_score{id: 977, score: 0.999998986721}, ['user19', '46.186.241.89'])
===>40% completed (elapse: 10 sec)
(id_with_score{id: 1053, score: 1.01017296314}, ['user99', '58.186.158.62'])
(id_with_score{id: 1184, score: 0.998991131783}, ['user57', '46.186.241.89'])
(id_with_score{id: 1229, score: 0.997941493988}, ['user96', '183.60.122.126'])
===>50% completed (elapse: 15 sec)
===>60% completed (elapse: 22 sec)
===>70% completed (elapse: 29 sec)
(id_with_score{id: 1783, score: 1.00462818146}, ['user1', '46.186.241.89'])
===>80% completed (elapse: 37 sec)
(id_with_score{id: 2013, score: 1.00843286514}, ['user64', '79.169.110.246'])
(id_with_score{id: 2018, score: 1.00108575821}, ['user4', '58.186.158.62'])
(id_with_score{id: 2053, score: 1.00647413731}, ['user72', '188.140.127.155'])
(id_with_score{id: 2227, score: 1.00535583496}, ['user3', '91.224.160.184'])
(id_with_score{id: 2231, score: 1.0007673502}, ['user16', '46.186.241.89'])
===>90% completed (elapse: 46 sec)
(id_with_score{id: 2353, score: 0.995516657829}, ['user97', '58.186.158.62'])
(id_with_score{id: 2421, score: 0.9998447299}, ['user94', '188.140.127.155'])
(id_with_score{id: 2475, score: 0.992462217808}, ['user87', '183.60.122.126'])
===>100% completed (elapse: 57 sec)
Test Finished

# ./anomaly_test.py ./anomaly.json "string2"
make users complete
make data complete
Test Started
(id_with_score{id: 38, score: 1.00619769096}, ['user4', '91.224.160.184'])
(id_with_score{id: 131, score: 0.9985871315}, ['user85', '188.140.127.155'])
(id_with_score{id: 176, score: 1.06174123287}, ['user15', '91.224.160.184'])
===>10% completed (elapse: 0 sec)
(id_with_score{id: 308, score: 1.01850771904}, ['user79', '46.186.241.89'])
(id_with_score{id: 442, score: 1.02138268948}, ['user53', '58.186.158.62'])
===>20% completed (elapse: 2 sec)
(id_with_score{id: 728, score: 1.00389277935}, ['user87', '46.186.241.89'])
===>30% completed (elapse: 3 sec)
(id_with_score{id: 771, score: 0.993154287338}, ['user29', '183.60.122.126'])
(id_with_score{id: 822, score: 1.00076854229}, ['user96', '79.169.110.246'])
(id_with_score{id: 854, score: 1.01049315929}, ['user94', '183.60.122.126'])
(id_with_score{id: 880, score: 1.02746069431}, ['user20', '91.224.160.184'])
(id_with_score{id: 903, score: 1.03314602375}, ['user41', '183.60.122.126'])
(id_with_score{id: 910, score: 1.00415050983}, ['user67', '79.169.110.246'])
===>40% completed (elapse: 6 sec)
(id_with_score{id: 1006, score: 1.01191151142}, ['user84', '91.224.160.184'])
(id_with_score{id: 1175, score: 1.01555621624}, ['user93', '91.224.160.184'])
(id_with_score{id: 1243, score: 0.986846208572}, ['user96', '79.169.110.246'])
===>50% completed (elapse: 9 sec)
(id_with_score{id: 1290, score: 0.999491155148}, ['user84', '91.224.160.184'])
(id_with_score{id: 1363, score: 1.0839984417}, ['user95', '183.60.122.126'])
(id_with_score{id: 1435, score: 0.988385975361}, ['user71', '79.169.110.246'])
(id_with_score{id: 1451, score: 1.07448995113}, ['user50', '46.186.241.89'])
===>60% completed (elapse: 13 sec)
(id_with_score{id: 1506, score: 1.01763594151}, ['user22', '46.186.241.89'])
(id_with_score{id: 1659, score: 1.02214670181}, ['user27', '58.186.158.62'])
===>70% completed (elapse: 17 sec)
(id_with_score{id: 1919, score: 1.01828491688}, ['user27', '79.169.110.246'])
(id_with_score{id: 1944, score: 1.01462376118}, ['user30', '183.60.122.126'])
===>80% completed (elapse: 22 sec)
(id_with_score{id: 2030, score: 1.01424443722}, ['user30', '188.140.127.155'])
(id_with_score{id: 2203, score: 1.05265760422}, ['user54', '91.224.160.184'])
===>90% completed (elapse: 28 sec)
(id_with_score{id: 2276, score: 0.991128385067}, ['user79', '46.186.241.89'])
(id_with_score{id: 2305, score: 0.999580144882}, ['user87', '188.140.127.155'])
(id_with_score{id: 2308, score: 1.02546048164}, ['user23', '183.60.122.126'])
(id_with_score{id: 2403, score: 0.996041715145}, ['user55', '183.60.122.126'])
===>100% completed (elapse: 34 sec)
Test Finished

# ./anomaly_test.py ./anomaly.json "ip_to_num"
make users complete
make data complete
Test Started
(id_with_score{id: 11, score: 1.40382528305}, ['user79', '183.60.122.126'])
(id_with_score{id: 96, score: 1.65164899826}, ['user30', '79.169.110.246'])
===>10% completed (elapse: 0 sec)
(id_with_score{id: 302, score: 1.38489890099}, ['user77', '183.60.122.126'])
(id_with_score{id: 304, score: 1.52720057964}, ['user16', '183.60.122.126'])
(id_with_score{id: 334, score: 1.37992143631}, ['user58', '58.186.158.62'])
(id_with_score{id: 339, score: 1.53620314598}, ['user7', '58.186.158.62'])
(id_with_score{id: 375, score: 1.511734128}, ['user27', '91.224.160.184'])
===>20% completed (elapse: 2 sec)
(id_with_score{id: 524, score: 1.37163031101}, ['user22', '183.60.122.126'])
===>30% completed (elapse: 3 sec)
(id_with_score{id: 880, score: 1.40898621082}, ['user93', '58.186.158.62'])
(id_with_score{id: 891, score: 1.32244873047}, ['user49', '79.169.110.246'])
===>40% completed (elapse: 5 sec)
(id_with_score{id: 1217, score: 1.43259418011}, ['user20', '183.60.122.126'])
(id_with_score{id: 1242, score: 1.44561052322}, ['user98', '91.224.160.184'])
===>50% completed (elapse: 6 sec)
(id_with_score{id: 1300, score: 1.69272983074}, ['user14', '46.186.241.89'])
(id_with_score{id: 1382, score: 1.36963653564}, ['user78', '79.169.110.246'])
(id_with_score{id: 1426, score: 1.37384569645}, ['user77', '46.186.241.89'])
===>60% completed (elapse: 7 sec)
(id_with_score{id: 1547, score: 1.37469255924}, ['user32', '188.140.127.155'])
(id_with_score{id: 1746, score: 1.34071433544}, ['user80', '46.186.241.89'])
(id_with_score{id: 1749, score: 1.764798522}, ['user18', '91.224.160.184'])
===>70% completed (elapse: 7 sec)
(id_with_score{id: 1827, score: 1.44498622417}, ['user76', '183.60.122.126'])
(id_with_score{id: 1925, score: 1.32021319866}, ['user17', '58.186.158.62'])
===>80% completed (elapse: 8 sec)
(id_with_score{id: 2104, score: 1.12577271461}, ['user68', '188.140.127.155'])
(id_with_score{id: 2128, score: 1.35998618603}, ['user53', '58.186.158.62'])
(id_with_score{id: 2192, score: 1.13407361507}, ['user31', '79.169.110.246'])
(id_with_score{id: 2211, score: 1.47604465485}, ['user29', '188.140.127.155'])
===>90% completed (elapse: 9 sec)
(id_with_score{id: 2292, score: 1.1355766058}, ['user61', '91.224.160.184'])
(id_with_score{id: 2305, score: 1.38899683952}, ['user81', '58.186.158.62'])
(id_with_score{id: 2349, score: 1.21521854401}, ['user15', '58.186.158.62'])
(id_with_score{id: 2385, score: 1.15831208229}, ['user61', '183.60.122.126'])
(id_with_score{id: 2408, score: 1.55122351646}, ['user12', '46.186.241.89'])
(id_with_score{id: 2482, score: 1.24194133282}, ['user17', '46.186.241.89'])
===>100% completed (elapse: 9 sec)
Test Finished

分かったこと

テキストデータの違いは重みづけが弱い?
学習済みデータが多くなればなるほど、学習時間がかかる?

2016-10-08

Jubatusチュートリアル勉強(データ変換・文字列編)

Jubatus

参考URL

Anomaly チュートリアル — Jubatus データ変換 — Jubatus

データ変換とは?

一般的に機械学習を行う場合、テキスト等の非定形データは直接扱うことはできない

その為、それらのデータから特徴抽出をして、特徴ベクトルデータを得る必要がある。これがデータ変換。

特徴ベクトルデータは、キーが文字列・値が数値のkey-value型のデータ

このデータ変換により、自然言語・画像・音声データを統一的に扱う事が可能になる

Jubatusはこの変換機能を有し、設定ファイルで柔軟にカスタマイズできる。

データ変換の流れ

①クライアントが学習データからdatum(学習データの元)を生成しサーバに渡す。

②サーバはdatumにフィルター処理をする。

③サーバはフィルター処理されたdatumに特徴抽出処理(重み付け)をして特徴ベクトルデータを得る

サーバにおけるフィルター処理について

クライアントから提供されるdatumはkey:value型データ。keyは文字列。
valueのデータは以下3種類があり。
1) 文字列
2) 数値
3) バイナリデータ

フィルター処理は、datumを入力として指定ルールに基づき新たなdatumを生成し、学習対象データに追加する。

例) 参考URLの例では、クライアントが生成したdatum(message)にHTMLタグ除去処理をして、別keyでdatum(message-detagged)を追加している。

フィルター処理は、datumの種類により異なる。本記事では文字列データの処理について記載する。(数学が苦手だから。。。)

設定ファイル内の以下パラメタで実現

stiring_filter_types

変換ルールを定義

例)

"string_filter_types": {
  "detag": { "method": "regexp", "pattern": "<[^>]*>", "replace": "" }
},

やっている事は以下の通り
①"detag"という名前でHTMLタグを除去するフィルタを定義
②"method"で正規表現をするフィルタ"regexp"を指定
③"pattern"でパターンマッチを指定
④"replace"でパターンにマッチした場合の置換ルールを指定

"detag"はユーザー定義データ
"method"、"pattern"、"replace"に指定可能なパラメタは他にもあり。詳細は参考URLを参照。
フィルタ・変換ルールのみ定義。どのdatumに適用するかは、ここでは定義しない。

string_filter_rules

stiring_filter_typesで定義されたルールをdatumに適用し学習対象となる新datumを追加する。

例)

"string_filter_rules": [
  { "key": "message", "type": "detag", "suffix": "-detagged" }
]

やっている事は以下の通り
①"key"でフィルタ処理対象にするdatumを指定。
※"message"はクライアント側プログラムで学習データからdatumを生成する時に定義されている。
②"type"でstiring_filter_typesで定義した変換ルール"detag"を指定。
③"suffix"で新しく生成されるdatumのkey値生成を生成。

結果、以下のdatumが作成され学習対象として追加される。
key: message-detagged
value: タグ除去されたデータ

上記例にはないが、"except"パラメタを利用してマッチ除外条件の指定も可能。

サーバにおける特徴抽出処理(重み付け)について

この処理をdatumに行い、特徴ベクトルデータを得る。

設定ファイル内の以下パラメタで実現

string_types

重み付けルールを定義

例)

"string_types": {
  "bigram":  { "method": "ngram", "char_num": "2" }
},

やっている事は以下の通り
①"bigram"という名前の重み付けルールを定義
②"method"で重み付けアルゴリズム("ngram")を指定
ngramは隣接するN文字を特徴量として利用するアルゴリズム。(このような特徴量をN-gram特徴と呼ぶ)
③"char_num"は②で指定したアルゴリズムのオプション。ngramの隣接するN文字のNを指定。

"bigram"はユーザー定義データ。
"method"、"method"に指定可能なパラメタは他にもあり。

詳細は参考URLを参照。
重み付けルールのみ定義。どのdatumに適用するかは、ここでは定義しない。

string_rules

string_typesで定義されたルールをdatumに適用し重み付けをする。

例)

"string_rules": [
  { "key": "message",          "type": "bigram", "sample_weight": "tf",  "global_weight": "bin" },
  { "key": "message-detagged", "type": "space",  "sample_weight": "bin", "global_weight": "bin" }
]

やっている事は以下の通り
①"key"で適用するdatumを指定。
②"type"でstring_typesで定義したアルゴリズム"bigram"を指定
③"sample_weight"で重み付けする値を決定。
　"bin"の場合は常に1とする。
　"tf"の場合は、datumの文字列中の出現回数で重みづけする。
④"global_weight"で今までの通算データから算出される大域的な重み付けを指定。
　"bin"の場合は常に1とする。

今日はここまで！
特徴抽出処理あたりからよく分からない。。。。
サンプルプログラムで動きをみてみよう。

2016-10-08

Jubatusチュートリアルメモ

Jubatus

参考URL

チュートリアル — Jubatus

Classifierチュートリアル

jubaclassifierを使って入力データの分類を行う

サンプルプログラムの内容

昔の将軍の名前を入力にして名字をあてる。
例)
入力) 家康→ 出力) 徳川
入力) 尊氏→ 出力) 足利　　

Recommender チュートリアル

jubarecommenderを使って類似するデータを推薦する
※jubarecommenderはECサイト商品お勧めなどに利用することができる

サンプルプログラムの内容

プロ野球選手の野手成績を学習し似たタイプ(成績)を推薦する
例)
入力) 中田翔→ 出力) 井口資仁、新井貴浩　中村紀洋

Anomaly チュートリアル

jubaanomalyを使って外れ値検知をする
LOFを使う
LOFはN次元空間で、近くにある点がどの程度あるか調べて、外れ値を検知
不正検知、障害検知に利用可能

サンプルプログラムの内容

Regression チュートリアル

jubaregressionを使って線形回帰機能（Regression)を提供する
線形回帰機能は、入力データから出力データを推定する機能
株価予測や消費電力に利用できる

サンプルプログラムの内容

賃貸物件の情報(駅からの距離、面積、年数等)から家賃を推測する

Graph チュートリアル

jubagraphを使ってグラフマイニング機能(Graph)を提供する
グラフマイニング機能は与えられたグラフ構造から中心点や最短距離を抽出する機能
ソーシャルコミュニティ分析やネットワーク構造分析に利用可能

サンプルプログラムの内容

鉄道路線の最短経路を推定するプログラム
例)
入力) 品川・お茶ノ水出力) 品川→東京→お茶ノ水(最短距離経路)

※山手線と中央線の接続を示すグラフを作成して、上記を実現

Stat チュートリアル

jubastatを使って統計分析機能（Stat）を提供
統計分析機能は時系列データのウィンドウ設定つき分析機能であり、センサー監視や異常データ検知などに利用可能

サンプルプログラムの内容

オレンジ・りんご・メロンの直径・重さ・価格を学習し、フルーツ毎にパラメータの合計値や標準偏差など統計分析をする

2016-10-08

Jubatusチュートリアルをやる！

Jubatus

* 参考URL

チュートリアル — Jubatus

チュートリアルプログラムの概要

・自然言語の分類をする
・評価用データとしてNews 20を使う
　Home Page for 20 Newsgroups Data Set
・News 20は自然言語分類の評価用データ
　80％が学習用データ
　20％がテスト用データ
・News 20はニュースグループ。20個のグループがあり、色々な人がメッセージを投稿している。
・チュートリアルプログラムは、学習用データを読み込んで、テスト用データとしてメッセージを読み込み、どのグループの投稿データかを推測する。
・自分の環境では、正答率は1回目71％、2回目74％、3回目75％、4回目75％、

メモ

・分類機能があるjubaclassifierを利用
・設定ファイルを引数にしてjubaclassifierをjubatusサーバで実行する
・設定ファイルの主な指定項目は以下。
　method アルゴリズムを指定。パーセプトロン等
　converter 入力データの特徴ベクトルへの変換方法を指定
　parameter 今回は指定なし
・クライアントが学習用データとテスト用データを投げ込んで、テスト結果を得る
・trainメソッドで学習
・classifyメソッドでテスト

2016-10-02

PostfixのSMTP認証に認証ユーザー単位の接続元NW制限を追加する方法

PostfixのPolicyサービスを使う

参考URL

POSTFIX に SPF を導入する（踏み台とされたくない
 Postfix SMTP アクセスポリシー委譲

Polixyサービスとは?

独自プログラムを呼び出して細かいアクセス制御を実現する機能
独自プログラムは、Postfixが呼び出し時に提供する属性情報(SASL認証ユーザー名等)を利用してリレー判定結果を返すだけ
→入力データはPostfix経由時に精査されて、プログラムに渡される為、簡単＆安心に使える感じ！
Postfixパッケージにサンプルプログラムがあり。
メールデータをいじらない為、メールデータロストのリスクは少ない(?)

今回実装したPolicyサービスの動き

①クライアントがRCPTコマンドを実行

②Postfixが標準入力で以下の情報をプログラムに渡す
例)

request=smtpd_access_policy
protocol_state=RCPT
protocol_name=SMTP
helo_name=some.domain.tld
queue_id=8045F2AB23
sender=foo@bar.tld
recipient=bar@foo.tld
client_address=1.2.3.4
client_name=another.domain.tld
instance=123.456.7
sasl_method=plain
sasl_username=you
sasl_sender=
ccert_subject=solaris9.porcupine.org
ccert_issuer=Wietse Venema
ccert_fingerprint=C2:9D:F4:87:71:73:73:D9:18:E7:C2:F3:C1:DA:6E:04
size=12345
[empty line]

③プログラムはPostfixから渡される以下情報を利用してリレー判定をする。

接続元IP(client_address)
SMTP認証ユーザー情報(sasl_username)

接続元IPの情報を、DBに問い合わせて接続を許可するNWか判定する。

④プログラムは判定結果を標準出力に以下のデータを返す。

リレー拒否する場合

action=reject ERROR MESSAGE
[empty line]

リレー拒否しない場合

actions=dunno
[empty line]

actions=には、Postfixのaccessテーブルで利用可能なアクションが設定できる。
Postfix manual - access(5)

dunnoは、テーブルにマッチするものがない場合と同じ動き。
こうすれば、smtpd_recipient_restrictionsで定義した以降のフィルタも評価される。

前提

ユーザー毎に接続を許可する接続元NWをDBに入れておく。

テスト環境構築

main.cf

smtpd_recipient_restrictions = check_policy_service unix:private/policy, permit_sasl_authenticated, reject_unauth_destination
smtpd_sasl_auth_enable = yes
smtpd_sasl_security_options = noanonymous
broken_sasl_auth_clients = yes
smtpd_sasl_local_domain = $myhostname
smtp_sasl_path = smtpd

master.cf

policy unix - n n - - spawn user=nobody argv=/bin/python /usr/libexec/postfix/smtpd-policy-chk.py

/usr/libexec/postfix/smtpd-policy-chk.py

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import MySQLdb
import ipaddress
import syslog
import time


def output_log(mes):
    syslog.openlog()
    syslog.syslog("DEBUG: " + mes)
    syslog.closelog()


def get_attr():
    attr = dict()
    while True:
        ent = raw_input()
        if ent:
            if "=" in ent:
                k = ent.split("=")[0]
                v = ent.split("=")[1]
                attr[k] = v
                output_log("Input: " + ent)
            else:
                output_log("Invalid entry: " + ent)
        else:
            output_log("Fin input" + ent)
            break
    return attr


def chk_relay(client_address, sasl_username):
    connector = MySQLdb.connect(host="localhost", db="testdb", user="testuser", passwd="password", charset="utf8")
    cursor = connector.cursor()
    sql = "select network from access where user = '" + sasl_username + "'"
    cursor.execute(sql)
    result = cursor.fetchall()
    cursor.close()
    connector.close()

    if result:
        network = result[0][0]
        nw = ipaddress.ip_network(network)
        ip = ipaddress.ip_address(unicode(client_address, "utf-8"))
        if ip in nw:
            return "relay"
        else:
            return "no_relay"
    else:
        return "no_relay"


def main():
    attr = get_attr()
    if attr.has_key("client_address") and attr.has_key("sasl_username"):
        client_address = attr["client_address"]
        sasl_username = attr["sasl_username"].split("@")[0]
        if client_address and sasl_username:
            ret = chk_relay(client_address, sasl_username)
            if ret == "relay":
                output_log("Result 1")
                sys.stdout.write("action=dunno\n\n")
            else:
                output_log("Result 2")
                sys.stdout.write("action=reject Dameyo\n\n")
        else:
            output_log("Result 3")
            sys.stdout.write("action=dunno\n\n")
    else:
        output_log("Result 4")
        sys.stdout.write("action=dunno\n\n")
    exit()

if __name__ == '__main__':
    main()

テストデータ

mysql
> create database testdb;
> grant all on testdb.* to testuser@localhost;
> flush privileges;
> set password for testuser@localhost=password('password');
> use testdb
> create table access (user VARCHAR(32), network VARCHAR(64));
> insert into access (user, network) values("test1", "127.0.0.1/32");
> insert into access (user, network) values("test2", "172.31.27.95/32");
> create table user (user VARCHAR(32), password VARCHAR(64), maildir VARCHAR(64));
> insert into user (user, password) values("test1", "password");
> insert into user (user, password) values("test2", "password");
> select * from user;
+-------+----------+---------+
| user  | password | maildir |
+-------+----------+---------+
| test1 | password | NULL    |
| test2 | password | NULL    |
+-------+----------+---------+
2 rows in set (0.00 sec)

> select * from access;                                                                        
+-------+-----------------+
| user  | network         |
+-------+-----------------+
| test1 | 127.0.0.1/32    |
| test2 | 172.31.27.95/32 |
+-------+-----------------+
2 rows in set (0.00 sec)

動作結果

SMTP認証に成功しただけでは、リレー許可されない。
DBに接続元NWが登録されている必要がある。

最後に

上記のプログラムは簡単に動作確認するだけのもので、細かい点はあまり考慮してません。

2016-09-26

シグモイド関数の微分を理解する

ここにたどり着くまでに、微分と対数を勉強する必要があった。。。とっても疲れた。

参考URL

シグモイド関数を微分する - のんびりしているエンジニアの日記

シグモイド関数とは?

ニューラルネットワーク

或る細胞の内部状態を出力値に変換する関数として、シグモイド関数がよく使われる。入力値の絶対値がいくら大きくても出力値は0～1の範囲に収まり、細胞に近い反応をする関数と言える。入力値が負ならば0.5以下、正ならば0.5以上の出力値となる。

シグモイド関数の微分が、ニューラルネットワークで利用される。

シグモイド関数の式

広義

           1
f(x) = -----------
       1 + e^(-ax)

aをゲインと呼ぶ。ゲインによる傾きに変換を与える事が可能

狭義

           1
f(x) = -----------
       1 + e^(-x)

ゲインは1。

シグモイド関数の微分

微分の中に関数自体が含まれるのが特徴

微分の公式

f'(x) = (1 -f(x))f(x)

確かに微分の中に、関数自体( f(x) )が含まれている。

微分の公式の証明

手作業で微分してみる。

①スタート

           1
f(x) = -----------
       1 + e^(-x)

②指数法則を使い展開

f(x) = (1 + e^(-x))^-1

③微分しやすくするため、関数を合成関数の形態に展開。

1) 以下の定義をする。
u = 1 + e^(-x)

2) uを使って、元の式を表記。合成関数の表記になる
f(u) = u^(-1)

合成関数の微分は、2つの関数の微分を掛け合わせたものになる。

④関数uを微分する。

公式: (x^n)' = nx^(n-1)

f(u) = u^(-1)
→右辺を公式に従い展開すると微分になる

f'(u) = -1 * u^(-1-1) = -u^(-2)

⑤新たに定義した関数uの微分をする。

以下3つの公式を使う
a) (a^x)' = a^x * loge(a)
b) (e^x)' = e^x
c) (ax)' = a

u = 1 + e^(-x)

・1は定数のため微分するとゼロになる
・eはネイピア数

ここでも合成関数の微分を使う
-xを関数vとして以下の合成関数にする。

v = -x
u = e^v

関数vは、cの公式で微分して以下
(v)' = -1

関数e^vは、bの公式で微分して以下
(e^v)' = e^v

よって合成関数の微分は、以下
(u)' = -1 * e^v
(u)' = -1 * e^(-x)
(u)' = -e^(-x)

⑥これで最初の以下の合成関数の微分をする

f'(x) = -u^(-2) * -e^(-x)
f'(x) = u^(-2) * e^(-x)

uを展開

f'(x) = (1 + e^(-x))^(-2) * e^(-x)

↓

             e^(-x)
f'(x) = ---------------
        (1 + e^(-x))^2

↓

          e^(-x)        1
f'(x) = ---------- * ----------
        1 + e^(-x)   1 + e^(-x)


ここで左側部分を以下に変形する。
  e^(-x)     1 + e^(-x)       1
---------- = ---------- - ----------
1 + e^(-x)   1 + e^(-x)   1 + e^(-x)


↓

         1 + e^(-x)       1              1
f'(x) =( ---------- - ---------- ) * ----------
         1 + e^(-x)   1 + e^(-x)     1 + e^(-x)

ここで、最初の関数表記を代入すると
           1
f(x) = -----------
       1 + e^(-x)

↓

やっとこさ、以下の形になる！

f'(x) =(1 - f(x)) * f(x)

分かったこと！

自分の頭では数学の理解はムリ！大人しく公式を覚えて活用できればいい！
シグモイド関数は、微分に元の関数が含まれる！
数式を色々展開すると、色々な姿、関係性が見えてくる。これが数学の楽しさ？

2016-09-25

自然対数とは?

参考

【対数】インデックス | 大人が学び直す数学
 【数列】自然対数の登場�U〜自然対数の底 | 大人が学び直す数学

ネイピア数eとセットで用いられる対数

そもそもネイピア数とは?

・eと表記される。

・円周率や黄金比と同じく無理数

以下式で表わされる

           1
lim  (1 + ---)^n
n→∞        n

上記式の結果は、上限があり収束する。上限は、2.7182818......。これがネイピア数

・使い道が分かりづらいが数学乗では重要な定数らしい。一定範囲の中で無限に数扱う事ができる＆対数計算で利用ができることから、無限に細かい計算ができるのがメリット。

改めて、自然対数とは?

底がネイピア数である対数

→上記よりネイピア数は自然対数の底ともいう。

自然対数の式は以下の通り。

まずネイピア数は以下式で表わされる、、、

           1
lim  (1 + ---)^n = e
n→∞        n

この式の1/nの1を任意の数rにした場合、以下の式が成り立つ

           r
lim  (1 + ---)^n = e^r
n→∞        n

この証明は参考URLを参照

上記式を対数表記にすると、、、、

左辺をxにする。

x = e^r
r = loge(x)

ネイピア数(無理数)を底とした対数
→これを自然対数と呼ぶ