2017-09-17

regex_rez: Tcl bindings for RE2

regex_rez


可是,我目前只需要基本功能,所以只實作最簡單的 fullmatch, partialmatch, replace 和 globalrelpace。


維基百科:
RE2 (software)


第一個測試程式使用內建的正規表示式:
puts -nonewline "Please input a number: "
flush stdout
gets stdin number
if {$number <= 0 || $number >= 10} {
   puts "The range is 1 - 9."
   exit
}

set max [expr pow(10, $number)]
set result [list]

puts "\nNow get the result:"
puts "==========="
puts "Start"
puts "==========="
puts [time {
set re {1.*1|2.*2|3.*3|4.*4|5.*5|6.*6|7.*7|8.*8|9.*9|0.*0}
for {set i 1} {$i < $max} {incr i} {
    if {[regexp $re $i] != 1} {
        lappend result $i
    }
}
} 1]
puts "\n"
puts [join $result ", "]
puts "\n==========="
puts "End"
puts "==========="

第二個測試程式使用 RE2:

package require regex_rez

puts -nonewline "Please input a number: "
flush stdout
gets stdin number
if {$number <= 0 || $number >= 10} {
   puts "The range is 1 - 9."
   exit
}

set max [expr pow(10, $number)]
set result [list]

puts "\nNow get the result:"
puts "==========="
puts "Start"
puts "==========="
puts [time {
set re [regex::re2 create {1.*1|2.*2|3.*3|4.*4|5.*5|6.*6|7.*7|8.*8|9.*9|0.*0}]
for {set i 1} {$i < $max} {incr i} {
    if {[$re partialmatch $i] != 1} {
        lappend result $i
    }
}
} 1]
puts "\n"
puts [join $result ", "]
$re close
puts "\n==========="
puts "End"
puts "==========="

(* 更新測試程式二,加上 $re close,這樣才會正確 close)

內建從輸入數字一到數字九:
475 microseconds per iteration
1029 microseconds per iteration
6884 microseconds per iteration
36152 microseconds per iteration
315566 microseconds per iteration
3134428 microseconds per iteration
32075931 microseconds per iteration
325211857 microseconds per iteration
2147483648 microseconds per iteration

RE2 從輸入數字一到數字九:
325 microseconds per iteration
649 microseconds per iteration
7262 microseconds per iteration
48204 microseconds per iteration
347111 microseconds per iteration
2727082 microseconds per iteration
21171756 microseconds per iteration
173788033 microseconds per iteration
1628503061 microseconds per iteration


雖然沒有很仔細的比較,在這個測試中如果是百萬級數目或者是以前比對,內建的正規表示式表現都很不錯,但是如果數目到達百萬級或者是以後比對,Tcl 內建的正規表示式速度看起來明顯的比較慢。

很有趣的問題。

沒有留言: