[83879] trunk/dports/java

hum at macports.org hum at macports.org
Mon Sep 12 07:46:04 PDT 2011


Revision: 83879
          http://trac.macports.org/changeset/83879
Author:   hum at macports.org
Date:     2011-09-12 07:46:02 -0700 (Mon, 12 Sep 2011)
Log Message:
-----------
New port: lucene-gosen 1.1.1 - a Japanese morphological analyzer for Apache Lucene/Solr

Added Paths:
-----------
    trunk/dports/java/lucene-gosen/
    trunk/dports/java/lucene-gosen/Portfile
    trunk/dports/java/lucene-gosen/files/
    trunk/dports/java/lucene-gosen/files/mapping-japanese.txt
    trunk/dports/java/lucene-gosen/files/stoptags_ja.txt
    trunk/dports/java/lucene-gosen/files/stopwords_ja.txt

Added: trunk/dports/java/lucene-gosen/Portfile
===================================================================
--- trunk/dports/java/lucene-gosen/Portfile	                        (rev 0)
+++ trunk/dports/java/lucene-gosen/Portfile	2011-09-12 14:46:02 UTC (rev 83879)
@@ -0,0 +1,51 @@
+# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=tcl:et:sw=4:ts=4:sts=4
+# $Id$
+
+PortSystem          1.0
+
+name                lucene-gosen
+version             1.1.1
+categories          java textproc japanese
+platforms           darwin
+maintainers         hum openmaintainer
+license             LGPL-2.1
+
+homepage            http://code.google.com/p/${name}/
+description         a Japanese morphological analyzer for Apache Lucene/Solr
+long_description    ${name} is ${description}.
+
+master_sites        http://${name}.googlecode.com/files/
+distfiles           ${distname}-ipadic.jar
+checksums           sha1    2637bf3c4bdb4cb0efd095f4d423e5efb37e4543 \
+                    rmd160  3feacc9c2ac81e2c15f3b8d6a377a85dd6fb6d01
+
+variant naist description {Use naist chasen dictionary instead of ipadic} {
+    distfiles       ${distname}-naist-chasen.jar
+    checksums       sha1    bf5cde340da36e0a1fc506c60358eca7c23347fa \
+                    rmd160  b76436ceb14bd92cc0ad449734ba8633bb15c02c
+}
+
+extract {}
+
+use_configure       no
+supported_archs     noarch
+
+build {}
+
+destroot {
+    # install the jar file.
+    xinstall -d ${destroot}${prefix}/share/java/${name}/lib
+    copy ${distpath}/${distfiles} \
+                ${destroot}${prefix}/share/java/${name}/lib
+    # install the config files.
+    xinstall -d ${destroot}${prefix}/share/java/${name}/conf
+    xinstall -m 644 -W ${filespath} \
+        mapping-japanese.txt \
+        stoptags_ja.txt \
+        stopwords_ja.txt \
+        ${destroot}${prefix}/share/java/${name}/conf
+}
+
+livecheck.type      regex
+livecheck.url       http://code.google.com/p/${name}/downloads/list
+livecheck.regex     ${name}\-(\[0-9.\]+)


Property changes on: trunk/dports/java/lucene-gosen/Portfile
___________________________________________________________________
Added: svn:keywords
   + Id
Added: svn:eol-style
   + native

Added: trunk/dports/java/lucene-gosen/files/mapping-japanese.txt
===================================================================
--- trunk/dports/java/lucene-gosen/files/mapping-japanese.txt	                        (rev 0)
+++ trunk/dports/java/lucene-gosen/files/mapping-japanese.txt	2011-09-12 14:46:02 UTC (rev 83879)
@@ -0,0 +1,185 @@
+#
+# Fullwidth-Halfwidth Latin mappings
+# Halfwidth-FullWidth Kana mappings
+#
+"\uff66\uff9e" => "\u30fa"
+"\uff73\uff9e" => "\u30f4"
+"\uff76\uff9e" => "\u30ac"
+"\uff77\uff9e" => "\u30ae"
+"\uff78\uff9e" => "\u30b0"
+"\uff79\uff9e" => "\u30b2"
+"\uff7a\uff9e" => "\u30b4"
+"\uff7b\uff9e" => "\u30b6"
+"\uff7c\uff9e" => "\u30b8"
+"\uff7d\uff9e" => "\u30ba"
+"\uff7e\uff9e" => "\u30bc"
+"\uff7f\uff9e" => "\u30be"
+"\uff80\uff9e" => "\u30c0"
+"\uff81\uff9e" => "\u30c2"
+"\uff82\uff9e" => "\u30c5"
+"\uff83\uff9e" => "\u30c7"
+"\uff84\uff9e" => "\u30c9"
+"\uff8a\uff9e" => "\u30d0"
+"\uff8a\uff9f" => "\u30d1"
+"\uff8b\uff9e" => "\u30d3"
+"\uff8b\uff9f" => "\u30d4"
+"\uff8c\uff9e" => "\u30d6"
+"\uff8c\uff9f" => "\u30d7"
+"\uff8d\uff9e" => "\u30d9"
+"\uff8d\uff9f" => "\u30da"
+"\uff8e\uff9e" => "\u30dc"
+"\uff8e\uff9f" => "\u30dd"
+"\uff9c\uff9e" => "\u30f7"
+"\uff01" => "\u0021"
+"\uff02" => "\u0022"
+"\uff03" => "\u0023"
+"\uff04" => "\u0024"
+"\uff05" => "\u0025"
+"\uff06" => "\u0026"
+"\uff07" => "\u0027"
+"\uff08" => "\u0028"
+"\uff09" => "\u0029"
+"\uff0a" => "\u002a"
+"\uff0b" => "\u002b"
+"\uff0c" => "\u002c"
+"\uff0d" => "\u002d"
+"\uff0e" => "\u002e"
+"\uff0f" => "\u002f"
+"\uff10" => "\u0030"
+"\uff11" => "\u0031"
+"\uff12" => "\u0032"
+"\uff13" => "\u0033"
+"\uff14" => "\u0034"
+"\uff15" => "\u0035"
+"\uff16" => "\u0036"
+"\uff17" => "\u0037"
+"\uff18" => "\u0038"
+"\uff19" => "\u0039"
+"\uff1a" => "\u003a"
+"\uff1b" => "\u003b"
+"\uff1c" => "\u003c"
+"\uff1d" => "\u003d"
+"\uff1e" => "\u003e"
+"\uff1f" => "\u003f"
+"\uff20" => "\u0040"
+"\uff21" => "\u0041"
+"\uff22" => "\u0042"
+"\uff23" => "\u0043"
+"\uff24" => "\u0044"
+"\uff25" => "\u0045"
+"\uff26" => "\u0046"
+"\uff27" => "\u0047"
+"\uff28" => "\u0048"
+"\uff29" => "\u0049"
+"\uff2a" => "\u004a"
+"\uff2b" => "\u004b"
+"\uff2c" => "\u004c"
+"\uff2d" => "\u004d"
+"\uff2e" => "\u004e"
+"\uff2f" => "\u004f"
+"\uff30" => "\u0050"
+"\uff31" => "\u0051"
+"\uff32" => "\u0052"
+"\uff33" => "\u0053"
+"\uff34" => "\u0054"
+"\uff35" => "\u0055"
+"\uff36" => "\u0056"
+"\uff37" => "\u0057"
+"\uff38" => "\u0058"
+"\uff39" => "\u0059"
+"\uff3a" => "\u005a"
+"\uff3b" => "\u005b"
+"\uff3c" => "\u005c"
+"\uff3d" => "\u005d"
+"\uff3e" => "\u005e"
+"\uff3f" => "\u005f"
+"\uff40" => "\u0060"
+"\uff41" => "\u0061"
+"\uff42" => "\u0062"
+"\uff43" => "\u0063"
+"\uff44" => "\u0064"
+"\uff45" => "\u0065"
+"\uff46" => "\u0066"
+"\uff47" => "\u0067"
+"\uff48" => "\u0068"
+"\uff49" => "\u0069"
+"\uff4a" => "\u006a"
+"\uff4b" => "\u006b"
+"\uff4c" => "\u006c"
+"\uff4d" => "\u006d"
+"\uff4e" => "\u006e"
+"\uff4f" => "\u006f"
+"\uff50" => "\u0070"
+"\uff51" => "\u0071"
+"\uff52" => "\u0072"
+"\uff53" => "\u0073"
+"\uff54" => "\u0074"
+"\uff55" => "\u0075"
+"\uff56" => "\u0076"
+"\uff57" => "\u0077"
+"\uff58" => "\u0078"
+"\uff59" => "\u0079"
+"\uff5a" => "\u007a"
+"\uff5b" => "\u007b"
+"\uff5c" => "\u007c"
+"\uff5d" => "\u007d"
+"\uff5e" => "\u007e"
+"\uff65" => "\u30fb"
+"\uff66" => "\u30f2"
+"\uff67" => "\u30a1"
+"\uff68" => "\u30a3"
+"\uff69" => "\u30a5"
+"\uff6a" => "\u30a7"
+"\uff6b" => "\u30a9"
+"\uff6c" => "\u30e3"
+"\uff6d" => "\u30e5"
+"\uff6e" => "\u30e7"
+"\uff6f" => "\u30c3"
+"\uff70" => "\u30fc"
+"\uff71" => "\u30a2"
+"\uff72" => "\u30a4"
+"\uff73" => "\u30a6"
+"\uff74" => "\u30a8"
+"\uff75" => "\u30aa"
+"\uff76" => "\u30ab"
+"\uff77" => "\u30ad"
+"\uff78" => "\u30af"
+"\uff79" => "\u30b1"
+"\uff7a" => "\u30b3"
+"\uff7b" => "\u30b5"
+"\uff7c" => "\u30b7"
+"\uff7d" => "\u30b9"
+"\uff7e" => "\u30bb"
+"\uff7f" => "\u30bd"
+"\uff80" => "\u30bf"
+"\uff81" => "\u30c1"
+"\uff82" => "\u30c4"
+"\uff83" => "\u30c6"
+"\uff84" => "\u30c8"
+"\uff85" => "\u30ca"
+"\uff86" => "\u30cb"
+"\uff87" => "\u30cc"
+"\uff88" => "\u30cd"
+"\uff89" => "\u30ce"
+"\uff8a" => "\u30cf"
+"\uff8b" => "\u30d2"
+"\uff8c" => "\u30d5"
+"\uff8d" => "\u30d8"
+"\uff8e" => "\u30db"
+"\uff8f" => "\u30de"
+"\uff90" => "\u30df"
+"\uff91" => "\u30e0"
+"\uff92" => "\u30e1"
+"\uff93" => "\u30e2"
+"\uff94" => "\u30e4"
+"\uff95" => "\u30e6"
+"\uff96" => "\u30e8"
+"\uff97" => "\u30e9"
+"\uff98" => "\u30ea"
+"\uff99" => "\u30eb"
+"\uff9a" => "\u30ec"
+"\uff9b" => "\u30ed"
+"\uff9c" => "\u30ef"
+"\uff9d" => "\u30f3"
+"\uff9e" => "\u3099"
+"\uff9f" => "\u309a"

Added: trunk/dports/java/lucene-gosen/files/stoptags_ja.txt
===================================================================
--- trunk/dports/java/lucene-gosen/files/stoptags_ja.txt	                        (rev 0)
+++ trunk/dports/java/lucene-gosen/files/stoptags_ja.txt	2011-09-12 14:46:02 UTC (rev 83879)
@@ -0,0 +1,410 @@
+# set of default stop tags:
+# uncomment a part of speech to treat those words as stopwords.
+# the entire tagset is provided here for convenience.
+#
+#####
+#  noun: unclassified nouns
+#名詞
+#
+#  noun-common: Common nouns or nouns where the sub-classification is undefined
+#名詞-一般
+#
+#  noun-proper: Proper nouns where the sub-classification is undefined 
+#名詞-固有名詞
+#
+#  noun-proper-misc: miscellaneous proper nouns
+#名詞-固有名詞-一般
+#
+#  noun-proper-person: Personal names where the sub-classification is undefined
+#名詞-固有名詞-人名
+#
+#  noun-proper-person-misc: names that cannot be divided into surname and 
+#  given name; foreign names; names where the surname or given name is unknown.
+#  e.g. お市の方
+#名詞-固有名詞-人名-一般
+#
+#  noun-proper-person-surname: Mainly Japanese surnames.
+#  e.g. 山田
+#名詞-固有名詞-人名-姓
+#
+#  noun-proper-person-given_name: Mainly Japanese given names.
+#  e.g. 太郎
+#名詞-固有名詞-人名-名
+#
+#  noun-proper-organization: Names representing organizations.
+#  e.g. 通産省, NHK
+#名詞-固有名詞-組織
+#
+#  noun-proper-place: Place names where the sub-classification is undefined
+#名詞-固有名詞-地域
+#
+#  noun-proper-place-misc: Place names excluding countries.
+#  e.g. アジア, バルセロナ, 京都
+#名詞-固有名詞-地域-一般
+#
+#  noun-proper-place-country: Country names. 
+#  e.g. 日本, オーストラリア
+#名詞-固有名詞-地域-国
+#
+#  noun-pronoun: Pronouns where the sub-classification is undefined
+#名詞-代名詞
+#
+#  noun-pronoun-misc: miscellaneous pronouns: 
+#  e.g. それ, ここ, あいつ, あなた, あちこち, いくつ, どこか, なに, みなさん, みんな, わたくし, われわれ
+#名詞-代名詞-一般
+#
+#  noun-pronoun-contraction: Spoken language contraction made by combining a 
+#  pronoun and the particle 'wa'.
+#  e.g. ありゃ, こりゃ, こりゃあ, そりゃ, そりゃあ 
+#名詞-代名詞-縮約
+#
+#  noun-adverbial: Temporal nouns such as names of days or months that behave 
+#  like adverbs. Nouns that represent amount or ratios and can be used adverbially,
+#  e.g. 金曜, 一月, 午後, 少量
+#名詞-副詞可能
+#
+#  noun-verbal: Nouns that take arguments with case and can appear followed by 
+#  'suru' and related verbs (する, できる, なさる, くださる)
+#  e.g. インプット, 愛着, 悪化, 悪戦苦闘, 一安心, 下取り
+#名詞-サ変接続
+#
+#  noun-adjective-base: The base form of adjectives, words that appear before な ("na")
+#  e.g. 健康, 安易, 駄目, だめ
+#名詞-形容動詞語幹
+#
+#  noun-numeric: Arabic numbers, Chinese numerals, and counters like 何 (回), 数.
+#  e.g. 0, 1, 2, 何, 数, 幾
+#名詞-数
+#
+#  noun-affix: noun affixes where the sub-classification is undefined
+#名詞-非自立
+#
+#  noun-affix-misc: Of adnominalizers, the case-marker の ("no"), and words that 
+#  attach to the base form of inflectional words, words that cannot be classified 
+#  into any of the other categories below. This category includes indefinite nouns.
+#  e.g. あかつき, 暁, かい, 甲斐, 気, きらい, 嫌い, くせ, 癖, こと, 事, ごと, 毎, しだい, 次第, 
+#       順, せい, 所為, ついで, 序で, つもり, 積もり, 点, どころ, の, はず, 筈, はずみ, 弾み, 
+#       拍子, ふう, ふり, 振り, ほう, 方, 旨, もの, 物, 者, ゆえ, 故, ゆえん, 所以, わけ, 訳,
+#       わり, 割り, 割, ん-口語/, もん-口語/
+#名詞-非自立-一般
+#
+#  noun-affix-adverbial: noun affixes that that can behave as adverbs.
+#  e.g. あいだ, 間, あげく, 挙げ句, あと, 後, 余り, 以外, 以降, 以後, 以上, 以前, 一方, うえ, 
+#       上, うち, 内, おり, 折り, かぎり, 限り, きり, っきり, 結果, ころ, 頃, さい, 際, 最中, さなか, 
+#       最中, じたい, 自体, たび, 度, ため, 為, つど, 都度, とおり, 通り, とき, 時, ところ, 所, 
+#       とたん, 途端, なか, 中, のち, 後, ばあい, 場合, 日, ぶん, 分, ほか, 他, まえ, 前, まま, 
+#       儘, 侭, みぎり, 矢先
+#名詞-非自立-副詞可能
+#
+#  noun-affix-aux: noun affixes treated as 助動詞 ("auxiliary verb") in school grammars 
+#  with the stem よう(だ) ("you(da)").
+#  e.g.  よう, やう, 様 (よう)
+#名詞-非自立-助動詞語幹
+#  
+#  noun-affix-adjective-base: noun affixes that can connect to the indeclinable
+#  connection form な (aux "da").
+#  e.g. みたい, ふう
+#名詞-非自立-形容動詞語幹
+#
+#  noun-special: special nouns where the sub-classification is undefined.
+#名詞-特殊
+#
+#  noun-special-aux: The そうだ ("souda") stem form that is used for reporting news, is 
+#  treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the base 
+#  form of inflectional words.
+#  e.g. そう
+#名詞-特殊-助動詞語幹
+#
+#  noun-suffix: noun suffixes where the sub-classification is undefined.
+#名詞-接尾
+#
+#  noun-suffix-misc: Of the nouns or stem forms of other parts of speech that connect 
+#  to ガル or タイ and can combine into compound nouns, words that cannot be classified into
+#  any of the other categories below. In general, this category is more inclusive than 
+#  接尾語 ("suffix") and is usually the last element in a compound noun.
+#  e.g. おき, かた, 方, 甲斐 (がい), がかり, ぎみ, 気味, ぐるみ, (~した) さ, 次第, 済 (ず) み,
+#       よう, (でき)っこ, 感, 観, 性, 学, 類, 面, 用
+#名詞-接尾-一般
+#
+#  noun-suffix-person: Suffixes that form nouns and attach to person names more often
+#  than other nouns.
+#  e.g. 君, 様, 著
+#名詞-接尾-人名
+#
+#  noun-suffix-place: Suffixes that form nouns and attach to place names more often 
+#  than other nouns.
+#  e.g. 町, 市, 県
+#名詞-接尾-地域
+#
+#  noun-suffix-verbal: Of the suffixes that attach to nouns and form nouns, those that 
+#  can appear before スル ("suru").
+#  e.g. 化, 視, 分け, 入り, 落ち, 買い
+#名詞-接尾-サ変接続
+#
+#  noun-suffix-aux: The stem form of そうだ (様態) that is used to indicate conditions, 
+#  is treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the 
+#  conjunctive form of inflectional words.
+#  e.g. そう
+#名詞-接尾-助動詞語幹
+#
+#  noun-suffix-adjective-base: Suffixes that attach to other nouns or the conjunctive 
+#  form of inflectional words and appear before the copula だ ("da").
+#  e.g. 的, げ, がち
+#名詞-接尾-形容動詞語幹
+#
+#  noun-suffix-adverbial: Suffixes that attach to other nouns and can behave as adverbs.
+#  e.g. 後 (ご), 以後, 以降, 以前, 前後, 中, 末, 上, 時 (じ)
+#名詞-接尾-副詞可能
+#
+#  noun-suffix-classifier: Suffixes that attach to numbers and form nouns. This category 
+#  is more inclusive than 助数詞 ("classifier") and includes common nouns that attach 
+#  to numbers.
+#  e.g. 個, つ, 本, 冊, パーセント, cm, kg, カ月, か国, 区画, 時間, 時半
+#名詞-接尾-助数詞
+#
+#  noun-suffix-special: Special suffixes that mainly attach to inflecting words.
+#  e.g. (楽し) さ, (考え) 方
+#名詞-接尾-特殊
+#
+#  noun-suffix-conjunctive: Nouns that behave like conjunctions and join two words 
+#  together.
+#  e.g. (日本) 対 (アメリカ), 対 (アメリカ), (3) 対 (5), (女優) 兼 (主婦)
+#名詞-接続詞的
+#
+#  noun-verbal_aux: Nouns that attach to the conjunctive particle て ("te") and are 
+#  semantically verb-like.
+#  e.g. ごらん, ご覧, 御覧, 頂戴
+#名詞-動詞非自立的
+#
+#  noun-quotation: text that cannot be segmented into words, proverbs, Chinese poetry, 
+#  dialects, English, etc. Currently, the only entry for 名詞 引用文字列 ("noun quotation") 
+#  is いわく ("iwaku").
+#名詞-引用文字列
+#
+#  noun-nai_adjective: Words that appear before the auxiliary verb ない ("nai") and
+#  behave like an adjective.
+#  e.g. 申し訳, 仕方, とんでも, 違い
+#名詞-ナイ形容詞語幹
+#
+#####
+#  prefix: unclassified prefixes
+接頭詞
+#
+#  prefix-nominal: Prefixes that attach to nouns (including adjective stem forms) 
+#  excluding numerical expressions.
+#  e.g. お (水), 某 (氏), 同 (社), 故 (~氏), 高 (品質), お (見事), ご (立派)
+接頭詞-名詞接続
+#
+#  prefix-verbal: Prefixes that attach to the imperative form of a verb or a verb
+#  in conjunctive form followed by なる/なさる/くださる.
+#  e.g. お (読みなさい), お (座り)
+接頭詞-動詞接続
+#
+#  prefix-adjectival: Prefixes that attach to adjectives.
+#  e.g. お (寒いですねえ), バカ (でかい)
+接頭詞-形容詞接続
+#
+#  prefix-numerical: Prefixes that attach to numerical expressions.
+#  e.g. 約, およそ, 毎時
+接頭詞-数接続
+#
+#####
+#  verb: unclassified verbs
+#動詞
+#
+#  verb-main:
+#動詞-自立
+#
+#  verb-auxiliary:
+動詞-非自立
+#
+#  verb-suffix:
+#動詞-接尾
+#
+#####
+#  adjective: unclassified adjectives
+#形容詞
+#
+#  adjective-main:
+#形容詞-自立
+#
+#  adjective-auxiliary:
+#形容詞-非自立
+#
+#  adjective-suffix:
+#形容詞-接尾
+#
+#####
+#  adverb: unclassified adverbs
+#副詞
+#
+#  adverb-misc: Words that can be segmented into one unit and where adnominal 
+#  modification is not possible.
+#  e.g. あいかわらず, 多分
+#副詞-一般
+#
+#  adverb-particle_conjunction: Adverbs that can be followed by の, は, に, 
+#  な, する, だ, etc.
+#  e.g. こんなに, そんなに, あんなに, なにか, なんでも
+#副詞-助詞類接続
+#
+#####
+#  adnominal: Words that only have noun-modifying forms.
+#  e.g. この, その, あの, どの, いわゆる, なんらかの, 何らかの, いろんな, こういう, そういう, ああいう, 
+#       どういう, こんな, そんな, あんな, どんな, 大きな, 小さな, おかしな, ほんの, たいした, 
+#       「(, も) さる (ことながら)」, 微々たる, 堂々たる, 単なる, いかなる, 我が」「同じ, 亡き
+#連体詞
+#
+#####
+#  conjunction: Conjunctions that can occur independently.
+#  e.g. が, けれども, そして, じゃあ, それどころか
+接続詞
+#
+#####
+#  particle: unclassified particles.
+助詞
+#
+#  particle-case: case particles where the subclassification is undefined.
+助詞-格助詞
+#
+#  particle-case-misc: Case particles.
+#  e.g. から, が, で, と, に, へ, より, を, の, にて
+助詞-格助詞-一般
+#
+#  particle-case-quote: the "to" that appears after nouns, a person’s speech, 
+#  quotation marks, expressions of decisions from a meeting, reasons, judgements,
+#  conjectures, etc.
+#  e.g. ( だ) と (述べた.), ( である) と (して執行猶予...)
+助詞-格助詞-引用
+#
+#  particle-case-compound: Compounds of particles and verbs that mainly behave 
+#  like case particles.
+#  e.g. という, といった, とかいう, として, とともに, と共に, でもって, にあたって, に当たって, に当って,
+#       にあたり, に当たり, に当り, に当たる, にあたる, において, に於いて,に於て, における, に於ける, 
+#       にかけ, にかけて, にかんし, に関し, にかんして, に関して, にかんする, に関する, に際し, 
+#       に際して, にしたがい, に従い, に従う, にしたがって, に従って, にたいし, に対し, にたいして, 
+#       に対して, にたいする, に対する, について, につき, につけ, につけて, につれ, につれて, にとって,
+#       にとり, にまつわる, によって, に依って, に因って, により, に依り, に因り, による, に依る, に因る, 
+#       にわたって, にわたる, をもって, を以って, を通じ, を通じて, を通して, をめぐって, をめぐり, をめぐる,
+#       って-口語/, ちゅう-関西弁「という」/, (何) ていう (人)-口語/, っていう-口語/, といふ, とかいふ
+助詞-格助詞-連語
+#
+#  particle-conjunctive:
+#  e.g. から, からには, が, けれど, けれども, けど, し, つつ, て, で, と, ところが, どころか, とも, ども, 
+#       ながら, なり, ので, のに, ば, ものの, や ( した), やいなや, (ころん) じゃ(いけない)-口語/, 
+#       (行っ) ちゃ(いけない)-口語/, (言っ) たって (しかたがない)-口語/, (それがなく)ったって (平気)-口語/
+助詞-接続助詞
+#
+#  particle-dependency:
+#  e.g. こそ, さえ, しか, すら, は, も, ぞ
+助詞-係助詞
+#
+#  particle-adverbial:
+#  e.g. がてら, かも, くらい, 位, ぐらい, しも, (学校) じゃ(これが流行っている)-口語/, 
+#       (それ)じゃあ (よくない)-口語/, ずつ, (私) なぞ, など, (私) なり (に), (先生) なんか (大嫌い)-口語/,
+#       (私) なんぞ, (先生) なんて (大嫌い)-口語/, のみ, だけ, (私) だって-口語/, だに, 
+#       (彼)ったら-口語/, (お茶) でも (いかが), 等 (とう), (今後) とも, ばかり, ばっか-口語/, ばっかり-口語/,
+#       ほど, 程, まで, 迄, (誰) も (が)([助詞-格助詞] および [助詞-係助詞] の前に位置する「も」)
+助詞-副助詞
+#
+#  particle-interjective: particles with interjective grammatical roles.
+#  e.g. (松島) や
+助詞-間投助詞
+#
+#  particle-coordinate:
+#  e.g. と, たり, だの, だり, とか, なり, や, やら
+助詞-並立助詞
+#
+#  particle-final:
+#  e.g. かい, かしら, さ, ぜ, (だ)っけ-口語/, (とまってる) で-方言/, な, ナ, なあ-口語/, ぞ, ね, ネ, 
+#       ねぇ-口語/, ねえ-口語/, ねん-方言/, の, のう-口語/, や, よ, ヨ, よぉ-口語/, わ, わい-口語/
+助詞-終助詞
+#
+#  particle-adverbial/conjunctive/final: The particle "ka" when unknown whether it is 
+#  adverbial, conjunctive, or sentence final. For example:
+#       (a) 「A か B か」. Ex:「(国内で運用する) か,(海外で運用する) か (.)」
+#       (b) Inside an adverb phrase. Ex:「(幸いという) か (, 死者はいなかった.)」
+#           「(祈りが届いたせい) か (, 試験に合格した.)」
+#       (c) 「かのように」. Ex:「(何もなかった) か (のように振る舞った.)」
+#  e.g. か
+助詞-副助詞/並立助詞/終助詞
+#
+#  particle-adnominalizer: The "no" that attaches to nouns and modifies 
+#  non-inflectional words.
+助詞-連体化
+#
+#  particle-adnominalizer: The "ni" and "to" that appear following nouns and adverbs 
+#  that are giongo, giseigo, or gitaigo.
+#  e.g. に, と
+助詞-副詞化
+#
+#  particle-special: A particle that does not fit into one of the above classifications. 
+#  This includes particles that are used in Tanka, Haiku, and other poetry.
+#  e.g. かな, けむ, ( しただろう) に, (あんた) にゃ(わからん), (俺) ん (家)
+助詞-特殊
+#
+#####
+#  auxiliary-verb:
+助動詞
+#
+#####
+#  interjection: Greetings and other exclamations.
+#  e.g. おはよう, おはようございます, こんにちは, こんばんは, ありがとう, どうもありがとう, ありがとうございます, 
+#       いただきます, ごちそうさま, さよなら, さようなら, はい, いいえ, ごめん, ごめんなさい
+感動詞
+#
+#####
+#  symbol: unclassified Symbols.
+#記号
+#
+#  symbol-misc: A general symbol not in one of the categories below.
+#  e.g. [○◎@$〒→+]
+記号-一般
+#
+#  symbol-comma: Commas
+#  e.g. [,、]
+記号-読点
+#
+#  symbol-period: Periods and full stops.
+#  e.g. [..。]
+記号-句点
+#
+#  symbol-space: Full-width whitespace.
+記号-空白
+#
+#  symbol-open_bracket:
+#  e.g. [({‘“『【]
+記号-括弧開
+#
+#  symbol-close_bracket:
+#  e.g. [)}’”』」】]
+記号-括弧閉
+#
+#  symbol-alphabetic:
+#記号-アルファベット
+#
+#####
+#  other: unclassified other
+#その他
+#
+#  other-interjection: Words that are hard to classify as noun-suffixes or 
+#  sentence-final particles.
+#  e.g. (だ)ァ
+その他-間投
+#
+#####
+#  filler: Aizuchi that occurs during a conversation or sounds inserted as filler.
+#  e.g. あの, うんと, えと
+フィラー
+#
+#####
+#  non-verbal: non-verbal sound.
+非言語音
+#
+#####
+#  fragment:
+#語断片
+#
+#####
+#  unknown: unknown part of speech.
+#未知語

Added: trunk/dports/java/lucene-gosen/files/stopwords_ja.txt
===================================================================
--- trunk/dports/java/lucene-gosen/files/stopwords_ja.txt	                        (rev 0)
+++ trunk/dports/java/lucene-gosen/files/stopwords_ja.txt	2011-09-12 14:46:02 UTC (rev 83879)
@@ -0,0 +1,13 @@
+# short set of japanese stopwords
+いう
+する
+人物
+さま
+すること
+ため
+もの
+おいて
+なる
+できる
+おく
+ある
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macports-changes/attachments/20110912/8d8b25a8/attachment-0001.html>


More information about the macports-changes mailing list