# analysis-dy-hanlp **Repository Path**: tmingze97/analysis-dy-hanlp ## Basic Information - **Project Name**: analysis-dy-hanlp - **Description**: No description available - **Primary Language**: Java - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-13 - **Last Updated**: 2024-12-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README STConvert Analysis for Elasticsearch ================================== STConvert is analyzer that convert Chinese characters between Traditional and Simplified. [中文简繁體转换][简体到繁体][繁体到简体][简繁查询Expand] You can download the pre-build package from [release page](https://github.com/medcl/elasticsearch-analysis-stconvert/releases) The plugin includes analyzer: `stconvert`, tokenizer: `stconvert`, token-filter: `stconvert`, and char-filter: `stconvert` Supported config: - `convert_type`: default `s2t` ,optional option: 1. `s2t` ,convert characters from Simple Chinese to Traditional Chinese 2. `t2s` ,convert characters from Traditional Chinese to Simple Chinese - `keep_both`:default `false` , - `delimiter`:default `,` Custom example: ``` PUT /stconvert/ { "settings" : { "analysis" : { "analyzer" : { "tsconvert" : { "tokenizer" : "tsconvert" } }, "tokenizer" : { "tsconvert" : { "type" : "stconvert", "delimiter" : "#", "keep_both" : false, "convert_type" : "t2s" } }, "filter": { "tsconvert" : { "type" : "stconvert", "delimiter" : "#", "keep_both" : false, "convert_type" : "t2s" } }, "char_filter" : { "tsconvert" : { "type" : "stconvert", "convert_type" : "t2s" } } } } } ``` Analyze tests ``` GET stconvert/_analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "char_filter" : ["tsconvert"], "text" : "国际國際" } Output: { "tokens": [ { "token": "国际国际", "start_offset": 0, "end_offset": 4, "type": "word", "position": 0 } ] } ``` Normalizer usage ``` DELETE index PUT index { "settings": { "analysis": { "char_filter": { "tsconvert": { "type": "stconvert", "convert_type": "t2s" } }, "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [ "tsconvert" ], "filter": [ "lowercase" ] } } } }, "mappings": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } PUT index/_doc/1 { "foo": "國際" } PUT index/_doc/2 { "foo": "国际" } GET index/_search { "query": { "term": { "foo": "国际" } } } GET index/_search { "query": { "term": { "foo": "國際" } } } ```