5337431273

pending completion

Build # 5337431273

Build Type

push

github

Committed by

wannaphong

Commit Message

Add กาลพฤกษ์ to list words

Run Details

3573 of 6329 relevant lines covered (56.45%)

0.56 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

35.0

/pythainlp/parse/core.py

# -*- coding: utf-8 -*-
# Copyright (C) 2016-2023 PyThaiNLP Project
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Union


_tagger = None
_tagger_name = ""


def dependency_parsing(
    text: str, model: str = None, tag: str = "str", engine: str = "esupar"
) -> Union[List[List[str]], str]:
    """
    Dependency Parsing

    :param str text: text to do dependency parsing
    :param str model: model for using with engine \
        (for esupar and transformers_ud)
    :param str tag: output type (str or list)
    :param str engine: the name dependency parser
    :return: str (conllu) or List
    :rtype: Union[List[List[str]], str]

    **Options for engine**
        * *esupar* (default) - Tokenizer POS-tagger and Dependency-parser \
            with BERT/RoBERTa/DeBERTa model. `GitHub \
                <https://github.com/KoichiYasuoka/esupar>`_
        * *spacy_thai* - Tokenizer, POS-tagger, and dependency-parser \
            for Thai language, working on Universal Dependencies. \
            `GitHub <https://github.com/KoichiYasuoka/spacy-thai>`_
        * *transformers_ud* - TransformersUD \
            `GitHub <https://github.com/KoichiYasuoka/>`_
        * *ud_goeswith* - POS-tagging and dependency-parsing with \
            using `goeswith` for subwords

    **Options for model (esupar engine)**
        * *th* (default) - KoichiYasuoka/roberta-base-thai-spm-upos model \
            `Huggingface \
            <https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-upos>`_
        * *KoichiYasuoka/deberta-base-thai-upos* - DeBERTa(V2) model \
            pre-trained on Thai Wikipedia texts for POS-tagging and \
            dependency-parsing `Huggingface \
            <https://huggingface.co/KoichiYasuoka/deberta-base-thai-upos>`_
        * *KoichiYasuoka/roberta-base-thai-syllable-upos* - RoBERTa model \
            pre-trained on Thai Wikipedia texts for POS-tagging and \
            dependency-parsing. (syllable level) `Huggingface \
            <https://huggingface.co/KoichiYasuoka/roberta-base-thai-syllable-upos>`_
        * *KoichiYasuoka/roberta-base-thai-char-upos* - RoBERTa model \
            pre-trained on Thai Wikipedia texts for POS-tagging \
            and dependency-parsing. (char level) `Huggingface \
            <https://huggingface.co/KoichiYasuoka/roberta-base-thai-char-upos>`_

    If you want to train model for esupar, you can read \
    `Huggingface <https://github.com/KoichiYasuoka/esupar>`_

    **Options for model (transformers_ud engine)**
        * *KoichiYasuoka/deberta-base-thai-ud-head* (default) - \
            DeBERTa(V2) model pretrained on Thai Wikipedia texts \
            for dependency-parsing (head-detection on Universal \
            Dependencies) as question-answering, derived from \
            deberta-base-thai. \
            trained by th_blackboard.conll. `Huggingface \
            <https://huggingface.co/KoichiYasuoka/deberta-base-thai-ud-head>`_
        * *KoichiYasuoka/roberta-base-thai-spm-ud-head* - \
            roberta model pretrained on Thai Wikipedia texts \
            for dependency-parsing. `Huggingface \
            <https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-ud-head>`_

    **Options for model (ud_goeswith engine)**
        * *KoichiYasuoka/deberta-base-thai-ud-goeswith* (default) - \
            This is a DeBERTa(V2) model pre-trained on Thai Wikipedia \
            texts for POS-tagging and dependency-parsing (using goeswith for subwords) \
            `Huggingface <https://huggingface.co/KoichiYasuoka/deberta-base-thai-ud-goeswith>`_

    :Example:
    ::

        from pythainlp.parse import dependency_parsing

        print(dependency_parsing("ผมเป็นคนดี", engine="esupar"))
        # output:
        # 1       ผม      _       PRON    _       _       3       nsubj   _       SpaceAfter=No
        # 2       เป็น     _       VERB    _       _       3       cop     _       SpaceAfter=No
        # 3       คน      _       NOUN    _       _       0       root    _       SpaceAfter=No
        # 4       ดี       _       VERB    _       _       3       acl     _       SpaceAfter=No

        print(dependency_parsing("ผมเป็นคนดี", engine="spacy_thai"))
        # output:
        # 1       ผม              PRON    PPRS    _       2       nsubj   _       SpaceAfter=No
        # 2       เป็น             VERB    VSTA    _       0       ROOT    _       SpaceAfter=No
        # 3       คนดี             NOUN    NCMN    _       2       obj     _       SpaceAfter=No
    """
    global _tagger, _tagger_name
    if _tagger_name != engine:
        if engine == "esupar":
            from pythainlp.parse.esupar_engine import Parse

            _tagger = Parse(model=model)
        elif engine == "transformers_ud":
            from pythainlp.parse.transformers_ud import Parse

            _tagger = Parse(model=model)
        elif engine == "spacy_thai":
            from pythainlp.parse.spacy_thai_engine import Parse

            _tagger = Parse()
        elif engine == "ud_goeswith":
            from pythainlp.parse.ud_goeswith import Parse

            _tagger = Parse(model=model)
        else:
            raise NotImplementedError("The engine doesn't support.")
    _tagger_name = engine
    return _tagger(text, tag=tag)

1	# -- coding: utf-8 --
2	# Copyright (C) 2016-2023 PyThaiNLP Project
3	#
4	# Licensed under the Apache License, Version 2.0 (the "License");
5	# you may not use this file except in compliance with the License.
6	# You may obtain a copy of the License at
7	#
8	# http://www.apache.org/licenses/LICENSE-2.0
9	#
10	# Unless required by applicable law or agreed to in writing, software
11	# distributed under the License is distributed on an "AS IS" BASIS,
12	# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13	# See the License for the specific language governing permissions and
14	# limitations under the License.
15	from typing import List, Union	1✔
16
17
18	_tagger = None	1✔
19	_tagger_name = ""	1✔
20
21
22	def dependency_parsing(	1✔
23	text: str, model: str = None, tag: str = "str", engine: str = "esupar"
24	) -> Union[List[List[str]], str]:
25	"""
26	Dependency Parsing
27
28	:param str text: text to do dependency parsing
29	:param str model: model for using with engine \
30	(for esupar and transformers_ud)
31	:param str tag: output type (str or list)
32	:param str engine: the name dependency parser
33	:return: str (conllu) or List
34	:rtype: Union[List[List[str]], str]
35
36	Options for engine
37	* esupar (default) - Tokenizer POS-tagger and Dependency-parser \
38	with BERT/RoBERTa/DeBERTa model. `GitHub \
39	<https://github.com/KoichiYasuoka/esupar>`_
40	* spacy_thai - Tokenizer, POS-tagger, and dependency-parser \
41	for Thai language, working on Universal Dependencies. \
42	`GitHub <https://github.com/KoichiYasuoka/spacy-thai>`_
43	* transformers_ud - TransformersUD \
44	`GitHub <https://github.com/KoichiYasuoka/>`_
45	* ud_goeswith - POS-tagging and dependency-parsing with \
46	using `goeswith` for subwords
47
48	Options for model (esupar engine)
49	* th (default) - KoichiYasuoka/roberta-base-thai-spm-upos model \
50	`Huggingface \
51	<https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-upos>`_
52	* KoichiYasuoka/deberta-base-thai-upos - DeBERTa(V2) model \
53	pre-trained on Thai Wikipedia texts for POS-tagging and \
54	dependency-parsing `Huggingface \
55	<https://huggingface.co/KoichiYasuoka/deberta-base-thai-upos>`_
56	* KoichiYasuoka/roberta-base-thai-syllable-upos - RoBERTa model \
57	pre-trained on Thai Wikipedia texts for POS-tagging and \
58	dependency-parsing. (syllable level) `Huggingface \
59	<https://huggingface.co/KoichiYasuoka/roberta-base-thai-syllable-upos>`_
60	* KoichiYasuoka/roberta-base-thai-char-upos - RoBERTa model \
61	pre-trained on Thai Wikipedia texts for POS-tagging \
62	and dependency-parsing. (char level) `Huggingface \
63	<https://huggingface.co/KoichiYasuoka/roberta-base-thai-char-upos>`_
64
65	If you want to train model for esupar, you can read \
66	`Huggingface <https://github.com/KoichiYasuoka/esupar>`_
67
68	Options for model (transformers_ud engine)
69	* KoichiYasuoka/deberta-base-thai-ud-head (default) - \
70	DeBERTa(V2) model pretrained on Thai Wikipedia texts \
71	for dependency-parsing (head-detection on Universal \
72	Dependencies) as question-answering, derived from \
73	deberta-base-thai. \
74	trained by th_blackboard.conll. `Huggingface \
75	<https://huggingface.co/KoichiYasuoka/deberta-base-thai-ud-head>`_
76	* KoichiYasuoka/roberta-base-thai-spm-ud-head - \
77	roberta model pretrained on Thai Wikipedia texts \
78	for dependency-parsing. `Huggingface \
79	<https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-ud-head>`_
80
81	Options for model (ud_goeswith engine)
82	* KoichiYasuoka/deberta-base-thai-ud-goeswith (default) - \
83	This is a DeBERTa(V2) model pre-trained on Thai Wikipedia \
84	texts for POS-tagging and dependency-parsing (using goeswith for subwords) \
85	`Huggingface <https://huggingface.co/KoichiYasuoka/deberta-base-thai-ud-goeswith>`_
86
87	:Example:
88	::
89
90	from pythainlp.parse import dependency_parsing
91
92	print(dependency_parsing("ผมเป็นคนดี", engine="esupar"))
93	# output:
94	# 1 ผม _ PRON _ _ 3 nsubj _ SpaceAfter=No
95	# 2 เป็น _ VERB _ _ 3 cop _ SpaceAfter=No
96	# 3 คน _ NOUN _ _ 0 root _ SpaceAfter=No
97	# 4 ดี _ VERB _ _ 3 acl _ SpaceAfter=No
98
99	print(dependency_parsing("ผมเป็นคนดี", engine="spacy_thai"))
100	# output:
101	# 1 ผม PRON PPRS _ 2 nsubj _ SpaceAfter=No
102	# 2 เป็น VERB VSTA _ 0 ROOT _ SpaceAfter=No
103	# 3 คนดี NOUN NCMN _ 2 obj _ SpaceAfter=No
104	"""
105	global _tagger, _tagger_name
106	if _tagger_name != engine:	1✔
107	if engine == "esupar":	1✔
108	from pythainlp.parse.esupar_engine import Parse	1✔
109
110	_tagger = Parse(model=model)	×
111	elif engine == "transformers_ud":	×
112	from pythainlp.parse.transformers_ud import Parse	×
113
114	_tagger = Parse(model=model)	×
115	elif engine == "spacy_thai":	×
116	from pythainlp.parse.spacy_thai_engine import Parse	×
117
118	_tagger = Parse()	×
119	elif engine == "ud_goeswith":	×
120	from pythainlp.parse.ud_goeswith import Parse	×
121
122	_tagger = Parse(model=model)	×
123	else:
124	raise NotImplementedError("The engine doesn't support.")	×
125	_tagger_name = engine	×
126	return _tagger(text, tag=tag)	×

PyThaiNLP / pythainlp / 5337431273

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous