• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

PyThaiNLP / pythainlp / 11625814262

01 Nov 2024 07:14AM UTC coverage: 20.782% (+20.8%) from 0.0%
11625814262

Pull #952

github

web-flow
Merge c8385dcae into 515fe7ced
Pull Request #952: Specify a limited test suite

45 of 80 new or added lines in 48 files covered. (56.25%)

1537 of 7396 relevant lines covered (20.78%)

0.21 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

0.0
/pythainlp/tokenize/nlpo3.py
1
# -*- coding: utf-8 -*-
2
# SPDX-FileCopyrightText: 2016-2024 PyThaiNLP Project
3
# SPDX-License-Identifier: Apache-2.0
4
from sys import stderr
×
5
from typing import List
×
6

7
from nlpo3 import load_dict as nlpo3_load_dict
×
NEW
8
from nlpo3 import segment as nlpo3_segment
×
9

10
from pythainlp.corpus import path_pythainlp_corpus
×
NEW
11
from pythainlp.corpus.common import _THAI_WORDS_FILENAME
×
12

13
_NLPO3_DEFAULT_DICT_NAME = "_67a47bf9"
×
14
_NLPO3_DEFAULT_DICT = nlpo3_load_dict(
×
15
    path_pythainlp_corpus(_THAI_WORDS_FILENAME), _NLPO3_DEFAULT_DICT_NAME
16
)
17

18

19
def load_dict(file_path: str, dict_name: str) -> bool:
×
20
    """Load a dictionary file into an in-memory dictionary collection.
21

22
    The loaded dictionary will be accessible through the assigned dict_name.
23
    *** This function does not override an existing dict name. ***
24

25
    :param file_path: Path to a dictionary file
26
    :type file_path: str
27
    :param dict_name: A unique dictionary name, used for reference.
28
    :type dict_name: str
29
    :return bool
30

31
    :See Also:
32
        * \
33
            https://github.com/PyThaiNLP/nlpo3
34
    """
35
    msg, success = nlpo3_load_dict(file_path=file_path, dict_name=dict_name)
×
36
    if bool is False:
×
37
        print(msg, file=stderr)
×
38
    return success
×
39

40

41
def segment(
×
42
    text: str,
43
    custom_dict: str = _NLPO3_DEFAULT_DICT_NAME,
44
    safe_mode: bool = False,
45
    parallel_mode: bool = False,
46
) -> List[str]:
47
    """Break text into tokens.
48

49
    Python binding for nlpO3. It is newmm engine in Rust.
50

51
    :param str text: text to be tokenized
52
    :param str custom_dict: dictionary name, as assigned with load_dict(),\
53
        defaults to pythainlp/corpus/common/words_th.txt
54
    :param bool safe_mode: reduce chance for long processing time for long text\
55
        with many ambiguous breaking points, defaults to False
56
    :param bool parallel_mode: Use multithread mode, defaults to False
57

58
    :return: list of tokens
59
    :rtype: List[str]
60

61
    :See Also:
62
        * \
63
            https://github.com/PyThaiNLP/nlpo3
64
    """
65
    return nlpo3_segment(
×
66
        text=text,
67
        dict_name=custom_dict,
68
        safe=safe_mode,
69
        parallel=parallel_mode,
70
    )
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc