tokenize

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2025 License: BSD-2-Clause Imports: 2 Imported by: 0

README

A Go tokenizer for Dutch text.

This is a wrapper in Go for the tokenizer that is part of Alpino.

The file nobr/libtok_no_breaks.c is a copy from the Alpino source.

The file br/libtok1.c is derived from the Alpino source by this command:

perl -p -e 's/QDATUM|new_t_accepts|qentry|qinit|qinsert|qpeek|qremove|queue|replace_from_queue|resize_buf|t_accepts|transition_struct|trans|unknown_symbol/$&1/g' \
    libtok.c > br/libtok1.c

Install

go get github.com/rug-compling/tokenize

If you get an error like this:

# github.com/rug-compling/tokenize
libtok1.c:1:1: error: unknown type name ‘version’
 version https://git-lfs.github.com/spec/v1
 ^
libtok1.c:1:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘:’ token
 version https://git-lfs.github.com/spec/v1
              ^

... run this command: git lfs install

If you get an error like this:

git: 'lfs' is not a git command. See 'git --help'.

... install git-lfs from https://git-lfs.github.com/

For newer versions of Go, you may also need to set this:

export GOPRIVATE=github.com/rug-compling/tokenize

Then, try again.

Docs

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Dutch

func Dutch(text string, withLineBreaks bool) (string, error)

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL