Package 'image.textlinedetector' reference manual

Title:	Segment Images in Text Lines and Words
Description:	Find text lines in scanned images and segment the lines into words. Includes implementations of the paper 'Novel A* Path Planning Algorithm for Line Segmentation of Handwritten Documents' by Surinta O. et al (2014) <doi:10.1109/ICFHR.2014.37> available at <https://github.com/smeucci/LineSegm>, an implementation of 'A Statistical approach to line segmentation in handwritten documents' by Arivazhagan M. et al (2007) <doi:10.1117/12.704538>, and a wrapper for an image segmentation technique to detect words in text lines as described in the paper 'Scale Space Technique for Word Segmentation in Handwritten Documents' by Manmatha R. and Srimal N. (1999) paper at <doi:10.1007/3-540-48236-9_3>, wrapper for code available at <https://github.com/arthurflor23/text-segmentation>. Provides as well functionality to put cursive text in images upright using the approach defined in the paper 'A new normalization technique for cursive handwritten words' by Vinciarelli A. and Luettin J. (2001) <doi:10.1016/S0167-8655(01)00042-3>.
Authors:	Jan Wijffels [aut, cre, cph] (R wrapper), Vrije Universiteit Brussel - DIGI: Brussels Platform for Digital Humanities [cph] (R wrapper), Jeroen Ooms [ctb, cph] (More details in LICENSE.note file), Arthur Flôr [ctb, cph] (More details in LICENSE.note file), Saverio Meucci [ctb, cph] (More details in LICENSE.note file), Yeara Kozlov [ctb, cph] (More details in LICENSE.note file), Tino Weinkauf [ctb, cph] (More details in LICENSE.note file), Harald Scheidl [ctb, cph] (More details in LICENSE.note file)
Maintainer:	Jan Wijffels <[email protected]>
License:	MIT + file LICENSE
Version:	0.2.4
Built:	2025-03-07 10:08:44 UTC
Source:	https://github.com/digi-vub/image.textlinedetector

Text Line Segmentation based on the A* Path Planning Algorithm

Description

Text Line Segmentation based on the A* Path Planning Algorithm

Usage

image_textlines_astar(x, morph = FALSE, step = 2, mfactor = 5, trace = FALSE)
image_textlines_astar(x, morph = FALSE, step = 2, mfactor = 5, trace = FALSE)

Arguments

`x`	an object of class magick-image
`morph`	logical indicating to apply a morphological 5x5 filter
`step`	step size of A-star
`mfactor`	multiplication factor in the cost heuristic of the A-star algorithm
`trace`	logical indicating to show the evolution of the line detection

Value

a list with elements

n: the number of lines found
overview: an opencv-image of the detected areas
paths: a list of data.frame's with the x/y location of the baseline paths
textlines: a list of opencv-image's, one for each rectangular text line area
lines: a data.frame with the x/y positions of the detected lines

Examples


library(opencv)
library(magick)
library(image.textlinedetector)
path   <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img    <- image_read(path)
img    <- image_resize(img, "x1000")
areas  <- image_textlines_astar(img, morph = TRUE, step = 2, mfactor = 5, trace = TRUE)
areas  <- lines(areas, img)
areas$n
areas$overview
areas$lines
areas$textlines[[2]]
areas$textlines[[4]]
combined <- lapply(areas$textlines, FUN=function(x) image_read(ocv_bitmap(x)))
combined <- do.call(c, combined)
combined
image_append(combined, stack = TRUE)



plt <- image_draw(img)
lapply(areas$paths, FUN=function(line){
  lines(x = line$x, y = line$y, col = "red")  
})
dev.off()
plt

library(opencv)
library(magick)
library(image.textlinedetector)
path   <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img    <- image_read(path)
img    <- image_resize(img, "x1000")
areas  <- image_textlines_astar(img, morph = TRUE, step = 2, mfactor = 5, trace = TRUE)
areas  <- lines(areas, img)
areas$n
areas$overview
areas$lines
areas$textlines[[2]]
areas$textlines[[4]]
combined <- lapply(areas$textlines, FUN=function(x) image_read(ocv_bitmap(x)))
combined <- do.call(c, combined)
combined
image_append(combined, stack = TRUE)



plt <- image_draw(img)
lapply(areas$paths, FUN=function(line){
  lines(x = line$x, y = line$y, col = "red")  
})
dev.off()
plt

Crop an image to extract only the region containing text

Description

Applies a sequence of image operations to obtain a region which contains relevant texts by cropping white space on the borders of the image. This is done in the following steps: morphological opening, morphological closing, blurring, canny edge detection, convex hull contours of the edges, keep only contours above the mean contour area, find approximated contour lines of the convex hull contours of these, dilation and thresholding.

Usage

image_textlines_crop(x)
image_textlines_crop(x)

Arguments

`x`	an object of class magick-image

Value

an object of class magick-image

Examples


library(opencv)
library(magick)
library(image.textlinedetector)
path  <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img   <- image_read(path)
image_info(img)
img   <- image_textlines_crop(img)
image_info(img)

library(opencv)
library(magick)
library(image.textlinedetector)
path  <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img   <- image_read(path)
image_info(img)
img   <- image_textlines_crop(img)
image_info(img)

Text Line Segmentation based on valley finding in projection profiles

Description

Text Line Segmentation based on valley finding in projection profiles

Usage

image_textlines_flor(
  x,
  light = TRUE,
  type = c("none", "niblack", "sauvola", "wolf")
)
image_textlines_flor(
  x,
  light = TRUE,
  type = c("none", "niblack", "sauvola", "wolf")
)

Arguments

`x`	an object of class magick-image
`light`	logical indicating to remove light effects due to scanning
`type`	which type of binarisation to perform before doing line segmentation

Value

a list with elements

n: the number of lines found
overview: an opencv-image of the detected areas
textlines: a list of opencv-image's, one for each text line area

Examples


library(opencv)
library(magick)
library(image.textlinedetector)
path   <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img    <- image_read(path)
img    <- image_resize(img, "1000x")
areas  <- image_textlines_flor(img, light = TRUE, type = "sauvola")
areas  <- lines(areas, img)
areas$n
areas$overview
combined <- lapply(areas$textlines, FUN=function(x) image_read(ocv_bitmap(x)))
combined <- do.call(c, combined)
combined
image_append(combined, stack = TRUE)

library(opencv)
library(magick)
library(image.textlinedetector)
path   <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img    <- image_read(path)
img    <- image_resize(img, "1000x")
areas  <- image_textlines_flor(img, light = TRUE, type = "sauvola")
areas  <- lines(areas, img)
areas$n
areas$overview
combined <- lapply(areas$textlines, FUN=function(x) image_read(ocv_bitmap(x)))
combined <- do.call(c, combined)
combined
image_append(combined, stack = TRUE)

Find Words by Connected Components Labelling

Description

Filter the image using the gaussian kernel and extract components which are connected which are to be considered as words.

Usage

image_wordsegmentation(x, kernelSize = 11L, sigma = 11L, theta = 7L)
image_wordsegmentation(x, kernelSize = 11L, sigma = 11L, theta = 7L)

Arguments

`x`	an object of class opencv-image containing black/white binary data (type CV_8U1)
`kernelSize`	size of the kernel
`sigma`	sigma of the kernel
`theta`	theta of the kernel

Value

a list with elements

n: the number of lines found
overview: an opencv-image of the detected areas
words: a list of opencv-image's, one for each word area

Examples


library(opencv)
library(magick)
library(image.textlinedetector)
path  <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img   <- image_read(path)
img   <- image_resize(img, "x1000")
areas <- image_textlines_flor(img, light = TRUE, type = "sauvola")
areas$overview
areas$textlines[[6]]
textwords <- image_wordsegmentation(areas$textlines[[6]])
textwords$n
textwords$overview
textwords$words[[2]]
textwords$words[[3]]

library(opencv)
library(magick)
library(image.textlinedetector)
path  <- system.file(package = "image.textlinedetector", "extdata", "example.png")
img   <- image_read(path)
img   <- image_resize(img, "x1000")
areas <- image_textlines_flor(img, light = TRUE, type = "sauvola")
areas$overview
areas$textlines[[6]]
textwords <- image_wordsegmentation(areas$textlines[[6]])
textwords$n
textwords$overview
textwords$words[[2]]
textwords$words[[3]]

Extract the polygons of the textlines

Description

Extract the polygons of the textlines as a cropped rectangular image containing the image content of the line segmented polygon

Usage

## S3 method for class 'textlines'
lines(x, image, crop = TRUE, channels = c("bgr", "gray"), ...)
## S3 method for class 'textlines'
lines(x, image, crop = TRUE, channels = c("bgr", "gray"), ...)

Arguments

`x`	an object of class `textlines` as returned by `image_textlines_astar` or `image_textlines_flor`
`image`	an object of class magick-image
`crop`	extract only the bounding box of the polygon of the text lines
`channels`	either 'bgr' or 'gray' to work on the colored data or on binary greyscale data
`...`	further arguments passed on

Value

the object x where element textlines is replaced with the extracted polygons of text lines

Examples

## See the examples in ?image_textlines_astar or ?image_textlines_flor
## See the examples in ?image_textlines_astar or ?image_textlines_flor

Deslant images by putting cursive text upright

Description

This algorithm sets handwritten text in images upright by removing cursive writing style. One can use it as a preprocessing step for handwritten text recognition.

image_deslant expects a magick-image and performs grayscaling before doing deslanting
ocv_deslant expects a ocv-image and does not perform grayscaling before doing deslanting

Usage

ocv_deslant(image, bgcolor = 255, lower_bound = -1, upper_bound = 1)

image_deslant(image, bgcolor = 255, lower_bound = -1, upper_bound = 1)
ocv_deslant(image, bgcolor = 255, lower_bound = -1, upper_bound = 1)

image_deslant(image, bgcolor = 255, lower_bound = -1, upper_bound = 1)

Arguments

`image`	an object of class opencv-image (for `ocv_deslant`) with pixel values between 0 and 255 or a magick-image (for `image_deslant`)
`bgcolor`	integer value with the background color to use to fill the gaps of the sheared image that is returned. Defaults to white: 255
`lower_bound`	lower bound of shear values. Defaults to -1
`upper_bound`	upper bound of shear values. Defaults to 1

Value

an object of class opencv-image or magick-image with the deslanted image

Examples



library(magick)
library(opencv)
library(image.textlinedetector)
path <- system.file(package = "image.textlinedetector", "extdata", "cursive.png")
img  <- ocv_read(path)
img  <- ocv_grayscale(img)
img
up   <- ocv_deslant(img)
up

img  <- image_read(path)
img
image_deslant(img)


library(magick)
library(opencv)
library(image.textlinedetector)
path <- system.file(package = "image.textlinedetector", "extdata", "cursive.png")
img  <- ocv_read(path)
img  <- ocv_grayscale(img)
img
up   <- ocv_deslant(img)
up

img  <- image_read(path)
img
image_deslant(img)

Package 'image.textlinedetector'

Help Index

Text Line Segmentation based on the A* Path Planning Algorithm

Description

Usage

Arguments

Value

Examples

Crop an image to extract only the region containing text

Description

Usage

Arguments

Value

Examples

Text Line Segmentation based on valley finding in projection profiles

Description

Usage

Arguments

Value

Examples

Find Words by Connected Components Labelling

Description

Usage

Arguments

Value

Examples

Extract the polygons of the textlines

Description

Usage

Arguments

Value

Examples

Deslant images by putting cursive text upright

Description

Usage

Arguments

Value

Examples