tabula-sharp

tabula-sharp is a library for extracting tables from PDF files — it is a port of tabula-java

Supports netstandard2.0, net462, net471, net6.0, net8.0
No java bindings

NuGet packages available on the releases page and on www.nuget.org:

Differences with tabula-java

Uses PdfPig, and not PdfBox.
Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
The NurminenDetectionAlgorithm is replaced by SimpleNurminenDetectionAlgorithm, because it requieres an image management library.
Table results might be different because of the way PdfPig builds Letters bounding box.

Usage

Stream mode - BasicExtractionAlgorithm

using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
	ObjectExtractor oe = new ObjectExtractor(document);
	PageArea page = oe.Extract(1);
	
	// detect canditate table zones
	SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();
	var regions = detector.Detect(page);
	
	IExtractionAlgorithm ea = new BasicExtractionAlgorithm();
	List<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area
	var table = tables[0];
	var rows = table.Rows;
}

Lattice mode - SpreadsheetExtractionAlgorithm

using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
	ObjectExtractor oe = new ObjectExtractor(document);
	PageArea page = oe.Extract(1);

	IExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm();
	List<Table> tables = ea.Extract(page);
	var table = tables[0];
	var rows = table.Rows;
}

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
Tabula.Csv		Tabula.Csv
Tabula.Json		Tabula.Json
Tabula.Tests		Tabula.Tests
Tabula		Tabula
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tabula-sharp.sln		tabula-sharp.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Releases 9

Packages

Languages

License

BobLd/tabula-sharp

Folders and files

Latest commit

History

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages