nextjs-pdf-parser.mp4
I was having some trouble parsing PDFs in Next.js, so I thought I would make this template for anyone else who was facing the same issues as me. I hope this template saves you some time and trouble. It's a basic create-next-app
with PDF parsing implemented using the pdf2json library and file uploading facilitated by FilePond.
-
Clone the repository:
-
git clone [repository-url]
-
Navigate to the project directory:
-
cd nextjs-pdf-parser
-
Install dependencies:
-
Windows only: In
app\api\upload\route.ts
on line 22, changetempFilePath
to a valid path. Make sure it starts from the root drive, for example:C:/coding/nextjs-pdf-parser/public/${fileName}.pdf
-
npm install # or yarn install
-
Run the development server:
npm run dev # or yarn dev
Visit
http://localhost:3000
to view the application.
Navigate to http://localhost:3000
and use the FilePond uploader to select and upload a PDF. Once uploaded, the content of the PDF is parsed and printed to the server console (Note: it will not be printed to the browser log).
-
nodeUtil is not defined Error:
To bypass the
nodeUtil is not defined
error, the following configuration was added tonext.config.js
:
const nextConfig = {
experimental: {
serverComponentsExternalPackages: ['pdf2json'],
},
};
module.exports = nextConfig;
See more details here
-
Blank output from
pdfParser.getRawTextContent()
:This issue might be due to incorrect type definitions. There are two potential solutions:
-
Fix TypeScript definitions: Update the type definition for PDFParser.
-
Bypass type checking: Instantiate PDFParser as shown:
const pdfParser = new (PDFParser as any)(null, 1);
For more details, refer to my comment on this GitHub issue.
-
A special thanks to the following libraries and their contributors:
- FilePond: For providing a seamless and user-friendly file uploading experience.
- pdf2json: For its efficient and robust PDF parsing capabilities.
MIT License