Skip to content

This script is a tool that allows you to extract PDF files from books on the Universidad de los Andes platform. It uses the Selenium library for web scraping to access the content of the books and generate the PDF files.

Notifications You must be signed in to change notification settings

abelarismendy/pdf-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Extractor

This script is a tool that allows you to extract PDF files from books on the Universidad de los Andes platform (http://www.ebooks7-24.com.ezproxy.uniandes.edu.co/). It uses the Selenium library for web scraping to access the content of the books and generate the PDF files.

Requirements

  • Python 3.6 or higher
  • Selenium library
  • Chrome browser
  • ChromeDriver

Setup

  1. Install Python 3.6 or higher.
  2. Install the required libraries by running pip install -r requirements.txt.
  3. Download and install the Chrome browser.
  4. Download the appropriate version of ChromeDriver for your system and place it in the drivers folder.
  5. Update the chrome_path variable in the script with the path to the ChromeDriver executable.

Usage

  1. Run the script by executing python3 main.py in the terminal.
  2. A dialog window will prompt you to enter the book id (4 digits). Enter the book id (which you can find in the 'URL del ebook' field on the book page - it is the last 4 digits of the URL) and click OK.
  3. The script will open the Chrome browser and navigate to the login page for the Universidad de los Andes platform.
  4. If you are already logged in to the platform, the script will proceed to the book page. If you are not logged in, you will need to complete the login form and the 2-step verification process before the script can proceed.
  5. The script will prompt you to enter the first page and the last page of the book that you want to extract. Enter the page numbers and click OK.
  6. Once the script has access to the book page, it will start extracting the PDF files and saving them to the pdf folder in the script's directory.
  7. After all pages were processed, the script will merge all the single page PDFs into a single PDF file located in the books folder. The name of the file is the book code.

Disclaimer

This script is intended for educational use only and should not be used for any illegal or unauthorized purposes. The content generated by this script is for personal or educational use only and should not be distributed or copied without the express permission of the copyright holders. While this script may be useful for accessing the content of books, it is recommended that you access the content through the official channels

The author of this script does not endorse or condone any illegal or unauthorized use of this script, and is not responsible for any misuse or damages caused by using this script. This script is provided "as is" without warranty of any kind, either express or implied. The author shall not be liable for any damages resulting from the use of this script.

By using this script, you confirm that you have read and understood this disclaimer and agree to be bound by its terms. If you do not agree to the terms of this disclaimer, you are not authorized to use this script.

About

This script is a tool that allows you to extract PDF files from books on the Universidad de los Andes platform. It uses the Selenium library for web scraping to access the content of the books and generate the PDF files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published