- Clone the repo
git clone https://github.com/Zeeshanahmad4/NLP--Data-extraction-Microsoft-Word-documents-into-a-CSV.git
- Install python packages
pip install docx docxpy string
Extract data from Microsoft Word documents into a CSV file or an Excel file. About 250 Word documents and each file is structured in a Microsoft Word table with repetitive titles. All documents are in French and need to keep or extract text and hyperlinks in the documents.
- scriptpyfile.py
- script.ipynb
- output.csv
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.