Skip to content

javascript element interaction web scraping script for data analysis

Notifications You must be signed in to change notification settings

prompto416/IM-sheets-automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Initial-margin-rate-auto-update-script-project-API

NOTE:This project is a practical work I created for the company, hench confidential information are left empty or blurred

The script scrape initial margin rate data from Stock Exchange Of Thailand and excel files to write new data to a desired google sheet via google API and then send a notification to LINE via LINE Notify API.

Additionally, the script also includes a download function to download the google sheet as PDF and a function that converts PDF to JPG for previewing in the LINE message app.

1) Scraping download link from hidden HTML element by running selenium webdriver to find javascript hidden element

Scraping a hidden element or scripted element isn't simple. Alternatively, you can use webdriver to actually run the website so javascript elements accessible then once we have activated the element we can then scrape the element. It is to be noted that you should give ample time for the webdriver to load if not the function will return an empty string since the code was forced to execute before the website finished loading

def scrapeHTML_string():
    PATH = 'C:\Program Files (x86)\chromedriver.exe'
    s=Service(PATH)
    options = webdriver.ChromeOptions() 
    options.add_experimental_option("excludeSwitches", ["enable-logging"])
    options.headless = True
    driver = webdriver.Chrome(options=options, service=s)
    url = ("https://www.set.or.th/th/tch/rules-regulations/regulations?fbclid=IwAR06NR4BDsK_1Sl-6QzyzHHuW-sHpgbE8uo6dtF0qGx6Udwo1eolYEvHnRM#noti-margin-rate-2022")
    driver.get(url)
    time.sleep(3)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    time.sleep(3)
    elements = soup.find(class_="rules-books-render-recursive")
    time.sleep(3)
elements_string = str(elements)

if len(elements_string) > 100:
    # f = open('cacheDebug.txt','w')
    # f.write(elements_string)
    # f.close()
    # print('finished writing file')
    return elements_string
else:
    print('Error: None Type Detected!')
    return None


driver.quit()

2) Downloading Excel then read and write excel data onto google sheets by sending the API request or importing a libary with a request function

NOTE: API Key from google cloud console are required

Builtin Function: Auto Image Editor

Before

After

Builtin Function: Download and converting PDF TO JPG sample

Line Notify Request syntax

import requests

url = 'https://notify-api.line.me/api/notify'
#your token
token = ''
headers = {
            'content-type':
            'application/x-www-form-urlencoded',
            'Authorization':'Bearer '+token
           }
#image you want to send you can upload them on imgur or so
imageurl = 'https://media.discordapp.net/attachments/938633479950827520/1014175836242464891/unknown.png'
while True:
    
    msg = input("Enter your message:")
    r = requests.post(url, headers=headers , data = {'message':" ",'imageThumbnail':imageurl,'imageFullsize':imageurl}
)
    # r = requests.post(url, headers=headers , data = {'message':msg})
    print(r.text)

Releases

No releases published

Packages

No packages published

Languages