Error al capturar fechas de forma personalizada de un contenido tabular

He escrito un script en python en combinación con selenium para analizar algunas fechas disponibles dentro de una tabla en una página web. La tabla se encuentra bajo el encabezado NPL Victoria Betting Odds . Los datos tabulares están dentro de la tabla de tournamentTable identificación. Puedes ver las tres fechas allí: 10 Aug 2018 , 11 Aug 2018 y 12 Aug 2018 . Deseo analizarlos y organizarlos de acuerdo con mi salida esperada a continuación.

Enlace de pagina web

Este es mi bash hasta ahora:

 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from bs4 import BeautifulSoup link = "find the link above" def get_content(driver,url): driver.get(url) for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))): try: idate = items.find_element_by_css_selector("th span[class^='datet']").text except Exception: idate = "" try: itime = items.find_element_by_css_selector("td.table-time").text except Exception: itime = "" print(f'{idate}--{itime}') if __name__ == '__main__': driver = webdriver.Chrome() wait = WebDriverWait(driver,10) try: get_content(driver,link) finally: driver.quit() 

Actualmente estoy teniendo salida como:

 -- 10 Aug 2018-- -- --09:30 --10:15 11 Aug 2018-- -- --05:00 --05:00 --09:00 12 Aug 2018-- -- --06:00 --06:00 

Mi salida esperada:

 10 Aug 2018--09:30 10 Aug 2018--10:15 11 Aug 2018--05:00 11 Aug 2018--05:00 11 Aug 2018--09:00 12 Aug 2018--06:00 12 Aug 2018--06:00 

Trate de usar el siguiente código:

 def get_content(driver,url): driver.get(url) dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border")))) for d in range(dates): item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d] try: idate = item.find_element_by_css_selector("th span[class^='datet']").text except Exception: idate = "" for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)): try: itime = time_td.text except Exception: itime = "" print(f'{idate}--{itime}') 

No estoy usando Selenium, pero las fechas seleccionadas se pueden extraer solo con BeautifulSoup. Las fechas de tiempo se codifican como marca de tiempo Unix dentro de las clases de tags:

 from bs4 import BeautifulSoup import requests import re import datetime headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'} r = requests.get('http://www.oddsportal.com/soccer/australia/npl-victoria/', headers=headers) soup = BeautifulSoup(r.text, 'lxml') for td in soup.select('table#tournamentTable td.datet'): for c in td['class']: if re.match(r't\d+', c): unix_timestamp = int(re.match(r't(\d+)', c)[1]) d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M') print(d) 

Huellas dactilares:

 10 Aug 2018--09:30 10 Aug 2018--10:15 11 Aug 2018--05:00 11 Aug 2018--05:00 11 Aug 2018--09:00 12 Aug 2018--06:00 12 Aug 2018--06:00 

Si quieres también los partidos impresos:

 for td in soup.select('table#tournamentTable td.datet'): for c in td['class']: if re.match(r't\d+', c): unix_timestamp = int(re.match(r't(\d+)', c)[1]) d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M') print(d, end=' ') print(td.find_next('td').text) 

Huellas dactilares:

 10 Aug 2018--09:30 Melbourne Knights - Port Melbourne Sharks 10 Aug 2018--10:15 Pascoe Vale - Dandenong Thunder 11 Aug 2018--05:00 Avondale FC - Bentleigh Greens 11 Aug 2018--05:00 Northcote City - Bulleen 11 Aug 2018--09:00 Hume City - Oakleigh Cannons 12 Aug 2018--06:00 Heidelberg Utd - Green Gully 12 Aug 2018--06:00 South Melbourne - Kingston City 

El problema es que los elementos son un idate o un itime. así que estás sobrescribiendo uno de ellos cada vez.

excepts tus excepts , y se imprime bien para mí:

 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from bs4 import BeautifulSoup link = "http://www.oddsportal.com/soccer/australia/npl-victoria/" def get_content(driver,url): driver.get(url) idate = '' itime = '' for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))): try: idate = items.find_element_by_css_selector("th span[class^='datet']").text #except Exception: idate = "" except: pass try: itime = items.find_element_by_css_selector("td.table-time").text # print('itime: ',itime) # except Exception: itime = "" except: pass if idate !='' and itime !='': print(f'{idate}--{itime}') if __name__ == '__main__': driver = webdriver.Chrome() wait = WebDriverWait(driver,10) try: get_content(driver,link) finally: driver.quit()