¿Cómo raspar esta página web con Python y lxml? lista vacía devuelta

Para fines educativos, estoy intentando raspar esta página gradualmente con Python y lxml , comenzando con los nombres de las películas.

Por lo que he leído hasta ahora de los documentos de Python en lxml y W3Schools en XPath, este código debería mostrar todos los títulos de películas en una lista:

from lxml import html import requests page = requests.get('http://www.rottentomatoes.com/browse/dvd-top-rentals/') tree = html.fromstring(page.text) movies = tree.xpath('//h3[@class="movieTitle"]/text()') print movies 

Básicamente, debería darme cada elemento h3 en cualquier parte del documento que tenga la class atributo que tenga el valor “movieTitle”. Sin embargo, al ejecutar el código, solo se imprime una lista vacía.

No puedo entender por qué.

Lo intenté por mi cuenta, así que corrí:

 movies = tree.xpath('//h3[@class]/text()') print movies 

Bueno, este debería devolver cualquier H3 con la clase de atributo, pero en su lugar devuelve esta lista:

 ['From RT Users Like You!', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '] 

Intenté apuntar a la primera cadena en esta lista apuntando a su valor de clase (“centro de noSpacing”), y devolví esta única cadena con éxito. Así que estoy seguro de que hay algo que no entiendo bien sobre el funcionamiento de lxml / XPath. ¿Alguien puede indicarme una dirección útil? ¡Gracias por adelantado!

La información en http://www.rottentomatoes.com/browse/dvd-top-rentals/ no se muestra directamente en la página, sino que se carga desde XMLHttpRequests .

La API que busca, parece ser:

http://d3biamo577v4eu.cloudfront.net/api/private/v1.0/m/list/find?page=1&limit=30&type=dvd-top-rentals&services=amazon%V.V.P.P.P.P.P. popularidad

Y la cadena de consulta se prepara dependiendo de los filtros seleccionados.

Por lo tanto, debe realizar solicitudes a ese punto final (en lugar de la URL que solicita actualmente) y analizar JSON para extraer los datos deseados.

Debes jugar con la variable GET de la “página” para obtener las siguientes.

Ejemplo con cURL + jq :

 ➜ ~ curl -s http://d3biamo577v4eu.cloudfront.net/api/private/v1.0/m/list/find\?page\=1\&limit\=30\&type\=dvd-top-rentals\&services\=amazon%3Bamazon_prime%3Bflixster%3Bhbo_go%3Bitunes%3Bnetflix_iw%3Bvudu\&sortBy\=popularity | jq '.results[].title' "Inside Out" "Vacation" "The End Of The Tour" "She's Funny That Way" "Best Of Enemies" "Before We Go" "Pixels" "The Gift" "Southpaw" "Max" "Jurassic World" "San Andreas" "Paper Towns" "Dragon Ball Z: Resurrection 'F'" "Testament Of Youth" "The Wolfpack" "Z For Zachariah" "Tomorrowland" "Terminator Genisys" "Dope" "The Gallows" "Magic Mike XXL" "Insidious: Chapter 3" "Me and Earl and the Dying Girl" "Particle Fever" "Dark Places" "What We Did on Our Holiday" "Avengers: Age of Ultron" "Spy" "Poltergeist" 

Ejemplo con Python + Solicitudes :

 import json import requests URL = "http://d3biamo577v4eu.cloudfront.net/api/private/v1.0/m/list/find?page=2&limit=30&type=dvd-top-rentals&services=amazon%3Bamazon_prime%3Bflixster%3Bhbo_go%3Bitunes%3Bnetflix_iw%3Bvudu&sortBy=popularity" response = requests.get(URL) for movie in response.json()['results']: movie_id = movie.get('id', None) title = movie.get('title', None) synopsis = movie.get('synopsis', None).encode('utf-8') actors = ', '.join(movie.get('actors', None)).encode('utf-8') tomato_score = movie.get('tomatoScore', None) popcorn_score = movie.get('popcornScore', None) mpaaRating = movie.get('mpaaRating', None) runtime = movie.get('runtime', None) print "Id: {}".format(movie_id) print "Title: {}".format(title) print "Synopsis: {}".format(synopsis) print "Actors: {}".format(actors) print "Rating: {}".format(mpaaRating) print "Runtime: {}".format(runtime) print "Scoring:" print "Tomato score: {}".format(tomato_score) print "PopCorn score: {}".format(popcorn_score) print "" 


 Id: 771306118 Title: Inside Out Synopsis: Inventive, gorgeously animated, and powerfully moving, Inside Out is another outstanding addition to the Pixar library of modern animated classics. Actors: Amy Poehler, Bill Hader, Lewis Black Rating: PG Runtime: 1 hr. 34 min. Scoring: Tomato score: 98 PopCorn score: 90 Id: 771312080 Title: Vacation Synopsis: Borrowing a basic storyline from the film that inspired it but forgetting the charm, wit, and heart, Vacation is yet another nostalgia-driven retread that misses the mark. Actors: Ed Helms, Christina Applegate, Leslie Mann Rating: R Runtime: 1 hr. 39 min. Scoring: Tomato score: 26 PopCorn score: 53 Id: 771411936 Title: The End Of The Tour Synopsis: Brilliantly performed and smartly unconventional, The End of the Tour pays fitting tribute to a singular talent while offering profoundly poignant observations on the human condition. Actors: Jesse Eisenberg, Jason Segel, Anna Chlumsky Rating: R Runtime: 1 hr. 45 min. Scoring: Tomato score: 92 PopCorn score: 88 Id: 771402562 Title: She's Funny That Way Synopsis: She's Funny That Way is an affectionate, talent-filled throwback to screwball comedies of old -- which makes it even more frustrating that the laughs are disappointingly few and far between. Actors: Imogen Poots, Owen Wilson, Jennifer Aniston Rating: R Runtime: 1 hr. 33 min. Scoring: Tomato score: 38 PopCorn score: 38 Id: 771412027 Title: Best Of Enemies Synopsis: Smart, fascinating, and funny, Best of Enemies takes a penetrating -- and wildly entertaining -- look back at the dawn of pundit politics. Actors: Dick Cavett, Gore Vidal, John Lithgow Rating: R Runtime: 1 hr. 27 min. Scoring: Tomato score: 94 PopCorn score: 94 Id: 771387770 Title: Before We Go Synopsis: BEFORE WE GO, the directorial debut of Chris Evans, follows the journey of two strangers stuck in New York City for the night. Starting as convenient acquaintances, the two soon grow into each other's most trusted confidants when a night of unexpected adventure forces them to confront their fears and take control of their lives. (C) Radius-TWC Actors: Alice Eve, Chris Evans, Daniel Spink Rating: PG-13 Runtime: 1 hr. 29 min. Scoring: Tomato score: 25 PopCorn score: 64 Id: 771263974 Title: Pixels Synopsis: Much like the worst arcade games from the era that inspired it, Pixels has little replay value and is hardly worth a quarter. Actors: Adam Sandler, Peter Dinklage, Kevin James Rating: PG-13 Runtime: 1 hr. 46 min. Scoring: Tomato score: 17 PopCorn score: 51 Id: 771415974 Title: The Gift Synopsis: The Gift is wickedly smart and playfully subversive, challenging the audience's expectations while leaving them leaning on the edges of their seats. Actors: Jason Bateman, Rebecca Hall, Joel Edgerton Rating: R Runtime: 1 hr. 48 min. Scoring: Tomato score: 93 PopCorn score: 78 Id: 771258788 Title: Southpaw Synopsis: Jake Gyllenhaal delivers an impressively committed performance, but Southpaw beats it down with a dispiriting dtwig that pummels viewers with genre clichés. Actors: Jake Gyllenhaal, Forest Whitaker, Rachel McAdams Rating: R Runtime: 2 hr. 3 min. Scoring: Tomato score: 59 PopCorn score: 79 Id: 771379965 Title: Max Synopsis: Max has good intentions and tries to hearken back to classic family-friendly features, but its disjointed, manipulative plot overwhelms the efforts of its talented human and canine stars. Actors: Josh Wiggins, Lauren Graham, Thomas Haden Church Rating: PG Runtime: 1 hr. 51 min. Scoring: Tomato score: 35 PopCorn score: 73 Id: 771324839 Title: Jurassic World Synopsis: Jurassic World can't match the original for sheer inventiveness and impact, but it works in its own right as an entertaining -- and visually dazzling -- popcorn thriller. Actors: Chris Pratt, Bryce Dallas Howard, Vincent D'Onofrio Rating: PG-13 Runtime: 2 hr. 3 min. Scoring: Tomato score: 71 PopCorn score: 80 Id: 771374432 Title: San Andreas Synopsis: San Andreas has a great cast and outstanding special effects, but amidst all the senses-shattering destruction, the movie's characters and plot prove less than structurally sound. Actors: Dwayne "The Rock" Johnson, Carla Gugino, Alexandra Daddario Rating: PG-13 Runtime: 1 hr. 54 min. Scoring: Tomato score: 50 PopCorn score: 56 Id: 771385882 Title: Paper Towns Synopsis: Paper Towns isn't as deep or moving as it wants to be, yet it's still earnest, well-acted, and thoughtful enough to earn a place in the hearts of teen filmgoers of all ages. Actors: Nat Wolff, Cara Delevingne, Halston Sage Rating: PG-13 Runtime: 1 hr. 49 min. Scoring: Tomato score: 55 PopCorn score: 53 Id: 771419940 Title: Dragon Ball Z: Resurrection 'F' Synopsis: Even the complete obliteration of his physical form can't stop the galaxy's most evil overlord. After years in spiritual purgatory, Frieza has been resurrected and plans to take his revenge on the Z-Fighters of Earth. Facing off against Frieza's powerful new form, and his army of 1,000 soldiers, Goku and Vegeta must reach new levels of strength in order to protect Earth from their vengeful nemesis. Actors: Koichi Yamadera, Todd Haberkorn, Sean Schemmel Rating: Unrated Runtime: 1 hr. 33 min. Scoring: Tomato score: 75 PopCorn score: 86 Id: 771385848 Title: Testament Of Youth Synopsis: Testament of Youth is well-acted and beautifully filmed, adding up to an enriching if not adventurous experience for fans of British period dtwigs. Actors: Alicia Vikander, Kit Harington, Dominic West Rating: PG-13 Runtime: 2 hr. 9 min. Scoring: Tomato score: 81 PopCorn score: 79 Id: 771412119 Title: The Wolfpack Synopsis: Offering a unique look at modern fears and our fascination with film, The Wolfpack is a fascinating -- and ultimately haunting -- urban fable. Actors: Visnu Angulo, Susanne Angulo, Oscar Angulo Rating: R Runtime: 1 hr. 20 min. Scoring: Tomato score: 84 PopCorn score: 73 Id: 771408592 Title: Z For Zachariah Synopsis: Z for Zachariah wrings compelling dtwig out of its simplistic premise -- albeit at a pace that may test the patience of less contemplative viewers. Actors: Chiwetel Ejiofor, Chris Pine, Margot Robbie Rating: PG-13 Runtime: 1 hr. 35 min. Scoring: Tomato score: 77 PopCorn score: 47 Id: 771306778 Title: Tomorrowland Synopsis: Ambitious and visually stunning, Tomorrowland is unfortunately weighted down by uneven storytelling. Actors: George Clooney, Hugh Laurie, Brittany Robertson Rating: PG Runtime: 1 hr. 47 min. Scoring: Tomato score: 50 PopCorn score: 52 Id: 771359910 Title: Terminator Genisys Synopsis: Mired in its muddled mythology, Terminator: Genisys is a lurching retread that lacks the thematic depth, conceptual intelligence, or visual thrills that launched this once-mighty franchise. Actors: Emilia Clarke, Jason Clarke, Jai Courtney Rating: PG-13 Runtime: 1 hr. 59 min. Scoring: Tomato score: 26 PopCorn score: 59 Id: 771412133 Title: Dope Synopsis: Featuring a starmaking performance from Shameik Moore and a refreshingly original point of view from writer-director Rick Famuyiwa, Dope is smart, insightful entertainment. Actors: Shameik Moore, Kiersey Clemons, Tony Revolori Rating: R Runtime: 1 hr. 55 min. Scoring: Tomato score: 87 PopCorn score: 85 Id: 771417481 Title: The Gallows Synopsis: Narratively contrived and visually a mess, The Gallows sends viewers on a shaky tumble to the bottom of the found-footage horror barrel. Actors: Cassidy Gifford, Pfeifer Brown, Ryan Shoos Rating: R Runtime: 1 hr. 27 min. Scoring: Tomato score: 16 PopCorn score: 25 Id: 771378808 Title: Magic Mike XXL Synopsis: Magic Mike XXL has enough narrative thrust and beefy charm to deliver another helping of well-oiled entertainment, even if this sequel isn't quite as pleasurable as its predecessor. Actors: Channing Tatum, Matt Bomer, Joe Manganiello Rating: R Runtime: 1 hr. 55 min. Scoring: Tomato score: 62 PopCorn score: 61 Id: 771375494 Title: Insidious: Chapter 3 Synopsis: Insidious: Chapter 3 isn't as terrifying as the original, although it boasts surprising thematic depth and is enlivened by another fine performance from Lin Shaye. Actors: Dermot Mulroney, Stefanie Scott, Angus Sampson Rating: PG-13 Runtime: 1 hr. 37 min. Scoring: Tomato score: 60 PopCorn score: 54 Id: 771412075 Title: Me and Earl and the Dying Girl Synopsis: Beautifully scripted and perfectly cast, Me & Earl & the Dying Girl is a coming-of-age movie with uncommon charm and insight. Actors: Thomas Mann, RJ Cyler, Olivia Cooke Rating: PG-13 Runtime: 1 hr. 44 min. Scoring: Tomato score: 81 PopCorn score: 88 Id: 771364355 Title: Particle Fever Synopsis: The concepts behind its heady subject matter may fly over the heads of most viewers, but Particle Fever presents it in such a way that even the least science-inclined viewers will find themselves enraptured. Actors: Savas Dimopoulos, Nima Arkani-Hamed, Fabiola Gianotti Rating: Unrated Runtime: 1 hr. 39 min. Scoring: Tomato score: 95 PopCorn score: 84 Id: 771362649 Title: Dark Places Synopsis: 25 years after testifying against her brother as the person responsible for massacring her entire family, a haunted woman (Charlize Theron) is approached by a secret society that specializes in complex, unsolved cases. Nicholas Hoult, Corey Stoll, and Chloe Moretz co-star in this Mandalay Pictures thriller directed by Gilles Paquet-Brenner, and based on the novel by Gillian Flynn. ~ Jason Buchanan, Rovi Actors: Charlize Theron, Nicholas Hoult, Corey Stoll Rating: R Runtime: 1 hr. 54 min. Scoring: Tomato score: 26 PopCorn score: 35 Id: 771357112 Title: What We Did on Our Holiday Synopsis: Witty and well-cast, What We Did on Our Holiday injects unlikely laughs into a story dealing with dark, difficult themes. Actors: Rosamund Pike, David Tennant, Billy Connolly Rating: PG-13 Runtime: 1 hr. 35 min. Scoring: Tomato score: 73 PopCorn score: 73 Id: 771313962 Title: Avengers: Age of Ultron Synopsis: Exuberant and eye-popping, Avengers: Age of Ultron serves as an overstuffed but mostly satisfying sequel, reuniting its predecessor's unwieldy cast with a few new additions and a worthy foe. Actors: Robert Downey Jr., Chris Evans, Mark Ruffalo Rating: PG-13 Runtime: 2 hr. 21 min. Scoring: Tomato score: 74 PopCorn score: 85 Id: 771361497 Title: Spy Synopsis: Simultaneously broad and progressive, Spy offers further proof that Melissa McCarthy and writer-director Paul Feig bring out the best in one another -- and delivers scores of belly laughs along the way. Actors: Melissa McCarthy, Jason Statham, Rose Byrne Rating: R Runtime: 1 hr. 57 min. Scoring: Tomato score: 93 PopCorn score: 81 Id: 770799339 Title: Poltergeist Synopsis: Paying competent homage without adding anything of real value to the original Poltergeist, this remake proves just as ephemeral (but half as haunting) as its titular spirit. Actors: Sam Rockwell, Rosemarie DeWitt, Kyle Catlett Rating: PG-13 Runtime: 1 hr. 33 min. Scoring: Tomato score: 31 PopCorn score: 23 Id: 771370467 Title: Entourage Synopsis: Entourage retains many elements of the HBO series, but feels less like a film than a particularly shallow, cameo-studded extended episode of the show. Actors: Jeremy Piven, Adrian Grenier, Kevin Dillon Rating: R Runtime: 1 hr. 45 min. Scoring: Tomato score: 32 PopCorn score: 64 Id: 771412037 Title: Cop Car Synopsis: Cop Car boasts a terrific premise and a grimly gripping opening act -- and for some viewers, that will be enough to compensate for the movie's uneven denouement. Actors: Kevin Bacon, Shea Whigham, Camryn Manheim Rating: R Runtime: 1 hr. 26 min. Scoring: Tomato score: 79 PopCorn score: 52 Id: 771412114 Title: Unexpected Synopsis: Unexpected proves a thoughtful and well-acted -- if somewhat mild -- look at worthy, thought-provoking themes. Actors: Anders Holm, Cobie Smulders, Gail Bean Rating: R Runtime: 1 hr. 30 min. Scoring: Tomato score: 67 PopCorn score: 49 Id: 771356696 Title: Pitch Perfect 2 Synopsis: Pitch Perfect 2 sings in sweet comedic harmony, even if it doesn't hit quite as many high notes as its predecessor. Actors: Anna Kendrick, Rebel Wilson, Brittany Snow Rating: PG-13 Runtime: 1 hr. 54 min. Scoring: Tomato score: 66 PopCorn score: 67 Id: 771354922 Title: Furious 7 Synopsis: Serving up a fresh round of over-the-top thrills while adding unexpected dtwigtic heft, Furious 7 keeps the franchise moving in more ways than one. Actors: Vin Diesel, Paul Walker, Jason Statham Rating: PG-13 Runtime: 2 hr. 20 min. Scoring: Tomato score: 81 PopCorn score: 84 Id: 771270966 Title: Cinderella Synopsis: Refreshingly traditional in a revisionist era, Kenneth Branagh's Cinderella proves Disney hasn't lost any of its old-fashioned magic. Actors: Lily James, Cate Blanchett, Richard Madden Rating: PG Runtime: 1 hr. 45 min. Scoring: Tomato score: 85 PopCorn score: 79 Id: 771359745 Title: Love & Mercy Synopsis: As unconventional and unwieldy as the life and legacy it honors, Love & Mercy should prove moving for Brian Wilson fans while still satisfying neophytes. Actors: Paul Dano, Elizabeth Banks, Brett Davern Rating: PG-13 Runtime: 2 hr. Scoring: Tomato score: 89 PopCorn score: 86 Id: 771378525 Title: Monkey Kingdom Synopsis: Monkey Kingdom's breathtaking footage of primates in the wild is likely to please animal lovers of all ages. Actors: Tina Fey Rating: G Runtime: 1 hr. 25 min. Scoring: Tomato score: 94 PopCorn score: 77 Id: 771412081 Title: The Overnight Synopsis: Witty and unpredictable, The Overnight benefits from writer-director Patrick Brice's sure-handed touch and strong performances from a talented cast. Actors: Adam Scott, Taylor Schilling, Jason Schwartzman Rating: R Runtime: 1 hr. 20 min. Scoring: Tomato score: 82 PopCorn score: 60 Id: 771374337 Title: Diplomacy Synopsis: For filmgoers who value character development and smart dialogue over plot, Diplomacy yields rich, powerfully acted rewards. Actors: André Dussollier, Niels Arestrup, Robert Stadlober Rating: Unrated Runtime: 1 hr. 25 min. Scoring: Tomato score: 93 PopCorn score: 80 Id: 771373149 Title: The Age of Adaline Synopsis: The Age of Adaline ruminates on mortality less compellingly than similarly themed films, but is set apart by memorable performances from Blake Lively and Harrison Ford. Actors: Blake Lively, Michiel Huisman, Harrison Ford Rating: PG-13 Runtime: 1 hr. 49 min. Scoring: Tomato score: 54 PopCorn score: 67 Id: 771028170 Title: Mad Max: Fury Road Synopsis: With exhilarating action and a surprising amount of narrative heft, Mad Max: Fury Road brings George Miller's post-apocalyptic franchise roaring vigorously back to life. Actors: Tom Hardy, Charlize Theron, Nicholas Hoult Rating: R Runtime: 2 hr. Scoring: Tomato score: 97 PopCorn score: 87 Id: 770683518 Title: I'll See You in My Dreams Synopsis: I'll See You in My Dreams would be worth watching even if Blythe Danner's central performance was all it had going for it, but this thoughtful dtwig satisfies on multiple levels. Actors: Blythe Danner, Martin Starr, Sam Elliott Rating: PG-13 Runtime: 1 hr. 35 min. Scoring: Tomato score: 94 PopCorn score: 70 Id: 771387966 Title: Good Kill Synopsis: Thought-provoking, timely, and anchored by a strong performance from Ethan Hawke, Good Kill is a modern war movie with a troubled conscience. Actors: Ethan Hawke, January Jones, Zoë Kravitz Rating: R Runtime: 1 hr. 43 min. Scoring: Tomato score: 76 PopCorn score: 49 Id: 771377895 Title: Dior and I Synopsis: Dior and I will obviously appeal to fashion fans, but this beautifully tailored documentary may draw in even the least sartorially inclined. Actors: Omar Berrada, Marion Cotillard, Anna Wintour Rating: R Runtime: 1 hr. 30 min. Scoring: Tomato score: 82 PopCorn score: 80 Id: 771387025 Title: Glen Campbell: I'll Be Me Synopsis: The heartrendingly honest Glen Campbell: I'll Be Me offers a window into Alzheimer's that should prove powerful viewing for Campbell fans and novices alike. Actors: Bruce Springsteen, Bill Clinton, Paul McCartney Rating: PG Runtime: 1 hr. 56 min. Scoring: Tomato score: 100 PopCorn score: 91 Id: 771378194 Title: Boulevard Synopsis: Boulevard features a richly layered performance from Robin Williams, but that may be this dour dtwig's sole distinctive feature. Actors: Robin Williams, Kathy Baker, Roberto Aguire Rating: R Runtime: 1 hr. 28 min. Scoring: Tomato score: 51 PopCorn score: 33 Id: 770804151 Title: About Elly Synopsis: About Elly offers viewers performances as powerful as its thought-provoking ideas, and adds another strong entry to Asghar Farhadi's impressive filmography. Actors: Golshifteh Farahani, Taraneh Alidoosti, Taraneh Alidousti Rating: Unrated Runtime: 1 hr. 59 min. Scoring: Tomato score: 97 PopCorn score: 86 

Usar selenium es otra forma de esperar hasta que la página esté completamente cargada (es decir, incluyendo toda la manipulación de JavaScript ). No tiene que usar Firefox , puede usar otros navegadores o un navegador sin cabeza como Phantom JS si no es necesario mostrar el sitio real.

 from lxml import html from selenium import webdriver browser = webdriver.Firefox() browser.get("http://www.rottentomatoes.com/browse/dvd-top-rentals/") tree = html.fromstring(browser.page_source) movies = tree.xpath('//h3[@class="movieTitle"]/text()') browser.close() print movies 


 ['Inside Out', 'Vacation', 'The End Of The Tour', "She's Funny That Way", 'Best Of Enemies', 'Before We Go', 'Pixels', 'The Gift', 'Southpaw', 'Max', 'Jurassic World', 'San Andreas ', 'Paper Towns', "Dragon Ball Z: Resurrection 'F'", 'Testament Of Youth', 'The Wolfpack', 'Z For Zachariah', 'Tomorrowland', 'Terminator Genisys', 'Dope', 'The Gallows', 'Magic Mi ke XXL', 'Insidious: Chapter 3', 'Me and Earl and the Dying Girl', 'Particle Fever', 'Dark Places', 'What We Did on Our Holiday', 'Avengers: Age of Ultron', 'Spy', 'Poltergeist']