¿Cómo introducir una expresión regular en string.replace?

Necesito un poco de ayuda para declarar una expresión regular. Mis entradas son como las siguientes:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags  

La salida requerida es:

 this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags 

He intentado esto:

 #!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace(' ', '') line = line2.replace(' ', '') line2 = line.replace('', '') line = line2.replace('', '') print line 

También he intentado esto (pero parece que estoy usando la syntax de expresiones regulares incorrecta):

  line2 = line.replace(' ', '') line = line2.replace(' ', '') line2 = line.replace('', '') line = line2.replace('', '') 

No quiero codificar la replace de 1 a 99. . .

Este fragmento de código probado debe hacerlo:

 import re line = re.sub(r"", "", line) 

Edición: Aquí hay una versión comentada que explica cómo funciona:

 line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal '<' /? # Optionally match a '/' \[ # Match a literal '[' \d+ # Match one or more digits > # Match a literal '>' """, "", line) 

¡Los regexes son divertidos! Pero recomendaría encarecidamente pasar una o dos horas estudiando lo básico. Para empezar, necesita aprender qué caracteres son especiales: “metacaracteres” que deben escaparse (es decir, con una barra invertida colocada al frente y las reglas son diferentes dentro y fuera de las clases de caracteres). Hay un excelente tutorial en línea en: www .regular-expresiones.info . El tiempo que pases allí se pagará por sí mismo muchas veces. Feliz regexing!

str.replace() hace reemplazos fijos. Utilice re.sub() lugar.

Me gustaría ir de esta manera (expresiones regulares explicadas en los comentarios):

 import re # If you need to use the regex more than once it is suggested to compile it. pattern = re.compile(r"") # <\/{0,}\[\d+> # # Match the character “<” literally «<» # Match the character “/” literally «\/{0,}» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}» # Match the character “[” literally «\[» # Match a single digit 0..9 «\d+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» # Match the character “>” literally «>» subject = """this is a paragraph with<[1> in between and then there are cases ... where the<[99> number ranges from 1-100. and there are many other lines in the txt files with<[3> such tags """ result = pattern.sub("", subject) print(result) 

Si desea obtener más información sobre expresiones regulares, le recomiendo leer el libro de recetas de Expresiones regulares de Jan Goyvaerts y Steven Levithan.

La forma más fácil

 import re txt='this is a paragraph with<[1> in between and then there are cases ... where the<[99> number ranges from 1-100. and there are many other lines in the txt files with<[3> such tags ' out = re.sub("(<[^>]+>)", '', txt) print out 

El método de reemplazar objetos de cadena no acepta expresiones regulares, sino solo cadenas fijas (consulte la documentación: http://docs.python.org/2/library/stdtypes.html#str.replace ).

Tienes que usar re modulo:

 import re newline= re.sub("<\/?\[[0-9]+>", "", line) 

no tiene que usar expresiones regulares (para su cadena de muestra)

 >>> s 'this is a paragraph with<[1> in between and then there are cases ... where the<[99> number ranges from 1-100. \nand there are many other lines in the txt files\nwith<[3> such tags \n' >>> for w in s.split(">"): ... if "<" in w: ... print w.split("<")[0] ... this is a paragraph with in between and then there are cases ... where the number ranges from 1-100 . and there are many other lines in the txt files with such tags 
 import os, sys, re, glob pattern = re.compile(r"\<\[\d\>") replacementStringMatchesPattern = "<[1>" for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: retline = pattern.sub(replacementStringMatchesPattern, "", line) sys.stdout.write(retline) print (retline)