¿Hay una manera de convertir palabras de números a números enteros?

Necesito convertir one en 1 , two en 2 y así sucesivamente.

¿Hay una manera de hacer esto con una biblioteca o una clase o algo?

La mayoría de este código es para configurar los dictados de las palabras clave, que solo se realizan en la primera llamada.

 def text2int(textnum, numwords={}): if not numwords: units = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", ] tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"] scales = ["hundred", "thousand", "million", "billion", "trillion"] numwords["and"] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) current = result = 0 for word in textnum.split(): if word not in numwords: raise Exception("Illegal word: " + word) scale, increment = numwords[word] current = current * scale + increment if scale > 100: result += current current = 0 return result + current print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven") #7100031337 

Si alguien está interesado, pirateé una versión que mantiene el rest de la cadena (aunque puede tener errores, no la he probado demasiado).

 def text2int (textnum, numwords={}): if not numwords: units = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", ] tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"] scales = ["hundred", "thousand", "million", "billion", "trillion"] numwords["and"] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12} ordinal_endings = [('ieth', 'y'), ('th', '')] textnum = textnum.replace('-', ' ') current = result = 0 curstring = "" onnumber = False for word in textnum.split(): if word in ordinal_words: scale, increment = (1, ordinal_words[word]) current = current * scale + increment if scale > 100: result += current current = 0 onnumber = True else: for ending, replacement in ordinal_endings: if word.endswith(ending): word = "%s%s" % (word[:-len(ending)], replacement) if word not in numwords: if onnumber: curstring += repr(result + current) + " " curstring += word + " " result = current = 0 onnumber = False else: scale, increment = numwords[word] current = current * scale + increment if scale > 100: result += current current = 0 onnumber = True if onnumber: curstring += repr(result + current) return curstring 

Ejemplo:

  >>> text2int("I want fifty five hot dogs for two hundred dollars.") I want 55 hot dogs for 200 dollars. 

Podría haber problemas si tiene, por ejemplo, “$ 200”. Pero, esto fue realmente duro.

Gracias por el fragmento de código … me ahorré mucho tiempo!

Necesitaba manejar un par de casos de análisis adicionales, como palabras ordinales (“primero”, “segundo”), palabras con guión (“cien”) y palabras ordinales con guiones como (“cincuenta y siete”), así que agregué un par de lineas:

 def text2int(textnum, numwords={}): if not numwords: units = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", ] tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"] scales = ["hundred", "thousand", "million", "billion", "trillion"] numwords["and"] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12} ordinal_endings = [('ieth', 'y'), ('th', '')] textnum = textnum.replace('-', ' ') current = result = 0 for word in textnum.split(): if word in ordinal_words: scale, increment = (1, ordinal_words[word]) else: for ending, replacement in ordinal_endings: if word.endswith(ending): word = "%s%s" % (word[:-len(ending)], replacement) if word not in numwords: raise Exception("Illegal word: " + word) scale, increment = numwords[word] current = current * scale + increment if scale > 100: result += current current = 0 return result + current` 

Acabo de lanzar un módulo python a PyPI llamado word2number para el propósito exacto. https://github.com/akshaynagpal/w2n

Instálalo usando:

 pip install word2number 

Asegúrate de que tu pip esté actualizado a la última versión.

Uso:

 from word2number import w2n print w2n.word_to_num("two million three thousand nine hundred and eighty four") 2003984 

Necesitaba algo un poco diferente ya que mi entrada es de una conversión de voz a texto y la solución no siempre es sumr los números. Por ejemplo, “mi código postal es uno dos tres cuatro cinco” no se debe convertir a “mi código postal es 15”.

Tomé la respuesta de Andrew y la modifiqué para manejar algunos otros casos que las personas destacaron como errores, y también agregué soporte para ejemplos como el código postal que mencioné anteriormente. A continuación se muestran algunos casos de pruebas básicas, pero estoy seguro de que todavía hay espacio para mejorar.

 def is_number(x): if type(x) == str: x = x.replace(',', '') try: float(x) except: return False return True def text2int (textnum, numwords={}): units = [ 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen', ] tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety'] scales = ['hundred', 'thousand', 'million', 'billion', 'trillion'] ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12} ordinal_endings = [('ieth', 'y'), ('th', '')] if not numwords: numwords['and'] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) textnum = textnum.replace('-', ' ') current = result = 0 curstring = '' onnumber = False lastunit = False lastscale = False def is_numword(x): if is_number(x): return True if word in numwords: return True return False def from_numword(x): if is_number(x): scale = 0 increment = int(x.replace(',', '')) return scale, increment return numwords[x] for word in textnum.split(): if word in ordinal_words: scale, increment = (1, ordinal_words[word]) current = current * scale + increment if scale > 100: result += current current = 0 onnumber = True lastunit = False lastscale = False else: for ending, replacement in ordinal_endings: if word.endswith(ending): word = "%s%s" % (word[:-len(ending)], replacement) if (not is_numword(word)) or (word == 'and' and not lastscale): if onnumber: # Flush the current number we are building curstring += repr(result + current) + " " curstring += word + " " result = current = 0 onnumber = False lastunit = False lastscale = False else: scale, increment = from_numword(word) onnumber = True if lastunit and (word not in scales): # Assume this is part of a string of individual numbers to # be flushed, such as a zipcode "one two three four five" curstring += repr(result + current) result = current = 0 if scale > 1: current = max(1, current) current = current * scale + increment if scale > 100: result += current current = 0 lastscale = False lastunit = False if word in scales: lastscale = True elif word in units: lastunit = True if onnumber: curstring += repr(result + current) return curstring 

Algunas pruebas …

 one two three -> 123 three forty five -> 345 three and forty five -> 3 and 45 three hundred and forty five -> 345 three hundred -> 300 twenty five hundred -> 2500 three thousand and six -> 3006 three thousand six -> 3006 nineteenth -> 19 twentieth -> 20 first -> 1 my zip is one two three four five -> my zip is 12345 nineteen ninety six -> 1996 fifty-seventh -> 57 one million -> 1000000 first hundred -> 100 I will buy the first thousand -> I will buy the 1000 # probably should leave ordinal in the string thousand -> 1000 hundred and six -> 106 1 million -> 1000000 

Aquí está el enfoque de caso trivial:

 >>> number = {'one':1, ... 'two':2, ... 'three':3,} >>> >>> number['two'] 2 

¿O estás buscando algo que pueda manejar “doce mil ciento setenta y dos” ?

Esto podría ser fácilmente codificado en un diccionario si hay una cantidad limitada de números que le gustaría analizar.

Para casos un poco más complejos, probablemente querrá generar este diccionario automáticamente, en función de la gramática de números relativamente simples. Algo en la línea de esto (por supuesto, generalizado …)

 for i in range(10): myDict[30 + i] = "thirty-" + singleDigitsDict[i] 

Si necesita algo más extenso, parece que necesitará herramientas de procesamiento de lenguaje natural. Este artículo podría ser un buen punto de partida.

Esta es la implementación c # del código en la primera respuesta:

 public static double ConvertTextToNumber(string text) { string[] units = new string[] { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", }; string[] tens = new string[] {"", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}; string[] scales = new string[] { "hundred", "thousand", "million", "billion", "trillion" }; Dictionary numWord = new Dictionary(); numWord.Add("and", new ScaleIncrementPair(1, 0)); for (int i = 0; i < units.Length; i++) { numWord.Add(units[i], new ScaleIncrementPair(1, i)); } for (int i = 1; i < tens.Length; i++) { numWord.Add(tens[i], new ScaleIncrementPair(1, i * 10)); } for (int i = 0; i < scales.Length; i++) { if(i == 0) numWord.Add(scales[i], new ScaleIncrementPair(100, 0)); else numWord.Add(scales[i], new ScaleIncrementPair(Math.Pow(10, (i*3)), 0)); } double current = 0; double result = 0; foreach (var word in text.Split(new char[] { ' ', '-', '—'})) { ScaleIncrementPair scaleIncrement = numWord[word]; current = current * scaleIncrement.scale + scaleIncrement.increment; if (scaleIncrement.scale > 100) { result += current; current = 0; } } return result + current; } public struct ScaleIncrementPair { public double scale; public int increment; public ScaleIncrementPair(double s, int i) { scale = s; increment = i; } } 

Se hizo un cambio para que text2int (escala) devuelva la conversión correcta. Por ejemplo, text2int (“hundred”) => 100.

 import re numwords = {} def text2int(textnum): if not numwords: units = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen"] tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"] scales = ["hundred", "thousand", "million", "billion", "trillion", 'quadrillion', 'quintillion', 'sexillion', 'septillion', 'octillion', 'nonillion', 'decillion' ] numwords["and"] = (1, 0) for idx, word in enumerate(units): numwords[word] = (1, idx) for idx, word in enumerate(tens): numwords[word] = (1, idx * 10) for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0) ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12} ordinal_endings = [('ieth', 'y'), ('th', '')] current = result = 0 tokens = re.split(r"[\s-]+", textnum) for word in tokens: if word in ordinal_words: scale, increment = (1, ordinal_words[word]) else: for ending, replacement in ordinal_endings: if word.endswith(ending): word = "%s%s" % (word[:-len(ending)], replacement) if word not in numwords: raise Exception("Illegal word: " + word) scale, increment = numwords[word] if scale > 1: current = max(1, current) current = current * scale + increment if scale > 100: result += current current = 0 return result + current 

Hay una gem Ruby de Marc Burns que lo hace. Recientemente lo bifurqué para agregar soporte durante años. Puedes llamar al código ruby ​​desde python .

  require 'numbers_in_words' require 'numbers_in_words/duck_punch' nums = ["fifteen sixteen", "eighty five sixteen", "nineteen ninety six", "one hundred and seventy nine", "thirteen hundred", "nine thousand two hundred and ninety seven"] nums.each {|n| pn; p n.in_numbers} 

resultados:
"fifteen sixteen" 1516 "eighty five sixteen" 8516 "nineteen ninety six" 1996 "one hundred and seventy nine" 179 "thirteen hundred" 1300 "nine thousand two hundred and ninety seven" 9297

Puerto Java rápido y sucio de la implementación de C # de e_h (arriba). Tenga en cuenta que ambos devuelven doble, no int.

 public class Text2Double { public double Text2Double(String text) { String[] units = new String[]{ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", }; String[] tens = new String[]{"", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}; String[] scales = new String[]{"hundred", "thousand", "million", "billion", "trillion"}; Map numWord = new LinkedHashMap<>(); numWord.put("and", new ScaleIncrementPair(1, 0)); for (int i = 0; i < units.length; i++) { numWord.put(units[i], new ScaleIncrementPair(1, i)); } for (int i = 1; i < tens.length; i++) { numWord.put(tens[i], new ScaleIncrementPair(1, i * 10)); } for (int i = 0; i < scales.length; i++) { if (i == 0) numWord.put(scales[i], new ScaleIncrementPair(100, 0)); else numWord.put(scales[i], new ScaleIncrementPair(Math.pow(10, (i * 3)), 0)); } double current = 0; double result = 0; for(String word : text.split("[ -]")) { ScaleIncrementPair scaleIncrement = numWord.get(word); current = current * scaleIncrement.scale + scaleIncrement.increment; if (scaleIncrement.scale > 100) { result += current; current = 0; } } return result + current; } } public class ScaleIncrementPair { public double scale; public int increment; public ScaleIncrementPair(double s, int i) { scale = s; increment = i; } } 

Una solución rápida es utilizar inflect.py para generar un diccionario para la traducción.

inflect.py tiene una función number_to_words() , que cambiará un número (por ejemplo, 2 ) a su forma de palabra (por ejemplo, 'two' ). Desafortunadamente, no se ofrece su reverso (lo que le permitiría evitar la ruta del diccionario de traducción). De todos modos, puedes usar esa función para construir el diccionario de traducción:

 >>> import inflect >>> p = inflect.engine() >>> word_to_number_mapping = {} >>> >>> for i in range(1, 100): ... word_form = p.number_to_words(i) # 1 -> 'one' ... word_to_number_mapping[word_form] = i ... >>> print word_to_number_mapping['one'] 1 >>> print word_to_number_mapping['eleven'] 11 >>> print word_to_number_mapping['forty-three'] 43 

Si está dispuesto a dedicar algo de tiempo, podría ser posible examinar el funcionamiento interno de number_to_words() función number_to_words() y crear su propio código para hacer esto dinámicamente (no he intentado hacerlo).

 This code works only for numbers below 99. both word to Int and int to word. (for rest need to implement 10-20 lines of code and simple logic. This is just simple code for beginners) num=input("Enter the number you want to convert : ") mydict={'1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'} mydict2=['','','Twenty','Thirty','Fourty','fifty','sixty','Seventy','Eighty','Ninty'] if num.isdigit(): if(int(num)<20): print(" :---> "+mydict[num]) else: var1=int(num)%10 var2=int(num)/10 print(" :---> "+mydict2[int(var2)]+mydict[str(var1)]) else: num=num.lower(); dict_w={'one':1,'two':2,'three':3,'four':4,'five':5,'six':6,'seven':7,'eight':8,'nine':9,'ten':10,'eleven':11,'twelve':12,'thirteen':13,'fourteen':14,'fifteen':15,'sixteen':16,'seventeen':'17','eighteen':'18','nineteen':'19'} mydict2=['','','twenty','thirty','fourty','fifty','sixty','seventy','eighty','ninty'] divide=num[num.find("ty")+2:] if num: if(num in dict_w.keys()): print(" :---> "+str(dict_w[num])) elif divide=='' : for i in range(0, len(mydict2)-1): if mydict2[i] == num: print(" :---> "+str(i*10)) else : str3=0 str1=num[num.find("ty")+2:] str2=num[:-len(str1)] for i in range(0, len(mydict2) ): if mydict2[i] == str2: str3=i; if str2 not in mydict2: print("----->Invalid Input<-----") else: try: print(" :---> "+str((str3*10)+dict_w[str1])) except: print("----->Invalid Input<-----") else: print("----->Please Enter Input<-----")