Obtener la fecha más cercana a una fecha dada

Dada esta fecha base:

base_date = "10/29 06:58 AM" 

Quiero encontrar una tupla dentro de la lista que contenga la fecha más cercana a la base_date , pero no debe ser una fecha anterior.

 list_date = [('10/30 02:18 PM', '-103', '-107'), ('10/30 02:17 PM', '+100', '-110'), \ ('10/29 02:15 AM', '-101', '-109') 

así que aquí la salida debería ser ('10/30 02:17 PM', '+100', '-110') (no puede ser la tercera tupla porque la fecha allí ocurrió antes de la fecha base)

Mi pregunta es, ¿existe algún módulo para dicha comparación de fechas? Primero intenté cambiar todos los datos a formato AM y luego compararlos, pero mi código se pone feo con muchos cortes.

@editar:

Lista grande para probar:

 [('10/30 02:18 PM', '+13 -103', '-13 -107'), ('10/30 02:17 PM', '+13 +100', '-13 -110'), ('10/30 02:15 PM', '+13 -101', '-13 -109'), ('10/30 02:14 PM', '+13 -103', '-13 -107'), ('10/30 01:59 PM', '+13 -105', '-13 -105'), ('10/30 01:46 PM', '+13 -106', '-13 -104'), ('10/30 01:37 PM', '+13 -105', '-13 -105'), ('10/30 01:24 PM', '+13 -107', '-13 -103'), ('10/30 01:23 PM', '+13 -106', '-13 -104'), ('10/30 01:05 PM', '+13 -103', '-13 -107'), ('10/30 01:02 PM', '+13 -104', '-13 -106'), ('10/30 12:55 PM', '+13 -103', '-13 -107'), ('10/30 12:51 PM', '+13.5 -110', '-13.5 +100'), ('10/30 12:44 PM', '+13.5 -108', '-13.5 -102'), ('10/30 12:38 PM', '+13.5 -107', '-13.5 -103'), ('10/30 12:35 PM', '+13 -102', '-13 -108'), ('10/30 12:34 PM', '+13 -103', '-13 -107'), ('10/30 12:06 PM', '+13.5 -110', '-13.5 +100'), ('10/30 11:57 AM', '+13.5 -108', '-13.5 -102'), ('10/30 11:36 AM', '+13.5 -107', '-13.5 -103'), ('10/30 09:01 AM', '+13.5 -110', '-13.5 +100'), ('10/30 08:59 AM', '+13.5 -108', '-13.5 -102'), ('10/30 08:13 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:11 AM', '+13.5 +100', '-13.5 -110'), ('10/30 06:09 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:04 AM', '+13.5 -110', '-13.5 +100'), ('10/30 05:32 AM', '+13.5 -105', '-13.5 -105'), ('10/30 04:48 AM', '+13.5 -107', '-13.5 -103'), ('10/30 12:51 AM', '+13.5 -110', '-13.5 +100'), ('10/29 01:31 PM', '+13.5 -105', '-13.5 -105'), ('10/29 01:31 PM', '+13 +103', '-13 -113'), ('10/29 01:28 PM', '+13 -102', '-13 -108'), ('10/29 07:59 AM', '+13 -105', '-13 -105'), ('10/29 07:20 AM', '+13 -103', '-13 -107'), ('10/29 07:14 AM', '+13 -105', '-13 -105'), ('10/29 04:47 AM', '+13 +100', '-13 -110'), ('10/29 04:14 AM', '+13 -105', '-13 -105'), ('10/28 08:17 PM', '+12.5 +100', '-12.5 -110'), ('10/28 12:52 PM', '+12.5 -105', '-12.5 -105')] 

Lista grande para test2:

 [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')] 

 >>> from datetime import timedelta, datetime >>> base_date = "10/29 06:58 AM" >>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p") def func(x): d = datetime.strptime(x[0], "%m/%d %I:%M %p") delta = d - b_d if d > b_d else timedelta.max return delta ... >>> min(list_date, key = func) ('10/30 02:17 PM', '+100', '-110') 

datetime.strptime convierte la fecha en un objeto datetime, por lo que b_d ahora se parece a esto:

 >>> b_d datetime.datetime(1900, 10, 29, 6, 58) 

Ahora podemos escribir una función que se puede pasar al parámetro key de min :

 delta = d - b_d if d > b_d else timedelta.max 

si d > b_d es decir, si la fecha pasada a min es mayor que base_date entonces asigne su diferencia a delta ;

 >>> timedelta.max datetime.timedelta(999999999, 86399, 999999) 

Actualizar:

 >>> from datetime import timedelta, datetime >>> base_date = '10/29 06:59 AM' >>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p") >>> def func(x): ... d = datetime.strptime(x[0], "%m/%d %I:%M %p") ... delta = d - b_d if d > b_d else timedelta.max ... return delta ... >>> lis2 = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')] >>> min(lis2, key = func) ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105') 

Comparaciones de tiempo:

Guión:

 from datetime import datetime, timedelta import sys import time list_date = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')] base_date = "10/29 06:58 AM" def func1(list_date): #http://stackoverflow.com/a/17249420/846892 get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p") base = get_datetime(base_date) later = filter(lambda d: get_datetime(d[0]) > base, list_date) return min(later, key = lambda d: get_datetime(d[0])) def func2(list_date): #http://stackoverflow.com/a/17249470/846892 b_d = datetime.strptime(base_date, "%m/%d %I:%M %p") def func(x): d = datetime.strptime(x[0], "%m/%d %I:%M %p") delta = d - b_d if d > b_d else timedelta.max return delta return min(list_date, key = func) def func3(list_date): #http://stackoverflow.com/a/17249529/846892 fmt = '%m/%d %I:%M %p' d = datetime.strptime(base_date, fmt) def foo(x): return (datetime.strptime(x[0],fmt)-d).total_seconds() > 0 return sorted(list_date, key=foo)[-1] def func4(list_date): #http://stackoverflow.com/a/17249441/846892 fmt = '%m/%d %I:%M %p' base_d = datetime.strptime(base_date, fmt) candidates = ((datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date) candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d) return candidates[1:] 

Resultados:

 >>> from so import * #check output irst >>> func1(list_date) ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105') >>> func2(list_date) ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105') >>> func3(list_date) ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105') >>> func4(list_date) ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105') >>> %timeit func1(list_date) 100 loops, best of 3: 3.07 ms per loop >>> %timeit func2(list_date) 100 loops, best of 3: 1.59 ms per loop #winner >>> %timeit func3(list_date) 100 loops, best of 3: 1.91 ms per loop >>> %timeit func4(list_date) 1000 loops, best of 3: 2.02 ms per loop #increase the input size >>> list_date = list_date *10**3 >>> len(list_date) 48000 >>> %timeit func1(list_date) 1 loops, best of 3: 3.6 s per loop >>> %timeit func2(list_date) #winner 1 loops, best of 3: 1.99 s per loop >>> %timeit func3(list_date) 1 loops, best of 3: 2.09 s per loop >>> %timeit func4(list_date) 1 loops, best of 3: 2.02 s per loop #increase the input size again >>> list_date = list_date *10 >>> len(list_date) 480000 >>> %timeit func1(list_date) 1 loops, best of 3: 36.4 s per loop >>> %timeit func2(list_date) #winner 1 loops, best of 3: 20.2 s per loop >>> %timeit func3(list_date) 1 loops, best of 3: 22.8 s per loop >>> %timeit func4(list_date) 1 loops, best of 3: 22.7 s per loop 

Esto se puede hacer usando el módulo datetime , que puede analizar la cadena de fecha en el objeto datetime, que admite la comparación y la aritmética con fechas

 from datetime import datetime # function for parsing strings using specific format get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p") base = get_datetime(base_date) later = filter(lambda d: get_datetime(d[0]) > base, list_date) closest_date = min(later, key = lambda d: get_datetime(d[0])) 

¿Búsqueda lineal?

 import sys import time base_date = "10/29 06:58 AM" def str_to_my_time(my_str): return time.mktime(time.strptime(my_str, "%m/%d %I:%M %p")) # assume year 1900... base_dt = str_to_my_time(base_date) list_date = [('10/30 02:18 PM', '-103', '-107'), ('10/30 02:17 PM', '+100', '-110'), ('10/29 02:15 AM', '-101', '-109')] best_delta = sys.maxint best_match = None for t in list_date: the_dt = str_to_my_time(t[0]) delta_sec = the_dt - base_dt if (delta_sec >= 0) and (delta_sec < best_delta): best_delta = delta_sec best_match = t print best_match, best_delta 

Productor:

 ('10/30 02:17 PM', '+100', '-110') 112740.0 
 import time import sys #The Function def to_sec(date_string): return time.mktime(time.strptime(date_string, '%m/%d %I:%M %p')) #The Test base_date = "10/29 06:58 AM" base_date_sec = to_sec(base_date) result = None difference = sys.maxint list_date = [ ('10/30 02:18 PM', '-103', '-107'), ('10/30 02:17 PM', '+100', '-110'), ('10/29 02:15 AM', '-101', '-109') ] for date_str in list_date: diff_sec = to_sec(date_str[0])-base_date_sec if diff_sec >= 0 and diff_sec < difference: result = date_str difference = diff_sec print result 
 import datetime fmt = '%m/%d %H:%M %p' d = datetime.datetime.strptime(base_date, fmt) def foo(x): return (datetime.datetime.strptime(x[0],fmt)-d).total_seconds() > 0 sorted(list_date, key=foo)[-1] 

Decora, filtra, encuentra la fecha más cercana, sin decorar

 >>> base_date = "10/29 06:58 AM" >>> list_date = [ ... ('10/30 02:18 PM', '-103', '-107'), ... ('10/30 02:17 PM', '+100', '-110'), ... ('10/29 02:15 AM', '-101', '-109') ... ] >>> import datetime >>> fmt = '%m/%d %H:%M %p' >>> base_d = datetime.datetime.strptime(base_date, fmt) >>> candidates = ((datetime.datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date) >>> candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d) >>> print candidates[1:] ('10/30 02:17 PM', '+100', '-110') 

Puede considerar poner la lista de fechas en un índice de Pandas y luego usar la función ‘truncar’ o ‘obtener_loc’.

 import pandas as pd ##Initial inputs list_date = [('10/30 02:18 PM', '-103', '-107'),('10/29 02:15 AM', '-101', '-109') , ('10/30 02:17 PM', '+100', '-110'), \ ] # reordered to show the method is input order insensitive base_date = "10/29 06:58 AM" ##Make a data frame with data df=pd.DataFrame(list_date) df.columns=['date','val1','val2'] dateIndex=pd.to_datetime(df['date'], format='%m/%d %I:%M %p') df=df.set_index(dateIndex) df=df.sort_index(ascending=False) #earliest comes on top ##Find the result base_dateObj=pd.to_datetime(base_date, format='%m/%d %I:%M %p') result=df.truncate(after=base_dateObj).iloc[-1] #take the bottom value, or the 1st after the base date (result['date'],result['val1'], result['val2']) # result is ('10/30 02:17 PM', '+100', '-110') 

Referencia: este enlace

Estaba buscando este problema y encontré algunas respuestas, la mayoría de las cuales verifican todos los elementos. Tengo mis fechas ordenadas (y supongo que la mayoría de las personas lo hacen), así que si lo haces también, usa numpy

 import numpy as np // dates is a numpy array of np.datetime64 objects dates = np.array([date1, date2, date3, ...], dtype=np.datetime64) timestamp = np.datetime64('Your date') np.searchsorted(dates, timestamp) 

searchsorted utiliza la búsqueda binaria, que utiliza el hecho de que las fechas están ordenadas, y por lo tanto es muy eficiente. Si usas pandas, esto es posible:

 dates = df.index # df is a DatetimeIndex-ed dataframe timestamp = pd.to_datetime('your date here', format='its format') np.searchsorted(dates, timestamp) 

La función devuelve el índice de la fecha más cercana (si la fecha buscada se incluye en las fechas, su índice se devuelve [si no se desea, use side = ‘right’ como un argumento en la función]), para obtener el fecha de hacer esto:

 dates[np.searchsorted(dates, timestamp)]