ロト7(loto7)予測


機械学習を勉強し始めたばかりで、勉強がてらロト7(loto7)の数字予測プログラム(Python)を作成してみました。もちろん、当たるよという保証もないです。

最初習ったのは、ランダムフォレストモデル(RandomForestRegressor)と決定木モデル(DecisionTreeRegressor)なので、この二つモデルをそのまま適用して、過去の当たり数字を学習したうえ、指定された日付の予測数字を出すというプログラムです。

ここに前提としての考えは、宝くじの出現数字はランダムと言いながら、もしかして日付に関連するじゃないか?という考えが入っています。なので、日付を入力すると、日付をもとに数字を予測した数字を7個出力してくれます。

これのソースをloto7.pyの名前で保存したうえ、「python loto7.py」で実行すれば動きます。(中で使用するデータセットloto7_train.csvは以下の場所からダウンロードしてください。https://loto7.thekyo.jp/download/index)

import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

# Path of the file to read
loto7_file_path = './input/loto7_train.csv'
loto_data = pd.read_csv(loto7_file_path)

# Create 7th targets object and call it y1 to y7
train_y1 = loto_data.Num1
train_y2 = loto_data.Num2
train_y3 = loto_data.Num3
train_y4 = loto_data.Num4
train_y5 = loto_data.Num5
train_y6 = loto_data.Num6
train_y7 = loto_data.Num7

# Create X
loto_data['Date'] = pd.to_datetime(loto_data['Date'])
loto_data['LotoYear']=loto_data['Date'].dt.year
loto_data['LotoMonth']=loto_data['Date'].dt.month
loto_data['LotoDay']=loto_data['Date'].dt.day

features = ['LotoYear', 'LotoMonth', 'LotoDay']
X = loto_data[features]

# Define the model. Set random_state to 1
rf_model1 = RandomForestRegressor(random_state=1)
rf_model2 = RandomForestRegressor(random_state=1)
rf_model3 = RandomForestRegressor(random_state=1)
rf_model4 = RandomForestRegressor(random_state=1)
rf_model5 = RandomForestRegressor(random_state=1)
rf_model6 = RandomForestRegressor(random_state=1)
rf_model7 = RandomForestRegressor(random_state=1)

# fit the model for 7 targets number
rf_model1.fit(X, train_y1)
rf_model2.fit(X, train_y2)
rf_model3.fit(X, train_y3)
rf_model4.fit(X, train_y4)
rf_model5.fit(X, train_y5)
rf_model6.fit(X, train_y6)
rf_model7.fit(X, train_y7)


# path to file you will use for predictions
test_data_path = './loto7_test.csv'

# read test data file using pandas
test_data = pd.read_csv(test_data_path)

# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_data['Date'] = pd.to_datetime(test_data['Date'])
test_data['LotoYear']=test_data['Date'].dt.year
test_data['LotoMonth']=test_data['Date'].dt.month
test_data['LotoDay']=test_data['Date'].dt.day
features = ['LotoYear', 'LotoMonth', 'LotoDay']
test_X = test_data[features]

# make predictions which we will submit. 
num1_preds = rf_model1.predict(test_X)
num2_preds = rf_model2.predict(test_X)
num3_preds = rf_model3.predict(test_X)
num4_preds = rf_model4.predict(test_X)
num5_preds = rf_model5.predict(test_X)
num6_preds = rf_model6.predict(test_X)
num7_preds = rf_model7.predict(test_X)

# print(num1_preds,num2_preds,num3_preds,num4_preds,num5_preds,num6_preds,num7_preds)
print('---------Random Forest Model----------')

print('---------小数点以下切り上げ----------')
# 小数点以下切り上げ
print(np.ceil(num1_preds))
print(np.ceil(num2_preds))
print(np.ceil(num3_preds))
print(np.ceil(num4_preds))
print(np.ceil(num5_preds))
print(np.ceil(num6_preds))
print(np.ceil(num7_preds))

print('---------Decision Tree Model----------')

# Define the model. Set random_state to 1
dt_model1 = DecisionTreeRegressor(random_state=1)
dt_model2 = DecisionTreeRegressor(random_state=1)
dt_model3 = DecisionTreeRegressor(random_state=1)
dt_model4 = DecisionTreeRegressor(random_state=1)
dt_model5 = DecisionTreeRegressor(random_state=1)
dt_model6 = DecisionTreeRegressor(random_state=1)
dt_model7 = DecisionTreeRegressor(random_state=1)

# fit the model for 7 targets number
dt_model1.fit(X, train_y1)
dt_model2.fit(X, train_y2)
dt_model3.fit(X, train_y3)
dt_model4.fit(X, train_y4)
dt_model5.fit(X, train_y5)
dt_model6.fit(X, train_y6)
dt_model7.fit(X, train_y7)


# path to file you will use for predictions
test_data_path = './loto7_test.csv'

# read test data file using pandas
test_data = pd.read_csv(test_data_path)

# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_data['Date'] = pd.to_datetime(test_data['Date'])
test_data['LotoYear']=test_data['Date'].dt.year
test_data['LotoMonth']=test_data['Date'].dt.month
test_data['LotoDay']=test_data['Date'].dt.day
features = ['LotoYear', 'LotoMonth', 'LotoDay']
test_X = test_data[features]

# make predictions which we will submit. 
num1_preds = dt_model1.predict(test_X)
num2_preds = dt_model2.predict(test_X)
num3_preds = dt_model3.predict(test_X)
num4_preds = dt_model4.predict(test_X)
num5_preds = dt_model5.predict(test_X)
num6_preds = dt_model6.predict(test_X)
num7_preds = dt_model7.predict(test_X)

print('---------小数点以下切り上げ----------')
# 小数点以下切り上げ
print(np.ceil(num1_preds))
print(np.ceil(num2_preds))
print(np.ceil(num3_preds))
print(np.ceil(num4_preds))
print(np.ceil(num5_preds))
print(np.ceil(num6_preds))
print(np.ceil(num7_preds))

いろいろと意見やアドバイスがあるなら、コメントしてください。


コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です