The Challenges of Bicycle Sharing Systems
Bicycle sharing framework is a change of the customary bicycle rental frameworks whereby the procedure included enlistment to get participation, after which the individuals can lease and return bicycles, ( Kiefer & Behrendt 2016 pp.79-88).
This procedure has now been computerized on the novel bicycle sharing frameworks. Present day bicycle sharing framework are progressively getting to be valuable in urban focuses all through the world, (Kumar et al 2016 p.21597). This since bicycles give shabby and reasonable transport between short separations. Nonetheless, the administration of bicycle sharing frameworks presents issues.
The significant issue is rebalancing of the bikes, (Rivers & Koedinger 2017 pp.37-64). An irregularity is made in the framework when the clients make a hilter kilter request design. For viable working of the framework, there should be rebalancing of bikes in each bike focus. To take care of the directing issues particularly amid the surge hour, machine learning calculations prove to be useful to help explain this test, (Jian et al 2016 pp. 602-613).
For a consistent task of the bicycle sharing framework, dynamic grouping systems should be actualized for anticipating the over interest example of demand of the bicycles, (Carpenter et al 2017).
Taking care of the Demand Imbalance Problem
All together for the bicycle rebalancing to be compelling, the stock target levels must be precisely anticipated. In this task, three relapsing models have been actualized on a bicycle sharing dataset from Kaggle, and as gave in the task paper dataset (bicycle sharing dataset), (Orfanakis & Papadakis 2016). The calculations are as per the following:
- Decision tree calculation
- Gradient help calculation
iii. Linear relapse calculation
Dataset Description
The dataset has been recovered from the UCI information store from the accompanying url: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset . The dataset has been enhanced with occasional and information identified with climate. This was done at the University of Porto in a The dataset has hourly information and every day information which contains segment names as headers. The headers are as appeared underneath in the screen shot from the coding segment of this task:
data_path = ‘C:/Users/ROSANA/Desktop/bike/hour.csv’
rides = pd.read_csv(data_path)
rides.head()
instant |
dteday |
season |
yr |
mnth |
hr |
holiday |
weekday |
workingday |
weathersit |
temp |
atemp |
hum |
windspeed |
casual |
registered |
cnt |
|
0 |
1 |
2011-01-01 |
1 |
0 |
1 |
0 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.81 |
0.0 |
3 |
13 |
16 |
1 |
2 |
2011-01-01 |
1 |
0 |
1 |
1 |
0 |
6 |
0 |
1 |
0.22 |
0.2727 |
0.80 |
0.0 |
8 |
32 |
40 |
2 |
3 |
2011-01-01 |
1 |
0 |
1 |
2 |
0 |
6 |
0 |
1 |
0.22 |
0.2727 |
0.80 |
0.0 |
5 |
27 |
32 |
3 |
4 |
2011-01-01 |
1 |
0 |
1 |
3 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.75 |
0.0 |
3 |
10 |
13 |
4 |
5 |
2011-01-01 |
1 |
0 |
1 |
4 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.75 |
0.0 |
0 |
1 |
1 |
dummy_fields = [‘season’, ‘weathersit’, ‘mnth’, ‘hr’, ‘weekday’]
for field in dummy_fields:
dummies = pd.get_dummies(rides[field], prefix=field)
rides = pd.concat([rides, dummies], axis=1)
fields_to_drop = [‘instant’, ‘dteday’, ‘season’, ‘weathersit’, ‘mnth’, ‘hr’, ‘weekday’, ‘atemp’, ‘workingday’]
data = rides.drop(fields_to_drop, axis=1)
data.head()
yr |
holiday |
temp |
hum |
windspeed |
casual |
registered |
cnt |
season_1 |
season_2 |
… |
hr_21 |
hr_22 |
hr_23 |
weekday_0 |
weekday_1 |
weekday_2 |
weekday_3 |
weekday_4 |
weekday_5 |
weekday_6 |
|
0 |
0 |
0 |
0.24 |
0.81 |
0.0 |
3 |
13 |
16 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0.22 |
0.80 |
0.0 |
8 |
32 |
40 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
0 |
0 |
0.22 |
0.80 |
0.0 |
5 |
27 |
32 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
3 |
0 |
0 |
0.24 |
0.75 |
0.0 |
3 |
10 |
13 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
4 |
0 |
0 |
0.24 |
0.75 |
0.0 |
0 |
1 |
1 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
5 rows × 59 columns
Scaling target variables
After the target variables were scaled, the following was the output:
yr |
holiday |
temp |
hum |
windspeed |
casual |
registered |
cnt |
season_1 |
season_2 |
… |
hr_21 |
hr_22 |
hr_23 |
weekday_0 |
weekday_1 |
weekday_2 |
weekday_3 |
weekday_4 |
weekday_5 |
weekday_6 |
|
0 |
0 |
0 |
-1.334609 |
0.947345 |
-1.553844 |
-0.662736 |
-0.930162 |
-0.956312 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
-1.438475 |
0.895513 |
-1.553844 |
-0.561326 |
-0.804632 |
-0.823998 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
0 |
0 |
-1.438475 |
0.895513 |
-1.553844 |
-0.622172 |
-0.837666 |
-0.868103 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
3 |
0 |
0 |
-1.334609 |
0.636351 |
-1.553844 |
-0.662736 |
-0.949983 |
-0.972851 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
4 |
0 |
0 |
-1.334609 |
0.636351 |
-1.553844 |
-0.723582 |
-1.009445 |
-1.039008 |
1 |
0 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Since this assignment involve plotting of the given dataset, the python notebook has been used.
There is a total of 17379 recoords on a horly basuiis oof the dataset.
Building of Regression Models
PART 1
- Decision Trees
The following are some of the code snippet implements the decision tree algorithm:
import pandas as pd
import numpy as np
from sklearn.cross_validation import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor, export_graphviz
# read the data and set “datetime” as the index
url = ‘https://raw.githubusercontent.com/justmarkham/DAT8/master/data/bikeshare.csv’
bikes = pd.read_csv(url, index_col=’datetime’, parse_dates=True)
bikes.rename(columns={‘count’:’total’}, inplace=True)
bikes[‘hour’] = bikes.index.hour
bikes.head()
bikes.tail()
season |
holiday |
workingday |
weather |
temp |
atemp |
humidity |
windspeed |
casual |
registered |
total |
hour |
|
datetime |
||||||||||||
2012-12-19 19:00:00 |
4 |
0 |
1 |
1 |
15.58 |
19.695 |
50 |
26.0027 |
7 |
329 |
336 |
19 |
2012-12-19 20:00:00 |
4 |
0 |
1 |
1 |
14.76 |
17.425 |
57 |
15.0013 |
10 |
231 |
241 |
20 |
2012-12-19 21:00:00 |
4 |
0 |
1 |
1 |
13.94 |
15.910 |
61 |
15.0013 |
4 |
164 |
168 |
21 |
2012-12-19 22:00:00 |
4 |
0 |
1 |
1 |
13.94 |
17.425 |
61 |
6.0032 |
12 |
117 |
129 |
22 |
2012-12-19 23:00:00 |
4 |
0 |
1 |
1 |
13.12 |
16.665 |
66 |
8.9981 |
4 |
84 |
88 |
23 |
treereg = DecisionTreeRegressor(max_depth=7, random_state=1)
scores = cross_val_score(treereg, X, y, cv=10, scoring=’mean_squared_error’)
np.mean(np.sqrt(-scores))
OUTPUT: 107.64196789476493treereg = DecisionTreeRegressor(max_depth=3, random_state=1)treereg.fit(X, y) DecisionTreeRegressor(criterion=’mse’, max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=1, splitter=’best’)
from sklearn.ensemble import Gradient Boosting Regressor
rfr = RandomForestRegressor().fit(train_x, train_y)
prediction_rfr = rfr.predict(train_x)from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor().fit(train_x, train_y)
prediction_rfr = rfr.predict(train_x)
data_path = ‘C:/Users/ROSANA/Desktop/bike/hour.csv’
train_data = pd.read_csv(
train_data.head(3)
datetime |
season |
holiday |
workingday |
weather |
temp |
atemp |
humidity |
windspeed |
casual |
registered |
count |
|
0 |
2011-01-01 00:00:00 |
1 |
0 |
0 |
1 |
9.84 |
14.395 |
81 |
0.0 |
3 |
13 |
16 |
1 |
2011-01-01 01:00:00 |
1 |
0 |
0 |
1 |
9.02 |
13.635 |
80 |
0.0 |
8 |
32 |
40 |
2 |
2011-01-01 02:00:00 |
1 |
0 |
0 |
1 |
9.02 |
13.635 |
80 |
0.0 |
5 |
27 |
32 |
prediction_rfr = rfr.predict(train_x
plt.figure(figsize=(5, 5))
plt.scatter(prediction_rfr, train_y)
plt.plot( [0,1000],[0,1000], color=’red’)
plt.xlim(-100, 1000)
plt.ylim(-100, 1000)
plt.xlabel(‘prediction’)
plt.ylabel(‘train_y’)
plt.title(‘Random Forest Regressor Model’)
import pandas as pd
url = ‘https://raw.githubusercontent.com/justmarkham/DAT8/master/data/bikeshare.csv’
bikes = pd.read_csv(url, index_col=’datetime’, parse_dates=True)
def calculate_period(timestamp):
initial_date = date(2011, 1, 1)
current_date = timestamp.date()
return (current_date.year – initial_date.year) * 12 + (current_date.month – initial_date.month)
ossible_features = [
‘season’, ‘holiday’, ‘workingday’, ‘weather’,
‘temp’, ‘atemp’, ‘windspeed’, ‘month’,
‘hour’, ‘year’, ‘week_day’]
target = ‘count’
Building a linear regression mode
feature_cols = [‘temp’]
X = bikes[feature_cols]
y = bikes.total
bikes.groupby(‘hour’).total.mean().plot()
feature_cols = [‘hour’, ‘workingday’]
X = bikes[feature_cols]
y = bikes.total
linreg = LinearRegression()
linreg.fit(X, y)
linreg.coef_
Use 10-fold cross-validation for the linear regression model.
Scores = cross_val_score(linreg, X, y, cv=10, scoring=’mean_squared_error’)
np.mean(np.sqrt(-scores))
Output: 165.2232866891297
Conclusion
In conclusion, this task and numerous more examinations that have been done on the bicycle sharing dataset demonstrate that machine learning calculations can be utilized to take care of the forecast issue that is looked by bicycle sharing frameworks in different urban areas on the planet, (Diamond & Boyd 2016 pp.2909-2913). Examination of the client conduct, bike use personal conduct standards taxi be saw in the relapse models executed. The numerous tests that have been done in this genuine dataset exhibit how powerful the relapse models can be in tending to the bicycle sharing issue, (Salvatier, Wiecki, Fonnesbeck 2016 p.55).
References
Kiefer, C. and Behrendt, F., 2016. Smart e-bike monitoring system: real-time open source and open hardware GPS assistance and sensor data for electrically-assisted bicycles. IET Intelligent Transport Systems, 10(2), pp.79-88.
Jian, N., Freund, D., Wiberg, H.M. and Henderson, S.G., 2016, December. Simulation optimization for a large-scale bike-sharing system. In Proceedings of the 2016 Winter Simulation Conference (pp. 602-613). IEEE Press.
Rivers, K. and Koedinger, K.R., 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education, 27(1), pp.37-64.
Orfanakis, V. and Papadakis, S., 2016, December. Teaching basic programming concepts to novice programmers in secondary education using Twitter, Python, Ardruino and a coffee machine. In Hellenic Conference on Innovating STEM Education (HISTEM), Greece.
Salvatier, J., Wiecki, T.V. and Fonnesbeck, C., 2016. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2, p.55.
Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P. and Riddell, A., 2017. Stan: A probabilistic programming language. Journal of statistical software, 76(1).
Kumar, S., Vo, A.D., Qin, F. and Li, H., 2016. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Scientific reports, 6, p.21597.
Diamond, S. and Boyd, S., 2016. CVXPY: A Python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1), pp.2909-2913.