A Data Fusion-Based Hybrid Sensory System for Older People’s Daily Activity and Daily Routine Recognition

Sensor-based human activity recognition (HAR) has received considerable attention due to its wide applications in health care. Each sensor modality has its advantages and limitations. Single sensor modalities sometimes may not cope with complex situations in practice. To resolve this challenge, we design and develop a practical hybrid sensory HAR system for older people. To enhance the performance of the system, we propose a unique data fusion method through combining both wearable sensors and ambient sensors. The wearable sensors in this paper are used for identifying the specific daily activities. The ambient sensors delivering the occupant’s room-level daily routine provide a more comprehensive surveillance with the wearable sensors together; meanwhile, the captured room-level location information is also used in the data fusion to trigger the sub classification models pretrained by wearable data. We also explore a new feature set extracted from wearable sensors to improve the system performance. We experimentally evaluate our system by applying four typical mutual information-based feature selection methods and the support vector machines classification algorithm instead of other complex algorithms, with the aim of exploring a practical way to improve recognition accuracy. The ground-truth data are gathered from 21 subjects, including 17 daily activities with the sample size of 2,142,000. The experimental results demonstrate the effectiveness of our method. The new feature set help improve the accuracy to 96.82% ± 0.15 from 89.81% ± 0.54 using wearable data only; and the data fusion with ambient information achieves a further increased accuracy of 98.32%.


I. INTRODUCTION
NCREASED life expectancy coupled with declining birth rates leads to an aging population structure [1]. Agingcaused changes, such as physical or cognitive decline, could affect people's daily life, resulting in injuries, mental health or the lack of physical activity. Around 75% of older people prefer to continue living at their own homes and a majority of them (between 60 and 70) believe they can live independently and accomplish daily tasks without a caregiver [2]. Providing this group of older people with formal or traditional cares might imply an extra cost, and even disturbances in everyday life. However, certain assistances are still needed to maintain or improve the quality of life of these older people. Recent decades, the advancement of assistive technologies has promoted independent, active and healthy aging [3]. HARbased systems become one of the most promising solutions to assist older people's daily life [4][5][6][7].
HAR learns activities from a series of observations on the actions of subjects in real life settings, using ambient sensors, wearable sensors or hybrid sensory modality. Specifically, in an ambient-sensor-based HAR (ASHAR) system, dozens of sensors are typically deployed at home, which are attached to a door, a kettle, a fridge, on the floor, etc., providing the contextual information related to the defined activities [8][9][10]. ASHAR could be less obtrusive because of no on-body sensors deployed, whilst usually at the cost of poor flexibility and complex sensor deployment. Also, ASHAR works in a limited area. Besides, using pure ambient sensors is less capable of identifying detailed changes and elaborate actions. A wearablesensor-based HAR (WSHAR) system identifies human activities by mining the informative data from wearable sensors using computational algorithms, and it can function in a relatively large space. Generally, placing more sensors on multiple body parts is beneficial for improving the performance and robustness of WSHAR. For example, Laudanski, et al. [10] identify the post-stroke-gait-related activities by putting two inertial measurement units (IMUs) on the less-affected and affected shanks individually. Their experimental results demonstrate that the highest classification accuracy can be achieved using both sensor positions. Cleland, et al. [11] further study the impact of combining multiple accelerometers from different body positions (check, wrist, hip, foot, lower back and thigh). Their results indicate that combining two or more sensor positions can achieve better accuracy. Our previous work also demonstrates that combining 6 wrists-worn sensors with a chest-attached heart rate sensor can improve the daily activity recognition performance [12].
However, multiple sensors with complex sensor placement on body could cause higher cost, practical deployment difficult, and obtrusiveness for older users. Pure WSAHR systems also have some limitations that may enable less accurate recognition for certain activities, these activities contain similar sensorderived attributes, such as brushing and eating [6]. Consequently, WSHAR systems either confront with the problems of complex sensor placement on body or the limited capacity of identifying elaborate actions, which lays the foundation to develop hybrid sensory systems to tackle these problems [4]. The hybrid sensory systems which harness single sensor modalities are thus explored in HAR. Stikic et al. [13] combine the data from accelerometers and Radio Frequency Identification (RFID). They place 191 RFID tags on 55 objects to provide the primary information for activity recognition. The accelerometers are only used when the RFID data are not sufficient. The experimental results suggest that the sensor combination improves the recognition performance compared with using one of the two sensing modalities separately. Roy et al. [14] use ambient and mobile data in a multi-inhabitant environment for daily activity detection. Their initial results reach around 70% which is higher than the performance using the smartphone-based accelerometers alone. Atallah, et al. [15] combine the ear-worn sensors (an accelerometer and an oxygen saturation (SpO 2 ) sensor) with the ambient-mounted blob sensors to detect patients' daily pattern changes. Zhu & Sheng [16] use three motion sensors and two cameras to identify the body activities and hand gestures simultaneously. The cameras A Data Fusion based Hybrid Sensory System for Older People's Daily Activity and Daily Routine Recognition Yan Wang, Shuang Cang, Hongnian Yu, IEEE senior member I installed on the wall are used to capture the wearer's location information; and the wearable motion sensors attached on the right thigh, the right hand and the waist separately are used to record the motion-related information. They explore the correlation between the human activities and the location information and evaluate the effectiveness and accuracy of their method in a mock home environment. Some other studies report the improved performances of activity recognition by combining the wearable sensors with infrared sensors [17,18], where the data fusion is directly feeding the features extracted from two-source sensors to the classifiers. This paper proposes a unique data fusion method based on a hybrid sensory system for older people's daily activity and daily routine monitoring by leveraging the strengths of both WSHAR and ASHAR. The paper has two main contributions below.
1) Proposing an effective data fusion method based on a practical hybrid HAR system. The wearable data are used for recognition of specific daily activities. The location information captured by the room-mounted passive infrared (PIR) sensors has two functions. Firstly, it is used for inference of a user's room-level daily routine. According to the generally occurring rooms of an activity, we skilfully divide the whole task of recognizing all the defined activities into certain room-based sub tasks. Since each sub classification model for each sub task takes a smaller number of activities' recognition, we can improve efficiency and accuracy. In data fusion, the location information is also used to trigger the sub models that are pre-trained by the corresponding wearable data. Our data fusion effectively combines two-source sensor information for HAR, which is different to the other related data fusion methods in HAR. 2) Exploring augmented features from limited sensors for accuracy improvement. We implement a group of attitude-related features (ARFs) and evaluate their contribution to HAR in our system. Most previous studies in WSHAR employ the conventionally-used features (CUFs) generated from a channel (axis) of a single sensor or multiple channels of a single sensor, e.g., the mean of the acceleration readings along the xaxis, or the correlation between the x-axis and y-axis of the acceleration readings. Only few studies exploit a handful of ARFs, e.g., tilt, yaw or pitch angle [19,20]. Different to CUFs, ARFs are generated from the multiple wearable sensors instead of one individual sensor. The rest of the paper is organized as follows. Section II presents the proposed data fusion method based on a hybrid sensory system and the sensors used in this research. Section III introduces the data acquisition and the methods used for data processing. The experimental results are presented in Section IV. Section V provides the discussion and Section VI is proceeded with the conclusions and future directions.

A. Proposed system and data fusion method
Our previous work [12] develops a multi-sensor activity recognition system and investigates the contribution of seven types of wearable sensors to HAR. Seven sensors placed on three body parts can cause obtrusiveness and high cost for real use. In this paper, we use less wearable sensors and explore augmented features from the limited sensors to improve accuracy. Another problem in HAR is that some activities are difficult to recognize accurately when using wearable sensors alone, such as brushing teeth and eating (feeding), wiping and ironing, due to the similar attributes regarding wrist movements [6]. In this work, we assume, for instance, that eating is less likely happening in a bathroom. Thus, if the ambient information tells the classifier that the user is in the bathroom at a specific moment, it will be easier to differentiate brushing teeth from eating. To address the above-mentioned problems, we propose a practical approach by applying a data fusion method in our hybrid sensory system to combine twosource sensory data and exploring an augmented feature set from wearable sensors as well. Figure 1 shows our proposed system with three blocks: wearable information processing, ambient information processing and data fusion. The wearable sensing involves a wrist-worn device with five initially selected sensors inside, delivering the user's motion-caused observations. Each ambient sensing set (with a PIR sensor inside) is installed in one room, which provides the user's room-level location information. The system targets older people who live alone, which means, most of the time, only one ambient sensing set can capture "1" (presence) and others capture "0" (absence) at one specific moment. The recorded long-time "0" and "1" series can reveal the occupant's daily routine. As presented in Fig.1, we first compare the individual performance of the ARFs and the CUFs before applying data fusion; and the bestperformed feature set is used for classification and data fusion. Data fusion is the core of the system, which utilizes the ambient information of "presence" ("1") to trigger a sub classification model. The sub models are pretrained by the corresponding wearable data assigned in a specific room. For example, when room n is detected as occupied, only the sub model n is activated and works at this moment. Thus, each sub model is responsible for recognizing a smaller number of activities compared with the scenario of recognizing all the defined activities without applying data fusion (the whole model). By doing this, the overall recognition accuracy can be improved without additional computation. The system switches to "the whole model" mode to deal with the situation when more than one occupant is detected, i.e., two or more than two "1" are captured at the same time. "The whole model" recognizes all the defined activities together only using the wearable data.
Generally, there are three function modes in our system: the whole classification model (the pure wearable sensing mode) identifying specific daily activities, the pure ambient sensing mode delivering the occupant's room-level daily routine, and the room-based sub classification mode (data fusion applied) providing a spatio-temporal surveillance with the wearable sensors. The first mode can work alone when the ambient sensing fails, and the second mode can roughly identify the user's daily routine without wearable sensing. The data-fusionapplied mode provides a more accurate and complementary HAR surveillance when both the wearable sensing and ambient
Baro. sensing function properly. We evaluate the proposed method with the ground-truth data following the procedure in Fig.1.

B. Sensors and sensor placement
In this subsection, we present the details of the sensors used in the proposed system ( Fig.1).

1). Wearable sensors
We initially select five wearable sensors: a 3-axis accelerometer (MPU6050, range of ±2g), a 3-axis gyroscope (MPU6050, range of ±1000°/s), a 3-axis magnetometer (HMC588, range of ± 4.07 Gauss), a barometer for height measuring (BMP180, with resolution of 0.5m for the height measuring) and a temperature sensor (BMP180, range of -10~60 ℃ ). The accelerometer measures linear motion. The gyroscope measures rotational motion. The magnetometer provides the direction of an ambient magnetic field. The three inertial sensors above enable the measurement of motioncaused variations and offer useful information for activity recognition [6,20,21]. Also, we derive the ARFs from the three inertial sensors in this work. The barometer and the temperature sensor are selected since the height variations are likely linked to certain activities, such as climbing stairs or exercise; and the temperature changes are usually accompanied with some specific activities, such as cooking or eating.
We integrate the selected sensors into a specificallycustomized module, as shown in Fig.2. The upper one in Fig.2 (a) is the wearable device with 5 built-in sensors and the lower one is the receiver. The wearable device has an on-board processing system that can deliver the attitude angles. Thus, the wearable module provides 3 attitude values (yaw, pitch, roll) of the wearable device and another 11 readings from the 5 individual sensors. All the readings are wirelessly recorded with a nearby laptop at the sampling rate of 20Hz.
Eq. (1) presents data series at time t from the wearable module.
{ } (1) where denotes the index of the data series regarding the sample rate; and are the temperature and the height (1) (2)

2). Ambient sensors
We use ambient sensors to detect a user's in-home location.
The passive infrared (PIR) sensor is selected due to its utility, cost savings and energy savings in smart homes [22,23]. Our developed ambient sensor module consists of two parts (see Fig.2 (b)): the Receiving Terminal Unit (RTU) and the Centre Unit (CU). The CU circularly inquires the status of each RTU and receives the data sent from the RTUs. The readings obtained from the ambient sensors are processed as series of binary digits, of which "1" represents presence and "0" represents absence.

3.) Sensor placement
Sensor placement is one of the important issues for WSHAR. Sensors placing on different body parts offer diverse information and lead to different recognition performances [11]. Wrist is a promising position for detecting activities as most activities are associated with wrist movements [12,24]. Additionally, according to the survey in [25], 299 responders from 4 different countries give the answer that the wrist is the most-preferred placement when being asked about where they would like to wear the sensors. We choose the dominant wrist for the wearable sensors placing ( Fig.1), taking both the recognition performance and the user acceptance into account. As to the placement for the PIR sensor sets, each of them is placed on the rear side behind the door on the floor in the room (see Fig.1) for simplicity and disturbances avoiding.

III. DATA PROCESSING
This section presents the methods and algorithms involved in further stages in Fig.1, including (A) data acquisition, (B) data pre-processing, (C) feature selection, (D) data fusion and (E) performance evaluation, respectively.

A. Data acquisition
This paper focuses on indoor daily activity recognition for older people to observe their routine activities and abnormal patterns. We predefine 17 activities listed in Table I which can basically reveal independent life skills [26], including basic survival tasks (walking, eating, cooking, etc.), the activities for maintaining an independent life at home (using phones, mopping, washing, ironing, etc.) and abnormal activities (falls, long-term lying). Some activities, such as toilet using, dressing/undressing and bathing, are not included due to the privacy concerns or the unavailability of data because of the limitation of the sensor modules. We do not directly monitor the toilet using or bathing, nevertheless, we capture how often and how long the occupant uses bathroom from the ambient sensors. It is worth noting that a larger data set is beneficial for evaluating our proposed system. Therefore, our data set has 17 activities, which is large enough for our experimental purpose.
The data collection associated procedures are approved by Bournemouth University Research Ethics Committee. The data collection is carried in our developed home -based hybrid sensory environment. The activities except Falls are collected from 21 subjects (aged from 60 to 74, 11 females and 10 males, all right-handed). Table II shows their basic information. 'Fall detection' is one of the important tasks in HAR [27]. Considering older subjects' safety, we recruit 21 young subjects (aged from 25 to 35, 11 females, and 10 males) who replace the older subjects performing natural falls in different ways (forward, backward, left-side and right-side) onto a mattress.
During the data collection, the wearable device (the upper one in Fig.2(a)) is tightly bound at the subject's dominant wrist for acquiring the movement-caused signals from the sensors inside. Meanwhile, we deploy a PIR sensor set (the RTU in Fig.2 (b)) in each room ( Fig.1) to capture the user's presence and absence information. Taking the home structures into account, our predefined activities are assigned to four groups (Table I) according to their occurring places, i.e., 5 activities in Bathroom, 8 in Kitchen, 10 in Living room and 5 in Bedroom. We prepare the activity list for each room. The subjects are encouraged to independently perform each activity in their own way. They can have any breaks during data collection. The valid data from the same activity are added up up if the data collection is interrupted. We label the data manually and mark the start and end time for each activity. The whole data collection process lasts over twenty days. Each older subject completes 16 activities and each younger subject performs 1 activity of falls. We use 17 activities after merging the falls to the 16 daily activities. The valid data from each activity is 5 minutes with the sampling rate of 20Hz. The total sample size for wearable data is therefore 2,142,000 for 17 activities and 21 subjects. It is noted that the data do not contain overlap and disturbances between activities. Fig.3 presents some data collection cases with the corresponding raw data, in which the y axis shows the readings from different sensors and the x axis represents the number of data points. The raw data over different activities present diverse values and variations.
For the attitude angles, we can see from Fig.3 that the yaw angle fluctuates between 100 degrees and 150 degrees for Cook, waves between slightly under 250 degrees and over 300 for Mop, while keeps relatively steady just over 200 degrees then drops dramatically until a fall occurs for Falls. Mining useful information from the raw data can facilitate the later learning in HAR.

B. Data preprocessing
The data obtained from PIR sensors are processed as the format of { } digital series. The data pre-processing here refers to the wearable sensory data. For facilitating the later learning, time data series in Eq. (1) are needed to segment into certain fixed sub windows. It is generally acknowledged that a window length of several seconds can sufficiently capture circles of activities, such as walking, running, using stairs, etc. [28,29]. Here, we follow the principles in [29] setting our segmentation length as 12.8s (256 samples in each window). Meanwhile, 50% overlap between consecutive windows is applied to reduce possible information loss at the edges of pair of adjacent sub windows. The total number of window segmentations N for a data series is then obtained in Eq. (2) where is the data length, is the overlap size and is the segmentation length. Eq. (2) rounds a number to the next lower integer. After segmentation, is split into N sub windows { } . No smoothing filtering or medium filtering is applied to the raw data before feature extraction.

C. Feature extraction
Feature extraction plays a pivotal role in HAR, which typically transforms the original data into the informative  [30], time-domain features [31], frequencydomain features [6] as well as other hybrid features [19]. As mentioned previously, the commonly-used features are generated from an individual channel (axis) of a sensor, i.e. CUFs. The ARFs are instead derived from the multiple sensory channels or multiple sensors. The roll in Fig.4 is the sides of the device moving up/down; the pitch is the head of the device moving up and down and the yaw is the head moving right and left. From Fig.3, we can see that the attitude angles of the wearable device vary over different activities, which implies the potential of the ARFs for activity recognition. Hence our research explores the contribution of the ARFs to HAR based on the collected data. We apply the typical time-domain and frequency-domain features on the observations to generate CUFs and ARFs for later comparisons. The obtained feature space can be presented as where is the feature extraction function set, implementing the calculation of all the features used in the study; given in Eq. (1) is the data series obtained from the wearable device. We denote all the extracted features as All (ARFs + CUFs), the features related to the wearable device's attitude as ARFs, the remaining features excluding ARFs as CUFs. The feature extraction is conducted in each segmentation window . The details of the specific features used in this paper are given in appendix.
To the CUFs, we do not apply all types of features on each of all 5 sensors evenly. This is because people live in varied floors, different weather conditions and changing room environments, which means some features (like the max, the mean of the height or the temperature) are less useful to distinguish activities. Only the features that can represent the variations of the observations instead of the absolute or specific values are applied to the height and the temperature measurements. Features with multiple null values or with similar or equal values for different activities are removed manually. Finally, the feature pool is constructed in Table III with the abbreviations. Table III includes the potential features for activity recognition and often contains many redundant and irrelevant features. Applying the feature selection can select the optimal sub feature set and reduce the dimensionality of the feature space.

D. Feature selection
Mutual information (MI) based feature selection algorithms are a big family of the existing feature selection (FS) methods. Algorithms in this family usually exploit different filter criteria to measure the importance of the candidate features. The FS process involved is independent of any classifier and therefore capable of obtaining a comparable trade-off between the performance and the efficiency. We use four MI-based FS methods, i.e. minimum Relevance Maximum Relevance (mRMR), Joint Mutual Information (JMI), Conditional Mutual Information Maximum (CMIM), and Double Input Symmetrical Relevance (DISR) from [32].

E. Classification and performance assessment
The support vector machine (SVM) is one of the most robust and accurate methods among all well -known classification algorithms [11,33,34]. We use the libSVM package in MATLAB [35] with the RBF kernel to train and test our ground-truth data based on 10-fold-cross validation. The available data set from all subjects are split into 10  roughly equal-sized folds, and each fold has the roughly same number of patterns from each activity of each subject. 8 folds are used as training data, one fold serves for validation, and one fold is for testing the model. Each of the 10 folds is used exactly once as test data and the test data is unseen for the classifier. The results reported in the rest of the paper are the average of 10 test measures.

IV. RESULTS AND ANALYSIS
A. Identification of functions of the selected wearable sensors and the contribution of different feature sets We initially select 5 types of sensors which are integrated into a wrist-worn device (Fig.2). The diversity of the multiple sensors is expected to compensate the possible insufficient information when only placing them on the wrist. It is less practical to show the performance of all possible combinations of the 5 sensors. We divide the five sensors in the following groups (the first column in Table IV) according to their contributions in the related studies [6,12,14,20,36] to identify the sensors' functions in this work. The mRMR, JMI, CMIM and DISR are applied individually to select the best sub features from the CUFs pool for each sensor group, and the selected features are fed into the SVM classifiers with 10-foldcross validation. Table IV shows the classification accuracies over different sensor groups with four FS methods. When using one single sensor, accelerometer and the gyroscope achieve the better average accuracy of 83.62% and 82.18%, respectively; the magnetometer gives a lower average classification accuracy of 74.25%; the temperature and the barometer are unlikely useful on their own from the experimental results, giving lowest results. When using two sensors among the accelerometer, the gyroscope and the magnetometer, the classification accuracies are improved to a range of accuracies, between 84.26% and 86.35%. And the combination of the accelerometer, the gyroscope and the magnetometer (AGM) gives the highest average accuracy of 87.97%, and the best accuracy of 89.81% among all the groups is achieved by using the mRMR plus the SVM. When the barometer or/and the temperature sensor are combined with the AGM, the accuracies remain unchanged at 89.81%. The experimental results indicate that the temperature and the barometer fail in improving the recognition accuracy, which could be attributed to the assumption that the features extracted from these two sensors might be less discriminating or overwhelmed by the features extracted from other sensors. Thus, only AGM are used for the later stages hereafter.
The results obtained above are based on the CUFs. We also apply the mRMR, JMI, CMIM and DISR on the ARFs and all the features (i.e. CUFs + ARFs in Table III) to evaluate the performance of the ARFs. Fig. 5 shows the performance of the ARFs, the CUFs, and the ARFs+CUFs in terms of accuracy. We can see that the ARFs (the curve group in red) present the highest accuracy with respect to the used FS methods, followed by the feature set of ARFs+CUFs (the curves in blue) and the CUF S (the curves in green). The ARFs produce higher accuracies only using 5 to 20 selected features by the JMI, CMIM and DISR. The ARFs+CUFs perform better than the CUFs, with the best accuracy of around 92% and 90%, respectively. When applying the FS on the ARFs+CUFs, taking the mRMR as an example, nearly half of the selected features are from the CUFs and the other half from the ARFs. The detailed results are shown in Table V, in which all the results are obtained with 30 selected features. Specifically, the mRMR produces the highest accuracy of 89.8% based on the feature set of CUFs; the CMIM achieves the highest accuracy of 91.74% on the CUFs+ARFs; and the JMI, CMIM and DISR present a greatly-improved accuracy of over 96% on ARFs.
Table V also shows the classification accuracy for each activity in accordance with the three best cases in bold. The last column in Table V shows the accuracy difference for each activity between the ARFs and the CUFs. We can see that the accuracy increases at different degrees for the vast majority of the activities, especially for some misclassified activities when using CUFs. For example, the Read presents the largest increase by 20.45% on ARFs, which is usually misclassified as Lie on CUFs; next is the Mop with a rise of 12.69%; and followed by the Wipe with an increase of 12.16%, which is easily misclassified as Iron when using CUFs; the Exercise and the Phone only see a slightly increased accuracy; a dropped accuracy only occurs with the Stand. It is also found that the Falls and the Walk achieve their highest accuracies on the feature set of CUFs + ARFs. Also, the Read, Watch, Walk and Stairs rank the most difficult activities to recognize. The ARFs plus the CMIM performs best with an overall difference of 6.91% in accuracy to the CUFs with mRMR. The data fusion with ambient information in the next section is based on the set of ARFs due to its better performance in Table V. Figure 6 presents a subject's daily routine inferred from the room-level ambient information, which can tell us when, how long, and how often (WHH) the user stays in specific rooms. Fig. 6 also gives the details that the person under monitored got up in the bedroom at around 6.30 am, went to bed at about 9.30 pm and used the toilet once at night, etc. Furthermore, the room-level daily routine over a long time can reveal whether the user could actively organize a daily life, or whether the user is leading an abnormal routine compared with the normal routine. Accordingly, combining the ambient information with the wearable-sensor-based decisions can deliver a more  To fuse the data from the wearable and ambient sensors, we propose a simple but effective data fusion method, as shown in Fig.1, which is different to any other published methods described in Section I. The method is based on the following assumption: some activities can be limited in a specific room based on occurring places, e.g., cooking is highly impossible taking place in a bathroom and teeth brushing may not take place in a bedroom. Here, the user's location information can be used to trigger the room-based-sub-models, and each of the sub models is only responsible for the recognition of limited activities. As a result, after fusing the ambient information to the wearable information, the whole classification classifier turns into several parallel-working sub classifiers. To unify the home structures where we collect data, the 17 activities are assigned to four groups (see Table I), i.e., 5 activities in Bathroom, 8 in Kitchen, 10 in Living room and 5 in Bedroom. The activity types in each room decrease, thereby reducing recognition requirements and simplifying the classification models compared with the scenario of recognizing all the activities together. To facilitate the later comparisons, the sample size used for each activity remains unchanged before and after data fusion. Experimental results, including the scenarios of without data fusion (all activities) and with data fusion (room-based sub models), are illustrated in Table VI. The Accuracy, Precision, Recall and F-score present similar trends for each model. The analyses afterwards are all based on the index of accuracy. From Table VI, we can see that the CMIM plus SVM achieves the highest accuracy of 98.32%, followed by the JMI and DISR with the accuracy of 97.89% and 97.66% respectively after combining the room-level location information. The mRMR instead produces the largest increase by around 3.35%, from 3.46% to 96.81% after data fusion. Figure 7 further demonstrates the performances before and after data fusion, including the scenario of only using CUFsbased wearable sensing. The four FS methods all produce a similar trend regarding the increase of the recognition accuracies, i.e. Accuracy Fusion > Accuracy ARFs > Accuracy CUFs .

B. Fusion with ambient infomration
With the PIR-sensor-captured location information, the sub models for the specific rooms are assigned with fewer activities and hence most sub models obtain their improved performance. Table VI also shows that the accuracy for Bathroom, Kitchen, Bedroom and all greatly increases after data fusion; only Living room obtains a slightly higher or lower accuracy. More importantly, the improved accuracies are achieved with the smaller number of features compared with the 30 features when dealing with all the activities together. Taking mRMR and CMIM as an example, we list the selected features for the corresponding modes in Table VII which shows that only 2 or 3 features can produce the accuracy of over 99.3% in Bedroom, and both Kitchen and Bathroom achieve increased accuracy over 98% using no more than 20 features. The computational time for the feature selection on room-based sub models all decrease compared with the whole model which deals with all defined activities.    Table VII gives the computational time of feature selection at the same computer configuration, and the time drops for each roomlevel task compared to the task of recognizing all the activities together using mRMR and CMIM.
To study the performance of each activity before and after data fusion, we look into the results from the mRMR and keep an eye on the CMIM. From Table VIII to Table XI, we can clearly see the correct and incorrect classifications for each activity. When using mRMR for feature selection, Table IX indicates that the vast majority of activities achieve an increased accuracy after applying the data fusion. For instance, the Read obtains the largest increase by 10.09%, next is the Stairs with a rise of 8.49% and followed by the Mop with an improvement of 5.28%. Only the Fall and the Stand have a little drop in accuracy. The improved recognition results can be attributed to the assumption that some confusing activities are separated into different room groups to avoid misclassification. In Table VIII, 1.92% of patterns from the Phone are incorrectly classified as the Brush when using the wearable sensors alone. However, when the Brush is limited in Bathroom after applying data fusion, the accuracy of the Phone rises to 99.95% in Table IX from 97.77% in Table VIII. Similarly, 13.1% of the Read are misclassified as the Lie before data fusion in Table VIII, whilst only 5.38% of the Read are misclassified as the Watch after data fusion in Table  IX, this is the part of the explanation of greatly increased accuracy for the Read. Collectively, the Read and the Watch, the Walk and the Stairs, rank the most two confusing pairs of activities, although their recognition accuracies are apparently improved compared with the scenario without room location information combined. The Clean, the Cook, the Exercise, the Phone, the Stand and the Wash seem to be easily distinguished from other activities regardless combining ambient information or not.
For the results from the CMIM with the details shown in the supporting document, the experimental results exhibit certain different findings. The activities that have high accuracies of over 99% before data fusion, such as the Clean, the Exercise and the Phone, only have a slight increase or remain unchanged in accuracy. The Stairs and the Walk, on the other hand, present further increase of 4.97% and 3.66%, respectively. Also, the great improvements can be found to the Read, the Watch, the Stand and the Mop.

V. DISCUSSION
To identify the functions of the selected wearable sensors, results from Table IV suggest that the best sensor combination Exer. denotes Exercise from Table VIII to Table IX chosen by MI-based feature selection methods is the accelerometer, the gyroscope and the magnetometer. The related studies illustrated that the barometer (for the height measuring) [12,37,38] and the temperature [12] could contribute HAR when being combined with other sensors. This paper finds that the function of a sensor not only depends on the sensor's intrinsic characteristic but on what specific information extracted from the sensor. The mean, the max or other absolute values from the barometer contribute the activity recognition [12,37,38]. These features show the importance to the classification just in the specific environment, e.g., on the same floor or over a short time. The problem could be that it might be less useful for detecting Activity A on the ground floor if the max of the height value is useful for detecting activity A on the fifth floor. The similar issues can also be applied to the temperature sensor. For example, if the mean of the temperature is useful for differentiating Activity B in winter, it might be invalid for the same activity in summer or a different temperature environment. This study holds that people live in varied floors, different weather conditions and changing room environments, which means the features (like the max/min of the height, the mean of the temperature, etc.) are less beneficial to distinguish different activities. Therefore, only the features that can represent the relative variations of the height and the temperature, such as the peak-to-peak amplitude or the standard deviation, are used in this study. The experimental results in Table IV, nonetheless, imply that none of the features related to the temperature and the barometer is selected by the applied feature selection methods when the barometer and the temperature are used with any other sensors. This could be likely that the height and the temperature-related features are overwhelmed by the more informative features from other sensors. As a result, the temperature sensor and the barometer do not contribute to the improvement of the recognition accuracy with MI-based feature selection methods, whereas they might be useful with other feature selection approaches. Our future work will look into further evaluation of the feature sets with other state-art-of feature selection algorithms. Our proposed hybrid system is simple and practical, which only deploys three wrist-worn wearable sensors and one type of ambient sensor (PIR sensor) installed in each room. The data fusion by using the ambient information to trigger the room-based-sub-models provides a unique way to combine the ambient information and the wearable information. The improved performance after data fusion can be attributed to two factors: 1) the decrease of activity types reduces the requirements for each room-based model; 2) the confusing activities separated into different rooms can avoid the misclassification between them to some extent. After data fusion, the HAR system is extended to be more comprehensive which monitors the specific activities and the daily routine in the spatio-temporal environment simultaneously. Regarding each individual activity shown from Table VIII to Table IX, the most easily classifiable activities are the Brush, the Lie, the Cook, the Phone, the Exercise, the Wash, etc. The most difficult ones are the Walk, the Stairs, the Watch, the Read, etc., although their performances have been improved after applying data fusion. One possible reason for the lower recognition accuracies of the Read and the Watch is that the two activities share similar wrist caused attitude attributes. This can be studied further. The unexpected misclassification between the Stairs and the Walk is partly because there is a short and flat platform between two flights of stairs in some subjects' homes, and the data collected from the Walk on the platform are labelled as the Stairs instead of the Walk. The tiny part of mislabelled data is difficult to be corrected in the raw data.
There are the following remarks for comparison of other related studies with our research. First, the practical aspect can be seen from the sensor number and sensor deployment. Studies in [11,20] use a smaller number of wearable sensors, but they either only recognize the smaller number of activities or have a complex sensor deployment on body. Study in [20] uses the same wearable sensors with ours, whilst it only utilizes the CUFs without exploring the ARFs. Our previous work reaches the similar performance with this work using similar data mining techniques [12]. The authors in [12] deploy 7 wearable sensors on 3 different body parts, which may cause obtrusiveness or uncomfortable feelings for older people in real use. Our work only uses 3 wearable sensors on the wrist while producing comparable performances. Second, our sensor combination method is unique. Although the combination of wearable sensors with ambient sensors in HAR has been investigating, we propose and implement a different data fusion method. Both Stikic, et al [17] and our work combine the infrared sensors with wearable sensors. Stikic et al [17] directly use the number of activations from infrared sensors as the input to the classifiers. Nevertheless, infrared sensors have a different role in our hybrid system. Instead of using it as the input of a classifier, we use the binary location information derived from infrared sensors to trigger sub classification models for data fusion. In other words, the whole task of recognizing all defined 17 activities are skilfully separated to several sub tasks according to the room-level location information captured by infrared sensors. By doing this, we improve the overall accuracy in a practical way.

VI. CONCLUSIONS
We develop a practical HAR system which targets simultaneously monitoring older people's specific daily activity and daily routine. The system uses a unique data fusion approach to hybridize the wearable information and the ambient information. A group of attitude-related features (ARFs) are implemented and experimentally evaluated. The initial results are promising: the ARFs perform better than the CUFs based on the applied four FS methods plus SVM classifiers over the ground truth data; and the data fusion applied in our hybrid system improves the accuracy compared with the scenario of recognising all the activities only using the wearable sensors. We train and test the current models based on all the data from all the subjects to obtain a model for general users, meanwhile, we can also train and test the models subject-dependently to meet specific requirements. Additionally, the wearable network and the ambient network can function as a stand-alone network when any of them fails. The former can work alone for distinguishing the specific activities of the wearer and the latter can work for monitoring a person's room-level daily routine on its own.
The study has however a few limitations. One limitation is that the system only targets the older people who live alone. If the application is scaled up to a multi-person system, the identification of each specific user should be considered to activate sub classification models. Also, the impact of the pets or other visitors on the PIR sensors should be further studied and evaluated. We considered a room with only one door in this paper, we will explore more PIR sensors to handle a room with multiple doors in our future work. The second limitation is the activity assignment fixed in each room. As a case study, we generally define the activities which most likely take place in different rooms to verify our hypothesis. In real use, since house structures and people habits vary, we cannot be hundred percent sure which activities must occur in one specific room or not, e.g., the Read can take place anywhere. The third limitation is about the hardware: the wearable and ambient network are separated in this paper, the data analysis apart from test are all offline. The next version prototype can consider synchronizing two networks into one after further evaluation.
It is worth pointing out that we do not intend to identify all possible daily activities in this paper; we predefine and detect set of limited activities. An extension of our work could thus focus on semi-supervising or actively learning the activities based on feature mapping and feature similarity, in which we will regard some of the activities we define in this paper as unlabelled in the both home-level and the room level to address the second limitation. This is also expected to partly tackle the issue of overlap between activities by seeing some of the interwoven activities as the unseen activities from the base and more important activity, like drinking tea while reading newspaper. Another future work could be the practical investigation of the ARFs compared with using the CUFs in terms of the efficiency, the additional power consumption and so on. Our current work focuses on using CCA (Canonical Correlation Analysis)-based, sparse filtering-based feature selection methods to further evaluate the handcrafted features we extracted. Meanwhile, we are also working on using deep learning for automatically learning the features from the raw data for comparison study.