Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (2024)

Zibo LiuDepartment of Computer &Information Science & Engineering
University of Florida
USA
ziboliu@ufl.edu
,Zhe JiangDepartment of Computer &Information Science & Engineering
University of Florida
Gainesville, FL, USAUSA
zhe.jiang@ufl.edu
,Zelin XuDepartment of Computer &Information Science & Engineering
University of Florida
Gainesville, FL, USAUSA
zelin.xu@ufl.edu
,Tingsong XiaoDepartment of Computer &Information Science & Engineering
University of Florida
USA
xiaotingsong@ufl.edu
,Zhengkun XiaoDepartment of Computer &Information Science & Engineering
University of Florida
USA
xiaoz@ufl.edu
,Haibo WangDepartment of Computer Science
University of Kentucky
USA
haibo@ieee.org
andShigang ChenDepartment of Computer &Information Science & Engineering
University of Florida
USA
sgchen@ufl.edu

Abstract.

Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure investment cost in traffic management since deploying sensors at all locations could incur prohibitively high cost. However, the problem is challenging due to the unknown distribution at unsensed locations, the intricate spatio-temporal correlation in long-term forecasting, as well as noise in data and irregularities in traffic patterns (e.g., road closure). We propose a Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic prediction, with several novel contributions, including a rank-based embedding technique to capture irregularities and overcome noise, a spatial transfer matrix to overcome the spatial distribution shift from permanently sensed locations to unsensed locations, and a multi-step training process that utilizes all available data to successively refine the model parameters for better accuracy. Extensive experiments on several real-world traffic datasets demonstrate that STPS outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.

1. Introduction

Background: The traffic forecast problem is to use the recent measurements by sensors installed at chosen locations to forecast the future road traffic at these locations. This problem has significant practical values in traffic management and route planning, considering the ever worsening traffic conditions and congestion in cities across the world. Most existing work considers a full-sensing scenario, where sensors are installed at all locations. We observe that a lower cost, more flexible solution should support partial sensing, where only some locations have sensors, which still allow traffic forecast to cover other locations without sensors. This research is in its nascent stage, with limited recent work on short-term forecast. This paper investigates the partial sensing long-term forecast problem that aims to train a prediction model to use recent measurements from sensed locations (with sensors) to forecast traffic at unsensed locations (without sensors) deep into the future. Consider a scenario in a road network where only 10%percent1010\%10 % of the locations are equipped with sensors. The long-term forecast model may utilize just one hour of recent data from the sensed locations to predict traffic conditions up to eight hours ahead at the unsensed 90%percent9090\%90 % of the locations, where the amount of predicted future data is 72 times of the past measurements. The capability of long-term traffic forecast under partial sensing can play a crucial role for low-cost intelligent transportation. For example, early warnings about future traffic conditions may help traffic planners in traffic signal management, law enforcement operations, medical assistance, and disaster response (cheng2020mitigating, ; balasubramanian2015adaptive, ).

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (1)

Challenges: The partial sensing long-term forecast problem poses several non-trivial challenges.Fig. 1 shows the traffic rates at locations 44, 121 and 33 over two days from the PEMS08 dataset. Letโ€™s suppose location 44 and 121 are sensed, whereas location 33 is not. How could we train a model to infer the traffic rates of location 33 deep into the future based on the recent measurements at locations 44 and 121, when their patterns are so different? This requires not only the adaptability of a model to transfer the knowledge from sensed locations to unsensed locations based on the limited input features but also learning the intrinsic, subtle spatio-temperal connections across the locations. Next, long-term forecasting requires the learning of informative embeddings capable of capturing more intricate spatial and temporal correlations than what is required for short-term forecasting. In contrast to short-term models, which might use an hour of data to predict the next hour, long-term models must forecast far beyond this scope, necessitating a more nuanced understanding and representation of the data. We also observe traffic fluctuations at the locations of Fig. 1, paritcularly at location 44. How to smooth out fluctuations and reduce the impact of other noises are important in forecasting long-term traffic. Finally, beside the normal daily/weekly traffic patterns, there are irregular patterns caused by infrequent events such as road closure due to accidents or heavy traffic in holidays, which have limited training data. For example, at location 44 in Fig. 1, which is at a highway, the traffic rate drops abnormally to almost zero from 08/19 15:00 to 20:00, possibly caused by an accident. If the model is not efficient in capturing such irregularities, traffic forecasts in such times will carry significant errors.

Related Work: The existing work on traffic forecast can be categorized into full sensing forecast, where all locations of interest are deployed with sensors, or partial sensing forecast, where only some locations are deployed with sensors. Full sensing forecast has been studied extensively, including recurrent neural networks (RNNs) (veeriah2015differential, ; NIPS2015_07563a3f, ; Zhang_Zheng_Qi_2017, ), convolutional neural networks (CNNs) or Multiple Layer Perceptrons (MLPs) with graph neural networks (GNNs) (yu2017spatio, ; li2018diffusion, ; wu2019graph, ; li2020spatial, ; Jiang_Wang_Yong_Jeph_Chen_Kobayashi_Song_f*ckushima_Suzumura_2023, ; yi2023fouriergnn, ; wang2024towards, ), neural differential equations (NDEs) (chen2018neural, ; schirmer2022modeling, ; kidger2020neural, ; kidger2022neural, ; liu2023graphbased, ), transformers (liu2023spatio, ; 10.1609/aaai.v37i4.25556, ), and most recent mixture-of-experts (MOE) (shazeer2017outrageously, ) based spatio-temporal model (lee2024testam, ). Only two recent papers, Frigate (10.1145/3580305.3599357, ) and STSM (su2024spatial, ), studied the partial sensing forecast problem, but they are designed for short-term forecasting. Besides, STSM utilizes the location properties such as nearby shopping malls as extra prior knowledge.

The prior work can also be categorized into short-term spatio-temporal forecasting (above referenced), or long-term time series forecasting (deng2023learning, ; liu2023itransformer, ; nie2022time, ; wu2022timesnet, ), which do not sufficiently model spatial correlation patterns.

There exist spatial transfer learning methods for traffic forecasting (mallick2021transfer, ; tang2022domain, ; ouyang2023citytrans, ; jin2023transferable, ; lu2022spatio, ),which utilize traffic measurements from additional locations to help forecast the traffic at the locations of interest. They belong to the full sensing category, with extra input from additional locations.Several imputation methods (li2023missing, ; lana2018imputation, ; zhang2021missing, ; xu2020ge, ) estimate the missing data at the locations of interest where sensors may sometimes miss measurement. They do not belong to the partial sensing category because the locations all have sensors. Also related are the spatial extrapolation methods that could be used to estimate traffic data at unsensed locations based on data from sensed locations within the same time period, such as non-negative matrix factorization methods (lee2000algorithms, ; jing2020completion, ; li2022enhanced, ), and GNN methods (wu2021inductive, ; zheng2023increase, ; hu2023graph, ; appleby2020kriging, ). However, these methods are for extrapolation only and do not model the spatio-temporal dynamics of long-term traffic forecast. Their performance in our experiments is not as good as the best of the models designed for traffic forecasting.

In summary, this is the first work specifically on the partial sensing long-term traffic forecast problem.

Contributions: We propose a Spatial Temporal Partial Sensing forecast model (STPS) for long-term traffic forecasting. STPS has three main technical contributions. First, we design a rank-based node embedding, which helps capture the characteristics of traffic irregularities and make the model more robust against noises in learning intricate spatio-temporal correlations for long-term forecast. Second, we propose a spatial transfer module, which aims to enhance our modelโ€™s adaptability by extracting the dynamic traffic patterns from the transfer weights based on rank-based embedding in addition to spatial adjacency, allowing for more nuanced and accurate predictions. Third, we use a multi-step training process to fully utilize the available training data. This approach enables successive refinement of the model parameters. Extensive experiments on several real-world traffic datasets demonstrate that our model outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.

2. Preliminaries

2.1. Problem Definitions

Definition 1: Road Topology and Traffic Flow Rates.Consider a road system, and let N๐‘Nitalic_N be a set of chosen locations where traffic statistics are of interest. Let n=|N|๐‘›๐‘n=|N|italic_n = | italic_N |. The road topology of these locations is represented as a graph ๐’ข=(N,A)๐’ข๐‘๐ด\mathcal{G}=(N,A)caligraphic_G = ( italic_N , italic_A ). A=[Ai,j,i,jโˆˆN]โˆˆโ„nร—n๐ดdelimited-[]subscript๐ด๐‘–๐‘—๐‘–๐‘—๐‘superscriptโ„๐‘›๐‘›A=[A_{i,j},i,j\in N]\in\mathbb{R}^{n\times n}italic_A = [ italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_i , italic_j โˆˆ italic_N ] โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_n end_POSTSUPERSCRIPT is an adjacency matrix. Ai,j=1subscript๐ด๐‘–๐‘—1A_{i,j}=1italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 indicates that there is a road between location i๐‘–iitalic_i and location j๐‘—jitalic_j; otherwise Ai,j=0subscript๐ด๐‘–๐‘—0A_{i,j}=0italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0. A traffic flow consists of all the vehicles that pass a location in N๐‘Nitalic_N; the flow rate is defined as the number of vehicles passing through the location during a preset time interval. It is a discrete function of time if we partition the time into a series of time intervals of a certain length, e.g., 5 min. For simplicity, we normalize each time interval as one unit of time. Let M๐‘€Mitalic_M be a subset of locations, i.e., MโŠ†N๐‘€๐‘M\subseteq Nitalic_M โŠ† italic_N, and T๐‘‡Titalic_T be a series of time intervals. As an example, T๐‘‡Titalic_T could be {tโˆ’l+1,โ€ฆ,tโˆ’1,t}๐‘ก๐‘™1โ€ฆ๐‘ก1๐‘ก\{t-l+1,...,t-1,t\}{ italic_t - italic_l + 1 , โ€ฆ , italic_t - 1 , italic_t } of l๐‘™litalic_l intervals, where t๐‘กtitalic_t is the current time. The traffic matrix over locations M๐‘€Mitalic_M and times T๐‘‡Titalic_T is defined as ๐’ณM,T=[๐’ณi,j,iโˆˆM,jโˆˆT]\mathcal{X}_{M,T}=[\mathcal{X}_{i,j},i\in M,j\in T]caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT = [ caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_i โˆˆ italic_M , italic_j โˆˆ italic_T ], where ๐’ณi,jsubscript๐’ณ๐‘–๐‘—\mathcal{X}_{i,j}caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the flow rate at location i๐‘–iitalic_i during time j๐‘—jitalic_j. Further denote the rate vector at location i๐‘–iitalic_i as ๐’ณi,T=[๐’ณi,j,jโˆˆT]subscript๐’ณ๐‘–๐‘‡delimited-[]subscript๐’ณ๐‘–๐‘—๐‘—๐‘‡\mathcal{X}_{i,T}=[\mathcal{X}_{i,j},j\in T]caligraphic_X start_POSTSUBSCRIPT italic_i , italic_T end_POSTSUBSCRIPT = [ caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_j โˆˆ italic_T ]. Hence, ๐’ณM,T=[๐’ณi,T,iโˆˆM]subscript๐’ณ๐‘€๐‘‡delimited-[]subscript๐’ณ๐‘–๐‘‡๐‘–๐‘€\mathcal{X}_{M,T}=[\mathcal{X}_{i,T},i\in M]caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT = [ caligraphic_X start_POSTSUBSCRIPT italic_i , italic_T end_POSTSUBSCRIPT , italic_i โˆˆ italic_M ].

Definition 2: Problem of Full-Sensing Traffic Forecast.Suppose all the locations in N๐‘Nitalic_N are equipped with permanent sensors to continuously measure the flow rates. The problem is to forecast a future traffic matrix ๐’ณN,Tโ€ฒsubscript๐’ณ๐‘superscript๐‘‡โ€ฒ\mathcal{X}_{N,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT on all locations in N๐‘Nitalic_N, based on a past traffic matrix ๐’ณN,Tsubscript๐’ณ๐‘๐‘‡\mathcal{X}_{N,T}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT that has just been measured by the sensors, where Tโ€ฒ={t+1,โ€ฆ,t+lโ€ฒ}superscript๐‘‡โ€ฒ๐‘ก1โ€ฆ๐‘กsuperscript๐‘™โ€ฒT^{\prime}=\{t+1,...,t+l^{\prime}\}italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = { italic_t + 1 , โ€ฆ , italic_t + italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT }, T={tโˆ’l+1,โ€ฆ,tโˆ’1,t}๐‘‡๐‘ก๐‘™1โ€ฆ๐‘ก1๐‘กT=\{t-l+1,...,t-1,t\}italic_T = { italic_t - italic_l + 1 , โ€ฆ , italic_t - 1 , italic_t }, and t๐‘กtitalic_t is the current time. It is called full-sensing traffic forecast because all locations under forecast are fully equipped with sensors to provide traffic information of recent past. In most work (fang2021spatial, ; song2020spatial, ; jin2022automated, ; shao2022spatial, ; jia2020residual, ; ji2022stden, ), both l๐‘™litalic_l and lโ€ฒsuperscript๐‘™โ€ฒl^{\prime}italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT are set to 1 hour for short-term forecast. Note that l๐‘™litalic_l should not be too large to avoid excessively large models and computation costs that come with them. Moreover, research has shown that too large l๐‘™litalic_l may actually degrade forecast accuracy (zeng2023transformers, ). For long-term forecast, lโ€ฒsuperscript๐‘™โ€ฒl^{\prime}italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT is set much larger than l๐‘™litalic_l.

Definition 3: Problem of Partial-Sensing Traffic Forecast.Suppose only a subset M๐‘€Mitalic_M of locations are equipped with permanent sensors to continuously measure their flow rates. The subset of locations without permanent sensors is denoted as Mโ€ฒ=Nโˆ’Msuperscript๐‘€โ€ฒ๐‘๐‘€M^{\prime}=N-Mitalic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = italic_N - italic_M. Let m=|M|๐‘š๐‘€m=|M|italic_m = | italic_M | and mโ€ฒ=|Mโ€ฒ|superscript๐‘šโ€ฒsuperscript๐‘€โ€ฒm^{\prime}=|M^{\prime}|italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = | italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT |. The partial adjacency matrix is AM,Mโ€ฒ=[Ai,j,iโˆˆM,jโˆˆMโ€ฒ]โˆˆโ„mร—mโ€ฒA_{M,M^{\prime}}=[A_{i,j},i\in M,j\in M^{\prime}]\in\mathbb{R}^{m\times m^{%\prime}}italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = [ italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_i โˆˆ italic_M , italic_j โˆˆ italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ] โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The problem is to forecast a future traffic matrix ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over the locations without sensors, based on a recent traffic matrix ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT that has been just measured at the locations with sensors, where Tโ€ฒ={t+1,โ€ฆ,t+lโ€ฒ}superscript๐‘‡โ€ฒ๐‘ก1โ€ฆ๐‘กsuperscript๐‘™โ€ฒT^{\prime}=\{t+1,...,t+l^{\prime}\}italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = { italic_t + 1 , โ€ฆ , italic_t + italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT }, T={tโˆ’l+1,โ€ฆ,tโˆ’1,t}๐‘‡๐‘ก๐‘™1โ€ฆ๐‘ก1๐‘กT=\{t-l+1,...,t-1,t\}italic_T = { italic_t - italic_l + 1 , โ€ฆ , italic_t - 1 , italic_t }, and t๐‘กtitalic_t is the current time. It is called partial-sensing traffic forecast because locations in N๐‘Nitalic_N are partially equipped with sensors; for long-term forecast, lโ€ฒ>>lmuch-greater-thansuperscript๐‘™โ€ฒ๐‘™l^{\prime}>>litalic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT > > italic_l. This problem is the focus of our work.

In practice, one may want to forecast both ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT. In this paper, we only consider ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT because predicting ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT is essentially the full-sensing prediction problem, which has been thoroughly studied.

2.2. Inference vs Training

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (2)

The problem of partial sensing traffic forecast is illustrated by the left plot of Fig.2. After a forecast model is trained and deployed, the sensors at the locations in M๐‘€Mitalic_M will measure flow rates ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, which will be used as input to the model to inference (forecast) future rates of the locations in Mโ€ฒsuperscript๐‘€โ€ฒM^{\prime}italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, i.e., ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Hence, when we use the model for forecasting, the only available data is ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, which is shown in blue.

However, to train such a model (before its actual deployment), as illustrated in the middle plot, we will need the ground truth of ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in the training data, which is shown in gray. We assume that mobile sensors are deployed at the locations in Mโ€ฒsuperscript๐‘€โ€ฒM^{\prime}italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT for a period of time to collect the training data. This is reasonable because mobile sensors can be re-depolyed at different road systems for data collection, offering a lower overall cost than implementing permanent sensors at all locations of all road systems that need traffic forecast.

With mobile sensors deployed to collect ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, we may naturally use them to collect ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT as well. The permanent sensors at M๐‘€Mitalic_M will collect ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. So, the training data will contain ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, as illustrated in gray in the third plot. While the input to the model is ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the output ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, we should still fully utilize the information of ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT and ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, which is available in the training data, to build an accurate model.

2.3. Embeddings

Embeddings are the common technique to enhance feature expression. We adopt the embedding setting in (shao2022spatial, ; liu2023spatio, ), where the time-of-day embedding and the day-of-week embedding capture the temporal information, and node embedding captures the spatial property of locations. We use Etโขoโขdโˆˆโ„nร—dsuperscript๐ธ๐‘ก๐‘œ๐‘‘superscriptโ„๐‘›๐‘‘E^{tod}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT, Edโขoโขwโˆˆโ„nร—dsuperscript๐ธ๐‘‘๐‘œ๐‘คsuperscriptโ„๐‘›๐‘‘E^{dow}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT, and Evโˆˆโ„nร—dsuperscript๐ธ๐‘ฃsuperscriptโ„๐‘›๐‘‘E^{v}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT to denote them respectively, where d๐‘‘ditalic_d is the embedding dimension. A day is modelled as Ntโขoโขd=288superscript๐‘๐‘ก๐‘œ๐‘‘288N^{tod}=288italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT = 288 discrete units of 5 minutes each. A week has Ntโขoโขd=7superscript๐‘๐‘ก๐‘œ๐‘‘7N^{tod}=7italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT = 7 days. The time-of-day is i/Ntโขoโขd๐‘–superscript๐‘๐‘ก๐‘œ๐‘‘i/N^{tod}italic_i / italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT for the i๐‘–iitalic_ith unit in the day, and the day-of-week is i/Ndโขoโขw๐‘–superscript๐‘๐‘‘๐‘œ๐‘คi/N^{dow}italic_i / italic_N start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT for the i๐‘–iitalic_ith day in the week. Trainable vectors Bitโขoโขdโˆˆโ„dsuperscriptsubscript๐ต๐‘–๐‘ก๐‘œ๐‘‘superscriptโ„๐‘‘B_{i}^{tod}\in\mathbb{R}^{d}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, iโˆˆ[0,Ntโขoโขd)๐‘–0superscript๐‘๐‘ก๐‘œ๐‘‘i\in[0,N^{tod})italic_i โˆˆ [ 0 , italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT ), and Bidโขoโขwโˆˆโ„dsuperscriptsubscript๐ต๐‘–๐‘‘๐‘œ๐‘คsuperscriptโ„๐‘‘B_{i}^{dow}\in\mathbb{R}^{d}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, iโˆˆ[0,Ntโขoโขd)๐‘–0superscript๐‘๐‘ก๐‘œ๐‘‘i\in[0,N^{tod})italic_i โˆˆ [ 0 , italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT ), are the time-of-day embedding bank and the day-of-week bank. In practice, for the input flow rate data ๐’ณN,Tโˆˆโ„nร—lsubscript๐’ณ๐‘๐‘‡superscriptโ„๐‘›๐‘™\mathcal{X}_{N,T}\in\mathbb{R}^{{n}\times l}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_l end_POSTSUPERSCRIPT, T={tโˆ’l+1,โ€ฆ,tโˆ’1,t}๐‘‡๐‘ก๐‘™1โ€ฆ๐‘ก1๐‘กT=\{t-l+1,...,t-1,t\}italic_T = { italic_t - italic_l + 1 , โ€ฆ , italic_t - 1 , italic_t }, we only consider the time feature of flow rate data ๐’ณN,tโˆ’l+1โˆˆโ„nsubscript๐’ณ๐‘๐‘ก๐‘™1superscriptโ„๐‘›\mathcal{X}_{N,t-l+1}\in\mathbb{R}^{n}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_t - italic_l + 1 end_POSTSUBSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT at tโˆ’l+1๐‘ก๐‘™1t-l+1italic_t - italic_l + 1, yielding Etโขoโขdโˆˆโ„nร—dsuperscript๐ธ๐‘ก๐‘œ๐‘‘superscriptโ„๐‘›๐‘‘E^{tod}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT, Edโขoโขwโˆˆโ„nร—dsuperscript๐ธ๐‘‘๐‘œ๐‘คsuperscriptโ„๐‘›๐‘‘E^{dow}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT. For the node embedding, given any location iโˆˆN๐‘–๐‘i\in Nitalic_i โˆˆ italic_N, the trainable node embedding bank is Bivโˆˆโ„dsuperscriptsubscript๐ต๐‘–๐‘ฃsuperscriptโ„๐‘‘B_{i}^{v}\in\mathbb{R}^{d}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. For the flow rate data ๐’ณN,Tsubscript๐’ณ๐‘๐‘‡\mathcal{X}_{N,T}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT, we obtain Evโˆˆโ„nร—dsuperscript๐ธ๐‘ฃsuperscriptโ„๐‘›๐‘‘E^{v}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT from node embedding bank Bivsuperscriptsubscript๐ต๐‘–๐‘ฃB_{i}^{v}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT over n๐‘›nitalic_n total locations. In the following, we will use M๐‘€Mitalic_M, Mโ€ฒsuperscript๐‘€โ€ฒM^{\prime}italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, N๐‘Nitalic_N as the subscript of the embedding bank B๐ตBitalic_B and the embedding E๐ธEitalic_E to represent the locations of interest.

In addition to the above, this paper will introduce a new rank-based node embedding.

3. The Proposed Approach

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (3)

This section introduces our Spatio-Temporal Partial Sensing (STPS) traffic forecast model. In Fig. 3, the training phase (left plot) consists of three steps: 1. the dynamic adaptive step, which builds a module with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT as input and ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT as expected output; 2. the long-term forecasting step, which builds another module with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the previous moduleโ€™s output as input, and with ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT as expected output; and 3. the aggregation step, which builds yet another module with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the previous two modulesโ€™ output as input, and with ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT as expected output.We refer to the above three modules as the dynamic adaptive module, the long-term forecasting module, and the aggregation module, respectively, which together form the proposed STPS.Each step leverages the module parameters derived from its previous step(s) and incorporates new information (expected output) to learn its module parameters, ensuring progressive enhancement in model performance.

The testing (or inference, forecast) phase in the right plot also consists of the steps: 1. it uses the dynamic adaptive module, with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT as input, to inference ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT; 2. it uses the long-term forecasting module, with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the output of the previous step as input, to inference ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT; and 3. it uses the aggregation module, with ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the output of the previous two steps as input, to finally forecast ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

3.1. Rank-based Node Embedding

First, consider irregular traffic patterns. As shown in Fig.1, location 121 (top blue curve) at a highway from dataset PEMS08 demonstrates a regular traffic pattern from the day time high on 08/19 to the deep night low and back to the day time high on 08/20, and so on. These daily (or weekly) traffic patterns can be captured by the temporal embeddings, Etโขoโขdsuperscript๐ธ๐‘ก๐‘œ๐‘‘E^{tod}italic_E start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT (time of day) and Edโขoโขwsuperscript๐ธ๐‘‘๐‘œ๐‘คE^{dow}italic_E start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT (day of week), which however are less effective in addressing the irregular patterns that do not have daily or weekly cycles. For example, in Fig.1, although location 44 (middle red curve), also at a highway from dataset PEMS08, demonstrates a daily traffic pattern, it experienced a sharp drop to almost zero at 08/19 15:00 and recovered about 5 hours later. This could be a highway closure or partial block due to reasons such as a car accident. Such irregular events are infrequent in the dataset, which makes it harder for our model to be trained to recognize them. The existing node embedding Evsuperscript๐ธ๐‘ฃE^{v}italic_E start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT from the prior work cannot change temporally to reflect such changes in the flow rates.

To better capture such irregular traffic patterns, we introduce a rank-based node embedding. Consider an arbitrary time interval jโˆˆT๐‘—๐‘‡j\in Titalic_j โˆˆ italic_T in the input ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT. We assign a rank value to each location kโˆˆM๐‘˜๐‘€k\in Mitalic_k โˆˆ italic_M: Sort ๐’ณi,jsubscript๐’ณ๐‘–๐‘—\mathcal{X}_{i,j}caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, โˆ€iโˆˆMfor-all๐‘–๐‘€\forall i\in Mโˆ€ italic_i โˆˆ italic_M, in ascending order (with ties broken arbitrarily), and the rank of location k๐‘˜kitalic_k at time j๐‘—jitalic_j is the index number of ๐’ณk,jsubscript๐’ณ๐‘˜๐‘—\mathcal{X}_{k,j}caligraphic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPTโ€™s position in the ordered list. Our first hypothesis is that the new rank-based embedding can help distinguish the irregularities caused by road closure or other rare events, such as the one with location 44 in Fig.1, since the locationโ€™s rank during the time of road closure will be probably among the lowest, whereas its rank at normal times would be much higher. In the event of road closure, the ranks of dependent locations over adjacent time periods will likely change, too, deviating from normal patterns. It is easier to tell such irregularities based on location, rank, time-of-the-day, and day-of-the-week, than based on location, time-of-the-day, and day-of-the-week, since the rank value provides additional hint. For spatio-temporal irregularities, the exact flow rates of the affected locations vary greatly at different times in a day or on different days in a week. Yet ranks, which are the indices of those locationsโ€™ rates in the ascending list of all locationsโ€™ rates, can be considered as normalized, relative flow rates, whose distribution tends to be more stable from time to time than the actual rates, revealing the intrinsic pattern of an irregularity. This property will help the model identify irregular patterns and separate them from normal patterns, with limited training data for irregularities due to their infrequent occurrences.

Our second hypothesis is that the new rank-based embedding can help the model be more robust to noises of a traffic pattern. In Fig. 1, except for the irregularity at 08/19 15:00, location 44 exhibits a normal daily traffic pattern with noises as the traffic fluctuates, which could be normal or sometimes due to inaccuracy sensor reading. Noises can also happen due to inference errors. Refer to the right plot in Fig. 3 for inference. After the first step, we have the estimated ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, which carries the inference errors from the dynamic adaption module. These errors are noises to the next step. Comparing to the actual flow rates, the ranks are discretized values that tend to smooth out noise fluctuation and be more stable. Especially for partial sensing, the fewer the number of sensed locations is, the more stable the rank-based embedding becomes. The stability of the ranks could also enhance the modelโ€™s adaptability from one spatial area (say, with sensor data) to a new area (without sensor data), whereby their actual flow rates can differ greatly but their relative ranks follow stable patterns. In other words, the rank-based embedding is more robust to the distribution shift from the sensed locations to the unsensed locations, and between the training dataset and the test dataset.

We have run extensive experiments on node-based embedding to test the validity of the above hypotheses.

Specifically, at each time interval, we rank the flow rates at all locations. Given the i๐‘–iitalic_i-th rank of the locations, vector Birโˆˆโ„dsuperscriptsubscript๐ต๐‘–๐‘Ÿsuperscriptโ„๐‘‘B_{i}^{r}\in\mathbb{R}^{d}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denotes the rank-based node embedding bank for rank i๐‘–iitalic_i. For input data ๐’ณN,Tsubscript๐’ณ๐‘๐‘‡\mathcal{X}_{N,T}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT (or ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT for the first step), we rank each time interval in T๐‘‡Titalic_T over n๐‘›nitalic_n locations, yielding the rank feature Erโˆˆโ„nร—lร—dsuperscript๐ธ๐‘Ÿsuperscriptโ„๐‘›๐‘™๐‘‘E^{r}\in\mathbb{R}^{n\times l\times d}italic_E start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_l ร— italic_d end_POSTSUPERSCRIPT. A fully connected layer is then applied to the rank-based node embedding to aggregate over the length l๐‘™litalic_l, resulting in an aggregated rank-based node embedding Erโ€ฒโˆˆโ„nร—dsuperscriptsuperscript๐ธ๐‘Ÿโ€ฒsuperscriptโ„๐‘›๐‘‘{E^{r}}^{\prime}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT, encapsulating a generalized high-dimensional rank feature for each location over a period.

Ultimately, a Multi-Layer Perceptron (MLP) processes the raw data ๐’ณN,Tsubscript๐’ณ๐‘๐‘‡\mathcal{X}_{N,T}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT (or ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT for the first step) to produce the feature embedding Efโˆˆโ„nร—dsuperscript๐ธ๐‘“superscriptโ„๐‘›๐‘‘E^{f}\in\mathbb{R}^{n\times d}italic_E start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT. We then concatenate this with the other four embeddings: time-of-day embedding Etโขoโขdsuperscript๐ธ๐‘ก๐‘œ๐‘‘E^{tod}italic_E start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT, day-of-week embedding Edโขoโขwsuperscript๐ธ๐‘‘๐‘œ๐‘คE^{dow}italic_E start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT, node embedding Evsuperscript๐ธ๐‘ฃE^{v}italic_E start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, rank-based node embedding Erโ€ฒsuperscriptsuperscript๐ธ๐‘Ÿโ€ฒ{E^{r}}^{\prime}italic_E start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, to form an aggregated feature Hโˆˆโ„nร—5โขd๐ปsuperscriptโ„๐‘›5๐‘‘H\in\mathbb{R}^{n\times 5d}italic_H โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— 5 italic_d end_POSTSUPERSCRIPT. This aggregated feature then passes through another MLP to produce a high-dimensional representation Hโ€ฒโˆˆโ„nร—5โขdsuperscript๐ปโ€ฒsuperscriptโ„๐‘›5๐‘‘H^{\prime}\in\mathbb{R}^{n\times 5d}italic_H start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— 5 italic_d end_POSTSUPERSCRIPT.

(1)Efsuperscript๐ธ๐‘“\displaystyle E^{f}italic_E start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT=MโขLโขPโข(๐’ณN,T)absent๐‘€๐ฟ๐‘ƒsubscript๐’ณ๐‘๐‘‡\displaystyle=MLP(\mathcal{X}_{N,T})= italic_M italic_L italic_P ( caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT )
(2)H๐ป\displaystyle Hitalic_H=Efโขโ€–Evโ€–โขErโ€ฒโขโ€–Etโขoโขdโ€–โขEdโขoโขwabsentsuperscript๐ธ๐‘“normsuperscript๐ธ๐‘ฃsuperscriptsuperscript๐ธ๐‘Ÿโ€ฒnormsuperscript๐ธ๐‘ก๐‘œ๐‘‘superscript๐ธ๐‘‘๐‘œ๐‘ค\displaystyle=E^{f}||E^{v}||{E^{r}}^{\prime}||E^{tod}||E^{dow}= italic_E start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT | | italic_E start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT | | italic_E start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT | | italic_E start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT | | italic_E start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT
(3)Hโ€ฒsuperscript๐ปโ€ฒ\displaystyle H^{\prime}italic_H start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT=MโขLโขPโข(H)absent๐‘€๐ฟ๐‘ƒ๐ป\displaystyle=MLP(H)= italic_M italic_L italic_P ( italic_H )

3.2. Dynamic Adaption Step

The main objective of this step is to capture the latent correlations between sensed locations and unsensed locations and to transfer the feature embedding from sensed locations to unsensed locations. This is to address the challenge of the shift to an unknown distribution at unsensed locations. In Fig. 3, the dynamic adaption step is represented from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT with a red arrow. The input for this step is the historical data from the permanent sensors deployed at the sensed locations, denoted as ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT. The output is the predicted historical data for the unsensed locations, represented by ๐’ณ^Mโ€ฒ,Tsubscript^๐’ณsuperscript๐‘€โ€ฒ๐‘‡\hat{\mathcal{X}}_{M^{\prime},T}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT.

3.2.1. Node Embedding Enhanced Spatial Transfer Matrix from sensed locations to unsensed locations

In response to the challenges posed by the unknown data distribution at unsensed locations, irregular fluctuations, and the need for more robust embeddings, we have devised a node embedding enhanced spatial transfer matrix. The spatial correlation is enhanced by incorporating node embedding bank BMvsuperscriptsubscript๐ต๐‘€๐‘ฃB_{M}^{v}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and rank-based node embedding EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT. By integrating these two components, node embedding bank BMvsuperscriptsubscript๐ต๐‘€๐‘ฃB_{M}^{v}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and rank-based node embedding EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT can learn spatial correlations from the partial adjacency matrix AM,Mโ€ฒsubscript๐ด๐‘€superscript๐‘€โ€ฒA_{M,M^{\prime}}italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, thereby producing more robust embeddings. Besides, node embedding bank BMvsuperscriptsubscript๐ต๐‘€๐‘ฃB_{M}^{v}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT makes the spatial attention module aware of the high dimensional location properties. Rank-based node embedding EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT makes the spatial transfer matrix sensitive to the changes in traffic patterns, and enhances the modelโ€™s adaptability. Consider the example of a closure in the highway, where the traffic pattern is irregular and the flow rate drops to zero. The rank embedding EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT encodes the current rank of a flow rate to a high dimensional space. In this case, the rank drops to zero, generating a low-order pattern in the spatial transfer matrix. This spatial transfer matrix could generate a dynamic adaptation to the unsensed locations Mโ€ฒsuperscript๐‘€โ€ฒM^{\prime}italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT. In contrast, solely relying on sensor values and temporal embeddings would have not been able to achieve this.

EQ. 4 shows three facets of spatial knowledge: the partial adjacency matrix AM,Mโ€ฒ=[Ai,j,iโˆˆM,jโˆˆMโ€ฒ]โˆˆโ„mร—mโ€ฒA_{M,M^{\prime}}=[A_{i,j},i\in M,j\in M^{\prime}]\in\mathbb{R}^{m\times m^{%\prime}}italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = [ italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_i โˆˆ italic_M , italic_j โˆˆ italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ] โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, node embedding bank BMv={Biv|Bivโˆˆโ„d,iโˆˆM}โˆˆโ„mร—dsuperscriptsubscript๐ต๐‘€๐‘ฃconditional-setsuperscriptsubscript๐ต๐‘–๐‘ฃformulae-sequencesuperscriptsubscript๐ต๐‘–๐‘ฃsuperscriptโ„๐‘‘๐‘–๐‘€superscriptโ„๐‘š๐‘‘B_{M}^{v}=\{B_{i}^{v}\,|\,B_{i}^{v}\in\mathbb{R}^{d},i\in M\}\in\mathbb{R}^{m%\times d}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = { italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_i โˆˆ italic_M } โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— italic_d end_POSTSUPERSCRIPT, and rank-based node embedding EMrโ€ฒ={Eirโ€ฒ|Eirโ€ฒโˆˆโ„d,iโˆˆM}โˆˆโ„mร—dsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒconditional-setsuperscriptsuperscriptsubscript๐ธ๐‘–๐‘Ÿโ€ฒformulae-sequencesuperscriptsuperscriptsubscript๐ธ๐‘–๐‘Ÿโ€ฒsuperscriptโ„๐‘‘๐‘–๐‘€superscriptโ„๐‘š๐‘‘{E_{M}^{r}}^{\prime}=\{{E_{i}^{r}}^{\prime}\,|\,{E_{i}^{r}}^{\prime}\in\mathbb%{R}^{d},i\in M\}\in\mathbb{R}^{m\times d}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = { italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT | italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_i โˆˆ italic_M } โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— italic_d end_POSTSUPERSCRIPT.

(4)AM,Mโ€ฒโ€ฒsuperscriptsubscript๐ด๐‘€superscript๐‘€โ€ฒโ€ฒ\displaystyle{A_{M,M^{\prime}}}^{\prime}italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT=AM,Mโ€ฒ+(EMrโ€ฒ+BMv)โข(BMโ€ฒvฯ„)absentsubscript๐ด๐‘€superscript๐‘€โ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒsuperscriptsubscript๐ต๐‘€๐‘ฃsuperscriptsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃ๐œ\displaystyle=A_{M,M^{\prime}}+({E_{M}^{r}}^{\prime}+B_{M}^{v})({B_{M^{\prime}%}^{v}}^{\tau})= italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ( italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT )
(5)๐’ณ^Mโ€ฒ,Tsubscript^๐’ณsuperscript๐‘€โ€ฒ๐‘‡\displaystyle\hat{\mathcal{X}}_{M^{\prime},T}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT=MโขLโขPโข(AM,Mโ€ฒโ€ฒฯ„โขHM,Tโ€ฒ)absent๐‘€๐ฟ๐‘ƒsuperscriptsuperscriptsubscript๐ด๐‘€superscript๐‘€โ€ฒโ€ฒ๐œsuperscriptsubscript๐ป๐‘€๐‘‡โ€ฒ\displaystyle=MLP({{A_{M,M^{\prime}}}^{\prime}}^{\tau}{H_{M,T}}^{\prime})= italic_M italic_L italic_P ( italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT )

We start with the dynamic adaption step by generating the representation HM,Tโ€ฒโˆˆโ„mร—5โขdsuperscriptsubscript๐ป๐‘€๐‘‡โ€ฒsuperscriptโ„๐‘š5๐‘‘{H_{M,T}}^{\prime}\in\mathbb{R}^{m\times 5d}italic_H start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— 5 italic_d end_POSTSUPERSCRIPT from the input ๐’ณM,Tโˆˆโ„mร—lsubscript๐’ณ๐‘€๐‘‡superscriptโ„๐‘š๐‘™\mathcal{X}_{M,T}\in\mathbb{R}^{m\times l}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— italic_l end_POSTSUPERSCRIPT through MLPs. ฯ„๐œ\tauitalic_ฯ„ is the transpose of the matrix. After the node embedding enhanced spatial transfer matrix, a MLP is then utilized to map the high-dimensional representation 5โขd5๐‘‘5d5 italic_d to the desired output length l๐‘™litalic_l.

3.3. Long-Term Forecasting Step

The primary objective of this step is to enhance the modelโ€™s long-term forecasting power and to further refine the parameters established during the initial training step. One challenge is to fully utilize the embeddings from limited spatio-temporal data. With the previous dynamic adaptive step, we obtain the well-trained embeddings with the information of adaptation from sensed locations to unsensed locations. Thus, the forecasting structure in this step could leverage this knowledge to achieve a better outcome.

The long-Term Forecasting step is shown in Fig. 3, from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT to ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (blue arrows). The input consists of historical data from sensed sensors ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and predicted historical values at unsensed locations ๐’ณ^Mโ€ฒ,Tsubscript^๐’ณsuperscript๐‘€โ€ฒ๐‘‡\hat{\mathcal{X}}_{M^{\prime},T}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT (from the dynamic adaption step). The output of the long-term forecasting step is the future data for sensed locations ๐’ณ^M,Tโ€ฒsubscript^๐’ณ๐‘€superscript๐‘‡โ€ฒ\hat{\mathcal{X}}_{M,T^{\prime}}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

To keep our modelโ€™s resistance to noise, we first re-rank the flow rates across the sensed data and the predicted historical unsensed data, yielding the rank-based node embedding. As we explained, the noisy prediction could be alleviate by the rank-based node embedding. By incorporating additional ground truth information ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, the modelโ€™s parameters are further fine-tuned and optimized.

3.3.1. Node Embedding Enhanced Spatial Transfer Matrix from all locations to sensed locations

Beyond the advantages highlighted in the dynamic adaption step (from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT), the spatial transfer matrix in this step (from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT to ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT) is capable of learning precise long-term temporal patterns through shared and well-trained parameters. BNv=[BMv,BMโ€ฒv]superscriptsubscript๐ต๐‘๐‘ฃsuperscriptsubscript๐ต๐‘€๐‘ฃsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃB_{N}^{v}=[B_{M}^{v}\,\,,B_{M^{\prime}}^{v}]italic_B start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = [ italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ], where BMvsuperscriptsubscript๐ต๐‘€๐‘ฃB_{M}^{v}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and BMโ€ฒvsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃB_{M^{\prime}}^{v}italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT are learned on previous adaptive step. Besides, the rank-based node embedding ENrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘๐‘Ÿโ€ฒ{E_{N}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT is partially trained by the previous adaptive step too. We obtain the first m๐‘šmitalic_m rank information from EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT. These newly introduced parameters include the forecasting part from the MLP that maps the high-dimensional representation 5โขd5๐‘‘5d5 italic_d to the desired output length lโ€ฒsuperscript๐‘™โ€ฒl^{\prime}italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT. The rank-based embedding EMโ€ฒrโ€ฒsuperscriptsuperscriptsubscript๐ธsuperscript๐‘€โ€ฒ๐‘Ÿโ€ฒ{E_{M^{\prime}}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT is better trained based on these foundations. These two parameters, utilized later in the aggregation step, enhance the effectiveness of subsequent processes.

We use ๐’ณN,Tโ€ฒsuperscriptsubscript๐’ณ๐‘๐‘‡โ€ฒ{\mathcal{X}_{N,T}}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT to represent the input of the forecasting step, the concatenate data ๐’ณN,Tโ€ฒ=[๐’ณM,T,๐’ณ^Mโ€ฒ,T]superscriptsubscript๐’ณ๐‘๐‘‡โ€ฒsubscript๐’ณ๐‘€๐‘‡subscript^๐’ณsuperscript๐‘€โ€ฒ๐‘‡{\mathcal{X}_{N,T}}^{\prime}=[\mathcal{X}_{M,T},\hat{\mathcal{X}}_{M^{\prime},%T}]caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = [ caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT , over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT ], N=[M,Mโ€ฒ]๐‘๐‘€superscript๐‘€โ€ฒN=[M,M^{\prime}]italic_N = [ italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ]. We first obtain the representation HN,Tโ€ฒโˆˆโ„nร—5โขdsuperscriptsubscript๐ป๐‘๐‘‡โ€ฒsuperscriptโ„๐‘›5๐‘‘{H_{N,T}}^{\prime}\in\mathbb{R}^{n\times 5d}italic_H start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— 5 italic_d end_POSTSUPERSCRIPT through the MLPs from ๐’ณN,Tโ€ฒโˆˆโ„nร—lsuperscriptsubscript๐’ณ๐‘๐‘‡โ€ฒsuperscriptโ„๐‘›๐‘™{\mathcal{X}_{N,T}}^{\prime}\in\mathbb{R}^{n\times l}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_l end_POSTSUPERSCRIPT. The rank-based node embedding is ENrโ€ฒ={Eirโ€ฒ|Eirโ€ฒโˆˆโ„d,iโˆˆN}โˆˆโ„nร—dsuperscriptsuperscriptsubscript๐ธ๐‘๐‘Ÿโ€ฒconditional-setsuperscriptsuperscriptsubscript๐ธ๐‘–๐‘Ÿโ€ฒformulae-sequencesuperscriptsuperscriptsubscript๐ธ๐‘–๐‘Ÿโ€ฒsuperscriptโ„๐‘‘๐‘–๐‘superscriptโ„๐‘›๐‘‘{E_{N}^{r}}^{\prime}=\{{E_{i}^{r}}^{\prime}\,|\,{E_{i}^{r}}^{\prime}\in\mathbb%{R}^{d},i\in N\}\in\mathbb{R}^{n\times d}italic_E start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = { italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT | italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_i โˆˆ italic_N } โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT and the node embedding bank is BNv={Biv|Bivโˆˆโ„d,iโˆˆN}โˆˆโ„nร—dsuperscriptsubscript๐ต๐‘๐‘ฃconditional-setsuperscriptsubscript๐ต๐‘–๐‘ฃformulae-sequencesuperscriptsubscript๐ต๐‘–๐‘ฃsuperscriptโ„๐‘‘๐‘–๐‘superscriptโ„๐‘›๐‘‘B_{N}^{v}=\{B_{i}^{v}\,|\,B_{i}^{v}\in\mathbb{R}^{d},i\in N\}\in\mathbb{R}^{n%\times d}italic_B start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = { italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_i โˆˆ italic_N } โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— italic_d end_POSTSUPERSCRIPT.

(6)AN,Mโ€ฒsuperscriptsubscript๐ด๐‘๐‘€โ€ฒ\displaystyle{A_{N,M}}^{\prime}italic_A start_POSTSUBSCRIPT italic_N , italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT=AN,M+(ENrโ€ฒ+BNv)โข(BMvฯ„)absentsubscript๐ด๐‘๐‘€superscriptsuperscriptsubscript๐ธ๐‘๐‘Ÿโ€ฒsuperscriptsubscript๐ต๐‘๐‘ฃsuperscriptsuperscriptsubscript๐ต๐‘€๐‘ฃ๐œ\displaystyle=A_{N,M}+({E_{N}^{r}}^{\prime}+B_{N}^{v})({B_{M}^{v}}^{\tau})= italic_A start_POSTSUBSCRIPT italic_N , italic_M end_POSTSUBSCRIPT + ( italic_E start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ( italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT )
(7)๐’ณ^M,Tโ€ฒsubscript^๐’ณ๐‘€superscript๐‘‡โ€ฒ\displaystyle\hat{\mathcal{X}}_{M,T^{\prime}}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT=MโขLโขPโ€ฒโข(AN,Mโ€ฒฯ„โขHN,Tโ€ฒ)absent๐‘€๐ฟsuperscript๐‘ƒโ€ฒsuperscriptsuperscriptsubscript๐ด๐‘๐‘€โ€ฒ๐œsuperscriptsubscript๐ป๐‘๐‘‡โ€ฒ\displaystyle=MLP^{\prime}({{A_{N,M}}^{\prime}}^{\tau}{H_{N,T}}^{\prime})= italic_M italic_L italic_P start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_N , italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT )

3.4. Aggregation Step

So far, the model has learned how to adapt knowledge from sensed locations to unsensed locations during the dynamic adaptive step (as shown from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT in Fig. 3) and how to perform long-term forecasting in the long-term forecasting step (as shown from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT to ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in Fig. 3). Intuitively, this aggregation step inherits the optimal parameters learned from previous steps and aggregates them together. As Fig. 3 shown, the input comprises three parts, the historical sensed data ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, the predicted history unsensed data ๐’ณ^Mโ€ฒ,Tsubscript^๐’ณsuperscript๐‘€โ€ฒ๐‘‡\hat{\mathcal{X}}_{M^{\prime},T}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, and the predicted future sensed sensor data ๐’ณ^M,Tโ€ฒsubscript^๐’ณ๐‘€superscript๐‘‡โ€ฒ\hat{\mathcal{X}}_{M,T^{\prime}}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. We treat the historical data in all locations ๐’ณN,Tโ€ฒsuperscriptsubscript๐’ณ๐‘๐‘‡โ€ฒ{\mathcal{X}_{N,T}}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT as the concatenation of the first two data. The output is the predicted future data for the unsensed locations ๐’ณ^Mโ€ฒ,Tโ€ฒsubscript^๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\hat{\mathcal{X}}_{M^{\prime},T^{\prime}}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

Following the methodology previous steps, the two parts of input first get their embedding HN,Tโ€ฒโˆˆโ„nร—5โขdsuperscriptsubscript๐ป๐‘๐‘‡โ€ฒsuperscriptโ„๐‘›5๐‘‘{H_{N,T}}^{\prime}\in\mathbb{R}^{n\times 5d}italic_H start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_n ร— 5 italic_d end_POSTSUPERSCRIPT and HM,Tโ€ฒโ€ฒโˆˆโ„mร—5โขdsuperscriptsubscript๐ป๐‘€superscript๐‘‡โ€ฒโ€ฒsuperscriptโ„๐‘š5๐‘‘{H_{M,T^{\prime}}}^{\prime}\in\mathbb{R}^{m\times 5d}italic_H start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT โˆˆ blackboard_R start_POSTSUPERSCRIPT italic_m ร— 5 italic_d end_POSTSUPERSCRIPT. Sensed and unsensed node embedding bank BMvsuperscriptsubscript๐ต๐‘€๐‘ฃB_{M}^{v}italic_B start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and BMโ€ฒvsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃB_{M^{\prime}}^{v}italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, sensed and unsensed rank embedding EMrโ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒ{E_{M}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT and EMโ€ฒrโ€ฒsuperscriptsuperscriptsubscript๐ธsuperscript๐‘€โ€ฒ๐‘Ÿโ€ฒ{E_{M^{\prime}}^{r}}^{\prime}italic_E start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, and MLPโ€™ are trained from previous steps. Thus, they can express the valuable knowledge learned from previous steps to get the optimal model parameter and predicted future data for unsensed locations.

(8)AN,Mโ€ฒโ€ฒsuperscriptsubscript๐ด๐‘superscript๐‘€โ€ฒโ€ฒ\displaystyle{A_{N,M^{\prime}}}^{\prime}italic_A start_POSTSUBSCRIPT italic_N , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT=AN,Mโ€ฒ+(ENrโ€ฒ+BNv)โข(BMโ€ฒvฯ„)absentsubscript๐ด๐‘superscript๐‘€โ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘๐‘Ÿโ€ฒsuperscriptsubscript๐ต๐‘๐‘ฃsuperscriptsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃ๐œ\displaystyle=A_{N,M^{\prime}}+({E_{N}^{r}}^{\prime}+B_{N}^{v})({B_{M^{\prime}%}^{v}}^{\tau})= italic_A start_POSTSUBSCRIPT italic_N , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( italic_E start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ( italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT )
(9)AM,Mโ€ฒโ€ฒsuperscriptsubscript๐ด๐‘€superscript๐‘€โ€ฒโ€ฒ\displaystyle{A_{M,M^{\prime}}}^{\prime}italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT=AM,Mโ€ฒ+(EMrโ€ฒ+BNv)โข(BMโ€ฒvฯ„)absentsubscript๐ด๐‘€superscript๐‘€โ€ฒsuperscriptsuperscriptsubscript๐ธ๐‘€๐‘Ÿโ€ฒsuperscriptsubscript๐ต๐‘๐‘ฃsuperscriptsuperscriptsubscript๐ตsuperscript๐‘€โ€ฒ๐‘ฃ๐œ\displaystyle=A_{M,M^{\prime}}+({E_{M}^{r}}^{\prime}+B_{N}^{v})({B_{M^{\prime}%}^{v}}^{\tau})= italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ( italic_B start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT )
(10)๐’ณ^Mโ€ฒ,Tโ€ฒsubscript^๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\displaystyle\hat{\mathcal{X}}_{M^{\prime},T^{\prime}}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT=ฮฑโˆ—MโขLโขPโ€ฒโข(AN,Mโ€ฒโ€ฒฯ„โขHN,Tโ€ฒ)+(1โˆ’ฮฑ)โˆ—MโขLโขPโ€ฒโข(AM,Mโ€ฒโ€ฒฯ„โขHM,Tโ€ฒโ€ฒ)absent๐›ผ๐‘€๐ฟsuperscript๐‘ƒโ€ฒsuperscriptsuperscriptsubscript๐ด๐‘superscript๐‘€โ€ฒโ€ฒ๐œsuperscriptsubscript๐ป๐‘๐‘‡โ€ฒ1๐›ผ๐‘€๐ฟsuperscript๐‘ƒโ€ฒsuperscriptsuperscriptsubscript๐ด๐‘€superscript๐‘€โ€ฒโ€ฒ๐œsuperscriptsubscript๐ป๐‘€superscript๐‘‡โ€ฒโ€ฒ\displaystyle=\alpha*MLP^{\prime}({{A_{N,M^{\prime}}}^{\prime}}^{\tau}{H_{N,T}%}^{\prime})+(1-\alpha)*MLP^{\prime}({{A_{M,M^{\prime}}}^{\prime}}^{\tau}{H_{M,%T^{\prime}}}^{\prime})= italic_ฮฑ โˆ— italic_M italic_L italic_P start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_N , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ) + ( 1 - italic_ฮฑ ) โˆ— italic_M italic_L italic_P start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_M , italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_ฯ„ end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT )

4. Experiment

4.1. Setting and Datasets

DatasetPEMS03PEMS04PEMS08PEMS-BAYMETR-LA
Areain CA, USA
Time Span9/1/2018 -1/1/2018 -7/1/2016 -1/1/2017 -3/1/2012 -
11/30/20182/28/20188/31/20165/31/20176/30/2012
Time Interval5 min
Number of Locations358307170325207
Number of Time Intervals26,20816,99217,85652,11634,272

4.1.1. Datasets

We use five widely used public traffic flow datasets, METRA-LA, PEMSBAY, PEMS03, PEMS04 and PEMS08 111The datasets are provided in the STSGCN github repository at https://github.com/Davidham3/STSGCN/, and DCRNN github repository at https://github.com/liyaguang/DCRNN.. The details of data statistics are shown in Table 1. All datasets measure the flow rates, i.e., number of passing vehicles at each location during each 5-min interval. We pre-process the flow rates by the z-score normalization before using them as model input. The z-score normalization subtracts the mean rate from each flow rate in a dataset and then divides the result by the ratesโ€™ standard deviation. We split each dataset into three subsets in 3:1:1 ratio for training, validation, and testing. A traffic-forecast model will 1 hour of flow-rate data to predict the future 1, 2, 4, and 8 hours of flow-rate data. Because the time interval for flow-rate measurement is five minutes, each hour has 12 intervals.

Experiments were conducted on a server with AMD EPYC 7742 64-Core Processor @ 2.25 GHz, 500 GB of RAM, and NVIDIA A100 GPU with 80 GB memory.

4.1.2. Hyperparameters

The datasets have different numbers of locations n๐‘›nitalic_n. The numbers of sensed locations m๐‘šmitalic_m and unsensed locations mโ€ฒsuperscript๐‘šโ€ฒm^{\prime}italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT vary in our experiments, with m+mโ€ฒ=n๐‘šsuperscript๐‘šโ€ฒ๐‘›m+m^{\prime}=nitalic_m + italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = italic_n. The number l๐‘™litalic_l of time intervals in model input is 12121212, and the number lโ€ฒsuperscript๐‘™โ€ฒl^{\prime}italic_l start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT of time intervals in model output is 96969696. For the embedding parameters, Ndโขoโขw=7superscript๐‘๐‘‘๐‘œ๐‘ค7N^{dow}=7italic_N start_POSTSUPERSCRIPT italic_d italic_o italic_w end_POSTSUPERSCRIPT = 7, Ntโขoโขd=288superscript๐‘๐‘ก๐‘œ๐‘‘288N^{tod}=288italic_N start_POSTSUPERSCRIPT italic_t italic_o italic_d end_POSTSUPERSCRIPT = 288, and the dimension is d=64๐‘‘64d=64italic_d = 64. We use two layers of CNN with residual connection as the MLP structure in each of the three steps in Figure3. The input passes through one layer of CNN, the relu activation function (agarap2018deep, ), the dropout layer (srivastava2014dropout, ) with 0.15 dropout rate, and then the second layer of CNN. A residual connection (szegedy2017inception, ) of the original input then is added to the result for the final output. ฮฑ๐›ผ\alphaitalic_ฮฑ in the aggregation step is 0.50.50.50.5. During training, we set the batch size at 64, the learning rate at 10โˆ’3superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, and the weight decay at 10โˆ’3superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT for all datasets. The optimizer is AdamW (loshchilov2018decoupled, ). We use Mean Absolute Error (MAE) as the loss function.

4.1.3. Baselines

The baselines for comparison include a representative non-deep-learning Matrix Factorization method (lee2000algorithms, ); long-term traffic forecasting temporal models, such as PatchTST* (nie2022time, ) and iTransformer (liu2023itransformer, ); short-term traffic forecasting spatio-temporal models, such as D2STGNN* (shao2022decoupled, ), FourierGNN* (yi2023fouriergnn, ), STSGCN* (song2020spatial, ), STGODE* (fang2021spatial, ), ST-SSL (ji2023spatio, ), DyHSL* (zhao2023dynamic, ), STID* (shao2022spatial, ), MegaCRN* (jiang2023spatio, ), STEP* (shao2022pre, ), STAEFormer* (liu2023spatio, ), and TESTAM* (lee2024testam, ); and spatial extrapolation models, such as IGNNK (wu2021inductive, ) and STGNP (hu2023graph, ). The long-term forecasting temporal models and short-term forecasting spatio-temporal models need to be adapted from their full sensing designs to work on long-term partial sensing traffic forecast studied in this paper. The adaptation adds fully connected layers to map the feature dimensions from the sensed locations to the unsensed locations. For short-term forecasting models, the output length is increased for 1 hour to 8 hours. These adapted models are identified in later experiments with a suffix *. For the spatial extrapolation models, in order to use them in the context of partial sensing traffic forecast, we need to first apply a full sensing forecast model (such as STAEFormer (liu2023spatio, )) that uses ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to forecast ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and then use the spatial extrapolation models to estimate ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from ๐’ณM,Tโ€ฒsubscript๐’ณ๐‘€superscript๐‘‡โ€ฒ\mathcal{X}_{M,T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. We stress that not all models can be adapted for the partial sensing long-term traffic forecast task. So we have excluded from our comparison the models that either performed very poorly (such as Frigate (10.1145/3580305.3599357, ) which performed worse than all above baselines) or could not be adapted in our experimental setting (such as STSM (su2024spatial, ) which uses the environmental knowledge of each location such as nearby shopping malls). For all baselines, we executed their original codes and adhered to their recommended configurations. The reported results represent the average outcomes from four separate experiments with four random seeds, where under the same random seed, the selection of unsensed locations are the same across all baselines and our work.

Models PEMS03PEMS04PEMS08PEMSBAYMETRLA
mโ€ฒ=250,mโ€ฒ/n=69%formulae-sequencesuperscript๐‘šโ€ฒ250superscript๐‘šโ€ฒ๐‘›percent69m^{\prime}=250,m^{\prime}/n=69\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 250 , italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 69 %mโ€ฒ=250,mโ€ฒ/n=81%formulae-sequencesuperscript๐‘šโ€ฒ250superscript๐‘šโ€ฒ๐‘›percent81m^{\prime}=250,m^{\prime}/n=81\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 250 , italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 81 %mโ€ฒ=150,mโ€ฒ/n=88.2%formulae-sequencesuperscript๐‘šโ€ฒ150superscript๐‘šโ€ฒ๐‘›percent88.2m^{\prime}=150,m^{\prime}/n=88.2\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 150 , italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 88.2 %mโ€ฒ=250,mโ€ฒ/n=76.9%formulae-sequencesuperscript๐‘šโ€ฒ250superscript๐‘šโ€ฒ๐‘›percent76.9m^{\prime}=250,m^{\prime}/n=76.9\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 250 , italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 76.9 %mโ€ฒ=150,mโ€ฒ/n=72.4%formulae-sequencesuperscript๐‘šโ€ฒ150superscript๐‘šโ€ฒ๐‘›percent72.4m^{\prime}=150,m^{\prime}/n=72.4\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 150 , italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 72.4 %
MAERMSEMAPE MAERMSEMAPE MAERMSEMAPE MAERMSEMAPE MAERMSEMAPE
Matrix Factorization (lee2000algorithms, ) 69.01110.17135.41 91.38150.09129.65 76.01132.6485.13 6.2515.6320.21 16.7234.2149.83
PatchTST* (nie2022time, ) 57.7186.33103.23 58.2886.1554.99 43.9064.6828.59 4.588.6612.50 11.6721.8126.86
iTransformer* (liu2023itransformer, ) 67.31103.07127.88 87.44125.0499.1 69.8191.0361.24 5.139.6015.62 13.6522.5827.51
D2STGNN* (shao2022decoupled, ) 37.3161.145.69 47.2975.4148.98 47.3767.435.2 6.6210.9415.52 14.9327.2542.33
FourierGNN* (yi2023fouriergnn, ) 35.5552.9060.43 43.4662.4743.99 44.9762.6634.13 3.837.3610.87 7.8014.3436.41
STSGCN* (song2020spatial, ) 28.5745.9039.17 32.7049.0227.03 31.7449.0021.47 3.126.487.87 4.779.3715.26
STGODE* (fang2021spatial, ) 23.7838.6932.40 27.8142.4525.53 33.0151.2024.20 3.156.227.76 6.4811.4919.79
ST-SSL* (ji2023spatio, ) 24.9842.6136.15 27.7144.3822.07 30.9247.6522.39 3.156.317.86 5.9612.9814.82
DyHSL* (zhao2023dynamic, ) 25.1245.2230.19 29.1746.4422.03 29.1045.5921.87 3.687.6710.06 8.1416.1525.85
STID* (shao2022spatial, ) 21.7039.6724.26 24.1740.7617.14 24.9942.7715.42 3.005.987.43 6.9614.0019.68
MegaCRN* (jiang2023spatio, ) 21.9340.3228.30 24.1539.218.15 29.4147.5720.58 3.096.267.42 5.5010.3618.42
STEP* (shao2022pre, ) 21.8439.9125.82 23.5039.1417.40 28.3546.5117.30 4.849.1313.01 7.112.5125.75
TESTAM* (lee2024testam, ) 23.5443.1227.62 24.1139.1317.11 26.4744.4316.43 3.156.317.40 4.288.6614.05
STAEFormer* (liu2023spatio, ) 22.4741.6026.09 23.1138.3216.35 23.3141.2514.63 2.725.586.50 4.078.4313.05
STAEFormer + IGNNK (wu2021inductive, ) 23.0142.3227.12 23.9739.2117.09 23.9943.1515.32 2.805.906.89 4.328.7814.21
STAEFormer + STGNP (hu2023graph, ) 22.5943.1526.61 23.1038.4016.39 23.3741.3914.91 2.735.606.55 4.198.5813. 29
Our Method STPS 18.4133.5120.69 21.1937.0715.49 22.1837.9614.14 2.314.715.59 3.316.619.45

4.1.4. Location Selection

For partial sensing, we use two methods for the selection of sensed/unsensed locations: random selection and weighted selection. In random selection, the probability of each location to be unsensed is the same, determined by the ratio of mโ€ฒ/nsuperscript๐‘šโ€ฒ๐‘›m^{\prime}/nitalic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n, for given mโ€ฒsuperscript๐‘šโ€ฒm^{\prime}italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT and n๐‘›nitalic_n. In contrast, weighted selection is driven by a rate-based bias. The intuition is that locations of higher flow rates are more valuable for installing permanent sensors. Given the value of mโ€ฒsuperscript๐‘šโ€ฒm^{\prime}italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, we will select nโˆ’mโ€ฒ๐‘›superscript๐‘šโ€ฒn-m^{\prime}italic_n - italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT locations for installing permanent sensors. For each sensor, we pick its location with probabilities in proportion to the flow rates among the locations that have not yet been selected for sensors.

4.1.5. Accuracy Metrics

We evaluate the traffic forecast accuracy of the proposed work and the baseline models by Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).Given the ground truth ๐’ณi,jsubscript๐’ณ๐‘–๐‘—\mathcal{X}_{i,j}caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT and the forecast result ๐’ณ^i,jsubscript^๐’ณ๐‘–๐‘—\hat{\mathcal{X}}_{i,j}over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, iโˆˆMโ€ฒ๐‘–superscript๐‘€โ€ฒi\in M^{\prime}italic_i โˆˆ italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT, jโˆˆTโ€ฒ๐‘—superscript๐‘‡โ€ฒj\in T^{\prime}italic_j โˆˆ italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT,

(11)MAE at timeโขj=MAE at time๐‘—absent\displaystyle\text{MAE at time\ }j=MAE at time italic_j =1mโ€ฒโขโˆ‘i=1mโ€ฒ|๐’ณi,jโˆ’๐’ณ^i,j|1superscript๐‘šโ€ฒsuperscriptsubscript๐‘–1superscript๐‘šโ€ฒsubscript๐’ณ๐‘–๐‘—subscript^๐’ณ๐‘–๐‘—\displaystyle\frac{1}{m^{\prime}}\sum\limits_{i=1}^{m^{\prime}}\left|\mathcal{%X}_{i,j}-\hat{\mathcal{X}}_{i,j}\right|divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_ARG โˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT | caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT |
(12)MAPE at timeโขj=MAPE at time๐‘—absent\displaystyle\text{MAPE at time\ }j=MAPE at time italic_j =(1mโ€ฒโขโˆ‘i=1mโ€ฒ|๐’ณi,jโˆ’๐’ณ^i,j๐’ณi,j|)โˆ—100%1superscript๐‘šโ€ฒsuperscriptsubscript๐‘–1superscript๐‘šโ€ฒsubscript๐’ณ๐‘–๐‘—subscript^๐’ณ๐‘–๐‘—subscript๐’ณ๐‘–๐‘—percent100\displaystyle\Big{(}\frac{1}{m^{\prime}}\sum\limits_{i=1}^{m^{\prime}}\left|%\frac{\mathcal{X}_{i,j}-\hat{\mathcal{X}}_{i,j}}{\mathcal{X}_{i,j}}\right|\Big%{)}*100\%( divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_ARG โˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT | divide start_ARG caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG | ) โˆ— 100 %
(13)RMSE at timeโขj=RMSE at time๐‘—absent\displaystyle\text{RMSE at time\ }j=RMSE at time italic_j =1mโ€ฒโขโˆ‘i=1mโ€ฒ(๐’ณi,jโˆ’๐’ณ^i,j)21superscript๐‘šโ€ฒsuperscriptsubscript๐‘–1superscript๐‘šโ€ฒsuperscriptsubscript๐’ณ๐‘–๐‘—subscript^๐’ณ๐‘–๐‘—2\displaystyle\sqrt{\frac{1}{m^{\prime}}\sum\limits_{i=1}^{m^{\prime}}\Big{(}%\mathcal{X}_{i,j}-\hat{\mathcal{X}}_{i,j}\Big{)}^{2}}square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_ARG โˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over^ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

4.2. Traffic Forecast Accuracy

Table 2 compares our method STPS with the baselines on traffic forecast accuracy in terms of average MAE, MAPE, and RMSE over 96 future intervals (i.e., 8 hours). All experiments were conducted using the weighted selection method. The number of unsensed locations is set to 250 for larger datasets including PEMS03, PEMS04 and PEMSBAY, and it is reduced to 150 for PEMS08 and METRLA, which have fewer locations. The percentage of locations that are unsensed ranges from 69% to 88.2%. We will vary the number of unsensed locations and study how it will impact the accuracy shortly. The table shows that our model outperforms all baseline models in traffic forecast accuracy, for example, achieving a 28%percent2828\%28 % improvement in MAPE against the best baseline result over the METRLA dataset.

The non-deep-learning matrix factorization method performs poorly probably because it is unable to capture complex spatial-temporal dependencies. The long-term traffic forecasting models, PatchTST* and iTransformer*, do not sufficiently model spatial correlation patterns, thus resulting in relatively poor performance. Among all the baseline models, STAEFormer* โ€” which is a short-term spatial temporal traffic forecasting model โ€” achieves the best overall performance, benefiting from its sophisticated embedding techniques (e.g., node embedding, time-of-day and day-of-week embedding), coupled with a robust transformer structure. TESTAM* performs close to STAEFormer*, thanks to its informative MOE architecture. STID* also exhibits strong performance by leveraging similar embedding techniques as STAEFormer*. FourierGNN* shows relatively weak performance, likely due to its approach to treat spatio-temporal data purely as a graph, which is inadequate to capture the crucial temporal correlations for long-range forecasting.The spatial extrapolation models, IGNNK and STGNP, were originally designed for extrapolate to a relatively modest extent, such as 30%percent3030\%30 %, in contrast to our experimental setting of extrapolation towards 69-88.2% of locations that are unsensed from only 12.8-31% of locations that are sensed. In addition, they use STAEFormer for extrapolating in the temporal dimension from one hour data to eight hours data. Overall, they perform slightly worse than STAEFormer. In comparison, our model STPS performs better than all baselines as expected because this is the first work on the problem of partial sensing long-term traffic forecast with a targeted overall design and particularly, a novel three-steps model as shown in Figure3 and a new rank-based embedding.

4.3. Number of Unsensed Locations

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (4)

Figure 4 compares our method STPS with the baselines on forecast accuracy in terms of RMSE over dataset PEMS08 with respect to a varying number of unsensed locations from 50 to 100 to 150. The left plot uses the random selection method, and the right plot uses the weighted selection method. In each plot, for each number of unsensed locations on the horizontal axis, a group of bars present the average RMSE values of the Full Sensing Benchmark (leftmost), STID*, โ€ฆ, and our method STPS (rightmost), respectively.

The Full Sensing Benchmark tries to establish some sort of accuracy bound that a partial sensing model such as STPS can possibly achieve. We reason that a partial sensing model without the recent knowledge of unsensed locations, i.e., ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, should not beat a full sensing model with the knowledge of ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT. This knowledge of unsensed locations is hypothetical, and thus the Full Sensing Benchmark is not implementable, but it gives us an accuracy bound for a partial sensing model. Table 3 shows the forecast accuracy of the full-sensing baselines on dataset PEMS08, assuming the knowledge of ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, such that they can use ๐’ณN,Tsubscript๐’ณ๐‘๐‘‡\mathcal{X}_{N,T}caligraphic_X start_POSTSUBSCRIPT italic_N , italic_T end_POSTSUBSCRIPT to infer ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, where N=M+Mโ€ฒ๐‘๐‘€superscript๐‘€โ€ฒN=M+M^{\prime}italic_N = italic_M + italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT. Among the full-sensing baselines, PDFormer performs the best in RMSE and thus its results are used as the Full Sensing Benchmark in Figure 4.

Models PEMS08
MAE RMSE MAPE
STID(shao2022spatial, ) 19.59 32.20 13.05
PDFormer (10.1609/aaai.v37i4.25556, ) 18.58 31.86 12.64
STAEFormer (liu2023spatio, ) 18.77 32.73 12.10
TESTAM (lee2024testam, ) 18.90 32.81 12.45
MegaCRN (jiang2023spatio, ) 19.99 33.67 13.09
STEP (shao2022pre, ) 18.97 32.98 12.66
D2STGNN (shao2022decoupled, ) 34.50 46.13 29.96
FourierGNN (yi2023fouriergnn, ) 43.71 62.66 38.70
ST-SSL(ji2023spatio, ) 24.35 37.31 17.59
DyHSL(zhao2023dynamic, ) 26.04 39.28 20.09

Note that (1) the vertical axis of Figure 4, RMSE, begins from 27.5, not zero, and (2) it ends at 45.0, while the higher bars (which go beyond the figure) have their heights shown at the top of the figure.

First, the figure shows that the choice between the random selection method and the weighted selection method does not cause significant difference in forecast accuracy. It suggests that preference of installing sensors at locations of larger flow rates does not improve performance significantly over random selection of sensor locations.

Second, our model STPS outperforms the baselines under different numbers of unsensed locations, consistent with the results in Table2. Although the performance of STPS is worse than the Full Sensing Benchmark (as is expected), when the ratio of unsensed locations to all locations is relatively low, e.g., mโ€ฒ=50superscript๐‘šโ€ฒ50m^{\prime}=50italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 50 and mโ€ฒ/n=50/170=29.4%superscript๐‘šโ€ฒ๐‘›50170percent29.4m^{\prime}/n=50/170=29.4\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 50 / 170 = 29.4 %, the accuracy of STPS is very close to the Full Sensing Benchmark. Even when the ratio of unsensed locations to all locations is high, e.g., mโ€ฒ=150superscript๐‘šโ€ฒ150m^{\prime}=150italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 150 and mโ€ฒ/n=150/170=88.2%superscript๐‘šโ€ฒ๐‘›150170percent88.2m^{\prime}/n=150/170=88.2\%italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT / italic_n = 150 / 170 = 88.2 %, STPS is about 10%percent1010\%10 % less accurate than the Full Sensing Benchmark.This comparison demonstrates the validity of the partial sensing approach in traffic forecasting:By installing a significantly smaller number of permanent sensors (e.g., m=nโˆ’mโ€ฒ=170โˆ’150=20๐‘š๐‘›superscript๐‘šโ€ฒ17015020m=n-m^{\prime}=170-150=20italic_m = italic_n - italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 170 - 150 = 20), we can achieve long-term traffic forecasting accuracy within about 10% of what the full sensing models can achieve under a much more expensive deployment of installing sensors at all locations.

We have performed similar experiments in all other datasets, with results in terms of MAE and MAPE as well. While we do not include these numerous figures, beside numerical differences, they all confirm similar conclusions as we draw from Figure 4.

4.4. Forecast Length

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (5)

Our partial sensing long-term traffic forecast experiments output traffic estimates for 8 hours (i.e., 96 time intervals) into the future. The forecast length refers to a specific time interval in in the output. For example, when we consider a forecast length of 1 hour (i.e., the 12th interval), we refer to the traffic estimates for the last interval of the first hour into the future.

Fig. 10 compares our method with the baselines and the hypothetical Full Sensing Benchmark on forecast accuracy in terms of RMSE over dataset PEMS08 with respect to a varying forecast length of 1, 2, 4, and 8 hours. The figure shows that our method significantly outperforms all baselines at different forecast lengths. This result complements the average accuracy comparison in Table 2 by providing more detailed comparison in forecast accuracy over time. Interestingly, the accuracy of our method converges towards its accuracy bound of the hypothetical Full Sensing Benchmark as we stretch the forecast length from 1 hour to 8 hours, suggesting the suitability of STPS as a long-term traffic estimator.

Again we have performed similar experiments in all other datasets, with results in terms of MAE and MAPE as well. They lead to similar conclusions. The same is true for other experiments in the sequel as we cannot present the results on all datasets over all metrics due to space limitation.

4.5. Ablation Study

MethodMAERMSEMAPE
Our Method STPS16.5829.2217.85
1 Step (see Fig. 3, from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT)18.4732.0818.83
2 Step (from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT, then from ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT to ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT)17.2430.1818.06
Plain Spatial Transfer Matrix17.8032.3517.87
No Spatial Transfer Matrix17.7632.3517.85
No Rank-based Node Embedding17.3931.9517.89

To investigate the effect of different components of STPS, we conduct ablation experiments on PEMS08 with several different variants of our model.1-Step: This variant maintains the node embedding enhanced spatial transfer matrix, along with the rank-based node embedding technique. However, it simplifies the model to a single-step structure. Referring to Fig. 3, it directly uses the past data from the sensed locations, ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT, to predict the future unsensed data ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, without involving other supervised data. 2-Steps: Compared to STPS, the 2-steps model first estimates the past unsensed data ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT and then aggregates ๐’ณM,Tsubscript๐’ณ๐‘€๐‘‡\mathcal{X}_{M,T}caligraphic_X start_POSTSUBSCRIPT italic_M , italic_T end_POSTSUBSCRIPT and the estimate of ๐’ณMโ€ฒ,Tsubscript๐’ณsuperscript๐‘€โ€ฒ๐‘‡\mathcal{X}_{M^{\prime},T}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT to predict the future unsensed data ๐’ณMโ€ฒ,Tโ€ฒsubscript๐’ณsuperscript๐‘€โ€ฒsuperscript๐‘‡โ€ฒ\mathcal{X}_{M^{\prime},T^{\prime}}caligraphic_X start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Plain Spatial Transfer Matrix: We retain the STPSโ€™s 3-steps structure, but use a learnable matrix to add up with the partial adjacency matrix, instead of the node embedding bank Bvsuperscript๐ต๐‘ฃB^{v}italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and the rank-based node embedding Ersuperscript๐ธ๐‘ŸE^{r}italic_E start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT in (4). No Spatial Transfer Matrix: This model variant excludes the spatial transfer matrix of STPS. Instead, it employs an MLP to map the input sensor dimensions to the desired output dimensions, akin to the method we used for adapting the full sensing models for the partial sensing task. No Rank-based Node Embedding: We remove the rank-based node embedding, while keeping all other embeddings as well as the 3-steps structure.

Table 4 compares the proposed STPS and its variants above in terms of average MAE, MAPE and RMSE over 8-hour traffic forecast on dataset PEMS08, under the weighted selection of unsensed locations with mโ€ฒ=50superscript๐‘šโ€ฒ50m^{\prime}=50italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 50. Comparing STPS (3 steps) with 1-Step and 2-Steps, we can see that our 3-steps training process keeps improving forecast accuracy as more supervised information is used. Comparing STPS with No Rank-based Node Embedding, we show that the rank-based node embedding improves the average forecast accuracy. The significance of the ranked-based embedding will be further revealed in the next subsection when we take a deeper look at its impact. Comparing STPS with Plain Spatial Transfer Matrix and No Spatial Transfer Matrix, we show that our node embedding enhanced spatial transfer matrix is more effective than simple or no spatial transfer matrix.

4.6. Rank-based Node Embedding

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (6)

Extending the comparison between STPS and No Rank-based Node Embedding (i.e., STPS without rank) in Table 4, we present additional RMSE results in Fig. 6 with a varying number mโ€ฒsuperscript๐‘šโ€ฒm^{\prime}italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT of unsensed locations from 50 to 150. It shows that the performance gap between STPS and its variant without rank increases as the value of mโ€ฒsuperscript๐‘šโ€ฒm^{\prime}italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT increases. This demonstrates that rank-based node embedding is more beneficial when the number of unsensed locations is high.

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (7)

One of our purposes of introducing the rank-based node embedding is to help the STPS model capture irregular traffic patterns. The hypothesis is that irregular patterns causes significant forecast errors even though they do not happen frequently. Therefore, with rank-based embedding, we should observe more worst-case accuracy improvement than average-case in Table 4. To verify this hypothesis, we sort all 8 hours of traffic forecasts in the descending order of MAE, and then group them into 20 bins, each with 5% of all forecasts. We compute the average MAE in each bin for STPS, do the same for its variant without rank, and then compute the percentage improvement in average MAE of STPS over its variant without rank. The results are presented in Figure7, where STPS consistently improves over its variant without rank over all bins shown. The improvement amongst the first 5% percent forecasts is close to 35%percent3535\%35 %, where the largest irregularities are suspected to reside.

4.7. Robustness Study

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (8)

To demonstrate robustness contributed by rank-based order embedding to our model, we conduct an experiment that introduces noise into the training dataset and tested the model using the noise-free test dataset. This addition of noise is intended to simulate sensor noise, where a sensor generates a measurement that deviates from the true flow rate. The noise follows a zero-mean i.i.d. Gaussian distribution, ๐’ฉโข(0,ฮณ2)๐’ฉ0superscript๐›พ2\mathcal{N}(0,\gamma^{2})caligraphic_N ( 0 , italic_ฮณ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where ฮณ2โˆˆ{0,10,50,100}superscript๐›พ201050100\gamma^{2}\in\{0,10,50,100\}italic_ฮณ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT โˆˆ { 0 , 10 , 50 , 100 }, ฮณ2=0superscript๐›พ20\gamma^{2}=0italic_ฮณ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 represents the scenario without noise.

The experimental results are presented in Fig. 8. For STPS without rank-based node embedding, its performance significantly deteriorates as the level of noise increases. In contrast, STPS (with rank-based node embedding) demonstrates remarkable resistance to noise, with much smaller performance degradation as the noise level rises.

4.8. Efficiency Study

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (9)

Fig. 9 compares our STPSโ€™ efficiency with other models, in the context of efficiency-accuracy tradeoff. Experiments were conducted on a server with AMD EPYC 7742 64-Core Processor @ 2.25 GHz, 500 GB of RAM, and NVIDIA A100 GPU with 80 GB memory. The batch size is uniformly set to 16. We record the total training times and the average RMSE values over 8 hours of traffic forecast on dataset PEMS08 with mโ€ฒ=50superscript๐‘šโ€ฒ50m^{\prime}=50italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 50. Our model achieves the best RMSE among all models with relatively low computational cost. This dual advantage of high accuracy and efficiency sets our model apart in the realm of partial sensing traffic forecast. On the one hand, STID*, FourierGNN*, STGODE*, and STSSL* exhibit commendable efficiency, yet they fall short in achieving the accuracy level demonstrated by our STPS. On the other hand, MegaCRN*, TESTAM*, STAEFormer*, STAEFormer+IGNNK, STAEFormer+STGNP, STSGCN*, DyHSL*, D2STGNN*, STEP*, and PDFormer* show slower training speed and worse accuracy than our STPS. The balance of efficiency and accuracy makes STPS an attractive solution for real-world applications in intelligent traffic management.

4.9. Parameter Sensitivity

Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (10)

We study the parameter sensitivity of our model STPS with two experiments on the average RMSE over 8 hours of traffic forecast on dataset PEMS08 with mโ€ฒ=150superscript๐‘šโ€ฒ150m^{\prime}=150italic_m start_POSTSUPERSCRIPT โ€ฒ end_POSTSUPERSCRIPT = 150. One parameter is ฮฑ๐›ผ\alphaitalic_ฮฑ, the ratio factor of the two parts in the aggregation step; the other is the embedding dimension d๐‘‘ditalic_d. From Fig.10, the RMSE at ฮฑ=0.5๐›ผ0.5\alpha=0.5italic_ฮฑ = 0.5 is slightly better than the RMSEs at other ฮฑ๐›ผ\alphaitalic_ฮฑ values. The RMSE at d=64๐‘‘64d=64italic_d = 64 is slightly better than the RMSEs at other d๐‘‘ditalic_d values. The reason is that when the dimension d๐‘‘ditalic_d is smaller than 64, STPS may not fully capture the complex spatio-temporal patterns; when d๐‘‘ditalic_d is larger than 64, STPS may suffer from overfitting.

5. Conclusion

In this paper, we introduce a novel Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic, learned through multiple training steps. This approach leverages limited sensed data to enable long-term forecasting. Within STPS, we introduce a new rank-based node embedding to capture irregular traffic patterns, resist traffic noises, and deal with rate fluctuations. We use the node embedding enhanced spatial transfer matrix to enhance data representation across varying phases by shared parameters. Extensive experiments conducted on several real-world traffic datasets demonstrate that our proposed method outperforms the state-of-the-art and achieves superior prediction accuracy in partial sensing long-term traffic forecasting.

References

  • (1)Agarap, A.F.Deep learning using rectified linear units (relu).arXiv preprint arXiv:1803.08375 (2018).
  • (2)Appleby, G., Liu, L., and Liu, L.-P.Kriging convolutional networks.In Proceedings of the AAAI Conference on Artificial Intelligence (2020), vol.34, pp.3187โ€“3194.
  • (3)Balasubramanian, P., and Rao, K.R.An adaptive long-term bus arrival time prediction model with cyclic variations.Journal of Public Transportation 18, 1 (2015), 1โ€“18.
  • (4)Chen, R.T., Rubanova, Y., Bettencourt, J., and Duvenaud, D.K.Neural ordinary differential equations.Advances in neural information processing systems 31 (2018).
  • (5)Cheng, Z., Pang, M.-S., and Pavlou, P.A.Mitigating traffic congestion: The role of intelligent transportation systems.Information Systems Research 31, 3 (2020), 653โ€“674.
  • (6)Deng, J., Chen, X., Jiang, R., Yin, D., Yang, Y., Song, X., and Tsang, I.W.Learning structured components: Towards modular and interpretable multivariate time series forecasting.arXiv preprint arXiv:2305.13036 (2023).
  • (7)Fang, Z., Long, Q., Song, G., and Xie, K.Spatial-temporal graph ode networks for traffic flow forecasting.In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021), pp.364โ€“373.
  • (8)Gupta, M., Kodamana, H., and Ranu, S.Frigate: Frugal spatio-temporal forecasting on road networks.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2023), KDD โ€™23, Association for Computing Machinery, p.649โ€“660.
  • (9)Hu, J., Liang, Y., Fan, Z., Chen, H., Zheng, Y., and Zimmermann, R.Graph neural processes for spatio-temporal extrapolation.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2023), pp.752โ€“763.
  • (10)Ji, J., Wang, J., Huang, C., Wu, J., Xu, B., Wu, Z., Zhang, J., and Zheng, Y.Spatio-temporal self-supervised learning for traffic flow prediction.In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol.37, pp.4356โ€“4364.
  • (11)Ji, J., Wang, J., Jiang, Z., Jiang, J., and Zhang, H.Stden: Towards physics-guided neural networks for traffic flow prediction.In Proceedings of the AAAI Conference on Artificial Intelligence (2022), vol.36, pp.4048โ€“4056.
  • (12)Jia, J., and Benson, A.R.Residual correlation in graph neural network regression.In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp.588โ€“598.
  • (13)Jiang, J., Han, C., Zhao, W.X., and Wang, J.Pdformer: propagation delay-aware dynamic long-range transformer for traffic flow prediction.In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (2023), AAAIโ€™23/IAAIโ€™23/EAAIโ€™23, AAAI Press.
  • (14)Jiang, R., Wang, Z., Yong, J., Jeph, P., Chen, Q., Kobayashi, Y., Song, X., f*ckushima, S., and Suzumura, T.Spatio-temporal meta-graph learning for traffic forecasting.Proceedings of the AAAI Conference on Artificial Intelligence 37, 7 (Jun. 2023), 8078โ€“8086.
  • (15)Jiang, R., Wang, Z., Yong, J., Jeph, P., Chen, Q., Kobayashi, Y., Song, X., f*ckushima, S., and Suzumura, T.Spatio-temporal meta-graph learning for traffic forecasting.In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol.37, pp.8078โ€“8086.
  • (16)Jin, G., Li, F., Zhang, J., Wang, M., and Huang, J.Automated dilated spatio-temporal synchronous graph modeling for traffic prediction.IEEE Transactions on Intelligent Transportation Systems (2022).
  • (17)Jin, Y., Chen, K., and Yang, Q.Transferable graph structure learning for graph-based traffic forecasting across cities.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2023), pp.1032โ€“1043.
  • (18)Jing-Tao, S., and Qiu-Yu, Z.Completion of multiview missing data based on multi-manifold regularised non-negative matrix factorisation.Artificial Intelligence Review 53, 7 (2020), 5411โ€“5428.
  • (19)Kidger, P.On neural differential equations.arXiv preprint arXiv:2202.02435 (2022).
  • (20)Kidger, P., Morrill, J., Foster, J., and Lyons, T.Neural controlled differential equations for irregular time series.Advances in Neural Information Processing Systems 33 (2020), 6696โ€“6707.
  • (21)Laรฑa, I., Olabarrieta, I.I., Vรฉlez, M., and DelSer, J.On the imputation of missing data for road traffic forecasting: New insights and novel techniques.Transportation research part C: emerging technologies 90 (2018), 18โ€“33.
  • (22)Lee, D., and Seung, H.S.Algorithms for non-negative matrix factorization.Advances in neural information processing systems 13 (2000).
  • (23)Lee, H., and Ko, S.TESTAM: A time-enhanced spatio-temporal attention model with mixture of experts.In The Twelfth International Conference on Learning Representations (2024).
  • (24)Li, M., Sheng, L., Song, Y., and Song, J.An enhanced matrix completion method based on non-negative latent factors for recommendation system.Expert Systems with Applications 201 (2022), 116985.
  • (25)Li, M., and Zhu, Z.Spatial-temporal fusion graph neural networks for traffic flow forecasting.Proceedings of the AAAI Conference on Artificial Intelligence 35, 5 (May 2021), 4189โ€“4196.
  • (26)Li, X., Li, H., Lu, H., Jensen, C.S., Pandey, V., and Markl, V.Missing value imputation for multi-attribute sensor data streams via message propagation.Proceedings of the VLDB Endowment 17, 3 (2023), 345โ€“358.
  • (27)Li, Y., Yu, R., Shahabi, C., and Liu, Y.Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.In International Conference on Learning Representations (2018).
  • (28)Liu, H., Dong, Z., Jiang, R., Deng, J., Deng, J., Chen, Q., and Song, X.Spatio-temporal adaptive embedding makes vanilla transformer sota for traffic forecasting.In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023), pp.4125โ€“4129.
  • (29)Liu, Y., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., and Long, M.itransformer: Inverted transformers are effective for time series forecasting.In The Twelfth International Conference on Learning Representations (2023).
  • (30)Liu, Z., Shojaee, P., and Reddy, C.K.Graph-based multi-ODE neural networks for spatio-temporal traffic forecasting.Transactions on Machine Learning Research (2023).
  • (31)Loshchilov, I., and Hutter, F.Decoupled weight decay regularization.In International Conference on Learning Representations (2019).
  • (32)Lu, B., Gan, X., Zhang, W., Yao, H., Fu, L., and Wang, X.Spatio-temporal graph few-shot learning with cross-city knowledge transfer.In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022), pp.1162โ€“1172.
  • (33)Mallick, T., Balaprakash, P., Rask, E., and Macfarlane, J.Transfer learning with graph neural networks for short-term highway traffic forecasting.In 2020 25th International Conference on Pattern Recognition (ICPR) (2021), IEEE, pp.10367โ€“10374.
  • (34)Nie, Y., Nguyen, N.H., Sinthong, P., and Kalagnanam, J.A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730 (2022).
  • (35)Ouyang, X., Yang, Y., Zhou, W., Zhang, Y., Wang, H., and Huang, W.Citytrans: Domain-adversarial training with knowledge transfer for spatio-temporal prediction across cities.IEEE Transactions on Knowledge and Data Engineering (2023).
  • (36)Schirmer, M., Eltayeb, M., Lessmann, S., and Rudolph, M.Modeling irregular time series with continuous recurrent units.In International Conference on Machine Learning (2022), PMLR, pp.19388โ€“19405.
  • (37)Shao, Z., Zhang, Z., Wang, F., Wei, W., and Xu, Y.Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting.In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022), pp.4454โ€“4458.
  • (38)Shao, Z., Zhang, Z., Wang, F., and Xu, Y.Pre-training enhanced spatial-temporal graph neural network for multivariate time series forecasting.In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022), pp.1567โ€“1577.
  • (39)Shao, Z., Zhang, Z., Wei, W., Wang, F., Xu, Y., Cao, X., and Jensen, C.S.Decoupled dynamic spatial-temporal graph neural network for traffic forecasting.Proceedings of the VLDB Endowment 15, 11 (2022), 2733โ€“2746.
  • (40)Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., and Dean, J.Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538 (2017).
  • (41)SHI, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-k., and WOO, W.-c.Convolutional lstm network: A machine learning approach for precipitation nowcasting.In Advances in Neural Information Processing Systems (2015), C.Cortes, N.Lawrence, D.Lee, M.Sugiyama, and R.Garnett, Eds., vol.28, Curran Associates, Inc.
  • (42)Song, C., Lin, Y., Guo, S., and Wan, H.Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting.In Proceedings of the AAAI Conference on Artificial Intelligence (2020), vol.34, pp.914โ€“921.
  • (43)Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research 15, 1 (2014), 1929โ€“1958.
  • (44)Su, X., Qi, J., Tanin, E., Chang, Y., and Sarvi, M.Spatial-temporal forecasting for regions without observations.Advances in Database Technology - EDBT 27, 3 (2024), 488โ€“500.
  • (45)Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.Inception-v4, inception-resnet and the impact of residual connections on learning.In Proceedings of the AAAI conference on artificial intelligence (2017), vol.31.
  • (46)Tang, Y., Qu, A., Chow, A.H., Lam, W.H., Wong, S., and Ma, W.Domain adversarial spatial-temporal network: a transferable framework for short-term traffic forecasting across cities.In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022), pp.1905โ€“1915.
  • (47)Veeriah, V., Zhuang, N., and Qi, G.-J.Differential recurrent neural networks for action recognition.In Proceedings of the IEEE international conference on computer vision (2015), pp.4041โ€“4049.
  • (48)Wang, B., Wang, P., Zhang, Y., Wang, X., Zhou, Z., Bai, L., and Wang, Y.Towards dynamic spatial-temporal graph learning: A decoupled perspective.In Proceedings of the AAAI Conference on Artificial Intelligence (2024), vol.38, pp.9089โ€“9097.
  • (49)Wu, H., Hu, T., Liu, Y., Zhou, H., Wang, J., and Long, M.Timesnet: Temporal 2d-variation modeling for general time series analysis.In The eleventh international conference on learning representations (2022).
  • (50)Wu, Y., Zhuang, D., Labbe, A., and Sun, L.Inductive graph neural networks for spatiotemporal kriging.In Proceedings of the AAAI Conference on Artificial Intelligence (2021), vol.35, pp.4478โ€“4485.
  • (51)Wu, Z., Pan, S., Long, G., Jiang, J., and Zhang, C.Graph wavenet for deep spatial-temporal graph modeling.In Proceedings of the 28th International Joint Conference on Artificial Intelligence (2021), IJCAIโ€™19, AAAI Press, p.1907โ€“1913.
  • (52)Xu, D., Wei, C., Peng, P., Xuan, Q., and Guo, H.Ge-gan: A novel deep learning framework for road traffic state estimation.Transportation Research Part C: Emerging Technologies 117 (2020), 102635.
  • (53)Yi, K., Zhang, Q., Fan, W., He, H., Hu, L., Wang, P., An, N., Cao, L., and Niu, Z.Fouriergnn: Rethinking multivariate time series forecasting from a pure graph perspective.In Thirty-seventh Conference on Neural Information Processing Systems (2023).
  • (54)Yu, B., Yin, H., and Zhu, Z.Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.arXiv preprint arXiv:1709.04875 (2017).
  • (55)Zeng, A., Chen, M., Zhang, L., and Xu, Q.Are transformers effective for time series forecasting?In Proceedings of the AAAI conference on artificial intelligence (2023), vol.37, pp.11121โ€“11128.
  • (56)Zhang, J., Zheng, Y., and Qi, D.Deep spatio-temporal residual networks for citywide crowd flows prediction.Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017).
  • (57)Zhang, W., Zhang, P., Yu, Y., Li, X., Biancardo, S.A., and Zhang, J.Missing data repairs for traffic flow with self-attention generative adversarial imputation net.IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 7919โ€“7930.
  • (58)Zhao, Y., Luo, X., Ju, W., Chen, C., Hua, X.-S., and Zhang, M.Dynamic hypergraph structure learning for traffic flow forecasting.ICDE.
  • (59)Zheng, C., Fan, X., Wang, C., Qi, J., Chen, C., and Chen, L.Increase: Inductive graph representation learning for spatio-temporal kriging.In Proceedings of the ACM Web Conference 2023 (2023), pp.673โ€“683.
Spatio-Temporal Partial Sensing Forecast for Long-term Traffic (2024)
Top Articles
APยฎ Statistics FAQ: Everything You Need to Know for 2024
Is AP Statistics Hard? A Complete Guide | BestColleges
Skylar Vox Bra Size
Using GPT for translation: How to get the best outcomes
Camera instructions (NEW)
Tmf Saul's Investing Discussions
Ffxiv Palm Chippings
Aadya Bazaar
Vaya Timeclock
PRISMA Technik 7-10 Baden-Wรผrttemberg
Teamexpress Login
Roblox Character Added
Detroit Lions 50 50
South Bend Tribune Online
ATV Blue Book - Values & Used Prices
The fabulous trio of the Miller sisters
Youravon Comcom
History of Osceola County
Invert Clipping Mask Illustrator
Wausau Obits Legacy
Keurig Refillable Pods Walmart
Sizewise Stat Login
Rqi.1Stop
Timeforce Choctaw
Jc Green Obits
Www.dunkinbaskinrunsonyou.con
Form F-1 - Registration statement for certain foreign private issuers
THE FINALS Best Settings and Options Guide
Lines Ac And Rs Can Best Be Described As
Shoe Station Store Locator
Darrell Waltrip Off Road Center
Inter Miami Vs Fc Dallas Total Sportek
Royalfh Obituaries Home
Kqelwaob
Issue Monday, September 23, 2024
Shiftwizard Login Johnston
Craigslist Com Humboldt
Rise Meadville Reviews
Myfxbook Historical Data
Craigslist List Albuquerque: Your Ultimate Guide to Buying, Selling, and Finding Everything - First Republic Craigslist
Jack In The Box Menu 2022
Registrar Lls
Coroner Photos Timothy Treadwell
18006548818
How To Customise Mii QR Codes in Tomodachi Life?
The Horn Of Plenty Figgerits
Air Sculpt Houston
Dancing Bear - House Party! ID ? Brunette in hardcore action
Yosemite Sam Hood Ornament
Craigslist Pets Charleston Wv
Enjoy Piggie Pie Crossword Clue
Craigslist Psl
Latest Posts
Article information

Author: Jerrold Considine

Last Updated:

Views: 5886

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Jerrold Considine

Birthday: 1993-11-03

Address: Suite 447 3463 Marybelle Circles, New Marlin, AL 20765

Phone: +5816749283868

Job: Sales Executive

Hobby: Air sports, Sand art, Electronics, LARPing, Baseball, Book restoration, Puzzles

Introduction: My name is Jerrold Considine, I am a combative, cheerful, encouraging, happy, enthusiastic, funny, kind person who loves writing and wants to share my knowledge and understanding with you.