A Simulation Study of Cross-Length Transfer of Non-Adjacent Dependencies based on Simple Recurrent Networks

Zhang Ruhai; Guo Xiuyan; Ling Xiaoli; Zheng Li; Jiang Shan; Zoltan Dienes

doi:10.16719/j.cnki.1671-6981.20260203

PDF(620 KB)

Journal of Psychological Science ›› 2026, Vol. 49 ›› Issue (2) : 282-288. DOI: 10.16719/j.cnki.1671-6981.20260203

A Simulation Study of Cross-Length Transfer of Non-Adjacent Dependencies based on Simple Recurrent Networks

Zhang Ruhai ¹^,² ,
Guo Xiuyan ³^,⁴ ,
Ling Xiaoli ⁵^,⁶ ,
Zheng Li ³^,⁴^,^** ,
Jiang Shan ⁷^,^** ,
Zoltan Dienes ⁸

Author information +

History +

Abstract

This study investigated whether simple recurrent networks (SRNs) could learn abstract non-adjacent dependencies and generalize them across sequences of different lengths. Building on previous findings that highlight the human ability to unconsciously acquire and transfer non-adjacent structural dependencies (Jiang & Guan, 2018), the present research aimed to evaluate whether SRNs could similarly internalize such structures and apply them flexibly to novel sequences. This could provide insights into the modeling of implicit learning and transfer processes.

SRNs were trained using tonal sequences derived from the “level/oblique” (ping/ze) categorizations, reflecting prior cognitive categories available to human participants. The network architecture included input, hidden, and output layers, with feedback loops enabling temporal integration. A total of 150 SRN models were constructed by systematically varying three key parameters: the number of hidden units (5, 10, 15, 30, 60, or 120), learning rate (.1,.3,.5,.7, or.9), and momentum (.1,.3,.5,.7, or.9). Each model was subjected to 25 independent training sessions initialized with random weights, resulting in 3,750 simulations.

Models were exclusively trained on sequences of length 10 and subsequently tested on sequences of lengths 8, 10, and 12. Learning performance was assessed using cosine similarity scores between the network outputs and target sequences, and z-scores were calculated to quantify discrimination performance between grammatical and ungrammatical strings. Human benchmark data were sourced from Jiang and Guan (2018). Human learning effects were defined as the mean difference in discrimination index (d') between experimental and control groups, framed within ±1 standard error (SE) as the typical human performance range.

The results revealed that trained SRNs significantly outperformed untrained models across all sequence lengths, confirming the successful acquisition of the nonlocal dependencies. Furthermore, a notable number of SRNs exhibited discrimination performance that fell within the typical human range: 35 models for the 8-element test set, 23 models for the 10-element set, and 38 models for the 12-element set. Notably, several SRN models demonstrated consistent human-like behavior across both trained and novel lengths. Specifically, five models aligned with human data in both the 8- and 10-length tests, and four models aligned in both the 10- and 12-length tests.

These findings suggest that, under specific parameter settings, SRNs were capable not only of learning abstract non-adjacent dependencies but also of transferring them flexibly to structurally novel sequences. Compared to earlier studies, which primarily demonstrated SRNs’ learning fixed-length correspondences, this study highlighted SRNs’ potential to acquire variable-variable mappings, reflecting the concept of “operations over variables” proposed by Marcus (2001). This indicated a more abstract level of generalization than previously reported, showing that SRNs may implicitly capture underlying structural principles rather than merely memorizing surface patterns.

The introduction of tonal category labels (ping/ze) as non-terminal markers likely provided a cognitive scaffold that facilitated the abstraction of structural rules. This approach mirrored how human learners leveraged prior conceptual knowledge to enhance statistical learning, offering insights into the interaction between prior knowledge and the acquisition of novel patterns.

From a computational modeling perspective, the results implied that SRNs, despite their architectural simplicity, could mimic key aspects of human implicit learning, including structural abstraction and transfer. Furthermore, the ability of some SRNs to perform comparably to humans under specific conditions supported the use of SRNs as viable models for studying the cognitive mechanisms underlying implicit knowledge acquisition and generalization.

This study broadly contributed to bridging cognitive psychology and artificial intelligence research. The findings suggested that relatively simple recurrent architectures possess latent capacities for flexible generalization, an essential feature for developing AI systems capable of human-like learning. Additionally, by examining SRNs’ behavior on non-finite-state structures resembling those found in natural language, the study enriched our understanding of how internal memory dynamics support the processing of complex structures.

Overall, the present work advanced the field by systematically demonstrating the capacity of SRNs for abstract, nonlocal dependency learning and structural transfer. It provides empirical evidence for their utility in modeling implicit learning processes and contributes to theoretical foundations of future cognitive and AI modeling efforts.

Key words

simple recurrent networks / non-adjacent dependencies / transfer

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Zhang Ruhai , Guo Xiuyan , Ling Xiaoli , et al . A Simulation Study of Cross-Length Transfer of Non-Adjacent Dependencies based on Simple Recurrent Networks[J]. Journal of Psychological Science. 2026, 49(2): 282-288 https://doi.org/10.16719/j.cnki.1671-6981.20260203

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]

戴惠, 朱传林, 刘电芝. (2018). 内隐知识具有抽象性吗?——来自内隐序列学习迁移的证据. 心理学报, 50(9), 965-974.

https://doi.org/10.3724/SP.J.1041.2018.00965

Cited in this article [1] Abstract

内隐知识是否具有抽象和概括性, 已有研究有着不同的争议, 而迁移是检验习得知识是否具有抽象性的有效手段。探索RSI从0 ms至1000 ms中5种条件下内隐序列学习的迁移差异, 并试图证实随着RSI的变化, 迁移发生从无到有的变化, 以迁移来证明内隐知识的抽象性。结果发现：随着RSI的增加, 迁移出现了从无到有的质变, 证明了内隐知识可具有抽象性; 内隐序列学习效应和转移组段的新异刺激效应共同促进了迁移的产生, 纯粹的内隐序列学习是产生迁移的必要非充分条件, 转移组段(新异刺激)则加速促进了内隐知识的学习; 本实验条件下产生的不可知但可迁移的内隐知识具有边缘意识特点。

[2]	郭秀艳. (2003). 内隐学习和缄默知识. 教育研究, 12, 31-36. Cited in this article [1]

[3]	郭秀艳, 杨治良. (2002). 内隐学习的研究历程. 心理发展与教育, 17(3), 85-90. Cited in this article [1]

[4]	姜珊, 关守义. (2018). 汉语声调水平映射规则的内隐学习及其长度迁移效应. 心理学探新, 38(4), 326-332. [Jiang S., Guan S.(2018) Implicit Learning and Length Transfer Effect of Chinese Tonal Symmetry. Psychological Exploration, 38(4), 326-332.] Cited in this article [4]

[5]	李菲菲, 刘宝根. (2018). 远距离规则的内隐学习使用了何种记忆存储器: 来自神经网络模拟的证据. 心理科学, 41(4), 796. Cited in this article [9]

[6]

孙鹏, 李雪晴, 张庆云, 尚怀乾, 凌晓丽. (2022). 睡眠对知觉与动作序列内隐学习离线巩固效应的影响. 心理学报, 54(12), 1467-1480

https://doi.org/10.3724/SP.J.1041.2022.01467

Cited in this article [1] Abstract

离线阶段发生的学习被称为离线巩固, 即在最初获得知识之后, 即使没有额外的练习, 其记忆痕迹也会保持稳定或提高。有研究初步探究了睡眠对知觉和动作序列内隐学习离线巩固的影响, 然而, 这些研究未能实现知觉序列与动作序列的完全分离, 序列类型是否调节睡眠对内隐序列学习离线巩固的影响仍需进一步探讨。此外, 既往外显学习的研究发现相对于简单的序列, 复杂的序列更容易从睡眠中获益, 表现出基于睡眠的离线巩固效应。睡眠对知觉序列与动作序列内隐学习离线巩固的影响是否会受到序列复杂程度的调节尚不明确。为此, 本研究在完全分离知觉序列和动作序列的情况下, 通过3个实验操纵序列的长度及结构, 设置3种不同复杂程度的序列规则, 考察了这一问题。结果发现, 对于动作序列, 序列规则复杂程度较低时, 无论是否经过睡眠都会发生离线巩固效应, 而当动作序列规则较为复杂时, 只有经历睡眠才会引起离线巩固效应; 对于知觉序列, 无论何种难易程度的规则, 均未发生离线巩固效应。上述结果表明内隐序列知识基于睡眠的离线巩固会受到序列类型及序列复杂程度的调节, 这为内隐学习的离线巩固争论提供了新的视角。

[7]	Altmann G. T. M., & Dienes Z. (1999). Rule learning by seven-month-old infants and neural networks. Science, 284, 875. https://doi.org/10.1126/science.284.5416.875a https://www.science.org/doi/10.1126/science.284.5416.875a Cited in this article [1]

[8]	Brooks, L. R. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp.169-211). Lawrence Erlbaum. Cited in this article [1]

[9]	Chomsky N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 9, 113-124. Cited in this article [1]

[10]	Cleeremans A., & McClelland J. L. (1991). Learning the structure of event sequences. Journal of Experimental Psychology: General, 120(3), 235. Cited in this article [1]

[11]	Cleeremans A., & Dienes Z. (2008). Computational models of implicit learning. In R. Sun (Ed.), Cambridge handbook of computational psychology (pp.396-421). Cambridge University Press. Cited in this article [1]

[12]	Dienes, Z. (1993). Computational models of implicit learning. In D. Berry & Z. Dienes (Eds.), Implicit learning: Theoretical and empirical issues. Lawrence Erlbaum Associates. Cited in this article [1]

[13]

Dienes

, Altmann

G. T. M.

, & Gao

S. J.

(1999). Mapping across domains without feedback: A neural network model of transfer of implicit knowledge. Cognitive Science, 23(1), 53-82.

https://doi.org/10.1207/s15516709cog2301_3

https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog2301_3

Cited in this article [1]

[14]

Dienes

, & Longuet-Higgins

(2004). Can musical transformation be implicitly learned? Cognitive Science, 28, 531-558.

https://doi.org/10.1207/s15516709cog2804_2

https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog2804_2

Cited in this article [2] Abstract

The dominant theory of what people can learn implicitly is that they learn chunks of adjacent elements in sequences. A type of musical grammar that goes beyond specifying allowable chunks is provided by serialist or 12‐tone music. The rules constitute operations over variables and could not be appreciated as such by a system that can only chunk elements together. A series of studies investigated the extent to which people could implicitly (or explicitly) learn the structures of serialist music. We found that people who had no background in atonal music did not learn the structures, but highly selected participants with an interest in atonal music could implicitly learn to detect melodies instantiating the structures. The results have implications for both theorists of implicit learning and composers who may wish to know which structures they put into a piece of music can be appreciated.

[15]	Elman J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195-225. https://doi.org/10.1023/A:1022699029236 Cited in this article [3]

[16]

Jamieson

R. K.

, & Mewhort

D. J. K.

(2011). Grammaticality is inferred from global similarity: A reply to Kinder (2010). Quarterly Journal of Experimental Psychology, 64, 209-216.

https://doi.org/10.1080/17470218.2010.537932

https://journals.sagepub.com/doi/10.1080/17470218.2010.537932

Cited in this article [1] Abstract

Jamieson and Mewhort (2009b) proposed an account of performance in the artificial-grammar judgement-of-grammaticality task based on Hintzman's (1986) model of retrieval, Minerva 2. In the account, each letter is represented by a unique vector of random elements, and each exemplar is represented by concatenating its constituent letter vectors. Although successful in simulating several experiments, Kinder (2010) showed that the model fails for three selected experiments. We track the model's failure to a constraint introduced by concatenating letter vectors to construct the exemplar representation. To fix the problem, we use a holographic representation. Holographic representation not only provides the flexibility missing with the concatenation scheme but also acknowledges variability in what subjects notice when they inspect training exemplars. Armed with holographic representations, we show that the model successfully captures the three problematic data sets. We argue for retrospective accounts, like the present one, that acknowledge subjects’ skill in drawing unexpected inferences based on memory of studied items against prospective accounts that require subjects to learn statistical regularities in the training set in anticipation of an undefined classification test.

[17]

Jiang

, Zhu

, Guo

, Ma

, Yang

. & Dienes

(2012). Unconscious structural knowledge of tonal symmetry: Tang poetry redefines limits of implicit learning. Consciousness and Cognition, 21, 476-486.

https://doi.org/10.1016/j.concog.2011.12.009

https://www.ncbi.nlm.nih.gov/pubmed/22273573

Cited in this article [1] Abstract

The study aims to help characterize the sort of structures about which people can acquire unconscious knowledge. It is already well established that people can implicitly learn n-grams (chunks) and also repetition patterns. We explore the acquisition of unconscious structural knowledge of symmetry. Chinese Tang poetry uses a specific sort of mirror symmetry, an inversion rule with respect to the tones of characters in successive lines of verse. We show, using artificial poetry to control both n-gram structure and repetition patterns, that people can implicitly learn to discriminate inversions from non-inversions, presenting a challenge to existing models of implicit learning.Copyright © 2012 Elsevier Inc. All rights reserved.

[18]

Kinder

(2000). The knowledge acquired during artificial grammar learning: Testing the predictions of two connectionist models. Psychological Research-Psychologische Forschung, 63(2), 95-105.

https://www.ncbi.nlm.nih.gov/pubmed/10946584

Cited in this article [1] Abstract

An artificial grammar learning experiment is reported which investigated whether three types of information are learned during this kind of task: information about the positions of single letters, about fragments of training strings, and about entire training strings. Results indicate that participants primarily learned information about string fragments and, to a lesser extent, information about positions of letters. Two connectionist models, an autoassociator and a simple recurrent network (SRN), were tested on their ability to account for these results. In the autoassociator simulations, similarity of test items to entire training items had a large effect, which was at variance with the experimental results. The results of the SRN simulations almost perfectly matched the experimental ones.

[19]	Kovacs A. M., & Endress A. D. (2014). Hierarchical processing in seven-month-old infants. Infancy, 19(4), 409-425. https://doi.org/10.1111/infa.2014.19.issue-4 https://onlinelibrary.wiley.com/toc/15327078/19/4 Cited in this article [1]

[20]

Kuhn

, & Dienes

(2008). Learning non-local dependencies. Cognition, 106, 184-206.

https://www.ncbi.nlm.nih.gov/pubmed/17343839

Cited in this article [4] Abstract

This paper addresses the nature of the temporary storage buffer used in implicit or statistical learning. Kuhn and Dienes [Kuhn, G., and Dienes, Z. (2005). Implicit learning of nonlocal musical rules: implicitly learning more than chunks. Journal of Experimental Psychology-Learning Memory and Cognition, 31(6) 1417-1432] showed that people could implicitly learn a musical rule that was solely based on non-local dependencies. These results seriously challenge models of implicit learning that assume knowledge merely takes the form of linking adjacent elements (chunking). We compare two models that use a buffer to allow learning of long distance dependencies, the Simple Recurrent Network (SRN) and the memory buffer model. We argue that these models - as models of the mind - should not be evaluated simply by fitting them to human data but by determining the characteristic behaviour of each model. Simulations showed for the first time that the SRN could rapidly learn non-local dependencies. However, the characteristic performance of the memory buffer model rather than SRN more closely matched how people came to like different musical structures. We conclude that the SRN is more powerful than previous demonstrations have shown, but it's flexible learned buffer does not explain people's implicit learning (at least, the affective learning of musical structures) as well as fixed memory buffer models do.

[21]	Ling X., Sun P., Zhao L., Jiang S., Lu Y., Cheng X., & Zheng L. (2022). Neural basis of the implicit learning of complex artificial grammar with nonadjacent dependencies. Journal of Cognitive Neuroscience, 34(12), 2375-2389. Cited in this article [1]

[22]	Magnuson J. S., & Luthra S. (2024). Simple recurrent networks are interactive. Psychonomic Bulletin and Review, 32(3) 1-9. https://doi.org/10.3758/s13423-024-02535-y Cited in this article [3]

[23]	Marcus G. (2001). The algebraic mind: Integrating connectionism and cognitive science. MIT Press. Cited in this article [1]

[24]	Mienye I. D., Swart T. G., & Obaido G. (2024). Recurrent neural networks: A comprehensive review of architectures, variants, and applications. Information, 15(9), 517. Cited in this article [1]

[25]

Miyamoto

Y. R.

, Wang

, & Smith

M. A.

(2020). Implicit adaptation compensates for erratic explicit strategy in human motor learning. Nature Neuroscience, 23(3), 443-455.

https://doi.org/10.1038/s41593-020-0600-3

https://www.ncbi.nlm.nih.gov/pubmed/32112061

Cited in this article [1] Abstract

Sports are replete with strategies, yet coaching lore often emphasizes 'quieting the mind', 'trusting the body' and 'avoiding overthinking' in referring to the importance of relying less on high-level explicit strategies in favor of low-level implicit motor learning. We investigated the interactions between explicit strategy and implicit motor adaptation by designing a sensorimotor learning paradigm that drives adaptive changes in some dimensions but not others. We find that strategy and implicit adaptation synergize in driven dimensions, but effectively cancel each other in undriven dimensions. Independent analyses-based on time lags, the correlational structure in the data and computational modeling-demonstrate that this cancellation occurs because implicit adaptation effectively compensates for noise in explicit strategy rather than the converse, acting to clean up the motor noise resulting from low-fidelity explicit strategy during motor learning. These results provide new insight into why implicit learning increasingly takes over from explicit strategy as skill learning proceeds.

[26]

Nazzi

, Jusczyk

P. W.

, & Johnson

E. K.

(2000). Language discrimination by English-learning 5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language, 43(1), 1-19.

https://doi.org/10.1006/jmla.2000.2698

https://linkinghub.elsevier.com/retrieve/pii/S0749596X00926986

Cited in this article [1]

[27]	Nguyen D., & Widrow B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the 1990 International Joint Conference on Neural Networks (Vol. 3, pp. 21-26), IEEE. Cited in this article [1]

[28]	Pothos E. M. (2007). Theories of artificial grammar learning. Psychological Bulletin, 133(2), 227. Cited in this article [1]

[29]	Reber A. S. (1967). Implicit learning of artificial grammar. Journal of Verbal Learning and Verbal Behaviour, 6, 855-863. Cited in this article [2]

[30]	Reber A. S. (1969). Transfer of syntactic structure in synthetic languages. Journal of Experimental Psychology, 81(1), 115-119. https://doi.org/10.1037/h0027454 https://doi.apa.org/doi/10.1037/h0027454 Cited in this article [1]

[31]

Reed

(2019). Building bridges between AI and cognitive psychology. AI Magazine, 40(2), 17-28.

https://doi.org/10.1609/aimag.v40i2.2853

Cited in this article [1] Abstract

My goal in this article is to encourage greater integration of the fields of AI and cognitive psychology by reviewing work on shared interests. I begin with examples that link my early research related to AI with my current efforts to organize knowledge in the cognitive sciences. I then describe how cognitive psychologists have contributed to the methods explained in The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (Domingos, 2015), including how these methods can be combined. The final section discusses three benefits of building bridges: using computational models in AI as theoretical models in cognitive psychology, solving joint computational problems, and facilitating the interactions between people and machines.

[32]

Rohrmeier

, & Rebuschat

(2012). Implicit learning of music: What do we know today. Topics in Cognitive Science, 4, 525-553.

https://doi.org/10.1111/j.1756-8765.2012.01223.x

https://www.ncbi.nlm.nih.gov/pubmed/23060126

Cited in this article [2] Abstract

Implicit learning is a core process for the acquisition of a complex, rule-based environment from mere interaction, such as motor action, skill acquisition, or language. A body of evidence suggests that implicit knowledge governs music acquisition and perception in nonmusicians and musicians, and that both expert and nonexpert participants acquire complex melodic, harmonic, and other features from mere exposure. While current findings and computational modeling largely support the learning of chunks, some results indicate learning of more complex structures. Despite the body of evidence, more research is required to support the cross-cultural validity of implicit learning and to show that core and more complex music theoretical features are acquired implicitly.Copyright © 2012 Cognitive Science Society, Inc.

[33]	Timmermans B., & Cleeremans A. (2000). Rules vs. statistics in biconditional grammar learning: A simulation based on Shanks et al. (1997). Proceedings of the Twenty-Second Annual Conference of Cognitive Science Society, 22, 512-517 Cited in this article [1]

[34]

Wang

, Feng

, Fu

, Wang

, Sun

, Fu

, & Yi

(2021). A dual simple recurrent network model for chunking and abstract processes in sequence learning. Frontiers in Psychology, 12, 587405.

https://doi.org/10.3389/fpsyg.2021.587405

https://www.frontiersin.org/articles/10.3389/fpsyg.2021.587405/full

Cited in this article [2] Abstract

Although many studies have provided evidence that abstract knowledge can be acquired in artificial grammar learning, it remains unclear how abstract knowledge can be attained in sequence learning. To address this issue, we proposed a dual simple recurrent network (DSRN) model that includes a surface SRN encoding and predicting the surface properties of stimuli and an abstract SRN encoding and predicting the abstract properties of stimuli. The results of Simulations 1 and 2 showed that the DSRN model can account for learning effects in the serial reaction time (SRT) task under different conditions, and the manipulation of the contribution weight of each SRN accounted for the contribution of conscious and unconscious processes in inclusion and exclusion tests in previous studies. The results of human performance in Simulation 3 provided further evidence that people can implicitly learn both chunking and abstract knowledge in sequence learning, and the results of Simulation 3 confirmed that the DSRN model can account for how people implicitly acquire the two types of knowledge in sequence learning. These findings extend the learning ability of the SRN model and help understand how different types of knowledge can be acquired implicitly in sequence learning.