Simian Documentation
Download Simian and extract the binary file, then use these instructions to run the Similarity Analyzer tool:
Command Line Interface
Simian's command line interface allows you to run it from a shell, shell script, or batch file, scanning a directory for all files matching a pattern.
The general form for the Java version is:
java -jar simian.jar [options] [files]
The files can be specified as any regular shell glob or simply a list of files and can be
mixed with the -includes option. (See below for examples.)
For example, to find all java files in all sub-directories of the current directory:
"**/*.java"
To find all java files in the current directory and set the threshold to 3:
-threshold=3 "*.java"
To find all C# files in the current directory: "*.cs"
To find all C and header in all sub-directories of the current directory: **/*.c **/*.h
To find all java files in two different directories:
"/dir1/*.java" "/dir2/*.java"
To find all java files in all sub-directories, excluding Test classes:
-includes=**/*.java -excludes=**/*Test.java
To find all java files in the current directory and ignore numbers:
-ignoreNumbers "*.java"
To find all Ruby files and display the results in xml format:
-formatter=xml "*.rb"
To find all Ruby files and sends the results in emacs compatible format to a file:
-formatter=emacs:c:\temp\simian.log "*.rb"
To read configuration from a file (where each line of the file specifies at most one of any of the
valid command-line arguments): -config=simian.config
Notes
The default VM size seems to be adequate for most projects. If you encounter the following error you
will need to increase the VM heap size using the -mx JVM option.
Exception in thread "main" java.lang.OutOfMemoryError
Gradle Task
This method allows you to integrate Simian with Gradle Build Tool, an open source build system for Java, Android, and Kotlin development environments.
The following examples show neccessary steps to use Simian for checking code duplication in a Java project.
Note: Adjust the path of Simian jar to suit your environment. It is recommended to host the jar in a Maven repo instead of a local folder shown here.
Add Simian to the list of dependencies in your build.gradle.kts file:
dependencies {
...
implementation("simian:simian:4.0.0")
...
}
Define a simianCheck task to run the checker. For all defaults:
tasks.register<JavaExec>("simianCheck") {
val inclFiles = sourceSets["main"].java.srcDirs.map { "-includes=${it.path}/**/*.java" }
mainClass.set("-jar")
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
*inclFiles.toTypedArray(),
)
}
To exclude test classes if they exists in the same tree as the source:
val exclFiles = sourceSets["main"].java.srcDirs.map { "-excludes=${it.path}/**/*Test.java" }
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
*inclFiles.toTypedArray(),
*exclFiles.toTypedArray(),
)
To change the minimum number of lines that is considered a match:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
*inclFiles.toTypedArray(),
"-threshold=6",
)
To force the language used for processing:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
"-includes=path/to/sources/**/*.*",
"-language=java",
)
To have the build fail one or more matches are found:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
"-includes=path/to/sources/**/*.*",
"-failOnDuplication",
)
To set a build property if one or more matches are found:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
"-includes=path/to/sources/**/*.*",
"-failureProperty=test.failure",
)
By default, Simian produces output in plain text. You can override this by using the
nested formatter element. The formatter takes a type (either
"plain"; "xml"; "emacs"; "vs"; or
"yaml") and an optional filename (toFile). For example, to send output to a
file:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
"-includes=path/to/sources/**/*.*",
"-formatter=plain:simian-log.txt",
)
To produce XML output:
...
args(layout.projectDirectory.file("libs/simian-4.0.0.jar").asFile.absolutePath,
"-includes=path/to/sources/**/*.*",
"-formatter=xml:simian-log.xml",
)
You may specify any number of formatter elements allowing you to produce both XML and plain text output if necessary.
Simian Processing Options
| Option | Languages | Default | Possible values | Description |
|---|---|---|---|---|
| formatter | all | none | plain, xml, emacs, vs (visual studio), yaml, null | Specifies the format in which processing results will be produced. |
| threshold | all | 6 | integer >= 2 | Matches will contain at least the specified number of lines. |
| language | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes all files are in the specified language |
| defaultLanguage | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes files are in the specified language if none can be inferred |
| failOnDuplication | all | true | boolean | Causes the checker to fail the current process if duplication is detected |
| reportDuplicateText | all | false | boolean | Prints the duplicate text in reports |
| ignoreBlocks | all | none | string | Ignores all lines between specified START/END markers |
| ignoreCurlyBraces | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Curly braces are ignored. |
| ignoreIdentifiers | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | Completely ignores all identfiers. |
| ignoreIdentifierCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean |
Matches identifiers irrespective of case. Eg. MyVariableName and
myvariablename would both match.
|
| ignoreRegions | C# | false | boolean | Ignore lines between #region/#endregion. |
| ignoreStrings | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | MyVariable and myvariablewould both match. |
| ignoreStringCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | true | boolean | "Hello, World" and "HELLO, WORLD" would both match. |
| ignoreNumbers | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | int x = 1; and int x = 576; would both match. |
| ignoreCharacters | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | 'A' and 'Z'would both match. |
| ignoreCharacterCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | 'A' and 'a'would both match. |
| ignoreLiterals | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | 'A', "one" and 27.8would all match. |
| ignoreSubtypeNames | Java, C, Groovy | false | boolean |
BufferedReader, StringReader and Reader would all match.
|
| ignoreModifiers | Java, C#, C, C++, JavaScript, Groovy | true | boolean | public, protected, static, etc. |
| ignoreVariableNames | Java, C, Groovy | false | boolean |
Completely ignores variable names (field, parameter and local). Eg.
int foo = 1; and int bar = 1 would both match
|
| balanceParentheses | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one. |
| balanceCurlyBraces | Ruby | false | boolean | Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one. |
| balanceSquareBrackets | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false. |
Simian recognizes the following file extensions:
| Language | Extensions |
|---|---|
| Java | java |
| C Sharp | cs, c#, csharp |
| C | c, h, m |
| C++ | cpp, hpp, cplusplus, inl |
| Ruby | rb, ruby |
| COBOL | cobol |
| ABAP | abap |
| XML | xml, xsl, xsd |
| Jakarta Server Pages | jsp |
| ASP | asp |
| JavaScript | js, javascript |
| HTML | html, htm |
| Visual Basic | vb, bas, cls, frm |
| Lisp | lisp, lsp |
| Groovy | groovy |
| Plain Text | default when language cannot be determined |