Machine Learning Book Recommendations

I am often asked to recommend books on machine learning (ML).  Here is a WIP list of my recommendations: http://www.amazon.com/gp/registry/wishlist/1TLVK1WKJF47S/

 

Posted in Uncategorized | Leave a comment

Using AWS Lambda Alias with API Gateway

Using AWS lambda with specific alias through API Gateway; as described here with a command like:

aws lambda add-permission --function-name arn:aws:lambda:us-east-1:xxxxxx:function:getData --source-arn arn:aws:execute-api:us-east-1:xxxxxx:t463p6g84d/*/POST/data --principal apigateway.amazonaws.com --statement-id  --action lambda:InvokeFunction

Resulted in an error

aws: error: argument operation: Invalid choice, valid choices are:

The cause was the following could be due to the following:

The CLI command must be issued with credentials that have permission to call the “add-permission” action of the Lambda APIs.

However; I still couldn’t quite get it to work.  What did work; was adding `API Endpoint` from Lambda’s console; which assigns the necessarily permissions; seems that you can change endpoint later on and the permission will still work (I guess they are not attached to a specific endpoint).  Also don’t forget to re-deploy your API once you’ve made the changes.

 

 

keywords:

Add Permission to Lambda Function

You defined your Lambda function as a stage variable; you must manually give permissions to all the functions you will use. You can do this by running the below AWS CLI command for each function, replacing the stage variable in the function-name parameter with the necessary function name.

aws lambda add-permission –function-name arn:aws:lambda:us-east-1:1111111111:function:getSectorData:${stageVariables.lambdaAlias} –source-arn arn:aws:execute-api:us-east-1:1111111111:111/*/POST/data –principal apigateway.amazonaws.com –statement-id 111-4f1f-b5e5-111 –action lambda:InvokeFunction

 

 

Posted in Uncategorized | Leave a comment

AWS Lambda invoking from code

You can invoke a lambda from your code (details); you’d probably want to use the AWSLambdaAsyncClient.

CAUTION: you have to be very careful not to make re-cursive calls (unless that is your intention); otherwise you might get into an infinite loop.  Even worse if you are making several recursive calls you might exponentially increase the number of lambda instances that are running.

Here is a small snippet attempting to ensure that you are not making direct-recursive calls (note that you might still end up with indirectly recursive calls); use at your own risk; add the following to your handler function:

if (context.getFunctionName.equals(nextFunctionName)){
  throw new RuntimeException( context.getFunctionName + "is making a recursive lambda call")
}

 

 

 

keywords: call lambda from java scala code directly

 

 

 

Posted in Uncategorized | Leave a comment

DynamoDB: batch delete

Excerpt from my program; adapt as needed:

 

val client = new AmazonDynamoDBClient()
val dynamo = new DynamoDB(client)

val scanRequest = new ScanRequest()
  .withTableName(LocationRecord.TABLE_NAME)

val items = client.scan(scanRequest).getItems
items.grouped(25).foreach(group => {  // DynamoDB: Member must have length less than or equal to 25
  val delItems = new TableWriteItems(LocationRecord.TABLE_NAME)
  group.foreach(item => delItems.addPrimaryKeyToDelete(new PrimaryKey(LocationRecord.FLD_ID, item.get(LocationRecord.FLD_ID).getS)))
  dynamo.batchWriteItem(delItems)
})

 

see also: http://stackoverflow.com/questions/9154264/what-is-the-recomended-way-to-delete-a-large-number-of-items-from-dynamodb

keywords: aws dynamodb delete all records rows

Posted in Uncategorized | Leave a comment

does not map a @DynamoDBHashKey attribute; ensure a public, zero-parameter get method/field is annotated

Problem:

“`
Class does not map a @DynamoDBHashKey attribute; ensure a public, zero-parameter get method/field is annotated
com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingException: Class does not map a @DynamoDBHashKey attribute; ensure a public, zero-parameter get method/field is annotated
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingsRegistry$Mappings.getHashKey(DynamoDBMappingsRegistry.java:245)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.needAutoGenerateAssignableKey(DynamoDBMapper.java:682)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:708)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:669)
“`

Solution:

Need to add `@beanGetter` to: `@(DynamoDBHashKey @beanGetter)`

e.g.: `@(DynamoDBHashKey @beanGetter)(attributeName=”ID”) @BeanProperty var ID: String = null`

Problem:

“`
The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: )
com.amazonaws.AmazonServiceException: The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: )
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2000)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1970)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.updateItem(AmazonDynamoDBClient.java:1798)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper$SaveObjectHandler.doUpdateItem(DynamoDBMapper.java:1095)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper$2.executeLowLevelRequest(DynamoDBMapper.java:795)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper$SaveObjectHandler.execute(DynamoDBMapper.java:974)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:824)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:669)
“`

Solution:

`attributeName` need to be provided for `DynamoDBHashKey` explicitly (even if it matches the field name).

e.g.: @(DynamoDBHashKey @beanGetter)(attributeName="ID") @BeanProperty var ID: String = null

 

 

com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingException: Class org.activeintel.al.crawler.LocationRecord does not map a @DynamoDBHashKey attribute; ensure a public, zero-parameter get method/field is annotated
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingsRegistry$Mappings.getHashKey(DynamoDBMappingsRegistry.java:245)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.needAutoGenerateAssignableKey(DynamoDBMapper.java:682)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:708)
at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.save(DynamoDBMapper.java:669)

Posted in Uncategorized | Leave a comment

AWS Lambda: extending execution time limit

Currently AWS Lambda limits execution time to 5 min.  However, in quite a few cases it is possible to overcome this limitation by simply re-invoking your function in an iterative manner.  Here are brief details.

You can keep a tab on remaining time with:   com.amazonaws.services.lambda.runtime.Context#getRemainingTimeInMillis

Once you are almost out of time you can simply re-invoke your own function directly from lambda (details); make sure to use the AWSLambdaAsyncClient and pass to your function whatever it needs to continue; e.g. resumptionPoint, intermediateResults etc.  And make sure you don’t get into an infinite loops (its going to be expensive).

 

 

 

keywords

Maximum execution duration per request 300 seconds

 

 

Posted in Uncategorized | 1 Comment

Using Google Scholar for managing your publications bibliography

Google Scholar (GS) is a very nifty tool.   However some of the features are not intuitive.

Export Your Publications from your profile

You can do that by clicking the checkbox (next to Title) in the header (export icon will appear once checkbox is clicked).gs_export

Update/Edit Citation Information

Sometimes GS gets citations to some of the articles wrong.  To edit it, you have to jump through a couple of hoops.  Click on the “+ Add” buttongs-add

Then click on “Add article manually”

gs-add-paper

Fill in the details.

Then go back to the list of your papers; select the new one (and the one that needs to be updated); and click merge.

 

 

 

bib bibtex zotero mendeley

edit citation information

Posted in Uncategorized | Leave a comment

AWS Lambda Reduce Jar Size

AWS Lambda limits the size of jar to 50 MB.

There are some things you can do to reduce the size of your jar.

use [[ProGuard|http://proguard.sourceforge.net/]]

! Summary
configuration:

`shrink.proguard`

to make fat jar:

`sbt assembly`

to make test jar

`sbt test:package`

to shrink file

`java -jar /opt/proguard/lib/proguard.jar @shrink.proguard`

to test that jar works:

`scala -J-Xmx4G -J-Xms4G -cp “/home/neil/.ivy2/cache/org.scalatest/scalatest_2.11/bundles/scalatest_2.11-2.2.6.jar:./target/scala-2.11/neatapp-assembly-0.1-SNAPSHOT_shrunk.jar:./target/scala-2.11/neatapp_2.11-0.1-SNAPSHOT-tests.jar” org.scalatest.run org.activeintel.MySpec`
! Configuring ProGuard

`shrink.proguard`

“`conf
-injars your-assembly-0.1-SNAPSHOT.jar
-outjars your-assembly-0.1-SNAPSHOT_shrunk.jar
-libraryjars /usr/lib/jvm/java-8-oracle/jre/lib/rt.jar
-dontobfuscate
-dontwarn
-dontoptimize
-keep class org.activeintel.** {
public protected private *;
}
-keep class org.something.else.** {
public protected private *;
}
“`

! Test

This step is very important to check the sanity of your shrunk jar.

!! Shrinking

First build the fat jar: `sbt assembly`

Then shrink it:

“`
scala -J-Xmx4G -J-Xms4G -cp “/home/neil/.ivy2/cache/org.scalatest/scalatest_2.11/bundles/scalatest_2.11-2.2.6.jar:/home/neil/tmp/agric/sector-graph-extractor-assembly-0.1-SNAPSHOT_shrunk.jar:/home/neil/tmp/agric/test-cases/target/scala-2.11/sector-graph-extractor-test_2.11-0.1-SNAPSHOT.jar” org.scalatest.run org.activeintel.sector.graph.extractor.SectorGraphExtractorSpec
“`
!! Packaging Tests in a jar

modify `build.sbt` [[ref|http://stackoverflow.com/questions/16389446/compile-tests-with-sbt-and-package-them-to-be-run-later]]
“`sbt
publishArtifact in (Test, packageBin) := true // jar of tests: `sbt test:package`
“`

to package run:

`sbt test:package`

keywords: scala sbt reduce jar size shrink aws lambda smaller remove unused dependencies

Posted in Uncategorized | Leave a comment

scala: case class: overloaded method constructor with alternatives: cannot be applied

I had a case class with overloaded constructor (needed for dynamoDB); kept getting the following error:

overloaded method constructor with alternatives:
cannot be applied to (Double)

cause: some of the fields are without default values; hence the error

in some cases it could also be in part due to case class limit of max 22 fields

Posted in Uncategorized | Leave a comment

TiddlyWiki MarkDown

I am a huge fan of tiddlywiki, if you haven’t yet do give a try.

It has a nice integration with markdown, making it quite useful for portable code documentation.

There is a small omission in tw-md documentation; note that for your tiddler you’ll need to specify “Type: text/x-markdown” at the bottom.

 

 

Posted in Uncategorized | Leave a comment